How We Benchmark Freecell Solver (Version 1)

Summary

A cat sitting on a bench

This article will summarise how we benchmark two or more versions of our software to determine which version runs faster. It may provide some insights for your benchmarking efforts, and we welcome further insights into it.

Table of contents

Introduction

Publishing Date: 3 October 2017.

This article aims to collect and share the insights we accumulated from benchmarking Freecell Solver which is a CPU and RAM intensive software application written in C. It is written from the point of view of benchmarking it on a modern GNU/Linux installation.

Points

Use the “sudo_renice” script

sudo_renice is a shell wrapper for nice and ionice that runs the command under optimal resource utilisation. This way it runs faster and gives less variance in the results.

It is reproduced here:

#!/bin/bash
sudo nice -n-20 ionice -c1 -n0 sudo -u "$USER" "$@"

Use the same machine and same operating system installation for both timings

It is important to run both the "before" and "after" versions of the benchmark on the same physical computer, with the same system installation and in similar conditions - one after the other.

Make sure the benchmarked process is practically the only thing running

We noticed that running an X environment with a resource-heavy desktop environment such as KDE Plasma, can slow down the program and skew the result. As a result, it is a good idea to stop X and use a virtual console or a remote shell such as ssh, and use a process monitor such as htop to make sure nothing else that consumes CPU or RAM is running (such as system services or daemons, or stale processes that were not killed).

Make sure the system is not overheated

We noticed that once the computer becomes overheated, the CPU is being throttled and performance decreases. Make sure this is not the case by making use of the "sensors" command from lm_sensors, PowerTOP, and perhaps by waiting a little using the UNIX sleep command.

Run each benchmarked process several times

Keep track of the results, and try to see which are generally (minimally, on average, etc.) faster. Also see some previous discussion of it on the Linux-IL mailing list.

Compile flags

You should build both versions using CPU flags for maximal performance such as -O3, -march=native, -flto, -fwhole-program, and possibly -fomit-frame-pointer. Profile-guided optimization may prove useful as well.

malloc library

One should link against a fast malloc library. TCMalloc is the best performer for us, but your kilometrage may vary. Other prominent mallocs, which are not mentioned in the first link, are the Lockless Inc. one and ltalloc

Credits

Licence

Creative Commons License

This document is Copyright by Shlomi Fish, 2017, and is available under the terms of the Creative Commons Attribution License 3.0 Unported (or at your option any later version of that licence).

For securing additional rights, please contact Shlomi Fish and see the explicit requirements that are being spelt from abiding by that licence.