How We Benchmark Freecell Solver (Version 1)
Summary
This article will summarise how we benchmark two or more versions of our software to determine which version runs faster. It may provide some insights for your benchmarking efforts, and we welcome further insights into it.
Table of Contents
Introduction
Publishing Date: 3 October 2017.
This article aims to collect and share the insights we have accumulated from benchmarking Freecell Solver which is a CPU and RAM intensive software application written in C. It is written from the point of view of benchmarking it on a modern GNU/Linux installation.
Points
Use the “sudo_renice” script
sudo_renice is a shell wrapper for nice and ionice that runs the command under optimal resource utilisation. This way it runs faster and gives less variance in the results.
It is reproduced here:
#!/bin/bash sudo nice -n-20 ionice -c1 -n0 sudo -u "$USER" "$@"
Use the same machine and same operating system installation for both timings
It is important to run both the "before" and "after" versions of the benchmark on the same physical computer, with the same system installation and in similar conditions - one after the other.
Make sure the benchmarked process is practically the only thing running
We noticed that running an X environment with a resource-heavy desktop environment such as KDE Plasma, can slow down the program and skew the result. As a result, it is a good idea to stop X and use a virtual console or a remote shell such as ssh, and use a process monitor such as htop to make sure nothing else that consumes CPU or RAM is running (such as system services or daemons, or stale processes that were not killed).
Make sure the system is not overheated
We noticed that once the computer becomes overheated, the CPU is being throttled and performance decreases. Make sure this is not the case by making use of the "sensors" command from lm_sensors, PowerTOP, and perhaps by waiting a little using the UNIX sleep command.
Run each benchmarked process several times
Keep track of the results, and try to see which are generally (minimally, on average, etc.) faster. Also see some previous discussion of it on the Linux-IL mailing list.
Compile flags
You should build both versions using CPU flags for maximal performance such as -O3
, -march=native
, -flto
, -fwhole-program
, and possibly -fomit-frame-pointer
. Profile-guided optimization may prove useful as well.
malloc library
One should link against a fast malloc library. TCMalloc is the best performer for us, but your kilometrage may vary. Other prominent mallocs, which are not mentioned in the first link, are the Lockless Inc. one and ltalloc
Links and References
The Phoronix Test Suite - a test suite that benchmarks Linux and other systems. No first hand experience with it.
Optimizing Code for Speed - an earlier wikibook written by me.
Credits
Licence
This document is Copyright by Shlomi Fish, 2017, and is available under the terms of the Creative Commons Attribution License 3.0 Unported (or at your option any later version of that licence).
For securing additional rights, please contact Shlomi Fish and see the explicit requirements that are being spelt from abiding by that licence.