Hi all!
This is a review of the changes since the 2.30.0 release of Freecell Solver.
I tried to convert move_non_top_stack_cards_to_founds() to positions_by_rank,
but since it also used cards that were also on a true parent, I had to
generalise the positions_by_rank mechanism, which in turn out to made FCS
considerably slower. Thus, I had to revert that commit.
I added meaningful opening comments to the *.c and *.h files. This eliminated
a bootload of "TODO" comments in the code.
I performed a memory+speed optimisation of rand.c/rand.h. Now instead of
malloc()ing a pointer to a struct that only contains a long integer (how
wasteful), it uses an existing long and gets passed the pointer to it. This
proved to improve performance quite a bit.
I performed some small refactorings on move.c.
Now the CMake build process calculates the "int" shift-width (log2 of
sizeof(int)*8)) and defines the appropriate macro. There's a fallback macro in
ANSI C which doesn't handle all possible sizeof(int)'s properly.
I added support for icc's PGO (= Profile Guided Optimisations). The results
are even worse than gcc's PGO.
I revamped freecell.c by extracting an empty_two_cols_from_new_state()
function that moves cards to vacant freecells and columns. This reduced the
code considerably, and made it cleaner. It also eliminated a lot of code that
inhibited the Schlemiel the Painter syndrome:
*
http://en.wikipedia.org/wiki/Schlemiel_the_painter's_Algorithm
*
http://www.joelonsoftware.com/articles/fog0000000319.html
This function was later GCC_INLINE'd to make the code somewhat faster.
There are also some new benchmarks:
{{{{{{{{{
r1864 trunk after ./Tatzer -l p4b:
Non-threaded:
118.82307100296s
Threaded:
107.07123208046s
With icc:
Non-threaded:
115.439105987549s
Threaded:
103.577337026596s
========================================================================
r1878 after ./Tatzer -l p4b: (and after the empty_two_cols_from_new_state()
refactoring)
Non-threaded:
119.177080869675s
========================================================================
r1878 after making empty_two_cols_from_new_state() GCC_INLINE:
./Tatzer -l p4b:
118.614977121353s
}}}}}}}}}
Note that now the gcc serial code went below the two minutes (120 seconds)
threshold on my P4-2.4GHz machine.
-----------------------------------------------
Finally, I measuered the binary size again, with the latest freecell.c
function extraction:
{{{{{{{{
gcc-Os - before strip - 86464
gcc-Os - after strip - 74584
gcc-Os-fc-only - before strip - 72256
gcc-Os-fc-only - after strip - 61128
gcc-Os-no-simple-simon - before strip - 74484
gcc-Os-no-simple-simon - after strip - 63356
gcc-Os-fc-only-no-flips - before strip - 71440
gcc-Os-fc-only-no-flips - after strip - 60312
default - before strip - 245793
default - after strip - 115680
release - before strip - 123918
release - after strip - 111484
r-fc-only - before strip - 105534
r-fc-only - after strip - 93836
r-no-simple-simon - before strip - 103134
r-no-simple-simon - after strip - 91436
r-fc-only-arch-omit-frame - before strip - 105950
r-fc-only-arch-omit-frame - after strip - 94252
r-fc-only-omit-frame - before strip - 108958
r-fc-only-omit-frame - after strip - 97260
r-no-simple-simon-omit-frame - before strip - 106782
r-no-simple-simon-omit-frame - after strip - 95084
}}}}}}}}
So we're down to 60,312 bytes for the Freecell-only preset with -Os.
Regards,
Shlomi Fish
--
-----------------------------------------------------------------
Shlomi Fish http://www.shlomifish.org/
Funny Anti-Terrorism Story - http://xrl.us/bjn7t
God gave us two eyes and ten fingers so we will type five times as much as we
read.
Received on Mon Jun 08 2009 - 15:21:16 IDT