On Friday 23 April 2004 18:44, Randall Parker wrote:
> Shlomi Fish wrote:
> >>5) Your solver takes about 70% longer to run with the MS Visual C++
> >>Toolkit 2003 as compared to MSVC v6. I've tried all sorts of compiler
> >>flags and haven't figured out why yet. That web page claims it is
> >>optiimizing. I've tried
> >> CC=cl /GL /G7 /Ox
> >> and it was even slightly (a couple of percent) slower than not having
> >>all that there. I do not know why this is the case. The later compiler
> >>from MS is supposed to generate better code. Maybe
> >
> >Maybe what? In any case, I don't see a reason why it should be so worse.
> > It's quite interesting.
>
> The truncated "Maybe": I don't know why that happened.
>
> Here's my speculation: I realize that the solver has to use a lot of
> memory to check out lots of moves. Well, the problem is not the sheer
> amount of memory used. Rather for some reason perhaps the newer compiler
> is slower at free/malloc. It might cache less and turn more back over to
> the OS and thereby cause more context switches into the OS kernel. Those
> are costly.
Well, free and malloc are implemented in the Microsoft Standard C Run-Time
Library which is a DLL that is common to the new and old compiler. Lower
level functions reside in lower-level DLLs (like KRNL32.DLL) which are also
common. Freecell Solver is dynamically linked to the ANSI C library and the
rest of the libraries, so there should not be a difference between it and the
FCS compiled with the other compiler in this regard. But with Microsoft's
products everything is possible... ;-)
>
> You might consider implementing a way to do fewer larger malloc calls
> that then hand out smaller chunks of memory. I don't know the structure
> of your code and how difficult that would be to do.
>
This is actually being done for every possible resource. You can tweak it
using the following defines:
ALLOCED_SIZE in alloc.c - currently set to somewhat below 8K. The reason it is
not 8K, is because that due to the behaviour of malloc/free/realloc on UNIX
(and possibly on Windows as well), memory allocated as a power of 2, will
physically allocate a block twice as large, due to the fact that a small
memory overhead, that is adjacent to the allocated memory, is used to keep
meta-information about the block.
The second place is:
hard_thread->state_pack_len (a variable not a macro) in fcs_isa.c:
Change the line:
hard_thread->state_pack_len = 0x010000 / sizeof(fcs_state_with_locations_t);
To assign some other value and you can change the amount of states that fit
within this memory segment. Having written it I still don't know why I did
not say:
hard_thread->state_pack_len =
0x010000 / sizeof(fcs_state_with_locations_t) - 1;
there. (to be certain that there's a place for the overhead).
You can tweak them and see if it improves things.
> If you want to suggest any build or run flags for me to try that might
> provide insight into the MS Visual C++ Toolkit 2003 performance problem
> I'd be happy to try them.
Can't think of any except what I said, sorry.
Regards,
Shlomi Fish
--
---------------------------------------------------------------------
Shlomi Fish shlomif_at_iglu.org.il
Homepage: http://shlomif.il.eu.org/
Quidquid latine dictum sit, altum viditur.
[Whatever is said in Latin sounds profound.]
Received on Fri Apr 23 2004 - 11:32:05 IDT