Re: Hash from WKRfresno_at_aol.com on 2001-12-11 (stdin)

From: <WKRfresno_at_aol.com>
Date: Tue, 11 Dec 2001 13:46:32 EST

In a message dated 12/11/01 5:46:28 AM Pacific Standard Time,
aettlinger_at_worldnet.att.net writes:

> Hey, I'm enjoying reading this technical discussion, although a lot of
> it is over my head. I never heard of MD5, and I'm not sure I really want
to
> learn about it.

Do a Google search: md5 hash.

> But....... let me, at my sophomoric level,

Ooooooh, pleeeeeeeease. Your specialty just lies elsewhere. I'm a generalist
focused on speed and space. If I can assume that Dr. Tom's "Dr." is in one of
the numerical/logical sciences (as opposed to gastronomy, astrology or, say,
divinity), he can discuss this on a more theoretical level.

> ... talk a bit about my
> experience with hash tables. I first learned about the technique from Don
> Woods' code. I assume the term "collision" means when two different
> positions happen to generate an identical hash value. Don's code provided
> only 19001 different hash values, so with 100,000 positions in storage,
> there have to be many, many collisions. The way Don's code handles the
> problem is that, whenever a collision is generated, a chaining sequence is
> originated so that all positions with identical hash values are compared
> brute force.

That's standard technique for all hash schemes that I know of except "perfect
hash" which produces no collisions in a minimum number of hash values.

> This is the main factor which caused the solver to run very,
> very slow with large trees, slowing down rather rapidly to the pace of a...

I like that expression, "slowing down ... rapidly."

> ...crawl. I once mentioned this problem to a "guru" friend, and he tipped
me
> off to a process he called "double hash", where you compute two hash values
> based on two different formulae, and compare the combination of the two
> values. I introduced this technique, and it yielded a very dramatic
> improvement in the FcPro solver's ability to run much larger trees at a
much
> faster pace. Chaining is still used, but the maximum chain that develops
> becomes much much shorter. Rereading Bill's latest, I see the term
> "secondary search". May I assume that that's the accepted term for what
I'm
> calling "chaining"?

You (and I) can't count on me for "accepted terms." I had misunderstood your
explanation of "double hash" a year or two ago. At that time I had thought
that the second hash was applied only within a bucket. Now it appears that
your second hash actually extends the 19001 (a prime?) values to some larger
number (probably not a prime), but with twice the effort. If true, the same
result could have been obtained, but more quickly by just increasing the
19001 and not expending the extra work.

Didn't Fish use a word different than "MD5" to describe his hash function?

Bill Raymond
Received on Tue Dec 11 2001 - 10:46:49 IST