On Thu, 13 Dec 2001, Tom Holroyd wrote:
> On Thu, 13 Dec 2001, Shlomi Fish wrote:
>
> > I believe you misunderstand the standard defintion of a perfect hash
> > function. What it means is that for every two distinct keys k1 and k2,
> > there is a chance of 1/M (where M is the number of values the hash
> > accepts) of collision in the hash value.
>
> http://hissa.nist.gov/dads/HTML/perfecthash.html
> http://burtleburtle.net/bob/hash/perfect.html
> http://www.amk.ca/python/code/perfect-hash.html
> http://www.cslab.vt.edu/manuals/libg/gperf_4.html
> http://www.codebits.com/bit.cfm?BitID=64
> man gperf
>
> Perfect hash means no collisions. Minimal perfect hash means your hash
> table is equal in size to the number of keys.
>
Very interesting. Maybe I confused perfect hashing with something else.
The material of a course tends to get a little mixed up after a course.
> > Again, I believe you are mistaken. That's not what I was told a perfect
> > hash is.
>
> Look it up.
>
> > I use only the first 32 bits of the MD5 return code (or 64 bits in rare
> > cases where an int is defined as a 64-bits integer). I believe Ron Rivest
> > also showed that any subset of bits of the MD5 checksum is a perfect hash.
>
> MD5 is not perfect, and the first 32 bits are amazingly not perfect.
> And, are you using those 32 bits to index a bucket that you search (I
> doubt it), or are you using those 32 bits AS a bucket? In that case a
> collision may lead to a false impossible.
>
I use chaining. So, if I see that the hash value is the same, I still
compare the keys themselves.
> Collisions are very easy to find with only 32 bits, for example,
>
> % msdeal 123209 | tee /dev/tty | md5sum
> 5S KD 7S 7H QD 3H AH
> TC 5C 2D 3C 6S KH 9H
> 8H KS JD AD 8C AS TH
> 2H 5D TD AC 7C 2S TS
> 6H 4D 6D QC 3D 4C
> 9D 6C 8D 4S 7D 8S
> JS 4H JC KC 9C 9S
> QS JH 2C 5H QH 3S
> 8532ed958a67e1c564ed1fdcf4b0ddd5 -
> ^^^^^^^^
> % msdeal 72767 | tee /dev/tty | md5sum
> 6D 7C 7H AC AH 2S 9C
> TH 6H JH JS 8C 9S KH
> 7D 6S TD 4H QH 4S QD
> 9H AS 4D KC 6C 4C KS
> 5S 2D 3H 3S 5D JD
> 2C 8H KD AD 7S TC
> 3D 8D 2H 3C 5C QC
> QS TS 9D 5H 8S JC
> 8532ed959f9842ec41d4da8bae926106 -
> ^^^^^^^^
> but it would be more relevant to find two intermediate boards from the
> same game that collide. (Note there are probably spaces at the ends of
> those lines, in case anybody is bored enough to try to verify this.)
>
There probably are such cases. But that would probably occur in any other
hash function I can choose. The problem is that I'll get a very big
performance decrease if I use more bits than 32 (at least on i386's). And
I'm not in the mood for maintaining a complex #ifdef scheme for 64-bit
machines.
Regards,
Shlomi Fish
> Dr. Tom Holroyd
> "I am, as I said, inspired by the biological phenomena in which
> chemical forces are used in repetitious fashion to produce all
> kinds of weird effects (one of which is the author)."
> -- Richard Feynman, _There's Plenty of Room at the Bottom_
>
>
>
> To unsubscribe from this group, send an email to:
> fc-solve-discuss-unsubscribe_at_yahoogroups.com
>
>
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>
--
----------------------------------------------------------------------
Shlomi Fish shlomif_at_vipe.technion.ac.il
Home Page: http://t2.technion.ac.il/~shlomif/
Home E-mail: shlomif_at_techie.com
He who re-invents the wheel, understands much better how a wheel works.
Received on Thu Dec 13 2001 - 23:15:26 IST