## Text preview for : CSL-76-3_The_Analysis_of_Hashing_Algorithms.pdf part of xerox CSL-76-3 The Analysis of Hashing Algorithms xerox parc techReports CSL-76-3_The_Analysis_of_Hashing_Algorithms.pdf

Back to : CSL-76-3_The_Analysis_of_ | Home

THE ANALYSIS OF

HASHING ALGORITHMS

BY LEONIDAS J. GUIBAS

CSL-76-3 JULY 1976 '

See abstract on next page.

KEY WORDS AND PHRASES

algorithmic analysis, arithmetic progression, hashing, information storage and retrieval,

generating function, recurrence relation, Farey series, clustering, hypergeometric

distribution, table overflow

CR CATEGORIES

3.74, 5.24, 5.25, 5.30

XEROX

PALO ALTO RESEARCH CENTER

3333 COYOTE HILL ROAD / PALO ALTO / CALIFORNIA 94304

ABSTRACT

In this thesis we relate the performance of hashing algorithms to the notion of

clustering, that is the pile-up phenomenon that occurs because many keys may probe

the table locations in the same sequence. We will say that a hashing technique exhibits

k-ary clustering if the search for a key begins with k independent random probes and

the subsequent sequence of probes is completely determined by the location of the k

initial probes. Such techniques may be very bad; for instance, the average number of

probes necessary for insertion may grow linearly with the table size. However, on the

average (that is if the permutations describing the method are randomly chosen), k-ary

clustering techniques for k) 1 are very good. In fact the average performance is

asymptotically equivalent to the performance of uniform probing, a method that exhibits

no clustering and is known to be optimal in a certain sense.

Perharps the most famous among tertiary clustering techniques is double hashing, the

method in which we probe the hash table along arithmetic progressions where the initial

element and the increment of the progression are chosen randomly and independently

depending only on the key K of the search. We prove that double hashing