CSL-76-3_The_Analysis_of_Hashing_Algorithms.pdf part of xerox CSL-76-3 The Analysis of Hashing Algorithms CSL-76-3_The_Analysis_of_Hashing_Algorithms.pdf, xerox parc techReports CSL-76-3_The_Analysis_of_Hashing

Text preview for : CSL-76-3_The_Analysis_of_Hashing_Algorithms.pdf part of xerox CSL-76-3 The Analysis of Hashing Algorithms xerox parc techReports CSL-76-3_The_Analysis_of_Hashing_Algorithms.pdf

Back to : CSL-76-3_The_Analysis_of_ | Home

THE ANALYSIS OF
HASHING ALGORITHMS
BY LEONIDAS J. GUIBAS

CSL-76-3 JULY 1976 '

See abstract on next page.

KEY WORDS AND PHRASES

algorithmic analysis, arithmetic progression, hashing, information storage and retrieval,
generating function, recurrence relation, Farey series, clustering, hypergeometric
distribution, table overflow

CR CATEGORIES

3.74, 5.24, 5.25, 5.30

XEROX
PALO ALTO RESEARCH CENTER
3333 COYOTE HILL ROAD / PALO ALTO / CALIFORNIA 94304
ABSTRACT

In this thesis we relate the performance of hashing algorithms to the notion of
clustering, that is the pile-up phenomenon that occurs because many keys may probe
the table locations in the same sequence. We will say that a hashing technique exhibits
k-ary clustering if the search for a key begins with k independent random probes and
the subsequent sequence of probes is completely determined by the location of the k
initial probes. Such techniques may be very bad; for instance, the average number of
probes necessary for insertion may grow linearly with the table size. However, on the
average (that is if the permutations describing the method are randomly chosen), k-ary
clustering techniques for k) 1 are very good. In fact the average performance is
asymptotically equivalent to the performance of uniform probing, a method that exhibits
no clustering and is known to be optimal in a certain sense.

Perharps the most famous among tertiary clustering techniques is double hashing, the
method in which we probe the hash table along arithmetic progressions where the initial
element and the increment of the progression are chosen randomly and independently
depending only on the key K of the search. We prove that double hashing