Patricia trie symbol tables?

John Moser john.r.moser@gmail.com
Mon Jun 26 20:14:00 GMT 2006
On Mon, 2006-06-26 at 18:25 +0200, Florian Weimer wrote:
> * John Moser:
> 
> > Reading what Drepper says about -Wl,-O1 and why to use it, I get the
> > gist of this:  You'll do comparisons against
> > "_ZN14some_namespace22some_longer_class_name" twice, what a waste.  The
> > point of -Wl,-O1 is to make the buckets smaller so this happens less
> > often, as a side effect of making the hash table bigger (in other words,
> > no guarantees; but it'll probably happen).
> 
> The technique described at <http://www.cs.princeton.edu/~rs/strings/>
> might also be a win, compared to hash tables.  (Quoting from my
> bookmark file, as I'm offline at the moment.)

A Radix trie gives a pretty straightforward O(n) number of character
comparisons, and NR_COMMON_PREFIXES branches (i.e. "foobarbaz" when you
have "foojin" "foobarjin" "foobarbaz" forces you to branch 3 times),
which is again O(n) (but you'll never see that case in a hash bucket);
so we're looking at total O(2n) ~= O(n) operations (in other words, it's
about equivalent to having a single symbol in each hash bucket).

A quick glance at the SODA paper gives me the impression that they were
trying to make something fast for sorting and searching changing data
sets.  I can't tell if it's more efficient than radix in our case;
although, it looks like they intend every node to be a single character.
With a radix trie, the nodes contain however many characters go before
you need a branch; this avoids a lot of jumping around.

SODA:
r    --a
o   /   \  (at, an)
o  /     t .-.n
t /       \
V/         --t--r--i--b--u--t--e (attribute..?  It'll probably balloon
i--s (is)                           into a tree here, like with isnt)
  / \
 n   t (isnt)

Radix:

root
V     (at)   (attribute)
---a--t--tribute
 |  |
 |  |-n (an)
 | (is)  (isn't)
 |-is--nt

At least, that's what I get at a quick glance.

Anyone more familiar with the stuff in the SODA paper?
-- 
John Moser <john.r.moser@gmail.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 829 bytes
Desc: This is a digitally signed message part
URL: <https://sourceware.org/pipermail/binutils/attachments/20060626/21bd4d4b/attachment.sig>


More information about the Binutils mailing list