PATCH: PR ld/3111: LD very slow linking object files containing dwarf2 symbols

H. J. Lu hjl@lucon.org
Fri Jan 19 15:03:00 GMT 2007

Previous message (by thread): PATCH: PR ld/3111: LD very slow linking object files containing dwarf2 symbols
Next message (by thread): PATCH: Remove duplications in ldgram.y
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Jan 19, 2007 at 03:46:09PM +0100, Jakub Jelinek wrote:
> On Tue, Aug 29, 2006 at 04:00:12PM -0700, H. J. Lu wrote:
> > When we are comparing symbols in a section against another section in
> > a different file, we swap in 2 symbol tables, sort them and free the
> > symbol tables for each section. It is a O^2 problem. When there are
> > many sections and symbols in a file, it can be very slow. This patch
> > caches the symbol table unless --reduce-memory-overheads is used. The
> > results for the testcase are
> > 
> > /usr/bin/time ./ld -shared -o libfoo.so b2dynamic_nonlinear_solver.o b2dynamic_nonlinear_utile.o b2static_nonlinear_utile.o b2static_nonlinear_solver.o b2free_vibration_solver.o b2frequency_dependent_free_vibration_solver.o b2linearized_prebuckling_solver.o
> > 1.06user 0.13system 0:01.20elapsed 99%CPU
> > /usr/bin/time ./ld --reduce-memory-overheads -shared -o libbar.so b2dynamic_nonlinear_solver.o b2dynamic_nonlinear_utile.o b2static_nonlinear_utile.o b2static_nonlinear_solver.o b2free_vibration_solver.o b2frequency_dependent_free_vibration_solver.o b2linearized_prebuckling_solver.o
> > 139.24user 8.93system 2:28.19elapsed 99%CPU
> > 
> > It is about 130X speed up. The only concern I have is the memory
> > overhead. It seems small for the testcase.
> 
> I was debugging this too on older (FC6) binutils for
> http://bugzilla.redhat.com/bugzilla/223181
> (initially thought the main problem is caching, but as backport of
> http://sources.redhat.com/ml/binutils/2006-11/msg00190.html
> didn't help, I looked at why is bfd_elf_match_symbols_in_sections
> so slow) and came up with a very different patch.
> 
> What the routine used to do is:
> 1) swap in both symbol tables
> 2) qsort both of them, with huge entry size (32 bytes on 64-bit host)
>    such that SHN_UNDEF symbols come last, other symbols are sorted
>    based on st_shndx and when st_shndx is equal based on binding
>    (this was actually the most expensive part, on the RH#223181
>    testcase oprofile said 55.98% of total time spent in mempcpy,
>    19.26% in msort_with_tmp, 10.12% in memcpy and 4.23% in
>    elf_sort_elf_symbol, all of which is from this qsort)
> 3) then we scan the sorted array and find the consecutive
>    region in it which has st_shndx equal to the section,
>    count how many symbols there are
> 4) if both sections have the same number of such symbols,
>    create symtable{1,2} arrays and for symbols with st_shndx
>    equal to shndx{1,2} store in the array a pointer to the
>    isym and name pointer
> 5) qsort symtable{1,2} arrays based solely on symbol name
> 6) compare the symbol names, st_info, st_other
> 
> Now, unless I grossly misunderstood this, we are doing step 2)
> only to select isyms with isym->st_shndx == shndx{1,2} and count them
> (we don't care about SHN_UNDEF symbols, those don't have isym->st_shndx
> equal to shndx{1,2}, we don't care about the ST_BIND sorting, because
> we shortly after that re-sort the array based on name.
> But, to select isyms with isym->st_shndx == shndx{1,2} we don't need
> to qsort at all, that can be done linearly.

My original idea is if we sort the symbol table first, we can stop
counting symbols when isym->st_shndx != shndx{1,2}. But if sorting
itself is very expensive, it isn't a good idea. Memory usage was
also a concern.

> 
> So the patch below (against 2.17.50.0.6) removes the expensive 2) step
> and step 3) at the expense of allocating bigger temporary symtable{1,2} arrays
> (previously they were just 2 * sizeof (void *) * count{1,2}, with
> this patch they are * symcount{1,2}).  But those are freed before
> the function returns and it pays off a lot.  I'm getting similar
> speedup as you are getting with your patch.
> 
> Now, your patch instead caches when not reduce_memory_overheads
> steps 1) and 2), which is quite memory hungry (some sections have
> huge number of (mostly SHN_UNDEF or other section) symbols).
> 
> I wonder if we can't combine the benefits of both approaches.
> 

Let me give it a try.


H.J.

Previous message (by thread): PATCH: PR ld/3111: LD very slow linking object files containing dwarf2 symbols
Next message (by thread): PATCH: Remove duplications in ldgram.y
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Binutils mailing list