PATCH: PR ld/3111: LD very slow linking object files containing dwarf2 symbols
H. J. Lu
hjl@lucon.org
Fri Jan 19 15:03:00 GMT 2007
More information about the Binutils mailing list
Fri Jan 19 15:03:00 GMT 2007
- Previous message (by thread): PATCH: PR ld/3111: LD very slow linking object files containing dwarf2 symbols
- Next message (by thread): PATCH: Remove duplications in ldgram.y
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, Jan 19, 2007 at 03:46:09PM +0100, Jakub Jelinek wrote: > On Tue, Aug 29, 2006 at 04:00:12PM -0700, H. J. Lu wrote: > > When we are comparing symbols in a section against another section in > > a different file, we swap in 2 symbol tables, sort them and free the > > symbol tables for each section. It is a O^2 problem. When there are > > many sections and symbols in a file, it can be very slow. This patch > > caches the symbol table unless --reduce-memory-overheads is used. The > > results for the testcase are > > > > /usr/bin/time ./ld -shared -o libfoo.so b2dynamic_nonlinear_solver.o b2dynamic_nonlinear_utile.o b2static_nonlinear_utile.o b2static_nonlinear_solver.o b2free_vibration_solver.o b2frequency_dependent_free_vibration_solver.o b2linearized_prebuckling_solver.o > > 1.06user 0.13system 0:01.20elapsed 99%CPU > > /usr/bin/time ./ld --reduce-memory-overheads -shared -o libbar.so b2dynamic_nonlinear_solver.o b2dynamic_nonlinear_utile.o b2static_nonlinear_utile.o b2static_nonlinear_solver.o b2free_vibration_solver.o b2frequency_dependent_free_vibration_solver.o b2linearized_prebuckling_solver.o > > 139.24user 8.93system 2:28.19elapsed 99%CPU > > > > It is about 130X speed up. The only concern I have is the memory > > overhead. It seems small for the testcase. > > I was debugging this too on older (FC6) binutils for > http://bugzilla.redhat.com/bugzilla/223181 > (initially thought the main problem is caching, but as backport of > http://sources.redhat.com/ml/binutils/2006-11/msg00190.html > didn't help, I looked at why is bfd_elf_match_symbols_in_sections > so slow) and came up with a very different patch. > > What the routine used to do is: > 1) swap in both symbol tables > 2) qsort both of them, with huge entry size (32 bytes on 64-bit host) > such that SHN_UNDEF symbols come last, other symbols are sorted > based on st_shndx and when st_shndx is equal based on binding > (this was actually the most expensive part, on the RH#223181 > testcase oprofile said 55.98% of total time spent in mempcpy, > 19.26% in msort_with_tmp, 10.12% in memcpy and 4.23% in > elf_sort_elf_symbol, all of which is from this qsort) > 3) then we scan the sorted array and find the consecutive > region in it which has st_shndx equal to the section, > count how many symbols there are > 4) if both sections have the same number of such symbols, > create symtable{1,2} arrays and for symbols with st_shndx > equal to shndx{1,2} store in the array a pointer to the > isym and name pointer > 5) qsort symtable{1,2} arrays based solely on symbol name > 6) compare the symbol names, st_info, st_other > > Now, unless I grossly misunderstood this, we are doing step 2) > only to select isyms with isym->st_shndx == shndx{1,2} and count them > (we don't care about SHN_UNDEF symbols, those don't have isym->st_shndx > equal to shndx{1,2}, we don't care about the ST_BIND sorting, because > we shortly after that re-sort the array based on name. > But, to select isyms with isym->st_shndx == shndx{1,2} we don't need > to qsort at all, that can be done linearly. My original idea is if we sort the symbol table first, we can stop counting symbols when isym->st_shndx != shndx{1,2}. But if sorting itself is very expensive, it isn't a good idea. Memory usage was also a concern. > > So the patch below (against 2.17.50.0.6) removes the expensive 2) step > and step 3) at the expense of allocating bigger temporary symtable{1,2} arrays > (previously they were just 2 * sizeof (void *) * count{1,2}, with > this patch they are * symcount{1,2}). But those are freed before > the function returns and it pays off a lot. I'm getting similar > speedup as you are getting with your patch. > > Now, your patch instead caches when not reduce_memory_overheads > steps 1) and 2), which is quite memory hungry (some sections have > huge number of (mostly SHN_UNDEF or other section) symbols). > > I wonder if we can't combine the benefits of both approaches. > Let me give it a try. H.J.
- Previous message (by thread): PATCH: PR ld/3111: LD very slow linking object files containing dwarf2 symbols
- Next message (by thread): PATCH: Remove duplications in ldgram.y
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Binutils mailing list