RFC: [PATCH] ELF: Don't require section header on ELF objects

Kaylee Blake klkblake@gmail.com
Mon Mar 9 04:59:04 GMT 2020
On 9/3/20 2:44 pm, H.J. Lu wrote:
> On Sun, Mar 8, 2020 at 7:35 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>> On Sun, Mar 8, 2020 at 7:23 PM Alan Modra <amodra@gmail.com> wrote:
>>>
>>> On Mon, Mar 09, 2020 at 12:29:48PM +1030, Kaylee Blake wrote:
>>>> On 9/3/20 12:06 pm, H.J. Lu wrote:
>>>>> On Sun, Mar 8, 2020 at 5:05 PM Alan Modra <amodra@gmail.com> wrote:
>>>>>> Well we certainly don't do such sorting.  For example, from a freshly
>>>>>> build ld/ld-new --enable-targets=all
>>>>>>
>>>>>>    148: 0000000000f08380     4 OBJECT  GLOBAL DEFAULT   25 opterr@GLIBC_2.2.5 (3)
>>>>>>    149: 0000000000402f80     0 FUNC    GLOBAL DEFAULT  UND calloc@GLIBC_2.2.5 (3)
>>>>>>    150: 0000000000881536    35 FUNC    GLOBAL DEFAULT   13 _obstack_allocated_p
>>>>>>
>>>>>
>>>>> I will make 2 changes:
>>>>>
>>>>> 1.  Update -z nosectionheader to guarantee that the last entry in
>>>>> dynamic symbol table
>>>>> is defined.
>>>>> 2.  Update --remove-section-header to issue an error if the last entry
>>>>> in dynamic symbol
>>>>> table is undefined.
>>>>>
>>>>
>>>> With some testing, it seems like ld will emit an ordered symbol table
>>>> iff it's using the DT_GNU_HASH hash table style
>>>
>>> It doesn't.  The snippet of .dynsym I posted was from a binary with
>>> DT_GNU_HASH.  elflink.c:_bfd_elf_link_renumber_dynsyms should convince
>>> you that any ordering seen is by chance.
>>>
>>>> , and my understanding is
>>>> that DT_GNU_HASH in fact requires this behaviour.
>>>
>>> Apparently not.  ;-)
>>>
>>>> So in that case, we
>>>> don't need to do an additional check, because we only need the ordering
>>>> if we are looking up through DT_GNU_HASH instead of DT_HASH.
>>>>
>>>> --
>>>> Kaylee Blake <klkblake@gmail.com>
>>>> C is the worst language, except for all the others.
>>>
>>
>> x86 backend does:
>>
>>  if (!local_undefweak
>>       && !h->def_regular
>>       && (h->plt.offset != (bfd_vma) -1
>>           || eh->plt_got.offset != (bfd_vma) -1))
>>     {
>>       /* Mark the symbol as undefined, rather than as defined in
>>          the .plt section.  Leave the value if there were any
>>          relocations where pointer equality matters (this is a clue
>>          for the dynamic linker, to make function pointer
>>          comparisons work between an application and shared
>>          library), otherwise set it to zero.  If a function is only
>>          called from a binary, there is no need to slow down
>>          shared libraries because of that.  */
>>       sym->st_shndx = SHN_UNDEF;
>>       if (!h->pointer_equality_needed)
>>         sym->st_value = 0;
>>     }
>>
>> Entries in DT_GNU_HASH were originally defined.  A backend
>> may change some entries to undefined.  I think my patch is OK.
>>
> 
> [hjl@gnu-cfl-2 pr25617]$ cat y.s
> .data
> bar:
> .dc.a foo
> [hjl@gnu-cfl-2 pr25617]$ gcc -c y.s
> [hjl@gnu-cfl-2 pr25617]$ ./ld -shared y.o --hash-style=sysv
> [hjl@gnu-cfl-2 pr25617]$ readelf -D -s  a.out
> 
> Symbol table for image:
>   Num Buc:    Value          Size   Type   Bind Vis      Ndx Name
>     1   0: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT UND foo
> [hjl@gnu-cfl-2 pr25617]$ ./ld -shared y.o --hash-style=gnu
> [hjl@gnu-cfl-2 pr25617]$ readelf -D -s  a.out
> [hjl@gnu-cfl-2 pr25617]$
> 
> I will update my patch to not to generate such binary without section
> header.
> 

A possible alternative if we have DT_GNU_HASH is to scan through the
relocation list. Every symbol we care about for linking must either be
something this library is providing (in which case it's in the range
provided by DT_GNU_HASH), or something it needs (in which there will be
a relocation referencing it). So if there is no DT_HASH, we can take the
max of the highest DT_GNU_HASH symbol and the highest symbol referenced
by a relocation entry. Theoretically there could be a symbol which is
undefined but never referenced in a relocation, but the dynamic linker
doesn't have any information we don't, so it can't affect anything if we
don't have a way to get it.

-- 
Kaylee Blake <klkblake@gmail.com>
C is the worst language, except for all the others.



More information about the Binutils mailing list