[PATCH v10 06/28] gas: implement parsing of object attributes v2

Jan Beulich jbeulich@suse.com
Mon Dec 22 08:19:20 GMT 2025
On 19.12.2025 15:31, Matthieu Longo wrote:
> On 08/12/2025 07:43, Jan Beulich wrote:
>> On 05.12.2025 17:27, Matthieu Longo wrote:
>>> On 02/12/2025 08:19, Jan Beulich wrote:
>>>> On 01.12.2025 17:28, Matthieu Longo wrote:
>>>>> On 28/11/2025 14:29, Jan Beulich wrote:
>>>>>> On 20.11.2025 18:58, Matthieu Longo wrote:
>>>>>>> +#if (TC_OBJ_ATTR_v2)
>>>>>>> +/* Look up attribute tags defined in the backend (object attribute v2).  */
>>>>>>> +static bool
>>>>>>> +obj_attr_v2_lookup_known_attr_tag_symbol (const char *identifier,
>>>>>>> +					  arg_token_t token_type,
>>>>>>> +					  arg_t *val_out)
>>>>>>> +{
>>>>>>> +  obj_attr_subsection_v2_t *subsec = elf_obj_attr_subsections (stdoutput).last;
>>>>>>> +  /* If there is no current subsection, this function was called wrongly before
>>>>>>> +     setting one (usually via the subsection directive).  */
>>>>>>> +  gas_assert (subsec != NULL);
>>>>>>
>>>>>> The comment (in particular what is being said in parentheses) suggests to me that
>>>>>> here you're asserting correctness of user input. That would want to be an as_bad()
>>>>>> then, though.
>>>>>>
>>>>>
>>>>> In my understanding, as you stated, as_bad() is used to emit an error due to a user input.
>>>>> However, this comment refers to a logic error. The function is not supposed to be called if there is no current subsection.
>>>>> It happened to me at some point during the development, and got a mysterious segfault instead of an explicit message pointing to my mistake. I decided to add this comment and assert just in case it happens to someone else.
>>>>
>>>> May I then ask that you re-word the comment such that it's unambiguously talking
>>>> about internal state, not input we're processing? To me "If there is no current
>>>> subsection" can mean either.
>>>
>>> What about this ? Is it clearer ?
>>>
>>>     /* This function is called when setting a value for an attribute, and it
>>>        requires an active subsection.  If 'subsec' is NULL, the caller wrongly
>>>        invoked this function before selecting a current subsection (normally via
>>>        the subsection directive in assembly).  This is a logic error, most likely
>>>        caused by a misvalidation of a user input.  */
>>
>> Sadly not. What is said in parentheses again refers to user input. The
>> "misvalidation" part then recovers some, but it still leaves things
>> ambiguous imo.
>>
> 
> Ok what about this ?
> 
>    /* This function is called when setting a value for an attribute, and it
>       requires an active subsection.  Calling this function without setting
>       'subsec' is a logical error.  Invalid user code, where the subsection
>       has not been selected must be handled by the caller.  We require 'subsec'
>       to be non-NULL here.  */

Yes, better. Thanks.

>>>>>>> +/* Parse an argument, and set its type accordingly depending on the input
>>>>>>> +   value, and the constraints on the expected argument.  */
>>>>>>> +static bool
>>>>>>> +obj_attr_parse_arg (arg_token_t expected_ttype,
>>>>>>> +		    bool (*match_identifier) (char c),
>>>>>>> +		    bool resolve_identifier,
>>>>>>> +		    bool optional,
>>>>>>> +		    arg_t *arg_out)
>>>>>>> +{
>>>>>>> +  const arg_token_t low_ttype = (expected_ttype & LT_MASK);
>>>>>>> +
>>>>>>> +  if (optional && is_end_of_stmt (*input_line_pointer))
>>>>>>> +    {
>>>>>>> +      arg_out->vtype = VALUE_OPTIONAL_ABSENT;
>>>>>>> +      return true;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Check whether this looks like a string literal
>>>>>>> +     Note: symbol look-up for string literals is not available.  */
>>>>>>> +  if (*input_line_pointer == '"')
>>>>>>> +    {
>>>>>>> +      bool status = extract_string_literal (arg_out);
>>>>>>> +      if (status && (low_ttype & STRING))
>>>>>>> +	return true;
>>>>>>> +
>>>>>>> +      if (status)
>>>>>>> +	{
>>>>>>> +	  char sbuffer[100];
>>>>>>> +	  as_bad (_("unexpected `string' \"%s\", expected %s instead"),
>>>>>>> +		  arg_out->val.string,
>>>>>>> +		  expectations_to_string (low_ttype, sbuffer, sizeof(sbuffer)));
>>>>>>> +	  free ((char *) arg_out->val.string);
>>>>>>> +	  arg_out->val.string = NULL;
>>>>>>> +	  arg_out->vtype = VALUE_UNDEFINED;
>>>>>>> +	}
>>>>>>> +      return false;
>>>>>>> +    }
>>>>>>> +  /* Check whether this looks like an identifier.  */
>>>>>>> +  else if (ISALPHA (*input_line_pointer) || *input_line_pointer == '_')
>>>>>>
>>>>>> Aren't you open-coding is_tag_identifier() here? Oh, wait - it's is ALNUM()
>>>>>> there and ISALPHA() here. Why the difference? A symbol (identifier) can't
>>>>>> start with a number, can it (seeing that is_tag_identifier() is only an
>>>>>> alias of is_symbol())?
>>>>>
>>>>> Exactly what you said, i.e. a symbol cannot start with a number, hence the call to ISALPHA() instead of ALNUM().
>>>>
>>>> Yet then exactly as I said: is_tag_identifier() being an alias of is_symbol(),
>>>> why is ISALNUM() used there? I.e. my original question remains: Are you perhaps
>>>> open-coding is_tag_identifier() here? (That's the only meaning of "identifier"
>>>> in the comment that I can derive. I can't exclude though that there are [again]
>>>> multiple different meanings of "identifier" throughout this code.)
>>>
>>> Sorry to repeat myself but I don't understand what you mean in this comment.
>>>
>>> Are we agreeing on the following statements ?
>>> 1. An identifier can start with [a-zA-Z_] but not [0-9], so the first character's check is using ISALPHA().
>>> 2. Any following characters can use [a-zA-Z0-9_] hence ALNUM().
>>
>> Yes (leaving aside the question of whether further characters may be legitimate to
>> use).
>>
>>> Is using "ISALPHA (*input_line_pointer) || *input_line_pointer == '_'" in the condition the issue you try to point at ?
>>> Do you want me to move it to a function/macro and call it "is_identifier_start()" ?
>>
>> Not exactly. First, as said, that expression feels like it's open-coding something.
>> That "something" may be is_tag_identifier(). Which aliases is_symbol(). Which uses
>> ISALNUM(), irrespective of the character being looked at being first in a symbol
>> name.
>>
>> IOW is_symbol() violates 1. in your reply. Why? If that was changed, in turn
>> is_tag_identifier() could be used here.
>>
>> is_symbol() of course is further confusing to see: Why would you open-code (and
>> further constrain) is_name_beginner() / is_part_of_name() / is_name_ender() here?
>> The only possible explanation I can find is that "symbol" means something else
>> here than in the rest of the assembler. Which wouldn't be very helpful.
> 
> I tried to use is_name_beginner() along get_symbol_name().
> The parsing falls apart, and I get a lot of failures in the tests.

Which may point to a more general problem with the parsing (here and/or what's
available as general machinery in gas). Without you giving details, I of course
can't even guess what the problem may be.

> I don't have the courage to debug what is going on anymore. FYI I am morally exhausted with this patch particularly.
> If we want to be able to ship this feature for the next release of binutils, maybe we should stop changing everything.

Hmm, I'm really of two minds here. On one hand I can understand your frustration
as well as your goal to "just get this in". Otoh other kinds of open-coded
parsing have been causing issues elsewhere in the assembler. I'm therefore
rather hesitant to accept anything else going such a route.

Jan


More information about the Binutils mailing list