[PATCH] x86: Optimize the encoder of the vvvv register

Cui, Lili lili.cui@intel.com
Mon Apr 22 08:42:45 GMT 2024
> On 19.04.2024 09:36, Cui, Lili wrote:
> > This patch wants to optimize the encoder of the vvvv register.
> > Previously we used Vexvvvv, SWAP_SOURCES and extension_opcode to
> help
> > encode the vvvv register, this patch simplified the logic to only use
> > vexvvvv and added appropriate Vexvvvv values for the related
> > instructions.
> 
> This looks to be a good move, yet you're not fully leveraging the potential:
> Afaict all uses of SwapSources now go away. Hence SwapSources itself should
> go away too, together with SWAP_SOURCES.
> 

I forgot to delete them. Thanks.

> Beyond that largely (but not only) cosmetic comments:
> 
> > @@ -10426,40 +10426,35 @@ build_modrm_byte (void)
> >  				     || i.encoding == encoding_evex));
> >      }
> >
> > -  if (i.tm.opcode_modifier.vexvvvv == VexVVVV_DST)
> > +  switch (i.tm.opcode_modifier.vexvvvv)
> >      {
> > -      v = dest;
> > -      dest-- ;
> > -    }
> > -  else
> > -    {
> > -      for (v = source + 1; v < dest; ++v)
> > -	if (v != reg_slot)
> > +    case VexVVVV_SRC2:
> > +      if (source != op)
> > +	{
> > +	  v = source++;
> >  	  break;
> > -      if (v >= dest)
> > -	v = ~0;
> > -    }
> > -  if (i.tm.extension_opcode != None)
> > -    {
> > -      if (dest != source)
> > -	v = dest;
> > -      dest = ~0;
> > +	}
> > +      /* For XOP: vpshl* and vpsha*.  */
> > +      else
> > +      /* Fall through.  */
> > +    case VexVVVV_SRC1:
> > +      v =  dest - 1;
> 
> Indentation here wants to fit the "else", not the "case ...". Even better would
> be to avoid the "else", seeing that there already is a "break" inside the if()'s
> body.
> 

Dropped 'else'.

> > --- a/opcodes/i386-opc.h
> > +++ b/opcodes/i386-opc.h
> > @@ -640,10 +640,13 @@ enum
> >    Vex,
> >    /* How to encode VEX.vvvv:
> >       0: VEX.vvvv must be 1111b.
> > -     1: VEX.vvvv encodes one of the src register operands.
> > -     2: VEX.vvvv encodes the dest register operand.
> > +     1: VEX.vvvv encodes the src1 register operand.
> > +     2: VEX.vvvv encodes the src2 register operand.
> > +     3: VEX.vvvv encodes the dest register operand.
> >     */
> > -#define VexVVVV_DST   2
> > +#define VexVVVV_SRC1   1
> > +#define VexVVVV_SRC2   2
> > +#define VexVVVV_DST    3
> >    VexVVVV,
> 
> While I'm not overly fussed on the names used here, ...
> 
> > --- a/opcodes/i386-opc.tbl
> > +++ b/opcodes/i386-opc.tbl
> > @@ -141,7 +141,9 @@
> >
> >  #define Disp8ShiftVL Disp8MemShift=DISP8_SHIFT_VL
> >
> > -#define DstVVVV VexVVVV=VexVVVV_DST
> > +#define VexVVVV_Src1 VexVVVV=VexVVVV_SRC1 #define VexVVVV_Src2
> > +VexVVVV=VexVVVV_SRC2 #define VexVVVV_Dst  VexVVVV=VexVVVV_DST
> 
> ... I am here, due to the line length issues we already have. Please can you
> keep DstVVVV as a name (thus reducing the churn on the table below) and
> add Src1VVVV and Src2VVVV, all being 4 characters shorter than what you
> presently have?
> 

Src1VVVV, Src2VVVV and DstVVVV are better.

> > @@ -1000,13 +1002,13 @@ pause, 0xf390, i186, NoSuf, {}
> >
> >  // MMX/SSE2 instructions.
> >
> > -<mmx:cpu:pfx:attr:reg:mem, +
> > -    $avx:AVX:66:Vex128|VexVVVV|VexW0|SSE2AVX:RegXMM:Xmmword, +
> > -    $sse:SSE2:66::RegXMM:Xmmword, +
> > -    $mmx:MMX:::RegMMX:Qword>
> > +<mmx:cpu:pfx:attr:scal:reg:mem, +
> > +
> $avx:AVX:66:Vex128|VexVVVV_Src1|VexW0|SSE2AVX:Vex128|VexVVVV_Dst
> |VexW0|SSE2AVX:RegXMM:Xmmword, +
> > +    $sse:SSE2:66:::RegXMM:Xmmword, +
> > +    $mmx:MMX::::RegMMX:Qword>
> >
> >  <sse2:cpu:attr:scal:vvvv, +
> > -    $avx:AVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVV, +
> > +
> $avx:AVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVV_Src1,
> > + +
> >      $sse:SSE2:::>
> >
> >  <bw:opc:vexw:elem:kcpu:kpfx:cpubmi, + @@ -1058,7 +1060,7 @@
> > pmulhw<mmx>, 0x<mmx:pfx>0fe5, <mmx:cpu>,
> Modrm|<mmx:attr>|C|NoSuf, {
> > <mmx:reg>|<  pmullw<mmx>, 0x<mmx:pfx>0fd5, <mmx:cpu>,
> > Modrm|<mmx:attr>|C|NoSuf,
> { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex,
> > <mmx:reg> }  por<mmx>, 0x<mmx:pfx>0feb, <mmx:cpu>,
> > Modrm|<mmx:attr>|C|NoSuf,
> { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex,
> > <mmx:reg> }  psllw<mmx>, 0x<mmx:pfx>0ff1, <mmx:cpu>,
> > Modrm|<mmx:attr>|NoSuf,
> { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex,
> > <mmx:reg> } -psllw<mmx>, 0x<mmx:pfx>0f71/6, <mmx:cpu>,
> > Modrm|<mmx:attr>|NoSuf, { Imm8, <mmx:reg> }
> > +psllw<mmx>, 0x<mmx:pfx>0f71/6, <mmx:cpu>,
> Modrm|<mmx:scal>|NoSuf, {
> > +Imm8, <mmx:reg> }
> 
> This is not a scalar instruction, hence "scal" as a template parameter name is
> misleading. It's not really clear to me anyway why this needs fiddling with -
> there was no SwapSources here, and none of its siblings are being touched
> either.
> 

'psllw' has an extended opcode and two non-immediate operands, to delete the corresponding code below.

-  if (i.tm.extension_opcode != None)
-    {
-      if (dest != source)
-       v = dest;
-      dest = ~0;
-    }

> To help recognizing such anomalies (possible problems), could I talk you into
> splitting the patch in two pieces? First a purely mechanical one introducing
> (perhaps simply as an alias of VexVVVV) / using Src1VVVV wherever it is
> meant to be used. Then the remaining changes, with a much smaller diff on
> the actual opcode templates, in the 2nd one.
> 

How about splitting into 3 patches:
1. Introduce VexVVVV , Src1VVVV and Src2VVVV. 
2. Replace SwapSources. 
3. Replace extension_opcode part.

Thanks,
Lili.



More information about the Binutils mailing list