[PATCH] x86: Optimize the encoder of the vvvv register
Cui, Lili
lili.cui@intel.com
Mon Apr 22 08:42:45 GMT 2024
More information about the Binutils mailing list
Mon Apr 22 08:42:45 GMT 2024
- Previous message (by thread): [PATCH] x86: Optimize the encoder of the vvvv register
- Next message (by thread): [PATCH] x86: Optimize the encoder of the vvvv register
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> On 19.04.2024 09:36, Cui, Lili wrote:
> > This patch wants to optimize the encoder of the vvvv register.
> > Previously we used Vexvvvv, SWAP_SOURCES and extension_opcode to
> help
> > encode the vvvv register, this patch simplified the logic to only use
> > vexvvvv and added appropriate Vexvvvv values for the related
> > instructions.
>
> This looks to be a good move, yet you're not fully leveraging the potential:
> Afaict all uses of SwapSources now go away. Hence SwapSources itself should
> go away too, together with SWAP_SOURCES.
>
I forgot to delete them. Thanks.
> Beyond that largely (but not only) cosmetic comments:
>
> > @@ -10426,40 +10426,35 @@ build_modrm_byte (void)
> > || i.encoding == encoding_evex));
> > }
> >
> > - if (i.tm.opcode_modifier.vexvvvv == VexVVVV_DST)
> > + switch (i.tm.opcode_modifier.vexvvvv)
> > {
> > - v = dest;
> > - dest-- ;
> > - }
> > - else
> > - {
> > - for (v = source + 1; v < dest; ++v)
> > - if (v != reg_slot)
> > + case VexVVVV_SRC2:
> > + if (source != op)
> > + {
> > + v = source++;
> > break;
> > - if (v >= dest)
> > - v = ~0;
> > - }
> > - if (i.tm.extension_opcode != None)
> > - {
> > - if (dest != source)
> > - v = dest;
> > - dest = ~0;
> > + }
> > + /* For XOP: vpshl* and vpsha*. */
> > + else
> > + /* Fall through. */
> > + case VexVVVV_SRC1:
> > + v = dest - 1;
>
> Indentation here wants to fit the "else", not the "case ...". Even better would
> be to avoid the "else", seeing that there already is a "break" inside the if()'s
> body.
>
Dropped 'else'.
> > --- a/opcodes/i386-opc.h
> > +++ b/opcodes/i386-opc.h
> > @@ -640,10 +640,13 @@ enum
> > Vex,
> > /* How to encode VEX.vvvv:
> > 0: VEX.vvvv must be 1111b.
> > - 1: VEX.vvvv encodes one of the src register operands.
> > - 2: VEX.vvvv encodes the dest register operand.
> > + 1: VEX.vvvv encodes the src1 register operand.
> > + 2: VEX.vvvv encodes the src2 register operand.
> > + 3: VEX.vvvv encodes the dest register operand.
> > */
> > -#define VexVVVV_DST 2
> > +#define VexVVVV_SRC1 1
> > +#define VexVVVV_SRC2 2
> > +#define VexVVVV_DST 3
> > VexVVVV,
>
> While I'm not overly fussed on the names used here, ...
>
> > --- a/opcodes/i386-opc.tbl
> > +++ b/opcodes/i386-opc.tbl
> > @@ -141,7 +141,9 @@
> >
> > #define Disp8ShiftVL Disp8MemShift=DISP8_SHIFT_VL
> >
> > -#define DstVVVV VexVVVV=VexVVVV_DST
> > +#define VexVVVV_Src1 VexVVVV=VexVVVV_SRC1 #define VexVVVV_Src2
> > +VexVVVV=VexVVVV_SRC2 #define VexVVVV_Dst VexVVVV=VexVVVV_DST
>
> ... I am here, due to the line length issues we already have. Please can you
> keep DstVVVV as a name (thus reducing the churn on the table below) and
> add Src1VVVV and Src2VVVV, all being 4 characters shorter than what you
> presently have?
>
Src1VVVV, Src2VVVV and DstVVVV are better.
> > @@ -1000,13 +1002,13 @@ pause, 0xf390, i186, NoSuf, {}
> >
> > // MMX/SSE2 instructions.
> >
> > -<mmx:cpu:pfx:attr:reg:mem, +
> > - $avx:AVX:66:Vex128|VexVVVV|VexW0|SSE2AVX:RegXMM:Xmmword, +
> > - $sse:SSE2:66::RegXMM:Xmmword, +
> > - $mmx:MMX:::RegMMX:Qword>
> > +<mmx:cpu:pfx:attr:scal:reg:mem, +
> > +
> $avx:AVX:66:Vex128|VexVVVV_Src1|VexW0|SSE2AVX:Vex128|VexVVVV_Dst
> |VexW0|SSE2AVX:RegXMM:Xmmword, +
> > + $sse:SSE2:66:::RegXMM:Xmmword, +
> > + $mmx:MMX::::RegMMX:Qword>
> >
> > <sse2:cpu:attr:scal:vvvv, +
> > - $avx:AVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVV, +
> > +
> $avx:AVX:Vex128|VexW0|SSE2AVX:VexLIG|VexW0|SSE2AVX:VexVVVV_Src1,
> > + +
> > $sse:SSE2:::>
> >
> > <bw:opc:vexw:elem:kcpu:kpfx:cpubmi, + @@ -1058,7 +1060,7 @@
> > pmulhw<mmx>, 0x<mmx:pfx>0fe5, <mmx:cpu>,
> Modrm|<mmx:attr>|C|NoSuf, {
> > <mmx:reg>|< pmullw<mmx>, 0x<mmx:pfx>0fd5, <mmx:cpu>,
> > Modrm|<mmx:attr>|C|NoSuf,
> { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex,
> > <mmx:reg> } por<mmx>, 0x<mmx:pfx>0feb, <mmx:cpu>,
> > Modrm|<mmx:attr>|C|NoSuf,
> { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex,
> > <mmx:reg> } psllw<mmx>, 0x<mmx:pfx>0ff1, <mmx:cpu>,
> > Modrm|<mmx:attr>|NoSuf,
> { <mmx:reg>|<mmx:mem>|Unspecified|BaseIndex,
> > <mmx:reg> } -psllw<mmx>, 0x<mmx:pfx>0f71/6, <mmx:cpu>,
> > Modrm|<mmx:attr>|NoSuf, { Imm8, <mmx:reg> }
> > +psllw<mmx>, 0x<mmx:pfx>0f71/6, <mmx:cpu>,
> Modrm|<mmx:scal>|NoSuf, {
> > +Imm8, <mmx:reg> }
>
> This is not a scalar instruction, hence "scal" as a template parameter name is
> misleading. It's not really clear to me anyway why this needs fiddling with -
> there was no SwapSources here, and none of its siblings are being touched
> either.
>
'psllw' has an extended opcode and two non-immediate operands, to delete the corresponding code below.
- if (i.tm.extension_opcode != None)
- {
- if (dest != source)
- v = dest;
- dest = ~0;
- }
> To help recognizing such anomalies (possible problems), could I talk you into
> splitting the patch in two pieces? First a purely mechanical one introducing
> (perhaps simply as an alias of VexVVVV) / using Src1VVVV wherever it is
> meant to be used. Then the remaining changes, with a much smaller diff on
> the actual opcode templates, in the 2nd one.
>
How about splitting into 3 patches:
1. Introduce VexVVVV , Src1VVVV and Src2VVVV.
2. Replace SwapSources.
3. Replace extension_opcode part.
Thanks,
Lili.
- Previous message (by thread): [PATCH] x86: Optimize the encoder of the vvvv register
- Next message (by thread): [PATCH] x86: Optimize the encoder of the vvvv register
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Binutils mailing list