[PATCH] Support Intel AVX10.2 convert instructions

Jan Beulich jbeulich@suse.com
Fri Oct 11 11:41:25 GMT 2024
On 09.10.2024 04:51, Haochen Jiang wrote:
> In this patch, we will support AVX10.2 convert instructions. All
> of them are new instruction forms.
> 
> Among all the instructions, vcvtbiasph2[b,h]f8[,s] needs extra care.
> Since Operand 2 could indicate memory size, we do not need suffix
> under ATTmode. However, CheckOperandSize could not be used since the
> dst operand size are the same for EVex128/256. Therefore, the templates
> could not be merged.

You can't merge all three, yes. But as per the last conditional in
operand_type_register_match() and as per, for example, VPSLL* with
the shift value in a register, the X and Y forms ought to be possible
to fold.

> gas/
> 	* testsuite/gas/i386/i386.exp: Add AVX10.2 tests.
> 	* testsuite/gas/i386/x86-64.exp: Ditto.
> 	* testsuite/gas/i386/avx10_2-256-2-intel.d: New.
> 	* testsuite/gas/i386/avx10_2-256-2.d: Ditto.
> 	* testsuite/gas/i386/avx10_2-256-2.s: Ditto.
> 	* testsuite/gas/i386/avx10_2-512-2-intel.d: Ditto.
> 	* testsuite/gas/i386/avx10_2-512-2.d: Ditto.
> 	* testsuite/gas/i386/avx10_2-512-2.s: Ditto.
> 	* testsuite/gas/i386/x86-64-avx10_2-256-2-intel.d: Ditto.
> 	* testsuite/gas/i386/x86-64-avx10_2-256-2.d: Ditto.
> 	* testsuite/gas/i386/x86-64-avx10_2-256-2.s: Ditto.
> 	* testsuite/gas/i386/x86-64-avx10_2-512-2-intel.d: Ditto.
> 	* testsuite/gas/i386/x86-64-avx10_2-512-2.d: Ditto.
> 	* testsuite/gas/i386/x86-64-avx10_2-512-2.s: Ditto.

Instead of -2 base name suffixes, could we better use e.g. -cvt, to be
more descriptive?

> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -3433,4 +3433,32 @@ pop2p, 0x8f/0, APX_F, Modrm|VexW1|EVexMap4|DstVVVV|ImplicitStackOp|No_bSuf|No_wS
>  vdpphps, 0x52, AVX10_2, Modrm|Space0F38|Src1VVVV|VexW0|Masking|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
>  vmpsadbw, 0xf342, AVX10_2, Modrm|Space0F3A|Src1VVVV|VexW0|Masking|Disp8ShiftVL|CheckOperandSize|NoSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
>  
> +vcvt2ps2phx, 0x6667, AVX10_2, Modrm|Space0F38|Masking|Src1VVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf|StaticRounding|SAE, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
> +
> +vcvtbiasph2bf8, 0x74, AVX10_2, Modrm|Space0F38|EVex128|Masking|Src1VVVV|VexW0|Broadcast|Disp8MemShift=4|NoSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
> +vcvtbiasph2bf8, 0x74, AVX10_2, Modrm|Space0F38|EVex256|Masking|Src1VVVV|VexW0|Broadcast|Disp8MemShift=5|NoSuf, { RegYMM|Word|Unspecified|BaseIndex, RegYMM, RegXMM }
> +vcvtbiasph2bf8, 0x74, AVX10_2, Modrm|Space0F38|EVex512|Masking|Src1VVVV|VexW0|Broadcast|Disp8MemShift=6|NoSuf, { RegZMM|Word|Unspecified|BaseIndex, RegZMM, RegYMM }
> +vcvtbiasph2bf8s, 0x74, AVX10_2, Modrm|EVexMap5|EVex128|Masking|Src1VVVV|VexW0|Broadcast|Disp8MemShift=4|NoSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
> +vcvtbiasph2bf8s, 0x74, AVX10_2, Modrm|EVexMap5|EVex256|Masking|Src1VVVV|VexW0|Broadcast|Disp8MemShift=5|NoSuf, { RegYMM|Word|Unspecified|BaseIndex, RegYMM, RegXMM }
> +vcvtbiasph2bf8s, 0x74, AVX10_2, Modrm|EVexMap5|EVex512|Masking|Src1VVVV|VexW0|Broadcast|Disp8MemShift=6|NoSuf, { RegZMM|Word|Unspecified|BaseIndex, RegZMM, RegYMM }
> +vcvtbiasph2hf8, 0x18, AVX10_2, Modrm|EVexMap5|EVex128|Masking|Src1VVVV|VexW0|Broadcast|Disp8MemShift=4|NoSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
> +vcvtbiasph2hf8, 0x18, AVX10_2, Modrm|EVexMap5|EVex256|Masking|Src1VVVV|VexW0|Broadcast|Disp8MemShift=5|NoSuf, { RegYMM|Word|Unspecified|BaseIndex, RegYMM, RegXMM }
> +vcvtbiasph2hf8, 0x18, AVX10_2, Modrm|EVexMap5|EVex512|Masking|Src1VVVV|VexW0|Broadcast|Disp8MemShift=6|NoSuf, { RegZMM|Word|Unspecified|BaseIndex, RegZMM, RegYMM }
> +vcvtbiasph2hf8s, 0x1b, AVX10_2, Modrm|EVexMap5|EVex128|Masking|Src1VVVV|VexW0|Broadcast|Disp8MemShift=4|NoSuf, { RegXMM|Word|Unspecified|BaseIndex, RegXMM, RegXMM }
> +vcvtbiasph2hf8s, 0x1b, AVX10_2, Modrm|EVexMap5|EVex256|Masking|Src1VVVV|VexW0|Broadcast|Disp8MemShift=5|NoSuf, { RegYMM|Word|Unspecified|BaseIndex, RegYMM, RegXMM }
> +vcvtbiasph2hf8s, 0x1b, AVX10_2, Modrm|EVexMap5|EVex512|Masking|Src1VVVV|VexW0|Broadcast|Disp8MemShift=6|NoSuf, { RegZMM|Word|Unspecified|BaseIndex, RegZMM, RegYMM }
> +
> +vcvtne2ph2bf8, 0xf274, AVX10_2, Modrm|Space0F38|Masking|Src1VVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Word|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
> +vcvtne2ph2bf8s, 0xf274, AVX10_2, Modrm|EVexMap5|Masking|Src1VVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Word|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
> +vcvtne2ph2hf8, 0xf218, AVX10_2, Modrm|EVexMap5|Masking|Src1VVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Word|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
> +vcvtne2ph2hf8s, 0xf21b, AVX10_2, Modrm|EVexMap5|Masking|Src1VVVV|VexW0|Broadcast|Disp8ShiftVL|CheckOperandSize|NoSuf, { RegXMM|RegYMM|RegZMM|Word|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
> +vcvtneph2bf8<Exy>, 0xf374, AVX10_2, Modrm|Space0F38|<Exy:attr>|Masking|VexW0|Broadcast|NoSuf, { <Exy:src>|Word, <Exy:dst> }
> +vcvtneph2bf8s<Exy>, 0xf374, AVX10_2, Modrm|EVexMap5|<Exy:attr>|Masking|VexW0|Broadcast|NoSuf, { <Exy:src>|Word, <Exy:dst> }
> +vcvtneph2hf8<Exy>, 0xf318, AVX10_2, Modrm|EVexMap5|<Exy:attr>|Masking|VexW0|Broadcast|NoSuf, { <Exy:src>|Word, <Exy:dst> }
> +vcvtneph2hf8s<Exy>, 0xf31b, AVX10_2, Modrm|EVexMap5|<Exy:attr>|Masking|VexW0|Broadcast|NoSuf, { <Exy:src>|Word, <Exy:dst> }

With the exception of vcvt2ps2phx at the top, I think all of the above would benefit
from templatizing - the 4 forms are very regular: There always is ph2bf8, ph2bf8s,
ph2hf8, and ph2hf8s, uniformly distributed across major opcodes / encoding spaces.
Untested:

<cvt8:opc:spc, +
    bf8:74:Space0F38, +
    bf8s:74:EVexMap5, +
    hf8:18:EVexMap5, +
    hf8s:1b:EVexMap5>

Jan


More information about the Binutils mailing list