[PATCH v2] Support Intel AVX10.2 BF16 instructions
Jiang, Haochen
haochen.jiang@intel.com
Thu Oct 31 07:54:08 GMT 2024
More information about the Binutils mailing list
Thu Oct 31 07:54:08 GMT 2024
- Previous message (by thread): [PATCH v2] Support Intel AVX10.2 BF16 instructions
- Next message (by thread): [PATCH v2] Support Intel AVX10.2 BF16 instructions
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> From: Jiang, Haochen > Sent: Wednesday, October 30, 2024 1:54 PM > > > > From: Jan Beulich <jbeulich@suse.com> > > > Sent: Friday, October 18, 2024 9:07 PM > > > > > > While this matches the spec, I still wonder: Not VSCALEFNEPBF16? I > > > guess what I'm struggling with is the pattern when the NE infix would be added. > > > My best present guess is that it means "no embedded rounding", and > > > is intended to be used when the counterpart PS/PD/PH insns would > > > support {er}. The issue would then extend to at least > > > VMINMAXNEPBF16, which imo wants to be VMINMAXPBF16 (matching > > > V{MIN,MAX}PBF16) and VRNDSCALENEPBF16 (ought to be > > > VRNDSCALEPBF16); I may have overlooked others. > > > > > > While this matches the spec, I further wonder: No VUCOMSBF16? > > > > Let me have a quick check with my colleagues for these opens. > > > > The NE does look strange here since I have similar understanding on that. > > BTW, NE should be "nearest even" here, which means a default rounding. > > > > An update on that: > > The check is not that quick as I expected since the original owner is impacted > by layoff. > > The current clue I get in the puzzle of BF16 makes me more convinced that > your and my understanding on NE are correct. I suppose SCALEF should have > NE and RNDSCALE should not have NE unless there is some info I did not find. > MINMAX for BF16 might not have NE also. Let me push that forward to get > the final decision or explanation on that. > > For VUCOMSBF16, I did not get any clue currently. This is the info I get from my colleagues: For VSCALEFPBF16, the reason why it doesn't include NE in mnemonic is that it Is actually not a NE inst. It is always exact since underflow is FTZed. However, the confusion here is that VSCALEFPD/S/H got this: "The overflow and underflow responses are dependent on the rounding mode (for IEEE-compliant rounding), as well as on other settings in MXCSR (exception mask bits), and on the SAE bit." I suppose it is not the normal understanding on rounding. This has made others got the expression that SCALEF insts got different roundings, causing the mnemonic of VSCALEFPBF16 confusing. The discussion on whether to add NE is still ongoing. For VRNDSCALENEPBF16, the NE should be kept after I read the SDM. The original VRNDSCALEPS/D/H actually uses imm8[1:0] as commonly understanding rounding control, 00 as round nearest ever, 01 as round down, 10 as round up and 11 as truncate. This also reflects in how we emulate in SDM introduction on this inst, for PD version, we got: ROUND(x) = 2^(-M)*Round_to_INT(x*2^M, round_ctrl), round_ctrl = imm[3:0]; M=imm[7:4]; However, on BF16, the rounding is always RNE, you could see the rounding is restricted to RNE in emulation: ROUND(x) = 2^(-M)*Round_to_INT(x*2^M, RNE), M=imm[7:4]; That is why NE appears in mnemonic. Also in the discussion, maybe best way would be to explicitly state globally in the doc that all BF16 ISA obey the following rules: Disregard MXCSR (i.e. not reading it, neither updating it), implicit SAE, DAZ, FTZ and RNE(when not exact). For VUCOMSBF16, it seems like a miss for now. Even if we want to add that, it is hard to get into AVX10.2. Thx, Haochen
- Previous message (by thread): [PATCH v2] Support Intel AVX10.2 BF16 instructions
- Next message (by thread): [PATCH v2] Support Intel AVX10.2 BF16 instructions
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Binutils mailing list