[PATCH v2] Support Intel AVX10.2 BF16 instructions

Jiang, Haochen haochen.jiang@intel.com
Thu Oct 31 07:54:08 GMT 2024
> From: Jiang, Haochen
> Sent: Wednesday, October 30, 2024 1:54 PM
> 
> > > From: Jan Beulich <jbeulich@suse.com>
> > > Sent: Friday, October 18, 2024 9:07 PM
> > >
> > > While this matches the spec, I still wonder: Not VSCALEFNEPBF16? I
> > > guess what I'm struggling with is the pattern when the NE infix would be added.
> > > My best present guess is that it means "no embedded rounding", and
> > > is intended to be used when the counterpart PS/PD/PH insns would
> > > support {er}. The issue would then extend to at least
> > > VMINMAXNEPBF16, which imo wants to be VMINMAXPBF16 (matching
> > > V{MIN,MAX}PBF16) and VRNDSCALENEPBF16 (ought to be
> > > VRNDSCALEPBF16); I may have overlooked others.
> > >
> > > While this matches the spec, I further wonder: No VUCOMSBF16?
> >
> > Let me have a quick check with my colleagues for these opens.
> >
> > The NE does look strange here since I have similar understanding on that.
> > BTW, NE should be "nearest even" here, which means a default rounding.
> >
> 
> An update on that:
> 
> The check is not that quick as I expected since the original owner is impacted
> by layoff.
> 
> The current clue I get in the puzzle of BF16 makes me more convinced that
> your and my understanding on NE are correct. I suppose SCALEF should have
> NE and RNDSCALE should not have NE unless there is some info I did not find.
> MINMAX for BF16 might not have NE also. Let me push that forward to get
> the final decision or explanation on that.
> 
> For VUCOMSBF16, I did not get any clue currently.

This is the info I get from my colleagues:

For VSCALEFPBF16, the reason why it doesn't include NE in mnemonic is that it
Is actually not a NE inst. It is always exact since underflow is FTZed.

However, the confusion here is that VSCALEFPD/S/H got this: "The overflow and
underflow responses are dependent on the rounding mode (for IEEE-compliant
rounding), as well as on other settings in MXCSR (exception mask bits), and on
the SAE bit." I suppose it is not the normal understanding on rounding. This has
made others got the expression that SCALEF insts got different roundings,
causing the mnemonic of VSCALEFPBF16 confusing. The discussion on whether
to add NE is still ongoing.

For VRNDSCALENEPBF16, the NE should be kept after I read the SDM. The original
VRNDSCALEPS/D/H actually uses imm8[1:0] as commonly understanding rounding
control, 00 as round nearest ever, 01 as round down, 10 as round up and 11 as
truncate. This also reflects in how we emulate in SDM introduction on this inst, for
PD version, we got:

ROUND(x) = 2^(-M)*Round_to_INT(x*2^M, round_ctrl),
round_ctrl = imm[3:0];
M=imm[7:4];

However, on BF16, the rounding is always RNE, you could see the rounding is restricted
to RNE in emulation:

ROUND(x) = 2^(-M)*Round_to_INT(x*2^M, RNE), M=imm[7:4];

That is why NE appears in mnemonic.

Also in the discussion, maybe best way would be to explicitly state globally in the doc
that all BF16 ISA obey the following rules: Disregard MXCSR (i.e. not reading it, neither
updating it), implicit SAE, DAZ, FTZ and RNE(when not exact).

For VUCOMSBF16, it seems like a miss for now. Even if we want to add that, it is hard to
get into AVX10.2.

Thx,
Haochen


More information about the Binutils mailing list