From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from zen.linaro.local ([81.128.185.34]) by smtp.gmail.com with ESMTPSA id t132-v6sm41399wme.40.2018.05.03.12.41.30 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 03 May 2018 12:41:30 -0700 (PDT) Received: from zen (localhost [127.0.0.1]) by zen.linaro.local (Postfix) with ESMTPS id D8DD13E00FC; Thu, 3 May 2018 20:41:29 +0100 (BST) References: <20180502154344.10585-1-alex.bennee@linaro.org> <20180502154344.10585-3-alex.bennee@linaro.org> User-agent: mu4e 1.1.0; emacs 26.1 From: Alex =?utf-8?Q?Benn=C3=A9e?= To: Peter Maydell Cc: Richard Henderson , qemu-arm , QEMU Developers , Aurelien Jarno Subject: Re: [PATCH v2 2/3] fpu/softfloat: support ARM Alternative half-precision In-reply-to: Date: Thu, 03 May 2018 20:41:29 +0100 Message-ID: <871sesa5na.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-TUID: CIvnVfCToNEq Peter Maydell writes: > On 2 May 2018 at 16:43, Alex Benn=C3=A9e wrote: >> For float16 ARM supports an alternative half-precision format which >> sacrifices the ability to represent NaN/Inf in return for a higher >> dynamic range. To support this I've added an additional >> FloatFmt (float16_params_ahp). >> >> The new FloatFmt flag (arm_althp) is then used to modify the behaviour >> of canonicalize and round_canonical with respect to representation and >> exception raising. >> >> Finally the float16_to_floatN and floatN_to_float16 conversion >> routines select the new alternative FloatFmt when !ieee. >> >> Signed-off-by: Alex Benn=C3=A9e >> --- >> fpu/softfloat.c | 97 +++++++++++++++++++++++++++++++++++++------------ >> 1 file changed, 74 insertions(+), 23 deletions(-) > > I found some corner cases where this patchset introduces > regressions; details below... They're not all althp related > but I started composing this email in reply to patch 2/3 and > don't want to try to move it to replying to the cover letter now :-) > > > (1) Here's an example of a wrong 32->16 conversion in alt-hp mode: > for FCVT h5, s0 where s0 is an SNaN, then your code gives 0x7E00 > when it should give 0x0, because float16a_round_pack_canonical() > tries to return a NaN, which doesn't exist in alt-HP. > > In the Arm ARM pseudocode this case is handled by the > FPConvert pseudocode, which treats alt_hp specially > when the input is a NaN. On that analogy I put this into > float_to_float(), which seems to have the desired effect: > > diff --git a/fpu/softfloat.c b/fpu/softfloat.c > index 25a331158f..1cc368175d 100644 > --- a/fpu/softfloat.c > +++ b/fpu/softfloat.c > @@ -1256,6 +1256,17 @@ static FloatParts float_to_float(FloatParts a, > s->float_exception_flags |=3D float_flag_invalid; > } > > + if (dstf->arm_althp) { > + /* There is no NaN in the destination format: raise Invalid > + * and return a zero with the sign of the input NaN. > + */ > + s->float_exception_flags |=3D float_flag_invalid; > + a.cls =3D float_class_zero; > + a.frac =3D 0; > + a.exp =3D 0; > + return a; > + } > + Doh. The previous version had handling for this in float_to_float but it was clear on my test case and I thought I'd handled it all in the unpack/canonicalize step. > if (s->default_nan_mode) { > a.cls =3D float_class_dnan; > return a; > > (2) You're also failing to set the Inexact flag for cases like > fcvt h1, s0 where s0 is 0x33000000 > which should return result of 0, flags Inexact | Underflow, > but is after this patchset returning just Underflow. More cases for the test case ;-) > > I think this is because you're not dealing with the > odd handling of flush-to-zero for half-precision: > in the v8A Arm ARM pseudocode, float-to-float conversion > uses the FPRoundCV() function, which squashes FZ16 to 0, > and FPRoundBase() looks at fpcr.FZ for 32 and 64 bit floats > but fpcr.FZ16 for 16 bit floats. (FZ16 exists only with the > halfprec extension, and effectively applies only for the > data processing and int<->fp conversions -- see FPRound().) > > In our old code we handled this implicitly by having > roundAndPackFloat16() not check status->flush_to_zero the way > that roundAndPackFloat32/64 did. Now you've combined them all > into one code path you need to do some special casing, I think. > This change fixes things for the fcvt case, but (a) is a > dubious hack and (b) you'll want to do something different > to handle FZ16 for actual arithmetic operations on halfprec. We handle FZ16 by passing a different float_status for halfprec. But I guess the fcvt helpers can do the save/squash/restore dance. > If architectures other than Arm use halfprec (MIPS seems to) > then we should check what their semantics are to see if they > match Arm. There are helpers but I couldn't see them actually being called. I was half tempted to delete the helpers and rationalise the softfloat API convention but decided against it to avoid churn. > > --- a/fpu/softfloat.c > +++ b/fpu/softfloat.c > @@ -453,7 +453,7 @@ static FloatParts round_canonical(FloatParts p, > float_status *s, > flags |=3D float_flag_inexact; > } > } > - } else if (s->flush_to_zero) { > + } else if (s->flush_to_zero && parm->exp_size !=3D 5) { > flags |=3D float_flag_output_denormal; > p.cls =3D float_class_zero; > goto do_zero; > > (3) Here's a NaN case we get wrong now: 64 to IEEE-16 conversion, > input is 0x7ff0000000000001 (an SNaN), we produce > 0x7c00 (infinity) but should produce 0x7e00 (a QNaN). OK. > > This is because this code in float16a_round_pack_canonical(): > > case float_class_msnan: > return float16_maybe_silence_nan(float16_pack_raw(p), s); > > doesn't consider the possibility that float16_pack_raw() > ends up with something that's not a NaN. In this case > because the float-to-float conversion has thrown away the > bottom bits of the double's mantissa, we have p.frac =3D=3D 0, > and float16_pack_raw() gives 0x7c00, which is an infinity, > not a NaN. So when float16_maybe_silence_nan() calls > float16_is_signaling_nan() on it it returns false and then > we don't change the SNaN bit. > > The code as of this patch seems to be a bit confused: > it does part of the conversion of NaNs from one format > to the other in float_to_float() (which is where it's > fiddling with the frac bits) and part of it in > the round_canonical() case (where it then messes > about with quietening the NaN). In an ideal world > this would all be punted out to the softfloat-specialize > code to convert with access to the full details of the > input number, because it's impdef how NaN conversion handles > the fraction bits. Arm happens to choose to use the > most significant bits of the fraction field, but there's > no theoretical reason why you couldn't have an > implementation that wanted to preserve the least > significant bits, for instance. > > Note also that we currently have workarounds at the target/arm > level for the softfloat code not quietening input NaNs for > fp-to-fp conversion: see the uses of float*_maybe_silence_nan() > after float*_to_float* calls in target/arm/helper.c. > If the softfloat code is now going to get these correct then > we can drop those. HPPA, MIPS, RISCV and S390x have similar > workarounds also. Overall, the maybe_silence_nan function > was a dubious workaround for not having been able to do > the NaN handling when we had a fully unpacked value, and > perhaps we can minimise its use or even get rid of it... > (target/i386 notably does not do this, we should check how > SSE and x87 handle NaNs in fp conversions first.) I guess it is time to expose some of the details for the unpacked float handling to specialize so its not an after the fact hack. > > thanks > -- PMM -- Alex Benn=C3=A9e