* [Qemu-devel] [PATCH 0/9] target-ppc: VSX Bug Fixes @ 2014-03-26 20:45 Tom Musta 2014-03-26 20:45 ` [Qemu-devel] [PATCH 1/9] softfloat: Introduce float32_to_uint64_round_to_zero Tom Musta ` (9 more replies) 0 siblings, 10 replies; 15+ messages in thread From: Tom Musta @ 2014-03-26 20:45 UTC (permalink / raw) To: qemu-devel; +Cc: Tom Musta, qemu-ppc This patch series addresses bugs in the recently added VSX instructions. Two general defects are fixed: (1) The VSX Convert to Integer instructions truncate the source floating point number to an integer value and hence should use a round-to-zero rounding algorithm. The existing implementation erroneously uses the rounding mode as set by the FPSCR. This patch was previously published separately (see discussion at http://lists.nongnu.org/archive/html/qemu-devel/2014-03/msg04526.html). (2) There is a pervasive problem with the VSX helpers on Little Endian hosts. The bug can be seen when using a mixture of VSRs in the upper and lower regions of VSR (i.e. a mixture of VSR 0-31, which overlay the Floating Point register array, and VSR 32-64 which overlay the Altivec AVR array). This patch series introduces VSR field accessors that are consistent with the Power ISA notation irrespective of host endianness. The VSX instruction macros are updated to use these accessors. For example, the Power ISA has pictures like this, describing a VSR: 0 32 64 96 127 +----------+----------+----------+----------+ | .word[0] | .word[1] | .word[2] | .word[3] | +----------+----------+----------+----------+ The updated code will now refer to .word[0] as v->VsrW(0), .word[1] as v->VsrW(1), and so on. This change makes the code correct and also more consistent with the ISA document. Tom Musta (9): softfloat: Introduce float32_to_uint64_round_to_zero target-ppc: Bug: VSX Convert to Integer Should Truncate target-ppc: Define Endian-Correct Accessors for VSR Field Acess target-ppc: Correct LE Host Inversion of Lower VSRs target-ppc: Correct Simple VSR LE Host Inversions target-ppc: Correct VSX Scalar Compares target-ppc: Correct VSX FP to FP Conversions target-ppc: Correct VSX FP to Integer Conversion target-ppc: Correct VSX Integer to FP Conversion fpu/softfloat.c | 54 +++++ include/fpu/softfloat.h | 1 + target-ppc/fpu_helper.c | 494 +++++++++++++++++++++++------------------------ 3 files changed, 298 insertions(+), 251 deletions(-) ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 1/9] softfloat: Introduce float32_to_uint64_round_to_zero 2014-03-26 20:45 [Qemu-devel] [PATCH 0/9] target-ppc: VSX Bug Fixes Tom Musta @ 2014-03-26 20:45 ` Tom Musta 2014-03-31 17:26 ` [Qemu-devel] [Qemu-ppc] " Alexander Graf 2014-03-31 17:48 ` [Qemu-devel] " Peter Maydell 2014-03-26 20:45 ` [Qemu-devel] [PATCH 2/9] target-ppc: Bug: VSX Convert to Integer Should Truncate Tom Musta ` (8 subsequent siblings) 9 siblings, 2 replies; 15+ messages in thread From: Tom Musta @ 2014-03-26 20:45 UTC (permalink / raw) To: qemu-devel; +Cc: Tom Musta, qemu-ppc This change adds the float32_to_uint64_round_to_zero function to the softfloat library. This function fills out the complement of float32 to INT round-to-zero conversion rountines, where INT is {int32_t, uint32_t, int64_t, uint64_t}. This contribution can be licensed under either the softfloat-2a or -2b license. Signed-off-by: Tom Musta <tommusta@gmail.com> Tested-by: Tom Musta <tommusta@gmail.com> --- fpu/softfloat.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++ include/fpu/softfloat.h | 1 + 2 files changed, 55 insertions(+), 0 deletions(-) diff --git a/fpu/softfloat.c b/fpu/softfloat.c index 5f02c16..d6df78a 100644 --- a/fpu/softfloat.c +++ b/fpu/softfloat.c @@ -1628,6 +1628,60 @@ uint64 float32_to_uint64(float32 a STATUS_PARAM) /*---------------------------------------------------------------------------- | Returns the result of converting the single-precision floating-point value +| `a' to the 64-bit unsigned integer format. The conversion is +| performed according to the IEC/IEEE Standard for Binary Floating-Point +| Arithmetic, except that the conversion is always rounded toward zero. If +| `a' is a NaN, the largest unsigned integer is returned. Otherwise, if the +| conversion overflows, the largest unsigned integer is returned. If the +| 'a' is negative, the result is rounded and zero is returned; values that do +| not round to zero will raise the inexact flag. +*----------------------------------------------------------------------------*/ + +uint64 float32_to_uint64_round_to_zero(float32 a STATUS_PARAM) +{ + flag aSign; + int_fast16_t aExp, shiftCount; + uint32_t aSig; + uint64_t aSig64; + uint64_t z; + a = float32_squash_input_denormal(a STATUS_VAR); + + aSig = extractFloat32Frac(a); + aExp = extractFloat32Exp(a); + aSign = extractFloat32Sign(a); + if ((aSign) && (aExp > 126)) { + float_raise(float_flag_invalid STATUS_VAR); + if (float32_is_any_nan(a)) { + return LIT64(0xFFFFFFFFFFFFFFFF); + } else { + return 0; + } + } + shiftCount = 0xBE - aExp; + if (aExp) { + aSig |= 0x00800000; + } + if (shiftCount < 0) { + float_raise(float_flag_invalid STATUS_VAR); + return LIT64(0xFFFFFFFFFFFFFFFF); + } else if (aExp <= 0x7E) { + if (aExp | aSig) { + STATUS(float_exception_flags) |= float_flag_inexact; + } + return 0; + } + + aSig64 = aSig; + aSig64 <<= 40; + z = aSig64 >> shiftCount; + if (shiftCount && ((uint64_t)(aSig64 << (-shiftCount & 63)))) { + STATUS(float_exception_flags) |= float_flag_inexact; + } + return z; +} + +/*---------------------------------------------------------------------------- +| Returns the result of converting the single-precision floating-point value | `a' to the 64-bit two's complement integer format. The conversion is | performed according to the IEC/IEEE Standard for Binary Floating-Point | Arithmetic, except that the conversion is always rounded toward zero. If diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h index db878c1..4b3090c 100644 --- a/include/fpu/softfloat.h +++ b/include/fpu/softfloat.h @@ -342,6 +342,7 @@ uint32 float32_to_uint32( float32 STATUS_PARAM ); uint32 float32_to_uint32_round_to_zero( float32 STATUS_PARAM ); int64 float32_to_int64( float32 STATUS_PARAM ); uint64 float32_to_uint64(float32 STATUS_PARAM); +uint64 float32_to_uint64_round_to_zero(float32 STATUS_PARAM); int64 float32_to_int64_round_to_zero( float32 STATUS_PARAM ); float64 float32_to_float64( float32 STATUS_PARAM ); floatx80 float32_to_floatx80( float32 STATUS_PARAM ); -- 1.7.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] [PATCH 1/9] softfloat: Introduce float32_to_uint64_round_to_zero 2014-03-26 20:45 ` [Qemu-devel] [PATCH 1/9] softfloat: Introduce float32_to_uint64_round_to_zero Tom Musta @ 2014-03-31 17:26 ` Alexander Graf 2014-03-31 17:48 ` [Qemu-devel] " Peter Maydell 1 sibling, 0 replies; 15+ messages in thread From: Alexander Graf @ 2014-03-31 17:26 UTC (permalink / raw) To: Tom Musta; +Cc: Peter Maydell, qemu-ppc, qemu-devel On 03/26/2014 09:45 PM, Tom Musta wrote: > This change adds the float32_to_uint64_round_to_zero function to the softfloat > library. This function fills out the complement of float32 to INT round-to-zero > conversion rountines, where INT is {int32_t, uint32_t, int64_t, uint64_t}. > > This contribution can be licensed under either the softfloat-2a or -2b > license. > > Signed-off-by: Tom Musta <tommusta@gmail.com> > Tested-by: Tom Musta <tommusta@gmail.com> Peter, could you please ack? This is a requirement for a bug fix series that I'd like to see in 2.0. Alex > --- > fpu/softfloat.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++ > include/fpu/softfloat.h | 1 + > 2 files changed, 55 insertions(+), 0 deletions(-) > > diff --git a/fpu/softfloat.c b/fpu/softfloat.c > index 5f02c16..d6df78a 100644 > --- a/fpu/softfloat.c > +++ b/fpu/softfloat.c > @@ -1628,6 +1628,60 @@ uint64 float32_to_uint64(float32 a STATUS_PARAM) > > /*---------------------------------------------------------------------------- > | Returns the result of converting the single-precision floating-point value > +| `a' to the 64-bit unsigned integer format. The conversion is > +| performed according to the IEC/IEEE Standard for Binary Floating-Point > +| Arithmetic, except that the conversion is always rounded toward zero. If > +| `a' is a NaN, the largest unsigned integer is returned. Otherwise, if the > +| conversion overflows, the largest unsigned integer is returned. If the > +| 'a' is negative, the result is rounded and zero is returned; values that do > +| not round to zero will raise the inexact flag. > +*----------------------------------------------------------------------------*/ > + > +uint64 float32_to_uint64_round_to_zero(float32 a STATUS_PARAM) > +{ > + flag aSign; > + int_fast16_t aExp, shiftCount; > + uint32_t aSig; > + uint64_t aSig64; > + uint64_t z; > + a = float32_squash_input_denormal(a STATUS_VAR); > + > + aSig = extractFloat32Frac(a); > + aExp = extractFloat32Exp(a); > + aSign = extractFloat32Sign(a); > + if ((aSign) && (aExp > 126)) { > + float_raise(float_flag_invalid STATUS_VAR); > + if (float32_is_any_nan(a)) { > + return LIT64(0xFFFFFFFFFFFFFFFF); > + } else { > + return 0; > + } > + } > + shiftCount = 0xBE - aExp; > + if (aExp) { > + aSig |= 0x00800000; > + } > + if (shiftCount < 0) { > + float_raise(float_flag_invalid STATUS_VAR); > + return LIT64(0xFFFFFFFFFFFFFFFF); > + } else if (aExp <= 0x7E) { > + if (aExp | aSig) { > + STATUS(float_exception_flags) |= float_flag_inexact; > + } > + return 0; > + } > + > + aSig64 = aSig; > + aSig64 <<= 40; > + z = aSig64 >> shiftCount; > + if (shiftCount && ((uint64_t)(aSig64 << (-shiftCount & 63)))) { > + STATUS(float_exception_flags) |= float_flag_inexact; > + } > + return z; > +} > + > +/*---------------------------------------------------------------------------- > +| Returns the result of converting the single-precision floating-point value > | `a' to the 64-bit two's complement integer format. The conversion is > | performed according to the IEC/IEEE Standard for Binary Floating-Point > | Arithmetic, except that the conversion is always rounded toward zero. If > diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h > index db878c1..4b3090c 100644 > --- a/include/fpu/softfloat.h > +++ b/include/fpu/softfloat.h > @@ -342,6 +342,7 @@ uint32 float32_to_uint32( float32 STATUS_PARAM ); > uint32 float32_to_uint32_round_to_zero( float32 STATUS_PARAM ); > int64 float32_to_int64( float32 STATUS_PARAM ); > uint64 float32_to_uint64(float32 STATUS_PARAM); > +uint64 float32_to_uint64_round_to_zero(float32 STATUS_PARAM); > int64 float32_to_int64_round_to_zero( float32 STATUS_PARAM ); > float64 float32_to_float64( float32 STATUS_PARAM ); > floatx80 float32_to_floatx80( float32 STATUS_PARAM ); ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 1/9] softfloat: Introduce float32_to_uint64_round_to_zero 2014-03-26 20:45 ` [Qemu-devel] [PATCH 1/9] softfloat: Introduce float32_to_uint64_round_to_zero Tom Musta 2014-03-31 17:26 ` [Qemu-devel] [Qemu-ppc] " Alexander Graf @ 2014-03-31 17:48 ` Peter Maydell 2014-03-31 18:07 ` Tom Musta 1 sibling, 1 reply; 15+ messages in thread From: Peter Maydell @ 2014-03-31 17:48 UTC (permalink / raw) To: Tom Musta; +Cc: qemu-ppc@nongnu.org, QEMU Developers On 26 March 2014 20:45, Tom Musta <tommusta@gmail.com> wrote: > This change adds the float32_to_uint64_round_to_zero function to the softfloat > library. This function fills out the complement of float32 to INT round-to-zero > conversion rountines, where INT is {int32_t, uint32_t, int64_t, uint64_t}. > > This contribution can be licensed under either the softfloat-2a or -2b > license. > > Signed-off-by: Tom Musta <tommusta@gmail.com> > Tested-by: Tom Musta <tommusta@gmail.com> > --- > fpu/softfloat.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++ > include/fpu/softfloat.h | 1 + > 2 files changed, 55 insertions(+), 0 deletions(-) > > diff --git a/fpu/softfloat.c b/fpu/softfloat.c > index 5f02c16..d6df78a 100644 > --- a/fpu/softfloat.c > +++ b/fpu/softfloat.c > @@ -1628,6 +1628,60 @@ uint64 float32_to_uint64(float32 a STATUS_PARAM) > > /*---------------------------------------------------------------------------- > | Returns the result of converting the single-precision floating-point value > +| `a' to the 64-bit unsigned integer format. The conversion is > +| performed according to the IEC/IEEE Standard for Binary Floating-Point > +| Arithmetic, except that the conversion is always rounded toward zero. If > +| `a' is a NaN, the largest unsigned integer is returned. Otherwise, if the > +| conversion overflows, the largest unsigned integer is returned. If the > +| 'a' is negative, the result is rounded and zero is returned; values that do > +| not round to zero will raise the inexact flag. > +*----------------------------------------------------------------------------*/ > + > +uint64 float32_to_uint64_round_to_zero(float32 a STATUS_PARAM) > +{ > + flag aSign; > + int_fast16_t aExp, shiftCount; > + uint32_t aSig; > + uint64_t aSig64; > + uint64_t z; So, float64_to_uint64_round_to_zero() works by temporarily fiddling with the rounding mode and then calling float64_to_uint64(). Is there a reason for doing this function like this rather than in the same way? thanks -- PMM ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 1/9] softfloat: Introduce float32_to_uint64_round_to_zero 2014-03-31 17:48 ` [Qemu-devel] " Peter Maydell @ 2014-03-31 18:07 ` Tom Musta 2014-03-31 18:12 ` Peter Maydell 0 siblings, 1 reply; 15+ messages in thread From: Tom Musta @ 2014-03-31 18:07 UTC (permalink / raw) To: Peter Maydell; +Cc: qemu-ppc@nongnu.org, QEMU Developers On 3/31/2014 12:48 PM, Peter Maydell wrote: > On 26 March 2014 20:45, Tom Musta <tommusta@gmail.com> wrote: >> This change adds the float32_to_uint64_round_to_zero function to the softfloat >> library. This function fills out the complement of float32 to INT round-to-zero >> conversion rountines, where INT is {int32_t, uint32_t, int64_t, uint64_t}. >> >> This contribution can be licensed under either the softfloat-2a or -2b >> license. >> >> Signed-off-by: Tom Musta <tommusta@gmail.com> >> Tested-by: Tom Musta <tommusta@gmail.com> >> --- >> fpu/softfloat.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++ >> include/fpu/softfloat.h | 1 + >> 2 files changed, 55 insertions(+), 0 deletions(-) >> >> diff --git a/fpu/softfloat.c b/fpu/softfloat.c >> index 5f02c16..d6df78a 100644 >> --- a/fpu/softfloat.c >> +++ b/fpu/softfloat.c >> @@ -1628,6 +1628,60 @@ uint64 float32_to_uint64(float32 a STATUS_PARAM) >> >> /*---------------------------------------------------------------------------- >> | Returns the result of converting the single-precision floating-point value >> +| `a' to the 64-bit unsigned integer format. The conversion is >> +| performed according to the IEC/IEEE Standard for Binary Floating-Point >> +| Arithmetic, except that the conversion is always rounded toward zero. If >> +| `a' is a NaN, the largest unsigned integer is returned. Otherwise, if the >> +| conversion overflows, the largest unsigned integer is returned. If the >> +| 'a' is negative, the result is rounded and zero is returned; values that do >> +| not round to zero will raise the inexact flag. >> +*----------------------------------------------------------------------------*/ >> + >> +uint64 float32_to_uint64_round_to_zero(float32 a STATUS_PARAM) >> +{ >> + flag aSign; >> + int_fast16_t aExp, shiftCount; >> + uint32_t aSig; >> + uint64_t aSig64; >> + uint64_t z; > > So, float64_to_uint64_round_to_zero() works by temporarily > fiddling with the rounding mode and then calling > float64_to_uint64(). Is there a reason for doing this > function like this rather than in the same way? > > thanks > -- PMM > True. But not all of the *_round_to_zero() routines do this, e.g. float32_to_int64_round_to_zero(). So no matter what I do, it is inconsistent with something. Do you prefer the fiddle-and-reuse approach? (I think I do, actually). If so, I will respin the patch. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 1/9] softfloat: Introduce float32_to_uint64_round_to_zero 2014-03-31 18:07 ` Tom Musta @ 2014-03-31 18:12 ` Peter Maydell 0 siblings, 0 replies; 15+ messages in thread From: Peter Maydell @ 2014-03-31 18:12 UTC (permalink / raw) To: Tom Musta; +Cc: qemu-ppc@nongnu.org, QEMU Developers On 31 March 2014 19:07, Tom Musta <tommusta@gmail.com> wrote: > On 3/31/2014 12:48 PM, Peter Maydell wrote: >> So, float64_to_uint64_round_to_zero() works by temporarily >> fiddling with the rounding mode and then calling >> float64_to_uint64(). Is there a reason for doing this >> function like this rather than in the same way? > True. But not all of the *_round_to_zero() routines do this, e.g. > float32_to_int64_round_to_zero(). So no matter what I do, it is > inconsistent with something. > > Do you prefer the fiddle-and-reuse approach? (I think I do, actually). > If so, I will respin the patch. I think that would be easier to review for correctness :-) thanks -- PMM ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 2/9] target-ppc: Bug: VSX Convert to Integer Should Truncate 2014-03-26 20:45 [Qemu-devel] [PATCH 0/9] target-ppc: VSX Bug Fixes Tom Musta 2014-03-26 20:45 ` [Qemu-devel] [PATCH 1/9] softfloat: Introduce float32_to_uint64_round_to_zero Tom Musta @ 2014-03-26 20:45 ` Tom Musta 2014-03-26 20:45 ` [Qemu-devel] [PATCH 3/9] target-ppc: Define Endian-Correct Accessors for VSR Field Acess Tom Musta ` (7 subsequent siblings) 9 siblings, 0 replies; 15+ messages in thread From: Tom Musta @ 2014-03-26 20:45 UTC (permalink / raw) To: qemu-devel; +Cc: Tom Musta, qemu-ppc The various VSX Convert to Integer instructions should truncate the floating point number to an integer value, which is equivalent to a round-to-zero rounding mode. The existing VSX floating point to integer conversion helpers are erroneously using the rounding mode set int the PowerPC Floating Point Status and Control Register (FPSCR). This change corrects this defect by using the appropriate float*_to_*_round_to_zero() routines fro the softfloat library. Signed-off-by: Tom Musta <tommusta@gmail.com> Tested-by: Tom Musta <tommusta@gmail.com> V2: Restored rounding mode prior to checking exceptions per Peter Maydell's review. V3: Changed to use float*_to_*_round_to_zero() routines from softfloat instead of directly tweaking and restoring the rounding mode (per Peter's review). This bug was discovered when running wget, which does this: double maxtime; struct timeval tmout; ... tmout.tv_usec = 1000000 * (maxtime - (long) maxtime); The newest PowerPC 64-bit gcc's are now using xscvdpsxds to perform the cast of the double to long. A timeout of 0.95 was erroneously rounding up to 1 and hence computing a negative timeout value. It would be great if we could still get this into 2.0. --- target-ppc/fpu_helper.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c index fd91239..691d572 100644 --- a/target-ppc/fpu_helper.c +++ b/target-ppc/fpu_helper.c @@ -2568,7 +2568,8 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXCVI, 0); \ xt.tfld = rnan; \ } else { \ - xt.tfld = stp##_to_##ttp(xb.sfld, &env->fp_status); \ + xt.tfld = stp##_to_##ttp##_round_to_zero(xb.sfld, \ + &env->fp_status); \ if (env->fp_status.float_exception_flags & float_flag_invalid) { \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXCVI, 0); \ } \ -- 1.7.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 3/9] target-ppc: Define Endian-Correct Accessors for VSR Field Acess 2014-03-26 20:45 [Qemu-devel] [PATCH 0/9] target-ppc: VSX Bug Fixes Tom Musta 2014-03-26 20:45 ` [Qemu-devel] [PATCH 1/9] softfloat: Introduce float32_to_uint64_round_to_zero Tom Musta 2014-03-26 20:45 ` [Qemu-devel] [PATCH 2/9] target-ppc: Bug: VSX Convert to Integer Should Truncate Tom Musta @ 2014-03-26 20:45 ` Tom Musta 2014-03-26 20:45 ` [Qemu-devel] [PATCH 4/9] target-ppc: Correct LE Host Inversion of Lower VSRs Tom Musta ` (6 subsequent siblings) 9 siblings, 0 replies; 15+ messages in thread From: Tom Musta @ 2014-03-26 20:45 UTC (permalink / raw) To: qemu-devel; +Cc: Tom Musta, qemu-ppc This change defines accessors for VSR doubleword and word fields that are correct from a host Endian perspective. This allows code to use the Power ISA indexing numbers in code. For example, the xscvdpsxws instruction has a target VSR that looks like this: 0 32 64 127 +-----------+--------+-----------+-----------+ | undefined | SW | undefined | undefined | +-----------+--------+-----------+-----------+ VSX helper code will use VsrW(1) to access this field. Signed-off-by: Tom Musta <tommusta@gmail.com> Tested-by: Tom Musta <tommusta@gmail.com> --- target-ppc/fpu_helper.c | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c index 691d572..d79aae9 100644 --- a/target-ppc/fpu_helper.c +++ b/target-ppc/fpu_helper.c @@ -1782,6 +1782,14 @@ typedef union _ppc_vsr_t { float64 f64[2]; } ppc_vsr_t; +#if defined(HOST_WORDS_BIGENDIAN) +#define VsrW(i) u32[i] +#define VsrD(i) u64[i] +#else +#define VsrW(i) u32[3-(i)] +#define VsrD(i) u64[1-(i)] +#endif + static void getVSR(int n, ppc_vsr_t *vsr, CPUPPCState *env) { if (n < 32) { -- 1.7.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 4/9] target-ppc: Correct LE Host Inversion of Lower VSRs 2014-03-26 20:45 [Qemu-devel] [PATCH 0/9] target-ppc: VSX Bug Fixes Tom Musta ` (2 preceding siblings ...) 2014-03-26 20:45 ` [Qemu-devel] [PATCH 3/9] target-ppc: Define Endian-Correct Accessors for VSR Field Acess Tom Musta @ 2014-03-26 20:45 ` Tom Musta 2014-03-26 20:45 ` [Qemu-devel] [PATCH 5/9] target-ppc: Correct Simple VSR LE Host Inversions Tom Musta ` (5 subsequent siblings) 9 siblings, 0 replies; 15+ messages in thread From: Tom Musta @ 2014-03-26 20:45 UTC (permalink / raw) To: qemu-devel; +Cc: Tom Musta, qemu-ppc This change properly orders the doublewords of the VSRs 0-31. Because these registers are constructed from separate doublewords, they must be inverted on Little Endian hosts. The inversion is performed both when the VSR is read and when it is written. Signed-off-by: Tom Musta <tommusta@gmail.com> Tested-by: Tom Musta <tommusta@gmail.com> --- target-ppc/fpu_helper.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c index d79aae9..9fc7dd8 100644 --- a/target-ppc/fpu_helper.c +++ b/target-ppc/fpu_helper.c @@ -1793,8 +1793,8 @@ typedef union _ppc_vsr_t { static void getVSR(int n, ppc_vsr_t *vsr, CPUPPCState *env) { if (n < 32) { - vsr->f64[0] = env->fpr[n]; - vsr->u64[1] = env->vsr[n]; + vsr->VsrD(0) = env->fpr[n]; + vsr->VsrD(1) = env->vsr[n]; } else { vsr->u64[0] = env->avr[n-32].u64[0]; vsr->u64[1] = env->avr[n-32].u64[1]; @@ -1804,8 +1804,8 @@ static void getVSR(int n, ppc_vsr_t *vsr, CPUPPCState *env) static void putVSR(int n, ppc_vsr_t *vsr, CPUPPCState *env) { if (n < 32) { - env->fpr[n] = vsr->f64[0]; - env->vsr[n] = vsr->u64[1]; + env->fpr[n] = vsr->VsrD(0); + env->vsr[n] = vsr->VsrD(1); } else { env->avr[n-32].u64[0] = vsr->u64[0]; env->avr[n-32].u64[1] = vsr->u64[1]; -- 1.7.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 5/9] target-ppc: Correct Simple VSR LE Host Inversions 2014-03-26 20:45 [Qemu-devel] [PATCH 0/9] target-ppc: VSX Bug Fixes Tom Musta ` (3 preceding siblings ...) 2014-03-26 20:45 ` [Qemu-devel] [PATCH 4/9] target-ppc: Correct LE Host Inversion of Lower VSRs Tom Musta @ 2014-03-26 20:45 ` Tom Musta 2014-03-26 20:45 ` [Qemu-devel] [PATCH 6/9] target-ppc: Correct VSX Scalar Compares Tom Musta ` (4 subsequent siblings) 9 siblings, 0 replies; 15+ messages in thread From: Tom Musta @ 2014-03-26 20:45 UTC (permalink / raw) To: qemu-devel; +Cc: Tom Musta, qemu-ppc A common pattern in the VSX helper code macros is the use of "x.fld[i]" where "x" is a VSR and "fld" is an argument to a macro ("f64" or "f32" is passed). This is not always correct on LE hosts. This change addresses all instances of this pattern to be "x.fld" where "fld" is: - "VsrD(0)" for scalar instructions accessing 64-bit numbers - "VsrD(i)" for vector instructions accessing 64-bit numbers - "VsrW(i)" for vector instructions accessing 32-bit numbers Signed-off-by: Tom Musta <tommusta@gmail.com> Tested-by: Tom Musta <tommusta@gmail.com> --- Note that there are no instances of this pattern where a scalar instruction accesses a 32-bit number. Note also that it would be correct to use "VsrD(i)" for scalar instructions since the loop index is only ever "0". I have choosen to use "VsrD(0)" instead ... it seems a little clearer. target-ppc/fpu_helper.c | 380 +++++++++++++++++++++++----------------------- 1 files changed, 190 insertions(+), 190 deletions(-) diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c index 9fc7dd8..1c37b30 100644 --- a/target-ppc/fpu_helper.c +++ b/target-ppc/fpu_helper.c @@ -1820,7 +1820,7 @@ static void putVSR(int n, ppc_vsr_t *vsr, CPUPPCState *env) * op - operation (add or sub) * nels - number of elements (1, 2 or 4) * tp - type (float32 or float64) - * fld - vsr_t field (f32 or f64) + * fld - vsr_t field (VsrD(*) or VsrW(*)) * sfprf - set FPRF */ #define VSX_ADD_SUB(name, op, nels, tp, fld, sfprf, r2sp) \ @@ -1837,44 +1837,44 @@ void helper_##name(CPUPPCState *env, uint32_t opcode) \ for (i = 0; i < nels; i++) { \ float_status tstat = env->fp_status; \ set_float_exception_flags(0, &tstat); \ - xt.fld[i] = tp##_##op(xa.fld[i], xb.fld[i], &tstat); \ + xt.fld = tp##_##op(xa.fld, xb.fld, &tstat); \ env->fp_status.float_exception_flags |= tstat.float_exception_flags; \ \ if (unlikely(tstat.float_exception_flags & float_flag_invalid)) { \ - if (tp##_is_infinity(xa.fld[i]) && tp##_is_infinity(xb.fld[i])) {\ + if (tp##_is_infinity(xa.fld) && tp##_is_infinity(xb.fld)) { \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXISI, sfprf); \ - } else if (tp##_is_signaling_nan(xa.fld[i]) || \ - tp##_is_signaling_nan(xb.fld[i])) { \ + } else if (tp##_is_signaling_nan(xa.fld) || \ + tp##_is_signaling_nan(xb.fld)) { \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXSNAN, sfprf); \ } \ } \ \ if (r2sp) { \ - xt.fld[i] = helper_frsp(env, xt.fld[i]); \ + xt.fld = helper_frsp(env, xt.fld); \ } \ \ if (sfprf) { \ - helper_compute_fprf(env, xt.fld[i], sfprf); \ + helper_compute_fprf(env, xt.fld, sfprf); \ } \ } \ putVSR(xT(opcode), &xt, env); \ helper_float_check_status(env); \ } -VSX_ADD_SUB(xsadddp, add, 1, float64, f64, 1, 0) -VSX_ADD_SUB(xsaddsp, add, 1, float64, f64, 1, 1) -VSX_ADD_SUB(xvadddp, add, 2, float64, f64, 0, 0) -VSX_ADD_SUB(xvaddsp, add, 4, float32, f32, 0, 0) -VSX_ADD_SUB(xssubdp, sub, 1, float64, f64, 1, 0) -VSX_ADD_SUB(xssubsp, sub, 1, float64, f64, 1, 1) -VSX_ADD_SUB(xvsubdp, sub, 2, float64, f64, 0, 0) -VSX_ADD_SUB(xvsubsp, sub, 4, float32, f32, 0, 0) +VSX_ADD_SUB(xsadddp, add, 1, float64, VsrD(0), 1, 0) +VSX_ADD_SUB(xsaddsp, add, 1, float64, VsrD(0), 1, 1) +VSX_ADD_SUB(xvadddp, add, 2, float64, VsrD(i), 0, 0) +VSX_ADD_SUB(xvaddsp, add, 4, float32, VsrW(i), 0, 0) +VSX_ADD_SUB(xssubdp, sub, 1, float64, VsrD(0), 1, 0) +VSX_ADD_SUB(xssubsp, sub, 1, float64, VsrD(0), 1, 1) +VSX_ADD_SUB(xvsubdp, sub, 2, float64, VsrD(i), 0, 0) +VSX_ADD_SUB(xvsubsp, sub, 4, float32, VsrW(i), 0, 0) /* VSX_MUL - VSX floating point multiply * op - instruction mnemonic * nels - number of elements (1, 2 or 4) * tp - type (float32 or float64) - * fld - vsr_t field (f32 or f64) + * fld - vsr_t field (VsrD(*) or VsrW(*)) * sfprf - set FPRF */ #define VSX_MUL(op, nels, tp, fld, sfprf, r2sp) \ @@ -1891,25 +1891,25 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ for (i = 0; i < nels; i++) { \ float_status tstat = env->fp_status; \ set_float_exception_flags(0, &tstat); \ - xt.fld[i] = tp##_mul(xa.fld[i], xb.fld[i], &tstat); \ + xt.fld = tp##_mul(xa.fld, xb.fld, &tstat); \ env->fp_status.float_exception_flags |= tstat.float_exception_flags; \ \ if (unlikely(tstat.float_exception_flags & float_flag_invalid)) { \ - if ((tp##_is_infinity(xa.fld[i]) && tp##_is_zero(xb.fld[i])) || \ - (tp##_is_infinity(xb.fld[i]) && tp##_is_zero(xa.fld[i]))) { \ + if ((tp##_is_infinity(xa.fld) && tp##_is_zero(xb.fld)) || \ + (tp##_is_infinity(xb.fld) && tp##_is_zero(xa.fld))) { \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXIMZ, sfprf); \ - } else if (tp##_is_signaling_nan(xa.fld[i]) || \ - tp##_is_signaling_nan(xb.fld[i])) { \ + } else if (tp##_is_signaling_nan(xa.fld) || \ + tp##_is_signaling_nan(xb.fld)) { \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXSNAN, sfprf); \ } \ } \ \ if (r2sp) { \ - xt.fld[i] = helper_frsp(env, xt.fld[i]); \ + xt.fld = helper_frsp(env, xt.fld); \ } \ \ if (sfprf) { \ - helper_compute_fprf(env, xt.fld[i], sfprf); \ + helper_compute_fprf(env, xt.fld, sfprf); \ } \ } \ \ @@ -1917,16 +1917,16 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ helper_float_check_status(env); \ } -VSX_MUL(xsmuldp, 1, float64, f64, 1, 0) -VSX_MUL(xsmulsp, 1, float64, f64, 1, 1) -VSX_MUL(xvmuldp, 2, float64, f64, 0, 0) -VSX_MUL(xvmulsp, 4, float32, f32, 0, 0) +VSX_MUL(xsmuldp, 1, float64, VsrD(0), 1, 0) +VSX_MUL(xsmulsp, 1, float64, VsrD(0), 1, 1) +VSX_MUL(xvmuldp, 2, float64, VsrD(i), 0, 0) +VSX_MUL(xvmulsp, 4, float32, VsrW(i), 0, 0) /* VSX_DIV - VSX floating point divide * op - instruction mnemonic * nels - number of elements (1, 2 or 4) * tp - type (float32 or float64) - * fld - vsr_t field (f32 or f64) + * fld - vsr_t field (VsrD(*) or VsrW(*)) * sfprf - set FPRF */ #define VSX_DIV(op, nels, tp, fld, sfprf, r2sp) \ @@ -1943,27 +1943,27 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ for (i = 0; i < nels; i++) { \ float_status tstat = env->fp_status; \ set_float_exception_flags(0, &tstat); \ - xt.fld[i] = tp##_div(xa.fld[i], xb.fld[i], &tstat); \ + xt.fld = tp##_div(xa.fld, xb.fld, &tstat); \ env->fp_status.float_exception_flags |= tstat.float_exception_flags; \ \ if (unlikely(tstat.float_exception_flags & float_flag_invalid)) { \ - if (tp##_is_infinity(xa.fld[i]) && tp##_is_infinity(xb.fld[i])) { \ + if (tp##_is_infinity(xa.fld) && tp##_is_infinity(xb.fld)) { \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXIDI, sfprf); \ - } else if (tp##_is_zero(xa.fld[i]) && \ - tp##_is_zero(xb.fld[i])) { \ + } else if (tp##_is_zero(xa.fld) && \ + tp##_is_zero(xb.fld)) { \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXZDZ, sfprf); \ - } else if (tp##_is_signaling_nan(xa.fld[i]) || \ - tp##_is_signaling_nan(xb.fld[i])) { \ + } else if (tp##_is_signaling_nan(xa.fld) || \ + tp##_is_signaling_nan(xb.fld)) { \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXSNAN, sfprf); \ } \ } \ \ if (r2sp) { \ - xt.fld[i] = helper_frsp(env, xt.fld[i]); \ + xt.fld = helper_frsp(env, xt.fld); \ } \ \ if (sfprf) { \ - helper_compute_fprf(env, xt.fld[i], sfprf); \ + helper_compute_fprf(env, xt.fld, sfprf); \ } \ } \ \ @@ -1971,16 +1971,16 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ helper_float_check_status(env); \ } -VSX_DIV(xsdivdp, 1, float64, f64, 1, 0) -VSX_DIV(xsdivsp, 1, float64, f64, 1, 1) -VSX_DIV(xvdivdp, 2, float64, f64, 0, 0) -VSX_DIV(xvdivsp, 4, float32, f32, 0, 0) +VSX_DIV(xsdivdp, 1, float64, VsrD(0), 1, 0) +VSX_DIV(xsdivsp, 1, float64, VsrD(0), 1, 1) +VSX_DIV(xvdivdp, 2, float64, VsrD(i), 0, 0) +VSX_DIV(xvdivsp, 4, float32, VsrW(i), 0, 0) /* VSX_RE - VSX floating point reciprocal estimate * op - instruction mnemonic * nels - number of elements (1, 2 or 4) * tp - type (float32 or float64) - * fld - vsr_t field (f32 or f64) + * fld - vsr_t field (VsrD(*) or VsrW(*)) * sfprf - set FPRF */ #define VSX_RE(op, nels, tp, fld, sfprf, r2sp) \ @@ -1994,17 +1994,17 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ helper_reset_fpstatus(env); \ \ for (i = 0; i < nels; i++) { \ - if (unlikely(tp##_is_signaling_nan(xb.fld[i]))) { \ + if (unlikely(tp##_is_signaling_nan(xb.fld))) { \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXSNAN, sfprf); \ } \ - xt.fld[i] = tp##_div(tp##_one, xb.fld[i], &env->fp_status); \ + xt.fld = tp##_div(tp##_one, xb.fld, &env->fp_status); \ \ if (r2sp) { \ - xt.fld[i] = helper_frsp(env, xt.fld[i]); \ + xt.fld = helper_frsp(env, xt.fld); \ } \ \ if (sfprf) { \ - helper_compute_fprf(env, xt.fld[0], sfprf); \ + helper_compute_fprf(env, xt.fld, sfprf); \ } \ } \ \ @@ -2012,16 +2012,16 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ helper_float_check_status(env); \ } -VSX_RE(xsredp, 1, float64, f64, 1, 0) -VSX_RE(xsresp, 1, float64, f64, 1, 1) -VSX_RE(xvredp, 2, float64, f64, 0, 0) -VSX_RE(xvresp, 4, float32, f32, 0, 0) +VSX_RE(xsredp, 1, float64, VsrD(0), 1, 0) +VSX_RE(xsresp, 1, float64, VsrD(0), 1, 1) +VSX_RE(xvredp, 2, float64, VsrD(i), 0, 0) +VSX_RE(xvresp, 4, float32, VsrW(i), 0, 0) /* VSX_SQRT - VSX floating point square root * op - instruction mnemonic * nels - number of elements (1, 2 or 4) * tp - type (float32 or float64) - * fld - vsr_t field (f32 or f64) + * fld - vsr_t field (VsrD(*) or VsrW(*)) * sfprf - set FPRF */ #define VSX_SQRT(op, nels, tp, fld, sfprf, r2sp) \ @@ -2037,23 +2037,23 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ for (i = 0; i < nels; i++) { \ float_status tstat = env->fp_status; \ set_float_exception_flags(0, &tstat); \ - xt.fld[i] = tp##_sqrt(xb.fld[i], &tstat); \ + xt.fld = tp##_sqrt(xb.fld, &tstat); \ env->fp_status.float_exception_flags |= tstat.float_exception_flags; \ \ if (unlikely(tstat.float_exception_flags & float_flag_invalid)) { \ - if (tp##_is_neg(xb.fld[i]) && !tp##_is_zero(xb.fld[i])) { \ + if (tp##_is_neg(xb.fld) && !tp##_is_zero(xb.fld)) { \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXSQRT, sfprf); \ - } else if (tp##_is_signaling_nan(xb.fld[i])) { \ + } else if (tp##_is_signaling_nan(xb.fld)) { \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXSNAN, sfprf); \ } \ } \ \ if (r2sp) { \ - xt.fld[i] = helper_frsp(env, xt.fld[i]); \ + xt.fld = helper_frsp(env, xt.fld); \ } \ \ if (sfprf) { \ - helper_compute_fprf(env, xt.fld[i], sfprf); \ + helper_compute_fprf(env, xt.fld, sfprf); \ } \ } \ \ @@ -2061,16 +2061,16 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ helper_float_check_status(env); \ } -VSX_SQRT(xssqrtdp, 1, float64, f64, 1, 0) -VSX_SQRT(xssqrtsp, 1, float64, f64, 1, 1) -VSX_SQRT(xvsqrtdp, 2, float64, f64, 0, 0) -VSX_SQRT(xvsqrtsp, 4, float32, f32, 0, 0) +VSX_SQRT(xssqrtdp, 1, float64, VsrD(0), 1, 0) +VSX_SQRT(xssqrtsp, 1, float64, VsrD(0), 1, 1) +VSX_SQRT(xvsqrtdp, 2, float64, VsrD(i), 0, 0) +VSX_SQRT(xvsqrtsp, 4, float32, VsrW(i), 0, 0) /* VSX_RSQRTE - VSX floating point reciprocal square root estimate * op - instruction mnemonic * nels - number of elements (1, 2 or 4) * tp - type (float32 or float64) - * fld - vsr_t field (f32 or f64) + * fld - vsr_t field (VsrD(*) or VsrW(*)) * sfprf - set FPRF */ #define VSX_RSQRTE(op, nels, tp, fld, sfprf, r2sp) \ @@ -2086,24 +2086,24 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ for (i = 0; i < nels; i++) { \ float_status tstat = env->fp_status; \ set_float_exception_flags(0, &tstat); \ - xt.fld[i] = tp##_sqrt(xb.fld[i], &tstat); \ - xt.fld[i] = tp##_div(tp##_one, xt.fld[i], &tstat); \ + xt.fld = tp##_sqrt(xb.fld, &tstat); \ + xt.fld = tp##_div(tp##_one, xt.fld, &tstat); \ env->fp_status.float_exception_flags |= tstat.float_exception_flags; \ \ if (unlikely(tstat.float_exception_flags & float_flag_invalid)) { \ - if (tp##_is_neg(xb.fld[i]) && !tp##_is_zero(xb.fld[i])) { \ + if (tp##_is_neg(xb.fld) && !tp##_is_zero(xb.fld)) { \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXSQRT, sfprf); \ - } else if (tp##_is_signaling_nan(xb.fld[i])) { \ + } else if (tp##_is_signaling_nan(xb.fld)) { \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXSNAN, sfprf); \ } \ } \ \ if (r2sp) { \ - xt.fld[i] = helper_frsp(env, xt.fld[i]); \ + xt.fld = helper_frsp(env, xt.fld); \ } \ \ if (sfprf) { \ - helper_compute_fprf(env, xt.fld[i], sfprf); \ + helper_compute_fprf(env, xt.fld, sfprf); \ } \ } \ \ @@ -2111,16 +2111,16 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ helper_float_check_status(env); \ } -VSX_RSQRTE(xsrsqrtedp, 1, float64, f64, 1, 0) -VSX_RSQRTE(xsrsqrtesp, 1, float64, f64, 1, 1) -VSX_RSQRTE(xvrsqrtedp, 2, float64, f64, 0, 0) -VSX_RSQRTE(xvrsqrtesp, 4, float32, f32, 0, 0) +VSX_RSQRTE(xsrsqrtedp, 1, float64, VsrD(0), 1, 0) +VSX_RSQRTE(xsrsqrtesp, 1, float64, VsrD(0), 1, 1) +VSX_RSQRTE(xvrsqrtedp, 2, float64, VsrD(i), 0, 0) +VSX_RSQRTE(xvrsqrtesp, 4, float32, VsrW(i), 0, 0) /* VSX_TDIV - VSX floating point test for divide * op - instruction mnemonic * nels - number of elements (1, 2 or 4) * tp - type (float32 or float64) - * fld - vsr_t field (f32 or f64) + * fld - vsr_t field (VsrD(*) or VsrW(*)) * emin - minimum unbiased exponent * emax - maximum unbiased exponent * nbits - number of fraction bits @@ -2137,28 +2137,28 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ getVSR(xB(opcode), &xb, env); \ \ for (i = 0; i < nels; i++) { \ - if (unlikely(tp##_is_infinity(xa.fld[i]) || \ - tp##_is_infinity(xb.fld[i]) || \ - tp##_is_zero(xb.fld[i]))) { \ + if (unlikely(tp##_is_infinity(xa.fld) || \ + tp##_is_infinity(xb.fld) || \ + tp##_is_zero(xb.fld))) { \ fe_flag = 1; \ fg_flag = 1; \ } else { \ - int e_a = ppc_##tp##_get_unbiased_exp(xa.fld[i]); \ - int e_b = ppc_##tp##_get_unbiased_exp(xb.fld[i]); \ + int e_a = ppc_##tp##_get_unbiased_exp(xa.fld); \ + int e_b = ppc_##tp##_get_unbiased_exp(xb.fld); \ \ - if (unlikely(tp##_is_any_nan(xa.fld[i]) || \ - tp##_is_any_nan(xb.fld[i]))) { \ + if (unlikely(tp##_is_any_nan(xa.fld) || \ + tp##_is_any_nan(xb.fld))) { \ fe_flag = 1; \ } else if ((e_b <= emin) || (e_b >= (emax-2))) { \ fe_flag = 1; \ - } else if (!tp##_is_zero(xa.fld[i]) && \ + } else if (!tp##_is_zero(xa.fld) && \ (((e_a - e_b) >= emax) || \ ((e_a - e_b) <= (emin+1)) || \ (e_a <= (emin+nbits)))) { \ fe_flag = 1; \ } \ \ - if (unlikely(tp##_is_zero_or_denormal(xb.fld[i]))) { \ + if (unlikely(tp##_is_zero_or_denormal(xb.fld))) { \ /* XB is not zero because of the above check and */ \ /* so must be denormalized. */ \ fg_flag = 1; \ @@ -2169,15 +2169,15 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ env->crf[BF(opcode)] = 0x8 | (fg_flag ? 4 : 0) | (fe_flag ? 2 : 0); \ } -VSX_TDIV(xstdivdp, 1, float64, f64, -1022, 1023, 52) -VSX_TDIV(xvtdivdp, 2, float64, f64, -1022, 1023, 52) -VSX_TDIV(xvtdivsp, 4, float32, f32, -126, 127, 23) +VSX_TDIV(xstdivdp, 1, float64, VsrD(0), -1022, 1023, 52) +VSX_TDIV(xvtdivdp, 2, float64, VsrD(i), -1022, 1023, 52) +VSX_TDIV(xvtdivsp, 4, float32, VsrW(i), -126, 127, 23) /* VSX_TSQRT - VSX floating point test for square root * op - instruction mnemonic * nels - number of elements (1, 2 or 4) * tp - type (float32 or float64) - * fld - vsr_t field (f32 or f64) + * fld - vsr_t field (VsrD(*) or VsrW(*)) * emin - minimum unbiased exponent * emax - maximum unbiased exponent * nbits - number of fraction bits @@ -2194,25 +2194,25 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ getVSR(xB(opcode), &xb, env); \ \ for (i = 0; i < nels; i++) { \ - if (unlikely(tp##_is_infinity(xb.fld[i]) || \ - tp##_is_zero(xb.fld[i]))) { \ + if (unlikely(tp##_is_infinity(xb.fld) || \ + tp##_is_zero(xb.fld))) { \ fe_flag = 1; \ fg_flag = 1; \ } else { \ - int e_b = ppc_##tp##_get_unbiased_exp(xb.fld[i]); \ + int e_b = ppc_##tp##_get_unbiased_exp(xb.fld); \ \ - if (unlikely(tp##_is_any_nan(xb.fld[i]))) { \ + if (unlikely(tp##_is_any_nan(xb.fld))) { \ fe_flag = 1; \ - } else if (unlikely(tp##_is_zero(xb.fld[i]))) { \ + } else if (unlikely(tp##_is_zero(xb.fld))) { \ fe_flag = 1; \ - } else if (unlikely(tp##_is_neg(xb.fld[i]))) { \ + } else if (unlikely(tp##_is_neg(xb.fld))) { \ fe_flag = 1; \ - } else if (!tp##_is_zero(xb.fld[i]) && \ + } else if (!tp##_is_zero(xb.fld) && \ (e_b <= (emin+nbits))) { \ fe_flag = 1; \ } \ \ - if (unlikely(tp##_is_zero_or_denormal(xb.fld[i]))) { \ + if (unlikely(tp##_is_zero_or_denormal(xb.fld))) { \ /* XB is not zero because of the above check and */ \ /* therefore must be denormalized. */ \ fg_flag = 1; \ @@ -2223,15 +2223,15 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ env->crf[BF(opcode)] = 0x8 | (fg_flag ? 4 : 0) | (fe_flag ? 2 : 0); \ } -VSX_TSQRT(xstsqrtdp, 1, float64, f64, -1022, 52) -VSX_TSQRT(xvtsqrtdp, 2, float64, f64, -1022, 52) -VSX_TSQRT(xvtsqrtsp, 4, float32, f32, -126, 23) +VSX_TSQRT(xstsqrtdp, 1, float64, VsrD(0), -1022, 52) +VSX_TSQRT(xvtsqrtdp, 2, float64, VsrD(i), -1022, 52) +VSX_TSQRT(xvtsqrtsp, 4, float32, VsrW(i), -126, 23) /* VSX_MADD - VSX floating point muliply/add variations * op - instruction mnemonic * nels - number of elements (1, 2 or 4) * tp - type (float32 or float64) - * fld - vsr_t field (f32 or f64) + * fld - vsr_t field (VsrD(*) or VsrW(*)) * maddflgs - flags for the float*muladd routine that control the * various forms (madd, msub, nmadd, nmsub) * afrm - A form (1=A, 0=M) @@ -2267,43 +2267,43 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ /* Avoid double rounding errors by rounding the intermediate */ \ /* result to odd. */ \ set_float_rounding_mode(float_round_to_zero, &tstat); \ - xt_out.fld[i] = tp##_muladd(xa.fld[i], b->fld[i], c->fld[i], \ + xt_out.fld = tp##_muladd(xa.fld, b->fld, c->fld, \ maddflgs, &tstat); \ - xt_out.fld[i] |= (get_float_exception_flags(&tstat) & \ + xt_out.fld |= (get_float_exception_flags(&tstat) & \ float_flag_inexact) != 0; \ } else { \ - xt_out.fld[i] = tp##_muladd(xa.fld[i], b->fld[i], c->fld[i], \ + xt_out.fld = tp##_muladd(xa.fld, b->fld, c->fld, \ maddflgs, &tstat); \ } \ env->fp_status.float_exception_flags |= tstat.float_exception_flags; \ \ if (unlikely(tstat.float_exception_flags & float_flag_invalid)) { \ - if (tp##_is_signaling_nan(xa.fld[i]) || \ - tp##_is_signaling_nan(b->fld[i]) || \ - tp##_is_signaling_nan(c->fld[i])) { \ + if (tp##_is_signaling_nan(xa.fld) || \ + tp##_is_signaling_nan(b->fld) || \ + tp##_is_signaling_nan(c->fld)) { \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXSNAN, sfprf); \ tstat.float_exception_flags &= ~float_flag_invalid; \ } \ - if ((tp##_is_infinity(xa.fld[i]) && tp##_is_zero(b->fld[i])) || \ - (tp##_is_zero(xa.fld[i]) && tp##_is_infinity(b->fld[i]))) { \ - xt_out.fld[i] = float64_to_##tp(fload_invalid_op_excp(env, \ + if ((tp##_is_infinity(xa.fld) && tp##_is_zero(b->fld)) || \ + (tp##_is_zero(xa.fld) && tp##_is_infinity(b->fld))) { \ + xt_out.fld = float64_to_##tp(fload_invalid_op_excp(env, \ POWERPC_EXCP_FP_VXIMZ, sfprf), &env->fp_status); \ tstat.float_exception_flags &= ~float_flag_invalid; \ } \ if ((tstat.float_exception_flags & float_flag_invalid) && \ - ((tp##_is_infinity(xa.fld[i]) || \ - tp##_is_infinity(b->fld[i])) && \ - tp##_is_infinity(c->fld[i]))) { \ + ((tp##_is_infinity(xa.fld) || \ + tp##_is_infinity(b->fld)) && \ + tp##_is_infinity(c->fld))) { \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXISI, sfprf); \ } \ } \ \ if (r2sp) { \ - xt_out.fld[i] = helper_frsp(env, xt_out.fld[i]); \ + xt_out.fld = helper_frsp(env, xt_out.fld); \ } \ \ if (sfprf) { \ - helper_compute_fprf(env, xt_out.fld[i], sfprf); \ + helper_compute_fprf(env, xt_out.fld, sfprf); \ } \ } \ putVSR(xT(opcode), &xt_out, env); \ @@ -2315,41 +2315,41 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ #define NMADD_FLGS float_muladd_negate_result #define NMSUB_FLGS (float_muladd_negate_c | float_muladd_negate_result) -VSX_MADD(xsmaddadp, 1, float64, f64, MADD_FLGS, 1, 1, 0) -VSX_MADD(xsmaddmdp, 1, float64, f64, MADD_FLGS, 0, 1, 0) -VSX_MADD(xsmsubadp, 1, float64, f64, MSUB_FLGS, 1, 1, 0) -VSX_MADD(xsmsubmdp, 1, float64, f64, MSUB_FLGS, 0, 1, 0) -VSX_MADD(xsnmaddadp, 1, float64, f64, NMADD_FLGS, 1, 1, 0) -VSX_MADD(xsnmaddmdp, 1, float64, f64, NMADD_FLGS, 0, 1, 0) -VSX_MADD(xsnmsubadp, 1, float64, f64, NMSUB_FLGS, 1, 1, 0) -VSX_MADD(xsnmsubmdp, 1, float64, f64, NMSUB_FLGS, 0, 1, 0) - -VSX_MADD(xsmaddasp, 1, float64, f64, MADD_FLGS, 1, 1, 1) -VSX_MADD(xsmaddmsp, 1, float64, f64, MADD_FLGS, 0, 1, 1) -VSX_MADD(xsmsubasp, 1, float64, f64, MSUB_FLGS, 1, 1, 1) -VSX_MADD(xsmsubmsp, 1, float64, f64, MSUB_FLGS, 0, 1, 1) -VSX_MADD(xsnmaddasp, 1, float64, f64, NMADD_FLGS, 1, 1, 1) -VSX_MADD(xsnmaddmsp, 1, float64, f64, NMADD_FLGS, 0, 1, 1) -VSX_MADD(xsnmsubasp, 1, float64, f64, NMSUB_FLGS, 1, 1, 1) -VSX_MADD(xsnmsubmsp, 1, float64, f64, NMSUB_FLGS, 0, 1, 1) - -VSX_MADD(xvmaddadp, 2, float64, f64, MADD_FLGS, 1, 0, 0) -VSX_MADD(xvmaddmdp, 2, float64, f64, MADD_FLGS, 0, 0, 0) -VSX_MADD(xvmsubadp, 2, float64, f64, MSUB_FLGS, 1, 0, 0) -VSX_MADD(xvmsubmdp, 2, float64, f64, MSUB_FLGS, 0, 0, 0) -VSX_MADD(xvnmaddadp, 2, float64, f64, NMADD_FLGS, 1, 0, 0) -VSX_MADD(xvnmaddmdp, 2, float64, f64, NMADD_FLGS, 0, 0, 0) -VSX_MADD(xvnmsubadp, 2, float64, f64, NMSUB_FLGS, 1, 0, 0) -VSX_MADD(xvnmsubmdp, 2, float64, f64, NMSUB_FLGS, 0, 0, 0) - -VSX_MADD(xvmaddasp, 4, float32, f32, MADD_FLGS, 1, 0, 0) -VSX_MADD(xvmaddmsp, 4, float32, f32, MADD_FLGS, 0, 0, 0) -VSX_MADD(xvmsubasp, 4, float32, f32, MSUB_FLGS, 1, 0, 0) -VSX_MADD(xvmsubmsp, 4, float32, f32, MSUB_FLGS, 0, 0, 0) -VSX_MADD(xvnmaddasp, 4, float32, f32, NMADD_FLGS, 1, 0, 0) -VSX_MADD(xvnmaddmsp, 4, float32, f32, NMADD_FLGS, 0, 0, 0) -VSX_MADD(xvnmsubasp, 4, float32, f32, NMSUB_FLGS, 1, 0, 0) -VSX_MADD(xvnmsubmsp, 4, float32, f32, NMSUB_FLGS, 0, 0, 0) +VSX_MADD(xsmaddadp, 1, float64, VsrD(0), MADD_FLGS, 1, 1, 0) +VSX_MADD(xsmaddmdp, 1, float64, VsrD(0), MADD_FLGS, 0, 1, 0) +VSX_MADD(xsmsubadp, 1, float64, VsrD(0), MSUB_FLGS, 1, 1, 0) +VSX_MADD(xsmsubmdp, 1, float64, VsrD(0), MSUB_FLGS, 0, 1, 0) +VSX_MADD(xsnmaddadp, 1, float64, VsrD(0), NMADD_FLGS, 1, 1, 0) +VSX_MADD(xsnmaddmdp, 1, float64, VsrD(0), NMADD_FLGS, 0, 1, 0) +VSX_MADD(xsnmsubadp, 1, float64, VsrD(0), NMSUB_FLGS, 1, 1, 0) +VSX_MADD(xsnmsubmdp, 1, float64, VsrD(0), NMSUB_FLGS, 0, 1, 0) + +VSX_MADD(xsmaddasp, 1, float64, VsrD(0), MADD_FLGS, 1, 1, 1) +VSX_MADD(xsmaddmsp, 1, float64, VsrD(0), MADD_FLGS, 0, 1, 1) +VSX_MADD(xsmsubasp, 1, float64, VsrD(0), MSUB_FLGS, 1, 1, 1) +VSX_MADD(xsmsubmsp, 1, float64, VsrD(0), MSUB_FLGS, 0, 1, 1) +VSX_MADD(xsnmaddasp, 1, float64, VsrD(0), NMADD_FLGS, 1, 1, 1) +VSX_MADD(xsnmaddmsp, 1, float64, VsrD(0), NMADD_FLGS, 0, 1, 1) +VSX_MADD(xsnmsubasp, 1, float64, VsrD(0), NMSUB_FLGS, 1, 1, 1) +VSX_MADD(xsnmsubmsp, 1, float64, VsrD(0), NMSUB_FLGS, 0, 1, 1) + +VSX_MADD(xvmaddadp, 2, float64, VsrD(i), MADD_FLGS, 1, 0, 0) +VSX_MADD(xvmaddmdp, 2, float64, VsrD(i), MADD_FLGS, 0, 0, 0) +VSX_MADD(xvmsubadp, 2, float64, VsrD(i), MSUB_FLGS, 1, 0, 0) +VSX_MADD(xvmsubmdp, 2, float64, VsrD(i), MSUB_FLGS, 0, 0, 0) +VSX_MADD(xvnmaddadp, 2, float64, VsrD(i), NMADD_FLGS, 1, 0, 0) +VSX_MADD(xvnmaddmdp, 2, float64, VsrD(i), NMADD_FLGS, 0, 0, 0) +VSX_MADD(xvnmsubadp, 2, float64, VsrD(i), NMSUB_FLGS, 1, 0, 0) +VSX_MADD(xvnmsubmdp, 2, float64, VsrD(i), NMSUB_FLGS, 0, 0, 0) + +VSX_MADD(xvmaddasp, 4, float32, VsrW(i), MADD_FLGS, 1, 0, 0) +VSX_MADD(xvmaddmsp, 4, float32, VsrW(i), MADD_FLGS, 0, 0, 0) +VSX_MADD(xvmsubasp, 4, float32, VsrW(i), MSUB_FLGS, 1, 0, 0) +VSX_MADD(xvmsubmsp, 4, float32, VsrW(i), MSUB_FLGS, 0, 0, 0) +VSX_MADD(xvnmaddasp, 4, float32, VsrW(i), NMADD_FLGS, 1, 0, 0) +VSX_MADD(xvnmaddmsp, 4, float32, VsrW(i), NMADD_FLGS, 0, 0, 0) +VSX_MADD(xvnmsubasp, 4, float32, VsrW(i), NMSUB_FLGS, 1, 0, 0) +VSX_MADD(xvnmsubmsp, 4, float32, VsrW(i), NMSUB_FLGS, 0, 0, 0) #define VSX_SCALAR_CMP(op, ordered) \ void helper_##op(CPUPPCState *env, uint32_t opcode) \ @@ -2398,7 +2398,7 @@ VSX_SCALAR_CMP(xscmpudp, 0) * op - operation (max or min) * nels - number of elements (1, 2 or 4) * tp - type (float32 or float64) - * fld - vsr_t field (f32 or f64) + * fld - vsr_t field (VsrD(*) or VsrW(*)) */ #define VSX_MAX_MIN(name, op, nels, tp, fld) \ void helper_##name(CPUPPCState *env, uint32_t opcode) \ @@ -2411,9 +2411,9 @@ void helper_##name(CPUPPCState *env, uint32_t opcode) \ getVSR(xT(opcode), &xt, env); \ \ for (i = 0; i < nels; i++) { \ - xt.fld[i] = tp##_##op(xa.fld[i], xb.fld[i], &env->fp_status); \ - if (unlikely(tp##_is_signaling_nan(xa.fld[i]) || \ - tp##_is_signaling_nan(xb.fld[i]))) { \ + xt.fld = tp##_##op(xa.fld, xb.fld, &env->fp_status); \ + if (unlikely(tp##_is_signaling_nan(xa.fld) || \ + tp##_is_signaling_nan(xb.fld))) { \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXSNAN, 0); \ } \ } \ @@ -2422,18 +2422,18 @@ void helper_##name(CPUPPCState *env, uint32_t opcode) \ helper_float_check_status(env); \ } -VSX_MAX_MIN(xsmaxdp, maxnum, 1, float64, f64) -VSX_MAX_MIN(xvmaxdp, maxnum, 2, float64, f64) -VSX_MAX_MIN(xvmaxsp, maxnum, 4, float32, f32) -VSX_MAX_MIN(xsmindp, minnum, 1, float64, f64) -VSX_MAX_MIN(xvmindp, minnum, 2, float64, f64) -VSX_MAX_MIN(xvminsp, minnum, 4, float32, f32) +VSX_MAX_MIN(xsmaxdp, maxnum, 1, float64, VsrD(0)) +VSX_MAX_MIN(xvmaxdp, maxnum, 2, float64, VsrD(i)) +VSX_MAX_MIN(xvmaxsp, maxnum, 4, float32, VsrW(i)) +VSX_MAX_MIN(xsmindp, minnum, 1, float64, VsrD(0)) +VSX_MAX_MIN(xvmindp, minnum, 2, float64, VsrD(i)) +VSX_MAX_MIN(xvminsp, minnum, 4, float32, VsrW(i)) /* VSX_CMP - VSX floating point compare * op - instruction mnemonic * nels - number of elements (1, 2 or 4) * tp - type (float32 or float64) - * fld - vsr_t field (f32 or f64) + * fld - vsr_t field (VsrD(*) or VsrW(*)) * cmp - comparison operation * svxvc - set VXVC bit */ @@ -2450,23 +2450,23 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ getVSR(xT(opcode), &xt, env); \ \ for (i = 0; i < nels; i++) { \ - if (unlikely(tp##_is_any_nan(xa.fld[i]) || \ - tp##_is_any_nan(xb.fld[i]))) { \ - if (tp##_is_signaling_nan(xa.fld[i]) || \ - tp##_is_signaling_nan(xb.fld[i])) { \ + if (unlikely(tp##_is_any_nan(xa.fld) || \ + tp##_is_any_nan(xb.fld))) { \ + if (tp##_is_signaling_nan(xa.fld) || \ + tp##_is_signaling_nan(xb.fld)) { \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXSNAN, 0); \ } \ if (svxvc) { \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXVC, 0); \ } \ - xt.fld[i] = 0; \ + xt.fld = 0; \ all_true = 0; \ } else { \ - if (tp##_##cmp(xb.fld[i], xa.fld[i], &env->fp_status) == 1) { \ - xt.fld[i] = -1; \ + if (tp##_##cmp(xb.fld, xa.fld, &env->fp_status) == 1) { \ + xt.fld = -1; \ all_false = 0; \ } else { \ - xt.fld[i] = 0; \ + xt.fld = 0; \ all_true = 0; \ } \ } \ @@ -2479,12 +2479,12 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ helper_float_check_status(env); \ } -VSX_CMP(xvcmpeqdp, 2, float64, f64, eq, 0) -VSX_CMP(xvcmpgedp, 2, float64, f64, le, 1) -VSX_CMP(xvcmpgtdp, 2, float64, f64, lt, 1) -VSX_CMP(xvcmpeqsp, 4, float32, f32, eq, 0) -VSX_CMP(xvcmpgesp, 4, float32, f32, le, 1) -VSX_CMP(xvcmpgtsp, 4, float32, f32, lt, 1) +VSX_CMP(xvcmpeqdp, 2, float64, VsrD(i), eq, 0) +VSX_CMP(xvcmpgedp, 2, float64, VsrD(i), le, 1) +VSX_CMP(xvcmpgtdp, 2, float64, VsrD(i), lt, 1) +VSX_CMP(xvcmpeqsp, 4, float32, VsrW(i), eq, 0) +VSX_CMP(xvcmpgesp, 4, float32, VsrW(i), le, 1) +VSX_CMP(xvcmpgtsp, 4, float32, VsrW(i), lt, 1) #if defined(HOST_WORDS_BIGENDIAN) #define JOFFSET 0 @@ -2671,7 +2671,7 @@ VSX_CVT_INT_TO_FP(xvcvuxwsp, 4, uint32, float32, u32[j], f32[i], i, 0, 0) * op - instruction mnemonic * nels - number of elements (1, 2 or 4) * tp - type (float32 or float64) - * fld - vsr_t field (f32 or f64) + * fld - vsr_t field (VsrD(*) or VsrW(*)) * rmode - rounding mode * sfprf - set FPRF */ @@ -2688,14 +2688,14 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ } \ \ for (i = 0; i < nels; i++) { \ - if (unlikely(tp##_is_signaling_nan(xb.fld[i]))) { \ + if (unlikely(tp##_is_signaling_nan(xb.fld))) { \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXSNAN, 0); \ - xt.fld[i] = tp##_snan_to_qnan(xb.fld[i]); \ + xt.fld = tp##_snan_to_qnan(xb.fld); \ } else { \ - xt.fld[i] = tp##_round_to_int(xb.fld[i], &env->fp_status); \ + xt.fld = tp##_round_to_int(xb.fld, &env->fp_status); \ } \ if (sfprf) { \ - helper_compute_fprf(env, xt.fld[i], sfprf); \ + helper_compute_fprf(env, xt.fld, sfprf); \ } \ } \ \ @@ -2711,23 +2711,23 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ helper_float_check_status(env); \ } -VSX_ROUND(xsrdpi, 1, float64, f64, float_round_nearest_even, 1) -VSX_ROUND(xsrdpic, 1, float64, f64, FLOAT_ROUND_CURRENT, 1) -VSX_ROUND(xsrdpim, 1, float64, f64, float_round_down, 1) -VSX_ROUND(xsrdpip, 1, float64, f64, float_round_up, 1) -VSX_ROUND(xsrdpiz, 1, float64, f64, float_round_to_zero, 1) +VSX_ROUND(xsrdpi, 1, float64, VsrD(0), float_round_nearest_even, 1) +VSX_ROUND(xsrdpic, 1, float64, VsrD(0), FLOAT_ROUND_CURRENT, 1) +VSX_ROUND(xsrdpim, 1, float64, VsrD(0), float_round_down, 1) +VSX_ROUND(xsrdpip, 1, float64, VsrD(0), float_round_up, 1) +VSX_ROUND(xsrdpiz, 1, float64, VsrD(0), float_round_to_zero, 1) -VSX_ROUND(xvrdpi, 2, float64, f64, float_round_nearest_even, 0) -VSX_ROUND(xvrdpic, 2, float64, f64, FLOAT_ROUND_CURRENT, 0) -VSX_ROUND(xvrdpim, 2, float64, f64, float_round_down, 0) -VSX_ROUND(xvrdpip, 2, float64, f64, float_round_up, 0) -VSX_ROUND(xvrdpiz, 2, float64, f64, float_round_to_zero, 0) +VSX_ROUND(xvrdpi, 2, float64, VsrD(i), float_round_nearest_even, 0) +VSX_ROUND(xvrdpic, 2, float64, VsrD(i), FLOAT_ROUND_CURRENT, 0) +VSX_ROUND(xvrdpim, 2, float64, VsrD(i), float_round_down, 0) +VSX_ROUND(xvrdpip, 2, float64, VsrD(i), float_round_up, 0) +VSX_ROUND(xvrdpiz, 2, float64, VsrD(i), float_round_to_zero, 0) -VSX_ROUND(xvrspi, 4, float32, f32, float_round_nearest_even, 0) -VSX_ROUND(xvrspic, 4, float32, f32, FLOAT_ROUND_CURRENT, 0) -VSX_ROUND(xvrspim, 4, float32, f32, float_round_down, 0) -VSX_ROUND(xvrspip, 4, float32, f32, float_round_up, 0) -VSX_ROUND(xvrspiz, 4, float32, f32, float_round_to_zero, 0) +VSX_ROUND(xvrspi, 4, float32, VsrW(i), float_round_nearest_even, 0) +VSX_ROUND(xvrspic, 4, float32, VsrW(i), FLOAT_ROUND_CURRENT, 0) +VSX_ROUND(xvrspim, 4, float32, VsrW(i), float_round_down, 0) +VSX_ROUND(xvrspip, 4, float32, VsrW(i), float_round_up, 0) +VSX_ROUND(xvrspiz, 4, float32, VsrW(i), float_round_to_zero, 0) uint64_t helper_xsrsp(CPUPPCState *env, uint64_t xb) { -- 1.7.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 6/9] target-ppc: Correct VSX Scalar Compares 2014-03-26 20:45 [Qemu-devel] [PATCH 0/9] target-ppc: VSX Bug Fixes Tom Musta ` (4 preceding siblings ...) 2014-03-26 20:45 ` [Qemu-devel] [PATCH 5/9] target-ppc: Correct Simple VSR LE Host Inversions Tom Musta @ 2014-03-26 20:45 ` Tom Musta 2014-03-26 20:45 ` [Qemu-devel] [PATCH 7/9] target-ppc: Correct VSX FP to FP Conversions Tom Musta ` (3 subsequent siblings) 9 siblings, 0 replies; 15+ messages in thread From: Tom Musta @ 2014-03-26 20:45 UTC (permalink / raw) To: qemu-devel; +Cc: Tom Musta, qemu-ppc This change fixes the VSX scalar compare instructions. The existing usage of "x.f64[0]" is changed to "x.VsrD(0)". Signed-off-by: Tom Musta <tommusta@gmail.com> Tested-by: Tom Musta <tommusta@gmail.com> --- target-ppc/fpu_helper.c | 13 +++++++------ 1 files changed, 7 insertions(+), 6 deletions(-) diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c index 1c37b30..6233d5e 100644 --- a/target-ppc/fpu_helper.c +++ b/target-ppc/fpu_helper.c @@ -2360,10 +2360,10 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ getVSR(xA(opcode), &xa, env); \ getVSR(xB(opcode), &xb, env); \ \ - if (unlikely(float64_is_any_nan(xa.f64[0]) || \ - float64_is_any_nan(xb.f64[0]))) { \ - if (float64_is_signaling_nan(xa.f64[0]) || \ - float64_is_signaling_nan(xb.f64[0])) { \ + if (unlikely(float64_is_any_nan(xa.VsrD(0)) || \ + float64_is_any_nan(xb.VsrD(0)))) { \ + if (float64_is_signaling_nan(xa.VsrD(0)) || \ + float64_is_signaling_nan(xb.VsrD(0))) { \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXSNAN, 0); \ } \ if (ordered) { \ @@ -2371,9 +2371,10 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ } \ cc = 1; \ } else { \ - if (float64_lt(xa.f64[0], xb.f64[0], &env->fp_status)) { \ + if (float64_lt(xa.VsrD(0), xb.VsrD(0), &env->fp_status)) { \ cc = 8; \ - } else if (!float64_le(xa.f64[0], xb.f64[0], &env->fp_status)) { \ + } else if (!float64_le(xa.VsrD(0), xb.VsrD(0), \ + &env->fp_status)) { \ cc = 4; \ } else { \ cc = 2; \ -- 1.7.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 7/9] target-ppc: Correct VSX FP to FP Conversions 2014-03-26 20:45 [Qemu-devel] [PATCH 0/9] target-ppc: VSX Bug Fixes Tom Musta ` (5 preceding siblings ...) 2014-03-26 20:45 ` [Qemu-devel] [PATCH 6/9] target-ppc: Correct VSX Scalar Compares Tom Musta @ 2014-03-26 20:45 ` Tom Musta 2014-03-26 20:45 ` [Qemu-devel] [PATCH 8/9] target-ppc: Correct VSX FP to Integer Conversion Tom Musta ` (2 subsequent siblings) 9 siblings, 0 replies; 15+ messages in thread From: Tom Musta @ 2014-03-26 20:45 UTC (permalink / raw) To: qemu-devel; +Cc: Tom Musta, qemu-ppc This change corrects the VSX double precision to single precision and single precision to double precisions conversion routines. The endian correct accessors are now used. The auxiliary "j" index is no longer necessary and is eliminated. Signed-off-by: Tom Musta <tommusta@gmail.com> Tested-by: Tom Musta <tommusta@gmail.com> --- target-ppc/fpu_helper.c | 9 ++++----- 1 files changed, 4 insertions(+), 5 deletions(-) diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c index 6233d5e..12bec90 100644 --- a/target-ppc/fpu_helper.c +++ b/target-ppc/fpu_helper.c @@ -2512,7 +2512,6 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ getVSR(xT(opcode), &xt, env); \ \ for (i = 0; i < nels; i++) { \ - int j = 2*i + JOFFSET; \ xt.tfld = stp##_to_##ttp(xb.sfld, &env->fp_status); \ if (unlikely(stp##_is_signaling_nan(xb.sfld))) { \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXSNAN, 0); \ @@ -2528,10 +2527,10 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ helper_float_check_status(env); \ } -VSX_CVT_FP_TO_FP(xscvdpsp, 1, float64, float32, f64[i], f32[j], 1) -VSX_CVT_FP_TO_FP(xscvspdp, 1, float32, float64, f32[j], f64[i], 1) -VSX_CVT_FP_TO_FP(xvcvdpsp, 2, float64, float32, f64[i], f32[j], 0) -VSX_CVT_FP_TO_FP(xvcvspdp, 2, float32, float64, f32[j], f64[i], 0) +VSX_CVT_FP_TO_FP(xscvdpsp, 1, float64, float32, VsrD(0), VsrW(0), 1) +VSX_CVT_FP_TO_FP(xscvspdp, 1, float32, float64, VsrW(0), VsrD(0), 1) +VSX_CVT_FP_TO_FP(xvcvdpsp, 2, float64, float32, VsrD(i), VsrW(2*i), 0) +VSX_CVT_FP_TO_FP(xvcvspdp, 2, float32, float64, VsrW(2*i), VsrD(i), 0) uint64_t helper_xscvdpspn(CPUPPCState *env, uint64_t xb) { -- 1.7.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 8/9] target-ppc: Correct VSX FP to Integer Conversion 2014-03-26 20:45 [Qemu-devel] [PATCH 0/9] target-ppc: VSX Bug Fixes Tom Musta ` (6 preceding siblings ...) 2014-03-26 20:45 ` [Qemu-devel] [PATCH 7/9] target-ppc: Correct VSX FP to FP Conversions Tom Musta @ 2014-03-26 20:45 ` Tom Musta 2014-03-26 20:45 ` [Qemu-devel] [PATCH 9/9] target-ppc: Correct VSX Integer to FP Conversion Tom Musta 2014-03-28 3:49 ` [Qemu-devel] [PATCH 0/9] target-ppc: VSX Bug Fixes Anton Blanchard 9 siblings, 0 replies; 15+ messages in thread From: Tom Musta @ 2014-03-26 20:45 UTC (permalink / raw) To: qemu-devel; +Cc: Tom Musta, qemu-ppc This patch corrects the VSX floating point to integer conversion instructions by using the endian correct accessors. The auxiliary "j" index used by the existing macros is now obsolete and is removed. Signed-off-by: Tom Musta <tommusta@gmail.com> Tested-by: Tom Musta <tommusta@gmail.com> --- target-ppc/fpu_helper.c | 36 +++++++++++++++--------------------- 1 files changed, 15 insertions(+), 21 deletions(-) diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c index 12bec90..abba703 100644 --- a/target-ppc/fpu_helper.c +++ b/target-ppc/fpu_helper.c @@ -2555,10 +2555,9 @@ uint64_t helper_xscvspdpn(CPUPPCState *env, uint64_t xb) * ttp - target type (int32, uint32, int64 or uint64) * sfld - source vsr_t field * tfld - target vsr_t field - * jdef - definition of the j index (i or 2*i) * rnan - resulting NaN */ -#define VSX_CVT_FP_TO_INT(op, nels, stp, ttp, sfld, tfld, jdef, rnan) \ +#define VSX_CVT_FP_TO_INT(op, nels, stp, ttp, sfld, tfld, rnan) \ void helper_##op(CPUPPCState *env, uint32_t opcode) \ { \ ppc_vsr_t xt, xb; \ @@ -2568,7 +2567,6 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ getVSR(xT(opcode), &xt, env); \ \ for (i = 0; i < nels; i++) { \ - int j = jdef; \ if (unlikely(stp##_is_any_nan(xb.sfld))) { \ if (stp##_is_signaling_nan(xb.sfld)) { \ fload_invalid_op_excp(env, POWERPC_EXCP_FP_VXSNAN, 0); \ @@ -2588,27 +2586,23 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ helper_float_check_status(env); \ } -VSX_CVT_FP_TO_INT(xscvdpsxds, 1, float64, int64, f64[j], u64[i], i, \ +VSX_CVT_FP_TO_INT(xscvdpsxds, 1, float64, int64, VsrD(0), VsrD(0), \ 0x8000000000000000ULL) -VSX_CVT_FP_TO_INT(xscvdpsxws, 1, float64, int32, f64[i], u32[j], \ - 2*i + JOFFSET, 0x80000000U) -VSX_CVT_FP_TO_INT(xscvdpuxds, 1, float64, uint64, f64[j], u64[i], i, 0ULL) -VSX_CVT_FP_TO_INT(xscvdpuxws, 1, float64, uint32, f64[i], u32[j], \ - 2*i + JOFFSET, 0U) -VSX_CVT_FP_TO_INT(xvcvdpsxds, 2, float64, int64, f64[j], u64[i], i, \ +VSX_CVT_FP_TO_INT(xscvdpsxws, 1, float64, int32, VsrD(0), VsrW(1), \ + 0x80000000U) +VSX_CVT_FP_TO_INT(xscvdpuxds, 1, float64, uint64, VsrD(0), VsrD(0), 0ULL) +VSX_CVT_FP_TO_INT(xscvdpuxws, 1, float64, uint32, VsrD(0), VsrW(1), 0U) +VSX_CVT_FP_TO_INT(xvcvdpsxds, 2, float64, int64, VsrD(i), VsrD(i), \ 0x8000000000000000ULL) -VSX_CVT_FP_TO_INT(xvcvdpsxws, 2, float64, int32, f64[i], u32[j], \ - 2*i + JOFFSET, 0x80000000U) -VSX_CVT_FP_TO_INT(xvcvdpuxds, 2, float64, uint64, f64[j], u64[i], i, 0ULL) -VSX_CVT_FP_TO_INT(xvcvdpuxws, 2, float64, uint32, f64[i], u32[j], \ - 2*i + JOFFSET, 0U) -VSX_CVT_FP_TO_INT(xvcvspsxds, 2, float32, int64, f32[j], u64[i], \ - 2*i + JOFFSET, 0x8000000000000000ULL) -VSX_CVT_FP_TO_INT(xvcvspsxws, 4, float32, int32, f32[j], u32[j], i, \ +VSX_CVT_FP_TO_INT(xvcvdpsxws, 2, float64, int32, VsrD(i), VsrW(2*i), \ 0x80000000U) -VSX_CVT_FP_TO_INT(xvcvspuxds, 2, float32, uint64, f32[j], u64[i], \ - 2*i + JOFFSET, 0ULL) -VSX_CVT_FP_TO_INT(xvcvspuxws, 4, float32, uint32, f32[j], u32[i], i, 0U) +VSX_CVT_FP_TO_INT(xvcvdpuxds, 2, float64, uint64, VsrD(i), VsrD(i), 0ULL) +VSX_CVT_FP_TO_INT(xvcvdpuxws, 2, float64, uint32, VsrD(i), VsrW(2*i), 0U) +VSX_CVT_FP_TO_INT(xvcvspsxds, 2, float32, int64, VsrW(2*i), VsrD(i), \ + 0x8000000000000000ULL) +VSX_CVT_FP_TO_INT(xvcvspsxws, 4, float32, int32, VsrW(i), VsrW(i), 0x80000000U) +VSX_CVT_FP_TO_INT(xvcvspuxds, 2, float32, uint64, VsrW(2*i), VsrD(i), 0ULL) +VSX_CVT_FP_TO_INT(xvcvspuxws, 4, float32, uint32, VsrW(i), VsrW(i), 0U) /* VSX_CVT_INT_TO_FP - VSX integer to floating point conversion * op - instruction mnemonic -- 1.7.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 9/9] target-ppc: Correct VSX Integer to FP Conversion 2014-03-26 20:45 [Qemu-devel] [PATCH 0/9] target-ppc: VSX Bug Fixes Tom Musta ` (7 preceding siblings ...) 2014-03-26 20:45 ` [Qemu-devel] [PATCH 8/9] target-ppc: Correct VSX FP to Integer Conversion Tom Musta @ 2014-03-26 20:45 ` Tom Musta 2014-03-28 3:49 ` [Qemu-devel] [PATCH 0/9] target-ppc: VSX Bug Fixes Anton Blanchard 9 siblings, 0 replies; 15+ messages in thread From: Tom Musta @ 2014-03-26 20:45 UTC (permalink / raw) To: qemu-devel; +Cc: Tom Musta, qemu-ppc This patch corrects the VSX integer to floating point conversion instructions by using the endian correct accessors. The auxiliary "j" index used by the existing macros is now obsolete and is removed. The JOFFSET preprocessor macro is also obsolete and removed. Signed-off-by: Tom Musta <tommusta@gmail.com> Tested-by: Tom Musta <tommusta@gmail.com> --- target-ppc/fpu_helper.c | 37 +++++++++++++------------------------ 1 files changed, 13 insertions(+), 24 deletions(-) diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c index abba703..c6f484f 100644 --- a/target-ppc/fpu_helper.c +++ b/target-ppc/fpu_helper.c @@ -2487,12 +2487,6 @@ VSX_CMP(xvcmpeqsp, 4, float32, VsrW(i), eq, 0) VSX_CMP(xvcmpgesp, 4, float32, VsrW(i), le, 1) VSX_CMP(xvcmpgtsp, 4, float32, VsrW(i), lt, 1) -#if defined(HOST_WORDS_BIGENDIAN) -#define JOFFSET 0 -#else -#define JOFFSET 1 -#endif - /* VSX_CVT_FP_TO_FP - VSX floating point/floating point conversion * op - instruction mnemonic * nels - number of elements (1, 2 or 4) @@ -2614,7 +2608,7 @@ VSX_CVT_FP_TO_INT(xvcvspuxws, 4, float32, uint32, VsrW(i), VsrW(i), 0U) * jdef - definition of the j index (i or 2*i) * sfprf - set FPRF */ -#define VSX_CVT_INT_TO_FP(op, nels, stp, ttp, sfld, tfld, jdef, sfprf, r2sp) \ +#define VSX_CVT_INT_TO_FP(op, nels, stp, ttp, sfld, tfld, sfprf, r2sp) \ void helper_##op(CPUPPCState *env, uint32_t opcode) \ { \ ppc_vsr_t xt, xb; \ @@ -2624,7 +2618,6 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ getVSR(xT(opcode), &xt, env); \ \ for (i = 0; i < nels; i++) { \ - int j = jdef; \ xt.tfld = stp##_to_##ttp(xb.sfld, &env->fp_status); \ if (r2sp) { \ xt.tfld = helper_frsp(env, xt.tfld); \ @@ -2638,22 +2631,18 @@ void helper_##op(CPUPPCState *env, uint32_t opcode) \ helper_float_check_status(env); \ } -VSX_CVT_INT_TO_FP(xscvsxddp, 1, int64, float64, u64[j], f64[i], i, 1, 0) -VSX_CVT_INT_TO_FP(xscvuxddp, 1, uint64, float64, u64[j], f64[i], i, 1, 0) -VSX_CVT_INT_TO_FP(xscvsxdsp, 1, int64, float64, u64[j], f64[i], i, 1, 1) -VSX_CVT_INT_TO_FP(xscvuxdsp, 1, uint64, float64, u64[j], f64[i], i, 1, 1) -VSX_CVT_INT_TO_FP(xvcvsxddp, 2, int64, float64, u64[j], f64[i], i, 0, 0) -VSX_CVT_INT_TO_FP(xvcvuxddp, 2, uint64, float64, u64[j], f64[i], i, 0, 0) -VSX_CVT_INT_TO_FP(xvcvsxwdp, 2, int32, float64, u32[j], f64[i], \ - 2*i + JOFFSET, 0, 0) -VSX_CVT_INT_TO_FP(xvcvuxwdp, 2, uint64, float64, u32[j], f64[i], \ - 2*i + JOFFSET, 0, 0) -VSX_CVT_INT_TO_FP(xvcvsxdsp, 2, int64, float32, u64[i], f32[j], \ - 2*i + JOFFSET, 0, 0) -VSX_CVT_INT_TO_FP(xvcvuxdsp, 2, uint64, float32, u64[i], f32[j], \ - 2*i + JOFFSET, 0, 0) -VSX_CVT_INT_TO_FP(xvcvsxwsp, 4, int32, float32, u32[j], f32[i], i, 0, 0) -VSX_CVT_INT_TO_FP(xvcvuxwsp, 4, uint32, float32, u32[j], f32[i], i, 0, 0) +VSX_CVT_INT_TO_FP(xscvsxddp, 1, int64, float64, VsrD(0), VsrD(0), 1, 0) +VSX_CVT_INT_TO_FP(xscvuxddp, 1, uint64, float64, VsrD(0), VsrD(0), 1, 0) +VSX_CVT_INT_TO_FP(xscvsxdsp, 1, int64, float64, VsrD(0), VsrD(0), 1, 1) +VSX_CVT_INT_TO_FP(xscvuxdsp, 1, uint64, float64, VsrD(0), VsrD(0), 1, 1) +VSX_CVT_INT_TO_FP(xvcvsxddp, 2, int64, float64, VsrD(i), VsrD(i), 0, 0) +VSX_CVT_INT_TO_FP(xvcvuxddp, 2, uint64, float64, VsrD(i), VsrD(i), 0, 0) +VSX_CVT_INT_TO_FP(xvcvsxwdp, 2, int32, float64, VsrW(2*i), VsrD(i), 0, 0) +VSX_CVT_INT_TO_FP(xvcvuxwdp, 2, uint64, float64, VsrW(2*i), VsrD(i), 0, 0) +VSX_CVT_INT_TO_FP(xvcvsxdsp, 2, int64, float32, VsrD(i), VsrW(2*i), 0, 0) +VSX_CVT_INT_TO_FP(xvcvuxdsp, 2, uint64, float32, VsrD(i), VsrW(2*i), 0, 0) +VSX_CVT_INT_TO_FP(xvcvsxwsp, 4, int32, float32, VsrW(i), VsrW(i), 0, 0) +VSX_CVT_INT_TO_FP(xvcvuxwsp, 4, uint32, float32, VsrW(i), VsrW(i), 0, 0) /* For "use current rounding mode", define a value that will not be one of * the existing rounding model enums. -- 1.7.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 0/9] target-ppc: VSX Bug Fixes 2014-03-26 20:45 [Qemu-devel] [PATCH 0/9] target-ppc: VSX Bug Fixes Tom Musta ` (8 preceding siblings ...) 2014-03-26 20:45 ` [Qemu-devel] [PATCH 9/9] target-ppc: Correct VSX Integer to FP Conversion Tom Musta @ 2014-03-28 3:49 ` Anton Blanchard 9 siblings, 0 replies; 15+ messages in thread From: Anton Blanchard @ 2014-03-28 3:49 UTC (permalink / raw) To: Tom Musta; +Cc: qemu-ppc, qemu-devel Hi Tom, > This patch series addresses bugs in the recently added VSX > instructions. Two general defects are fixed: Thanks! This series fixes the issue I had with wget. Tested-by: Anton Blanchard <anton@samba.org> Anton ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-03-31 18:12 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-03-26 20:45 [Qemu-devel] [PATCH 0/9] target-ppc: VSX Bug Fixes Tom Musta 2014-03-26 20:45 ` [Qemu-devel] [PATCH 1/9] softfloat: Introduce float32_to_uint64_round_to_zero Tom Musta 2014-03-31 17:26 ` [Qemu-devel] [Qemu-ppc] " Alexander Graf 2014-03-31 17:48 ` [Qemu-devel] " Peter Maydell 2014-03-31 18:07 ` Tom Musta 2014-03-31 18:12 ` Peter Maydell 2014-03-26 20:45 ` [Qemu-devel] [PATCH 2/9] target-ppc: Bug: VSX Convert to Integer Should Truncate Tom Musta 2014-03-26 20:45 ` [Qemu-devel] [PATCH 3/9] target-ppc: Define Endian-Correct Accessors for VSR Field Acess Tom Musta 2014-03-26 20:45 ` [Qemu-devel] [PATCH 4/9] target-ppc: Correct LE Host Inversion of Lower VSRs Tom Musta 2014-03-26 20:45 ` [Qemu-devel] [PATCH 5/9] target-ppc: Correct Simple VSR LE Host Inversions Tom Musta 2014-03-26 20:45 ` [Qemu-devel] [PATCH 6/9] target-ppc: Correct VSX Scalar Compares Tom Musta 2014-03-26 20:45 ` [Qemu-devel] [PATCH 7/9] target-ppc: Correct VSX FP to FP Conversions Tom Musta 2014-03-26 20:45 ` [Qemu-devel] [PATCH 8/9] target-ppc: Correct VSX FP to Integer Conversion Tom Musta 2014-03-26 20:45 ` [Qemu-devel] [PATCH 9/9] target-ppc: Correct VSX Integer to FP Conversion Tom Musta 2014-03-28 3:49 ` [Qemu-devel] [PATCH 0/9] target-ppc: VSX Bug Fixes Anton Blanchard
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).