[PATCH 0/2] target/i386: SSE floating-point fixes

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] target/i386: SSE floating-point fixes
@ 2020-06-25 23:57 Joseph Myers
  2020-06-25 23:57 ` [PATCH 1/2] target/i386: set SSE FTZ in correct floating-point state Joseph Myers
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Joseph Myers @ 2020-06-25 23:57 UTC (permalink / raw)
  To: qemu-devel, pbonzini, rth, ehabkost

Fix some issues relating to SSE floating-point emulation.  The first
patch fixes a problem with the handling of the FTZ bit that was found
through the testcase written for the second patch.  Rather than
writing a separate standalone test for that bug, it seemed sufficient
for the testcase in the second patch to cover both patches.

The style checker will produce its usual inapplicable warnings about
use of "volatile" in the testcase and about C99 hex float constants.

Joseph Myers (2):
  target/i386: set SSE FTZ in correct floating-point state
  target/i386: fix IEEE SSE floating-point exception raising

 target/i386/cpu.h                         |   1 +
 target/i386/fpu_helper.c                  |  35 +-
 target/i386/gdbstub.c                     |   1 +
 target/i386/helper.c                      |   1 +
 target/i386/helper.h                      |   1 +
 target/i386/ops_sse.h                     |  28 +-
 target/i386/translate.c                   |   1 +
 tests/tcg/i386/Makefile.target            |   4 +
 tests/tcg/i386/test-i386-sse-exceptions.c | 813 ++++++++++++++++++++++
 9 files changed, 872 insertions(+), 13 deletions(-)
 create mode 100644 tests/tcg/i386/test-i386-sse-exceptions.c

-- 
2.17.1


-- 
Joseph S. Myers
joseph@codesourcery.com


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] target/i386: set SSE FTZ in correct floating-point state
  2020-06-25 23:57 [PATCH 0/2] target/i386: SSE floating-point fixes Joseph Myers
@ 2020-06-25 23:57 ` Joseph Myers
  2020-06-25 23:58 ` [PATCH 2/2] target/i386: fix IEEE SSE floating-point exception raising Joseph Myers
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Joseph Myers @ 2020-06-25 23:57 UTC (permalink / raw)
  To: qemu-devel, pbonzini, rth, ehabkost

The code to set floating-point state when MXCSR changes calls
set_flush_to_zero on &env->fp_status, so affecting the x87
floating-point state rather than the SSE state.  Fix to call it for
&env->sse_status instead.

Signed-off-by: Joseph Myers <joseph@codesourcery.com>
---
 target/i386/fpu_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index 8ef5b463ea..6590ce482f 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -1830,7 +1830,7 @@ void update_mxcsr_status(CPUX86State *env)
     set_flush_inputs_to_zero((mxcsr & SSE_DAZ) ? 1 : 0, &env->sse_status);
 
     /* set flush to zero */
-    set_flush_to_zero((mxcsr & SSE_FZ) ? 1 : 0, &env->fp_status);
+    set_flush_to_zero((mxcsr & SSE_FZ) ? 1 : 0, &env->sse_status);
 }
 
 void helper_ldmxcsr(CPUX86State *env, uint32_t val)
-- 
2.17.1


-- 
Joseph S. Myers
joseph@codesourcery.com


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] target/i386: fix IEEE SSE floating-point exception raising
  2020-06-25 23:57 [PATCH 0/2] target/i386: SSE floating-point fixes Joseph Myers
  2020-06-25 23:57 ` [PATCH 1/2] target/i386: set SSE FTZ in correct floating-point state Joseph Myers
@ 2020-06-25 23:58 ` Joseph Myers
  2020-06-26  2:01 ` [PATCH 0/2] target/i386: SSE floating-point fixes no-reply
  2020-06-26 11:00 ` Paolo Bonzini
  3 siblings, 0 replies; 5+ messages in thread
From: Joseph Myers @ 2020-06-25 23:58 UTC (permalink / raw)
  To: qemu-devel, pbonzini, rth, ehabkost

The SSE instruction implementations all fail to raise the expected
IEEE floating-point exceptions because they do nothing to convert the
exception state from the softfloat machinery into the exception flags
in MXCSR.

Fix this by adding such conversions.  Unlike for x87, emulated SSE
floating-point operations might be optimized using hardware floating
point on the host, and so a different approach is taken that is
compatible with such optimizations.  The required invariant is that
all exceptions set in env->sse_status (other than "denormal operand",
for which the SSE semantics are different from those in the softfloat
code) are ones that are set in the MXCSR; the emulated MXCSR is
updated lazily when code reads MXCSR, while when code sets MXCSR, the
exceptions in env->sse_status are set accordingly.

A few instructions do not raise all the exceptions that would be
raised by the softfloat code, and those instructions are made to save
and restore the softfloat exception state accordingly.

Nothing is done about "denormal operand"; setting that (only for the
case when input denormals are *not* flushed to zero, the opposite of
the logic in the softfloat code for such an exception) will require
custom code for relevant instructions, or else architecture-specific
conditionals in the softfloat code for when to set such an exception
together with custom code for various SSE conversion and rounding
instructions that do not set that exception.

Nothing is done about trapping exceptions (for which there is minimal
and largely broken support in QEMU's emulation in the x87 case and no
support at all in the SSE case).

Signed-off-by: Joseph Myers <joseph@codesourcery.com>
---
 target/i386/cpu.h                         |   1 +
 target/i386/fpu_helper.c                  |  33 +
 target/i386/gdbstub.c                     |   1 +
 target/i386/helper.c                      |   1 +
 target/i386/helper.h                      |   1 +
 target/i386/ops_sse.h                     |  28 +-
 target/i386/translate.c                   |   1 +
 tests/tcg/i386/Makefile.target            |   4 +
 tests/tcg/i386/test-i386-sse-exceptions.c | 813 ++++++++++++++++++++++
 9 files changed, 871 insertions(+), 12 deletions(-)
 create mode 100644 tests/tcg/i386/test-i386-sse-exceptions.c

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 7d77efd9e4..06b2e3a5c6 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2143,6 +2143,7 @@ static inline bool cpu_vmx_maybe_enabled(CPUX86State *env)
 /* fpu_helper.c */
 void update_fp_status(CPUX86State *env);
 void update_mxcsr_status(CPUX86State *env);
+void update_mxcsr_from_sse_status(CPUX86State *env);
 
 static inline void cpu_set_mxcsr(CPUX86State *env, uint32_t mxcsr)
 {
diff --git a/target/i386/fpu_helper.c b/target/i386/fpu_helper.c
index 6590ce482f..bc79812fe3 100644
--- a/target/i386/fpu_helper.c
+++ b/target/i386/fpu_helper.c
@@ -1397,6 +1397,7 @@ static void do_xsave_fpu(CPUX86State *env, target_ulong ptr, uintptr_t ra)
 
 static void do_xsave_mxcsr(CPUX86State *env, target_ulong ptr, uintptr_t ra)
 {
+    update_mxcsr_from_sse_status(env);
     cpu_stl_data_ra(env, ptr + XO(legacy.mxcsr), env->mxcsr, ra);
     cpu_stl_data_ra(env, ptr + XO(legacy.mxcsr_mask), 0x0000ffff, ra);
 }
@@ -1826,6 +1827,14 @@ void update_mxcsr_status(CPUX86State *env)
     }
     set_float_rounding_mode(rnd_type, &env->sse_status);
 
+    /* Set exception flags.  */
+    set_float_exception_flags((mxcsr & FPUS_IE ? float_flag_invalid : 0) |
+                              (mxcsr & FPUS_ZE ? float_flag_divbyzero : 0) |
+                              (mxcsr & FPUS_OE ? float_flag_overflow : 0) |
+                              (mxcsr & FPUS_UE ? float_flag_underflow : 0) |
+                              (mxcsr & FPUS_PE ? float_flag_inexact : 0),
+                              &env->sse_status);
+
     /* set denormals are zero */
     set_flush_inputs_to_zero((mxcsr & SSE_DAZ) ? 1 : 0, &env->sse_status);
 
@@ -1833,6 +1842,30 @@ void update_mxcsr_status(CPUX86State *env)
     set_flush_to_zero((mxcsr & SSE_FZ) ? 1 : 0, &env->sse_status);
 }
 
+void update_mxcsr_from_sse_status(CPUX86State *env)
+{
+    uint8_t flags = get_float_exception_flags(&env->sse_status);
+    /*
+     * The MXCSR denormal flag has opposite semantics to
+     * float_flag_input_denormal (the softfloat code sets that flag
+     * only when flushing input denormals to zero, but SSE sets it
+     * only when not flushing them to zero), so is not converted
+     * here.
+     */
+    env->mxcsr |= ((flags & float_flag_invalid ? FPUS_IE : 0) |
+                   (flags & float_flag_divbyzero ? FPUS_ZE : 0) |
+                   (flags & float_flag_overflow ? FPUS_OE : 0) |
+                   (flags & float_flag_underflow ? FPUS_UE : 0) |
+                   (flags & float_flag_inexact ? FPUS_PE : 0) |
+                   (flags & float_flag_output_denormal ? FPUS_UE | FPUS_PE :
+                    0));
+}
+
+void helper_update_mxcsr(CPUX86State *env)
+{
+    update_mxcsr_from_sse_status(env);
+}
+
 void helper_ldmxcsr(CPUX86State *env, uint32_t val)
 {
     cpu_set_mxcsr(env, val);
diff --git a/target/i386/gdbstub.c b/target/i386/gdbstub.c
index b98a99500a..9ae43bda0f 100644
--- a/target/i386/gdbstub.c
+++ b/target/i386/gdbstub.c
@@ -184,6 +184,7 @@ int x86_cpu_gdb_read_register(CPUState *cs, GByteArray *mem_buf, int n)
             return gdb_get_reg32(mem_buf, 0); /* fop */
 
         case IDX_MXCSR_REG:
+            update_mxcsr_from_sse_status(env);
             return gdb_get_reg32(mem_buf, env->mxcsr);
 
         case IDX_CTL_CR0_REG:
diff --git a/target/i386/helper.c b/target/i386/helper.c
index c3a6e4fabe..fa2a1dcdda 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -544,6 +544,7 @@ void x86_cpu_dump_state(CPUState *cs, FILE *f, int flags)
         for(i = 0; i < 8; i++) {
             fptag |= ((!env->fptags[i]) << i);
         }
+        update_mxcsr_from_sse_status(env);
         qemu_fprintf(f, "FCW=%04x FSW=%04x [ST=%d] FTW=%02x MXCSR=%08x\n",
                      env->fpuc,
                      (env->fpus & ~0x3800) | (env->fpstt & 0x7) << 11,
diff --git a/target/i386/helper.h b/target/i386/helper.h
index 8f9e1905c3..c2ae2f7e61 100644
--- a/target/i386/helper.h
+++ b/target/i386/helper.h
@@ -207,6 +207,7 @@ DEF_HELPER_FLAGS_2(pext, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 /* MMX/SSE */
 
 DEF_HELPER_2(ldmxcsr, void, env, i32)
+DEF_HELPER_1(update_mxcsr, void, env)
 DEF_HELPER_1(enter_mmx, void, env)
 DEF_HELPER_1(emms, void, env)
 DEF_HELPER_3(movq, void, env, ptr, ptr)
diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index 14f2b16abd..c7614f8b0b 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -843,6 +843,7 @@ int64_t helper_cvttsd2sq(CPUX86State *env, ZMMReg *s)
 
 void helper_rsqrtps(CPUX86State *env, ZMMReg *d, ZMMReg *s)
 {
+    uint8_t old_flags = get_float_exception_flags(&env->sse_status);
     d->ZMM_S(0) = float32_div(float32_one,
                               float32_sqrt(s->ZMM_S(0), &env->sse_status),
                               &env->sse_status);
@@ -855,26 +856,33 @@ void helper_rsqrtps(CPUX86State *env, ZMMReg *d, ZMMReg *s)
     d->ZMM_S(3) = float32_div(float32_one,
                               float32_sqrt(s->ZMM_S(3), &env->sse_status),
                               &env->sse_status);
+    set_float_exception_flags(old_flags, &env->sse_status);
 }
 
 void helper_rsqrtss(CPUX86State *env, ZMMReg *d, ZMMReg *s)
 {
+    uint8_t old_flags = get_float_exception_flags(&env->sse_status);
     d->ZMM_S(0) = float32_div(float32_one,
                               float32_sqrt(s->ZMM_S(0), &env->sse_status),
                               &env->sse_status);
+    set_float_exception_flags(old_flags, &env->sse_status);
 }
 
 void helper_rcpps(CPUX86State *env, ZMMReg *d, ZMMReg *s)
 {
+    uint8_t old_flags = get_float_exception_flags(&env->sse_status);
     d->ZMM_S(0) = float32_div(float32_one, s->ZMM_S(0), &env->sse_status);
     d->ZMM_S(1) = float32_div(float32_one, s->ZMM_S(1), &env->sse_status);
     d->ZMM_S(2) = float32_div(float32_one, s->ZMM_S(2), &env->sse_status);
     d->ZMM_S(3) = float32_div(float32_one, s->ZMM_S(3), &env->sse_status);
+    set_float_exception_flags(old_flags, &env->sse_status);
 }
 
 void helper_rcpss(CPUX86State *env, ZMMReg *d, ZMMReg *s)
 {
+    uint8_t old_flags = get_float_exception_flags(&env->sse_status);
     d->ZMM_S(0) = float32_div(float32_one, s->ZMM_S(0), &env->sse_status);
+    set_float_exception_flags(old_flags, &env->sse_status);
 }
 
 static inline uint64_t helper_extrq(uint64_t src, int shift, int len)
@@ -1764,6 +1772,7 @@ void glue(helper_phminposuw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
 void glue(helper_roundps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
                                   uint32_t mode)
 {
+    uint8_t old_flags = get_float_exception_flags(&env->sse_status);
     signed char prev_rounding_mode;
 
     prev_rounding_mode = env->sse_status.float_rounding_mode;
@@ -1789,19 +1798,18 @@ void glue(helper_roundps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
     d->ZMM_S(2) = float32_round_to_int(s->ZMM_S(2), &env->sse_status);
     d->ZMM_S(3) = float32_round_to_int(s->ZMM_S(3), &env->sse_status);
 
-#if 0 /* TODO */
-    if (mode & (1 << 3)) {
+    if (mode & (1 << 3) && !(old_flags & float_flag_inexact)) {
         set_float_exception_flags(get_float_exception_flags(&env->sse_status) &
                                   ~float_flag_inexact,
                                   &env->sse_status);
     }
-#endif
     env->sse_status.float_rounding_mode = prev_rounding_mode;
 }
 
 void glue(helper_roundpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
                                   uint32_t mode)
 {
+    uint8_t old_flags = get_float_exception_flags(&env->sse_status);
     signed char prev_rounding_mode;
 
     prev_rounding_mode = env->sse_status.float_rounding_mode;
@@ -1825,19 +1833,18 @@ void glue(helper_roundpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
     d->ZMM_D(0) = float64_round_to_int(s->ZMM_D(0), &env->sse_status);
     d->ZMM_D(1) = float64_round_to_int(s->ZMM_D(1), &env->sse_status);
 
-#if 0 /* TODO */
-    if (mode & (1 << 3)) {
+    if (mode & (1 << 3) && !(old_flags & float_flag_inexact)) {
         set_float_exception_flags(get_float_exception_flags(&env->sse_status) &
                                   ~float_flag_inexact,
                                   &env->sse_status);
     }
-#endif
     env->sse_status.float_rounding_mode = prev_rounding_mode;
 }
 
 void glue(helper_roundss, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
                                   uint32_t mode)
 {
+    uint8_t old_flags = get_float_exception_flags(&env->sse_status);
     signed char prev_rounding_mode;
 
     prev_rounding_mode = env->sse_status.float_rounding_mode;
@@ -1860,19 +1867,18 @@ void glue(helper_roundss, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
 
     d->ZMM_S(0) = float32_round_to_int(s->ZMM_S(0), &env->sse_status);
 
-#if 0 /* TODO */
-    if (mode & (1 << 3)) {
+    if (mode & (1 << 3) && !(old_flags & float_flag_inexact)) {
         set_float_exception_flags(get_float_exception_flags(&env->sse_status) &
                                   ~float_flag_inexact,
                                   &env->sse_status);
     }
-#endif
     env->sse_status.float_rounding_mode = prev_rounding_mode;
 }
 
 void glue(helper_roundsd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
                                   uint32_t mode)
 {
+    uint8_t old_flags = get_float_exception_flags(&env->sse_status);
     signed char prev_rounding_mode;
 
     prev_rounding_mode = env->sse_status.float_rounding_mode;
@@ -1895,13 +1901,11 @@ void glue(helper_roundsd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
 
     d->ZMM_D(0) = float64_round_to_int(s->ZMM_D(0), &env->sse_status);
 
-#if 0 /* TODO */
-    if (mode & (1 << 3)) {
+    if (mode & (1 << 3) && !(old_flags & float_flag_inexact)) {
         set_float_exception_flags(get_float_exception_flags(&env->sse_status) &
                                   ~float_flag_inexact,
                                   &env->sse_status);
     }
-#endif
     env->sse_status.float_rounding_mode = prev_rounding_mode;
 }
 
diff --git a/target/i386/translate.c b/target/i386/translate.c
index 5e5dbb41b0..b3fea54411 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -8157,6 +8157,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
                 gen_exception(s, EXCP07_PREX, pc_start - s->cs_base);
                 break;
             }
+            gen_helper_update_mxcsr(cpu_env);
             gen_lea_modrm(env, s, modrm);
             tcg_gen_ld32u_tl(s->T0, cpu_env, offsetof(CPUX86State, mxcsr));
             gen_op_st_v(s, MO_32, s->T0, s->A0);
diff --git a/tests/tcg/i386/Makefile.target b/tests/tcg/i386/Makefile.target
index 1a6463a7dc..a66232a67d 100644
--- a/tests/tcg/i386/Makefile.target
+++ b/tests/tcg/i386/Makefile.target
@@ -10,6 +10,10 @@ ALL_X86_TESTS=$(I386_SRCS:.c=)
 SKIP_I386_TESTS=test-i386-ssse3
 X86_64_TESTS:=$(filter test-i386-ssse3, $(ALL_X86_TESTS))
 
+test-i386-sse-exceptions: CFLAGS += -msse4.1 -mfpmath=sse
+run-test-i386-sse-exceptions: QEMU_OPTS += -cpu max
+run-plugin-test-i386-sse-exceptions-%: QEMU_OPTS += -cpu max
+
 test-i386-pcmpistri: CFLAGS += -msse4.2
 run-test-i386-pcmpistri: QEMU_OPTS += -cpu max
 run-plugin-test-i386-pcmpistri-%: QEMU_OPTS += -cpu max
diff --git a/tests/tcg/i386/test-i386-sse-exceptions.c b/tests/tcg/i386/test-i386-sse-exceptions.c
new file mode 100644
index 0000000000..a104f46c11
--- /dev/null
+++ b/tests/tcg/i386/test-i386-sse-exceptions.c
@@ -0,0 +1,813 @@
+/* Test SSE exceptions.  */
+
+#include <float.h>
+#include <stdint.h>
+#include <stdio.h>
+
+volatile float f_res;
+volatile double d_res;
+
+volatile float f_snan = __builtin_nansf("");
+volatile float f_half = 0.5f;
+volatile float f_third = 1.0f / 3.0f;
+volatile float f_nan = __builtin_nanl("");
+volatile float f_inf = __builtin_inff();
+volatile float f_ninf = -__builtin_inff();
+volatile float f_one = 1.0f;
+volatile float f_two = 2.0f;
+volatile float f_zero = 0.0f;
+volatile float f_nzero = -0.0f;
+volatile float f_min = FLT_MIN;
+volatile float f_true_min = 0x1p-149f;
+volatile float f_max = FLT_MAX;
+volatile float f_nmax = -FLT_MAX;
+
+volatile double d_snan = __builtin_nans("");
+volatile double d_half = 0.5;
+volatile double d_third = 1.0 / 3.0;
+volatile double d_nan = __builtin_nan("");
+volatile double d_inf = __builtin_inf();
+volatile double d_ninf = -__builtin_inf();
+volatile double d_one = 1.0;
+volatile double d_two = 2.0;
+volatile double d_zero = 0.0;
+volatile double d_nzero = -0.0;
+volatile double d_min = DBL_MIN;
+volatile double d_true_min = 0x1p-1074;
+volatile double d_max = DBL_MAX;
+volatile double d_nmax = -DBL_MAX;
+
+volatile int32_t i32_max = INT32_MAX;
+
+#define IE (1 << 0)
+#define ZE (1 << 2)
+#define OE (1 << 3)
+#define UE (1 << 4)
+#define PE (1 << 5)
+#define EXC (IE | ZE | OE | UE | PE)
+
+uint32_t mxcsr_default = 0x1f80;
+uint32_t mxcsr_ftz = 0x9f80;
+
+int main(void)
+{
+    uint32_t mxcsr;
+    int32_t i32_res;
+    int ret = 0;
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    d_res = f_snan;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: widen float snan\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = d_min;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != (UE | PE)) {
+        printf("FAIL: narrow float underflow\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = d_max;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != (OE | PE)) {
+        printf("FAIL: narrow float overflow\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = d_third;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != PE) {
+        printf("FAIL: narrow float inexact\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = d_snan;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: narrow float snan\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("roundss $4, %0, %0" : "=x" (f_res) : "0" (f_min));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != PE) {
+        printf("FAIL: roundss min\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("roundss $12, %0, %0" : "=x" (f_res) : "0" (f_min));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != 0) {
+        printf("FAIL: roundss no-inexact min\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("roundss $4, %0, %0" : "=x" (f_res) : "0" (f_snan));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: roundss snan\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("roundss $12, %0, %0" : "=x" (f_res) : "0" (f_snan));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: roundss no-inexact snan\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("roundsd $4, %0, %0" : "=x" (d_res) : "0" (d_min));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != PE) {
+        printf("FAIL: roundsd min\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("roundsd $12, %0, %0" : "=x" (d_res) : "0" (d_min));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != 0) {
+        printf("FAIL: roundsd no-inexact min\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("roundsd $4, %0, %0" : "=x" (d_res) : "0" (d_snan));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: roundsd snan\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("roundsd $12, %0, %0" : "=x" (d_res) : "0" (d_snan));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: roundsd no-inexact snan\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("comiss %1, %0" : : "x" (f_nan), "x" (f_zero));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: comiss nan\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("ucomiss %1, %0" : : "x" (f_nan), "x" (f_zero));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != 0) {
+        printf("FAIL: ucomiss nan\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("ucomiss %1, %0" : : "x" (f_snan), "x" (f_zero));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: ucomiss snan\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("comisd %1, %0" : : "x" (d_nan), "x" (d_zero));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: comisd nan\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("ucomisd %1, %0" : : "x" (d_nan), "x" (d_zero));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != 0) {
+        printf("FAIL: ucomisd nan\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("ucomisd %1, %0" : : "x" (d_snan), "x" (d_zero));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: ucomisd snan\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = f_max + f_max;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != (OE | PE)) {
+        printf("FAIL: float add overflow\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = f_max + f_min;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != PE) {
+        printf("FAIL: float add inexact\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = f_inf + f_ninf;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: float add inf -inf\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = f_snan + f_third;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: float add snan\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_ftz));
+    f_res = f_true_min + f_true_min;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != (UE | PE)) {
+        printf("FAIL: float add FTZ underflow\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    d_res = d_max + d_max;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != (OE | PE)) {
+        printf("FAIL: double add overflow\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    d_res = d_max + d_min;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != PE) {
+        printf("FAIL: double add inexact\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    d_res = d_inf + d_ninf;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: double add inf -inf\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    d_res = d_snan + d_third;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: double add snan\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_ftz));
+    d_res = d_true_min + d_true_min;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != (UE | PE)) {
+        printf("FAIL: double add FTZ underflow\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = f_max - f_nmax;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != (OE | PE)) {
+        printf("FAIL: float sub overflow\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = f_max - f_min;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != PE) {
+        printf("FAIL: float sub inexact\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = f_inf - f_inf;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: float sub inf inf\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = f_snan - f_third;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: float sub snan\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_ftz));
+    f_res = f_min - f_true_min;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != (UE | PE)) {
+        printf("FAIL: float sub FTZ underflow\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    d_res = d_max - d_nmax;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != (OE | PE)) {
+        printf("FAIL: double sub overflow\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    d_res = d_max - d_min;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != PE) {
+        printf("FAIL: double sub inexact\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    d_res = d_inf - d_inf;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: double sub inf inf\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    d_res = d_snan - d_third;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: double sub snan\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_ftz));
+    d_res = d_min - d_true_min;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != (UE | PE)) {
+        printf("FAIL: double sub FTZ underflow\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = f_max * f_max;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != (OE | PE)) {
+        printf("FAIL: float mul overflow\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = f_third * f_third;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != PE) {
+        printf("FAIL: float mul inexact\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = f_min * f_min;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != (UE | PE)) {
+        printf("FAIL: float mul underflow\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = f_inf * f_zero;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: float mul inf 0\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = f_snan * f_third;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: float mul snan\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_ftz));
+    f_res = f_min * f_half;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != (UE | PE)) {
+        printf("FAIL: float mul FTZ underflow\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    d_res = d_max * d_max;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != (OE | PE)) {
+        printf("FAIL: double mul overflow\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    d_res = d_third * d_third;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != PE) {
+        printf("FAIL: double mul inexact\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    d_res = d_min * d_min;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != (UE | PE)) {
+        printf("FAIL: double mul underflow\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    d_res = d_inf * d_zero;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: double mul inf 0\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    d_res = d_snan * d_third;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: double mul snan\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_ftz));
+    d_res = d_min * d_half;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != (UE | PE)) {
+        printf("FAIL: double mul FTZ underflow\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = f_max / f_min;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != (OE | PE)) {
+        printf("FAIL: float div overflow\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = f_one / f_third;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != PE) {
+        printf("FAIL: float div inexact\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = f_min / f_max;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != (UE | PE)) {
+        printf("FAIL: float div underflow\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = f_one / f_zero;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != ZE) {
+        printf("FAIL: float div 1 0\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = f_inf / f_zero;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != 0) {
+        printf("FAIL: float div inf 0\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = f_nan / f_zero;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != 0) {
+        printf("FAIL: float div nan 0\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = f_zero / f_zero;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: float div 0 0\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = f_inf / f_inf;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: float div inf inf\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    f_res = f_snan / f_third;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: float div snan\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_ftz));
+    f_res = f_min / f_two;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != (UE | PE)) {
+        printf("FAIL: float div FTZ underflow\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    d_res = d_max / d_min;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != (OE | PE)) {
+        printf("FAIL: double div overflow\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    d_res = d_one / d_third;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != PE) {
+        printf("FAIL: double div inexact\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    d_res = d_min / d_max;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != (UE | PE)) {
+        printf("FAIL: double div underflow\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    d_res = d_one / d_zero;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != ZE) {
+        printf("FAIL: double div 1 0\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    d_res = d_inf / d_zero;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != 0) {
+        printf("FAIL: double div inf 0\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    d_res = d_nan / d_zero;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != 0) {
+        printf("FAIL: double div nan 0\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    d_res = d_zero / d_zero;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: double div 0 0\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    d_res = d_inf / d_inf;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: double div inf inf\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    d_res = d_snan / d_third;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: double div snan\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_ftz));
+    d_res = d_min / d_two;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != (UE | PE)) {
+        printf("FAIL: double div FTZ underflow\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("sqrtss %0, %0" : "=x" (f_res) : "0" (f_max));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != PE) {
+        printf("FAIL: sqrtss inexact\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("sqrtss %0, %0" : "=x" (f_res) : "0" (f_nmax));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: sqrtss -max\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("sqrtss %0, %0" : "=x" (f_res) : "0" (f_ninf));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: sqrtss -inf\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("sqrtss %0, %0" : "=x" (f_res) : "0" (f_snan));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: sqrtss snan\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("sqrtss %0, %0" : "=x" (f_res) : "0" (f_nzero));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != 0) {
+        printf("FAIL: sqrtss -0\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("sqrtss %0, %0" : "=x" (f_res) :
+                      "0" (-__builtin_nanf("")));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != 0) {
+        printf("FAIL: sqrtss -nan\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("sqrtsd %0, %0" : "=x" (d_res) : "0" (d_max));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != PE) {
+        printf("FAIL: sqrtsd inexact\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("sqrtsd %0, %0" : "=x" (d_res) : "0" (d_nmax));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: sqrtsd -max\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("sqrtsd %0, %0" : "=x" (d_res) : "0" (d_ninf));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: sqrtsd -inf\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("sqrtsd %0, %0" : "=x" (d_res) : "0" (d_snan));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: sqrtsd snan\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("sqrtsd %0, %0" : "=x" (d_res) : "0" (d_nzero));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != 0) {
+        printf("FAIL: sqrtsd -0\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("sqrtsd %0, %0" : "=x" (d_res) :
+                      "0" (-__builtin_nan("")));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != 0) {
+        printf("FAIL: sqrtsd -nan\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("maxss %1, %0" : : "x" (f_nan), "x" (f_zero));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: maxss nan\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("minss %1, %0" : : "x" (f_nan), "x" (f_zero));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: minss nan\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("maxsd %1, %0" : : "x" (d_nan), "x" (d_zero));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: maxsd nan\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("minsd %1, %0" : : "x" (d_nan), "x" (d_zero));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: minsd nan\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("cvtsi2ss %1, %0" : "=x" (f_res) : "m" (i32_max));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != PE) {
+        printf("FAIL: cvtsi2ss inexact\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("cvtsi2sd %1, %0" : "=x" (d_res) : "m" (i32_max));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != 0) {
+        printf("FAIL: cvtsi2sd exact\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("cvtss2si %1, %0" : "=r" (i32_res) : "x" (1.5f));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != PE) {
+        printf("FAIL: cvtss2si inexact\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("cvtss2si %1, %0" : "=r" (i32_res) : "x" (0x1p31f));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: cvtss2si 0x1p31\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("cvtss2si %1, %0" : "=r" (i32_res) : "x" (f_inf));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: cvtss2si inf\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("cvtsd2si %1, %0" : "=r" (i32_res) : "x" (1.5));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != PE) {
+        printf("FAIL: cvtsd2si inexact\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("cvtsd2si %1, %0" : "=r" (i32_res) : "x" (0x1p31));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: cvtsd2si 0x1p31\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("cvtsd2si %1, %0" : "=r" (i32_res) : "x" (d_inf));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: cvtsd2si inf\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("cvttss2si %1, %0" : "=r" (i32_res) : "x" (1.5f));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != PE) {
+        printf("FAIL: cvttss2si inexact\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("cvttss2si %1, %0" : "=r" (i32_res) : "x" (0x1p31f));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: cvttss2si 0x1p31\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("cvttss2si %1, %0" : "=r" (i32_res) : "x" (f_inf));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: cvttss2si inf\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("cvttsd2si %1, %0" : "=r" (i32_res) : "x" (1.5));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != PE) {
+        printf("FAIL: cvttsd2si inexact\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("cvttsd2si %1, %0" : "=r" (i32_res) : "x" (0x1p31));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: cvttsd2si 0x1p31\n");
+        ret = 1;
+    }
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("cvttsd2si %1, %0" : "=r" (i32_res) : "x" (d_inf));
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != IE) {
+        printf("FAIL: cvttsd2si inf\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("rcpss %0, %0" : "=x" (f_res) : "0" (f_snan));
+    f_res += f_one;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != 0) {
+        printf("FAIL: rcpss snan\n");
+        ret = 1;
+    }
+
+    __asm__ volatile ("ldmxcsr %0" : : "m" (mxcsr_default));
+    __asm__ volatile ("rsqrtss %0, %0" : "=x" (f_res) : "0" (f_snan));
+    f_res += f_one;
+    __asm__ volatile ("stmxcsr %0" : "=m" (mxcsr));
+    if ((mxcsr & EXC) != 0) {
+        printf("FAIL: rsqrtss snan\n");
+        ret = 1;
+    }
+
+    return ret;
+}
-- 
2.17.1


-- 
Joseph S. Myers
joseph@codesourcery.com


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] target/i386: SSE floating-point fixes
  2020-06-25 23:57 [PATCH 0/2] target/i386: SSE floating-point fixes Joseph Myers
  2020-06-25 23:57 ` [PATCH 1/2] target/i386: set SSE FTZ in correct floating-point state Joseph Myers
  2020-06-25 23:58 ` [PATCH 2/2] target/i386: fix IEEE SSE floating-point exception raising Joseph Myers
@ 2020-06-26  2:01 ` no-reply
  2020-06-26 11:00 ` Paolo Bonzini
  3 siblings, 0 replies; 5+ messages in thread
From: no-reply @ 2020-06-26  2:01 UTC (permalink / raw)
  To: joseph; +Cc: pbonzini, qemu-devel, ehabkost, rth

Patchew URL: https://patchew.org/QEMU/alpine.DEB.2.21.2006252355430.3832@digraph.polyomino.org.uk/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH 0/2] target/i386: SSE floating-point fixes
Type: series
Message-id: alpine.DEB.2.21.2006252355430.3832@digraph.polyomino.org.uk

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]         patchew/alpine.DEB.2.21.2006252355430.3832@digraph.polyomino.org.uk -> patchew/alpine.DEB.2.21.2006252355430.3832@digraph.polyomino.org.uk
Switched to a new branch 'test'
6986974 target/i386: fix IEEE SSE floating-point exception raising
4effb74 target/i386: set SSE FTZ in correct floating-point state

=== OUTPUT BEGIN ===
1/2 Checking commit 4effb7446212 (target/i386: set SSE FTZ in correct floating-point state)
2/2 Checking commit 6986974af06e (target/i386: fix IEEE SSE floating-point exception raising)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#310: 
new file mode 100644

ERROR: Use of volatile is usually wrong, please add a comment
#321: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:7:
+volatile float f_res;

ERROR: Use of volatile is usually wrong, please add a comment
#322: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:8:
+volatile double d_res;

ERROR: Use of volatile is usually wrong, please add a comment
#324: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:10:
+volatile float f_snan = __builtin_nansf("");

ERROR: Use of volatile is usually wrong, please add a comment
#325: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:11:
+volatile float f_half = 0.5f;

ERROR: Use of volatile is usually wrong, please add a comment
#326: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:12:
+volatile float f_third = 1.0f / 3.0f;

ERROR: Use of volatile is usually wrong, please add a comment
#327: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:13:
+volatile float f_nan = __builtin_nanl("");

ERROR: Use of volatile is usually wrong, please add a comment
#328: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:14:
+volatile float f_inf = __builtin_inff();

ERROR: Use of volatile is usually wrong, please add a comment
#329: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:15:
+volatile float f_ninf = -__builtin_inff();

ERROR: Use of volatile is usually wrong, please add a comment
#330: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:16:
+volatile float f_one = 1.0f;

ERROR: Use of volatile is usually wrong, please add a comment
#331: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:17:
+volatile float f_two = 2.0f;

ERROR: Use of volatile is usually wrong, please add a comment
#332: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:18:
+volatile float f_zero = 0.0f;

ERROR: Use of volatile is usually wrong, please add a comment
#333: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:19:
+volatile float f_nzero = -0.0f;

ERROR: Use of volatile is usually wrong, please add a comment
#334: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:20:
+volatile float f_min = FLT_MIN;

ERROR: spaces required around that '-' (ctx:VxV)
#335: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:21:
+volatile float f_true_min = 0x1p-149f;
                                 ^

ERROR: Use of volatile is usually wrong, please add a comment
#335: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:21:
+volatile float f_true_min = 0x1p-149f;

ERROR: Use of volatile is usually wrong, please add a comment
#336: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:22:
+volatile float f_max = FLT_MAX;

ERROR: Use of volatile is usually wrong, please add a comment
#337: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:23:
+volatile float f_nmax = -FLT_MAX;

ERROR: Use of volatile is usually wrong, please add a comment
#339: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:25:
+volatile double d_snan = __builtin_nans("");

ERROR: Use of volatile is usually wrong, please add a comment
#340: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:26:
+volatile double d_half = 0.5;

ERROR: Use of volatile is usually wrong, please add a comment
#341: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:27:
+volatile double d_third = 1.0 / 3.0;

ERROR: Use of volatile is usually wrong, please add a comment
#342: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:28:
+volatile double d_nan = __builtin_nan("");

ERROR: Use of volatile is usually wrong, please add a comment
#343: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:29:
+volatile double d_inf = __builtin_inf();

ERROR: Use of volatile is usually wrong, please add a comment
#344: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:30:
+volatile double d_ninf = -__builtin_inf();

ERROR: Use of volatile is usually wrong, please add a comment
#345: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:31:
+volatile double d_one = 1.0;

ERROR: Use of volatile is usually wrong, please add a comment
#346: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:32:
+volatile double d_two = 2.0;

ERROR: Use of volatile is usually wrong, please add a comment
#347: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:33:
+volatile double d_zero = 0.0;

ERROR: Use of volatile is usually wrong, please add a comment
#348: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:34:
+volatile double d_nzero = -0.0;

ERROR: Use of volatile is usually wrong, please add a comment
#349: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:35:
+volatile double d_min = DBL_MIN;

ERROR: spaces required around that '-' (ctx:VxV)
#350: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:36:
+volatile double d_true_min = 0x1p-1074;
                                  ^

ERROR: Use of volatile is usually wrong, please add a comment
#350: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:36:
+volatile double d_true_min = 0x1p-1074;

ERROR: Use of volatile is usually wrong, please add a comment
#351: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:37:
+volatile double d_max = DBL_MAX;

ERROR: Use of volatile is usually wrong, please add a comment
#352: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:38:
+volatile double d_nmax = -DBL_MAX;

ERROR: Use of volatile is usually wrong, please add a comment
#354: FILE: tests/tcg/i386/test-i386-sse-exceptions.c:40:
+volatile int32_t i32_max = INT32_MAX;

total: 33 errors, 1 warnings, 1033 lines checked

Patch 2/2 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/alpine.DEB.2.21.2006252355430.3832@digraph.polyomino.org.uk/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] target/i386: SSE floating-point fixes
  2020-06-25 23:57 [PATCH 0/2] target/i386: SSE floating-point fixes Joseph Myers
                   ` (2 preceding siblings ...)
  2020-06-26  2:01 ` [PATCH 0/2] target/i386: SSE floating-point fixes no-reply
@ 2020-06-26 11:00 ` Paolo Bonzini
  3 siblings, 0 replies; 5+ messages in thread
From: Paolo Bonzini @ 2020-06-26 11:00 UTC (permalink / raw)
  To: Joseph Myers, qemu-devel, rth, ehabkost

On 26/06/20 01:57, Joseph Myers wrote:
> Fix some issues relating to SSE floating-point emulation.  The first
> patch fixes a problem with the handling of the FTZ bit that was found
> through the testcase written for the second patch.  Rather than
> writing a separate standalone test for that bug, it seemed sufficient
> for the testcase in the second patch to cover both patches.
> 
> The style checker will produce its usual inapplicable warnings about
> use of "volatile" in the testcase and about C99 hex float constants.
> 
> Joseph Myers (2):
>   target/i386: set SSE FTZ in correct floating-point state
>   target/i386: fix IEEE SSE floating-point exception raising
> 
>  target/i386/cpu.h                         |   1 +
>  target/i386/fpu_helper.c                  |  35 +-
>  target/i386/gdbstub.c                     |   1 +
>  target/i386/helper.c                      |   1 +
>  target/i386/helper.h                      |   1 +
>  target/i386/ops_sse.h                     |  28 +-
>  target/i386/translate.c                   |   1 +
>  tests/tcg/i386/Makefile.target            |   4 +
>  tests/tcg/i386/test-i386-sse-exceptions.c | 813 ++++++++++++++++++++++
>  9 files changed, 872 insertions(+), 13 deletions(-)
>  create mode 100644 tests/tcg/i386/test-i386-sse-exceptions.c
> 

Queued, thanks.

Paolo



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-06-26 11:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-06-25 23:57 [PATCH 0/2] target/i386: SSE floating-point fixes Joseph Myers
2020-06-25 23:57 ` [PATCH 1/2] target/i386: set SSE FTZ in correct floating-point state Joseph Myers
2020-06-25 23:58 ` [PATCH 2/2] target/i386: fix IEEE SSE floating-point exception raising Joseph Myers
2020-06-26  2:01 ` [PATCH 0/2] target/i386: SSE floating-point fixes no-reply
2020-06-26 11:00 ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).