[PATCH v2 0/4] target/s390x: Implement DIVIDE TO INTEGER

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/4] target/s390x: Implement DIVIDE TO INTEGER
@ 2026-01-27 15:31 Ilya Leoshkevich
  2026-01-27 15:31 ` [PATCH v2 1/4] target/s390x: Dump Floating-Point-Control Register Ilya Leoshkevich
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Ilya Leoshkevich @ 2026-01-27 15:31 UTC (permalink / raw)
  To: Thomas Huth, Richard Henderson
  Cc: David Hildenbrand, qemu-s390x, qemu-devel, Ilya Leoshkevich

v1: https://lore.kernel.org/qemu-devel/20260121222116.713325-1-iii@linux.ibm.com/
v1 -> v2: Move the implementatation to fpu/ and rewrite using
          FloatParts (Richard). I can't say I particularly like the way
          it looks, but at least most macros are gone and it survives
          fuzzing.
          Explain why we need -O0 for the test (Alex).
          New patch: s390_get_bfp_rounding_mode().
          Add a few comments with calculation examples to the test.

Hi,

This series implements DIVIDE TO INTEGER instruction, which is
required to run LuaJIT.

Patch 1 is a debugging helper, patch 2 is a small refactoring, patch 3
is the implementation.

Since the instruction is quite complex, I've extensively tested it
using a libFuzzer-based harness [1] that compares emulation with native
execution at ~15k exec/s. The tests (patch 4) use data generated
this way.

Best regards,
Ilya

Ilya Leoshkevich (4):
  target/s390x: Dump Floating-Point-Control Register
  target/s390x: Extract s390_get_bfp_rounding_mode()
  target/s390x: Implement DIVIDE TO INTEGER
  tests/tcg/s390x: Test DIVIDE TO INTEGER

 fpu/softfloat.c                     | 158 ++++++++++++++++++
 include/fpu/softfloat.h             |  11 ++
 target/s390x/cpu-dump.c             |   1 +
 target/s390x/helper.h               |   1 +
 target/s390x/tcg/fpu_helper.c       | 118 ++++++++++----
 target/s390x/tcg/insn-data.h.inc    |   5 +-
 target/s390x/tcg/translate.c        |  26 +++
 tests/tcg/s390x/Makefile.target     |   5 +
 tests/tcg/s390x/divide-to-integer.c | 242 ++++++++++++++++++++++++++++
 9 files changed, 535 insertions(+), 32 deletions(-)
 create mode 100644 tests/tcg/s390x/divide-to-integer.c

-- 
2.52.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/4] target/s390x: Dump Floating-Point-Control Register
  2026-01-27 15:31 [PATCH v2 0/4] target/s390x: Implement DIVIDE TO INTEGER Ilya Leoshkevich
@ 2026-01-27 15:31 ` Ilya Leoshkevich
  2026-01-27 15:31 ` [PATCH v2 2/4] target/s390x: Extract s390_get_bfp_rounding_mode() Ilya Leoshkevich
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Ilya Leoshkevich @ 2026-01-27 15:31 UTC (permalink / raw)
  To: Thomas Huth, Richard Henderson
  Cc: David Hildenbrand, qemu-s390x, qemu-devel, Ilya Leoshkevich,
	Alex Bennée

Knowing the value of this register is very useful for debugging
floating-point code.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 target/s390x/cpu-dump.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/s390x/cpu-dump.c b/target/s390x/cpu-dump.c
index 869d3a4ad54..5b852928031 100644
--- a/target/s390x/cpu-dump.c
+++ b/target/s390x/cpu-dump.c
@@ -63,6 +63,7 @@ void s390_cpu_dump_state(CPUState *cs, FILE *f, int flags)
                              (i % 4) == 3 ? '\n' : ' ');
             }
         }
+        qemu_fprintf(f, "FPC=%08" PRIx32 "\n", env->fpc);
     }
 
 #ifndef CONFIG_USER_ONLY
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/4] target/s390x: Extract s390_get_bfp_rounding_mode()
  2026-01-27 15:31 [PATCH v2 0/4] target/s390x: Implement DIVIDE TO INTEGER Ilya Leoshkevich
  2026-01-27 15:31 ` [PATCH v2 1/4] target/s390x: Dump Floating-Point-Control Register Ilya Leoshkevich
@ 2026-01-27 15:31 ` Ilya Leoshkevich
  2026-01-28  5:25   ` Richard Henderson
  2026-01-27 15:31 ` [PATCH v2 3/4] target/s390x: Implement DIVIDE TO INTEGER Ilya Leoshkevich
  2026-01-27 15:31 ` [PATCH v2 4/4] tests/tcg/s390x: Test " Ilya Leoshkevich
  3 siblings, 1 reply; 11+ messages in thread
From: Ilya Leoshkevich @ 2026-01-27 15:31 UTC (permalink / raw)
  To: Thomas Huth, Richard Henderson
  Cc: David Hildenbrand, qemu-s390x, qemu-devel, Ilya Leoshkevich

For DIVIDE TO INTEGER it will be helpful to pass final-quotient
rounding mode around explicitly rather than setting it in fpu_status
implicitly. To facilitate this, extract a function for converting the
mask to the rounding mode.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 target/s390x/tcg/fpu_helper.c | 62 +++++++++++++++++------------------
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/target/s390x/tcg/fpu_helper.c b/target/s390x/tcg/fpu_helper.c
index 1ba43715ac1..7a3ff501a46 100644
--- a/target/s390x/tcg/fpu_helper.c
+++ b/target/s390x/tcg/fpu_helper.c
@@ -56,6 +56,35 @@ uint8_t s390_softfloat_exc_to_ieee(unsigned int exc)
     return s390_exc;
 }
 
+static int s390_get_bfp_rounding_mode(CPUS390XState *env, int m3)
+{
+    switch (m3) {
+    case 0:
+        /* current mode */
+        return env->fpu_status.float_rounding_mode;
+    case 1:
+        /* round to nearest with ties away from 0 */
+        return float_round_ties_away;
+    case 3:
+        /* round to prepare for shorter precision */
+        return float_round_to_odd;
+    case 4:
+        /* round to nearest with ties to even */
+        return float_round_nearest_even;
+    case 5:
+        /* round to zero */
+        return float_round_to_zero;
+    case 6:
+        /* round to +inf */
+        return float_round_up;
+    case 7:
+        /* round to -inf */
+        return float_round_down;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 /* Should be called after any operation that may raise IEEE exceptions.  */
 static void handle_exceptions(CPUS390XState *env, bool XxC, uintptr_t retaddr)
 {
@@ -416,37 +445,8 @@ int s390_swap_bfp_rounding_mode(CPUS390XState *env, int m3)
 {
     int ret = env->fpu_status.float_rounding_mode;
 
-    switch (m3) {
-    case 0:
-        /* current mode */
-        break;
-    case 1:
-        /* round to nearest with ties away from 0 */
-        set_float_rounding_mode(float_round_ties_away, &env->fpu_status);
-        break;
-    case 3:
-        /* round to prepare for shorter precision */
-        set_float_rounding_mode(float_round_to_odd, &env->fpu_status);
-        break;
-    case 4:
-        /* round to nearest with ties to even */
-        set_float_rounding_mode(float_round_nearest_even, &env->fpu_status);
-        break;
-    case 5:
-        /* round to zero */
-        set_float_rounding_mode(float_round_to_zero, &env->fpu_status);
-        break;
-    case 6:
-        /* round to +inf */
-        set_float_rounding_mode(float_round_up, &env->fpu_status);
-        break;
-    case 7:
-        /* round to -inf */
-        set_float_rounding_mode(float_round_down, &env->fpu_status);
-        break;
-    default:
-        g_assert_not_reached();
-    }
+    set_float_rounding_mode(s390_get_bfp_rounding_mode(env, m3),
+                            &env->fpu_status);
     return ret;
 }
 
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/4] target/s390x: Implement DIVIDE TO INTEGER
  2026-01-27 15:31 [PATCH v2 0/4] target/s390x: Implement DIVIDE TO INTEGER Ilya Leoshkevich
  2026-01-27 15:31 ` [PATCH v2 1/4] target/s390x: Dump Floating-Point-Control Register Ilya Leoshkevich
  2026-01-27 15:31 ` [PATCH v2 2/4] target/s390x: Extract s390_get_bfp_rounding_mode() Ilya Leoshkevich
@ 2026-01-27 15:31 ` Ilya Leoshkevich
  2026-01-28  5:50   ` Richard Henderson
  2026-01-27 15:31 ` [PATCH v2 4/4] tests/tcg/s390x: Test " Ilya Leoshkevich
  3 siblings, 1 reply; 11+ messages in thread
From: Ilya Leoshkevich @ 2026-01-27 15:31 UTC (permalink / raw)
  To: Thomas Huth, Richard Henderson
  Cc: David Hildenbrand, qemu-s390x, qemu-devel, Ilya Leoshkevich

DIVIDE TO INTEGER computes floating point remainder and is used by
LuaJIT, so add it to QEMU.

The instruction comes in two flavors: for floats and doubles, which are
very similar. Since it's also quite complex, copy-pasting the
implementation would result in barely maintainable code. Mitigate that
using macros. An alternative would be an .inc file, but this looks like
an overkill.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 fpu/softfloat.c                  | 158 +++++++++++++++++++++++++++++++
 include/fpu/softfloat.h          |  11 +++
 target/s390x/helper.h            |   1 +
 target/s390x/tcg/fpu_helper.c    |  56 +++++++++++
 target/s390x/tcg/insn-data.h.inc |   5 +-
 target/s390x/tcg/translate.c     |  26 +++++
 6 files changed, 256 insertions(+), 1 deletion(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 8094358c2e4..178ea262057 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -5361,6 +5361,164 @@ floatx80 floatx80_round(floatx80 a, float_status *status)
     return floatx80_round_pack_canonical(&p, status);
 }
 
+static void parts_s390_precision_round_normal(FloatParts128 *p,
+                                              const FloatFmt *fmt)
+{
+    /* Use precision of the target format, but unbounded exponent range */
+    p->frac_hi &= ((1ULL << (fmt->frac_size + 1)) - 1) <<
+                  (63 - fmt->frac_size);
+    p->frac_lo = 0;
+}
+
+static void parts_s390_divide_to_integer(FloatParts64 *a, FloatParts64 *b,
+                                         int final_quotient_rounding_mode,
+                                         bool mask_underflow, bool mask_inexact,
+                                         const FloatFmt *fmt,
+                                         FloatParts64 *r, FloatParts64 *n,
+                                         uint32_t *cc, int *dxc,
+                                         float_status *status)
+{
+    /* POp table "Results: DIVIDE TO INTEGER (Part 1 of 2)" */
+    if ((float_cmask(a->cls) | float_cmask(b->cls)) & float_cmask_anynan) {
+        *r = *parts_pick_nan(a, b, status);
+        *n = *r;
+        *cc = 1;
+    } else if (a->cls == float_class_inf || b->cls == float_class_zero) {
+        parts_default_nan(r, status);
+        *n = *r;
+        *cc = 1;
+        status->float_exception_flags |= float_flag_invalid;
+    } else if (b->cls == float_class_inf) {
+        *r = *a;
+        n->cls = float_class_zero;
+        n->sign = a->sign ^ b->sign;
+        *cc = 0;
+    } else {
+        FloatParts128 a128, b128, *m128, m128_buf, n128, *q128, q128_buf,
+                      r128, *r128_precise;
+        int float_exception_flags = 0;
+        bool is_q128_smallish;
+        uint32_t r_flags;
+        int saved_flags;
+
+        /* Compute precise quotient */
+        parts_float_to_float_widen(&a128, a, status);
+        parts_float_to_float_widen(&b128, b, status);
+        q128_buf = a128;
+        q128 = parts_div(&q128_buf, &b128, status);
+
+        /* Final or partial case? */
+        is_q128_smallish = q128->exp < (fmt->frac_size + 1);
+
+        /*
+         * Final quotient is rounded using final-quotient-rounding method, and
+         * partial quotient is rounded toward zero.
+         *
+         * Rounding of partial quotient may be inexact. This is the whole point
+         * of distinguishing partial quotients, so ignore the exception.
+         */
+        n128 = *q128;
+        saved_flags = status->float_exception_flags;
+        parts_round_to_int(&n128,
+                           is_q128_smallish ? final_quotient_rounding_mode :
+                                              float_round_to_zero,
+                           0, status, &float128_params);
+        float_exception_flags = saved_flags;
+        parts_s390_precision_round_normal(&n128, fmt);
+
+        /* Compute precise remainder */
+        m128_buf = b128;
+        m128 = parts_mul(&m128_buf, &n128, status);
+        m128->sign = !m128->sign;
+        status->float_exception_flags = 0;
+        r128_precise = parts_addsub(m128, &a128, status, false);
+
+        /* Round remainder to the target format */
+        parts_float_to_float_narrow(r, r128_precise, status);
+        parts_uncanon(r, status, fmt);
+        r->frac &= (1ULL << fmt->frac_size) - 1;
+        parts_canonicalize(r, status, fmt);
+        parts_float_to_float_widen(&r128, r, status);
+        r_flags = status->float_exception_flags;
+
+        /* POp table "Results: DIVIDE TO INTEGER (Part 2 of 2)" */
+        if (is_q128_smallish) {
+            if (r128.cls != float_class_zero) {
+                if (r128.exp < 2 - (1 << (fmt->exp_size - 1))) {
+                    if (mask_underflow) {
+                        float_exception_flags |= float_flag_underflow;
+                        *dxc = 0x10;
+                        r128.exp += fmt->exp_re_bias;
+                    }
+                } else if (r_flags & float_flag_inexact) {
+                    float_exception_flags |= float_flag_inexact;
+                    if (mask_inexact) {
+                        bool saved_r128_sign, saved_r128_precise_sign;
+
+                        /*
+                         * Check whether remainder was truncated (rounded
+                         * toward zero) or incremented.
+                         */
+                        saved_r128_sign = r128.sign;
+                        saved_r128_precise_sign = r128_precise->sign;
+                        r128.sign = false;
+                        r128_precise->sign = false;
+                        if (parts_compare(&r128, r128_precise, status, true) <
+                            float_relation_equal) {
+                            *dxc = 0x8;
+                        } else {
+                            *dxc = 0xc;
+                        }
+                        r128.sign = saved_r128_sign;
+                        r128_precise->sign = saved_r128_precise_sign;
+                    }
+                }
+            }
+            *cc = 0;
+        } else if (n128.exp > (1 << (fmt->exp_size - 1)) - 1) {
+            n128.exp -= fmt->exp_re_bias;
+            *cc = r128.cls == float_class_zero ? 1 : 3;
+        } else {
+            *cc = r128.cls == float_class_zero ? 0 : 2;
+        }
+
+        /* Adjust signs of zero results */
+        parts_float_to_float_narrow(r, &r128, status);
+        if (r->cls == float_class_zero) {
+            r->sign = a->sign;
+        }
+        parts_float_to_float_narrow(n, &n128, status);
+        if (n->cls == float_class_zero) {
+            n->sign = a->sign ^ b->sign;
+        }
+
+        status->float_exception_flags = float_exception_flags;
+    }
+}
+
+#define DEFINE_S390_DIVIDE_TO_INTEGER(floatN)                                  \
+void floatN ## _s390_divide_to_integer(floatN a, floatN b,                     \
+                                       int final_quotient_rounding_mode,       \
+                                       bool mask_underflow, bool mask_inexact, \
+                                       floatN *r, floatN *n,                   \
+                                       uint32_t *cc, int *dxc,                 \
+                                       float_status *status)                   \
+{                                                                              \
+    FloatParts64 pa, pb, pr, pn;                                               \
+                                                                               \
+    floatN ## _unpack_canonical(&pa, a, status);                               \
+    floatN ## _unpack_canonical(&pb, b, status);                               \
+    parts_s390_divide_to_integer(&pa, &pb, final_quotient_rounding_mode,       \
+                                 mask_underflow, mask_inexact,                 \
+                                 &floatN ## _params,                           \
+                                 &pr, &pn, cc, dxc, status);                   \
+    *r = floatN ## _round_pack_canonical(&pr, status);                         \
+    *n = floatN ## _round_pack_canonical(&pn, status);                         \
+}
+
+DEFINE_S390_DIVIDE_TO_INTEGER(float32)
+DEFINE_S390_DIVIDE_TO_INTEGER(float64)
+
 static void __attribute__((constructor)) softfloat_init(void)
 {
     union_float64 ua, ub, uc, ur;
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index c18ab2cb609..66b0c47b5eb 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -1372,4 +1372,15 @@ static inline bool float128_unordered_quiet(float128 a, float128 b,
 *----------------------------------------------------------------------------*/
 float128 float128_default_nan(float_status *status);
 
+#define DECLARE_S390_DIVIDE_TO_INTEGER(floatN)                                 \
+void floatN ## _s390_divide_to_integer(floatN a, floatN b,                     \
+                                       int final_quotient_rounding_mode,       \
+                                       bool mask_underflow, bool mask_inexact, \
+                                       floatN *r, floatN *n,                   \
+                                       uint32_t *cc, int *dxc,                 \
+                                       float_status *status)
+DECLARE_S390_DIVIDE_TO_INTEGER(float32);
+DECLARE_S390_DIVIDE_TO_INTEGER(float64);
+
+
 #endif /* SOFTFLOAT_H */
diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 1a8a76abb98..6a7426fdac7 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -46,6 +46,7 @@ DEF_HELPER_FLAGS_3(sxb, TCG_CALL_NO_WG, i128, env, i128, i128)
 DEF_HELPER_FLAGS_3(deb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(ddb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(dxb, TCG_CALL_NO_WG, i128, env, i128, i128)
+DEF_HELPER_6(dib, void, env, i32, i32, i32, i32, i32)
 DEF_HELPER_FLAGS_3(meeb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(mdeb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(mdb, TCG_CALL_NO_WG, i64, env, i64, i64)
diff --git a/target/s390x/tcg/fpu_helper.c b/target/s390x/tcg/fpu_helper.c
index 7a3ff501a46..122994960a6 100644
--- a/target/s390x/tcg/fpu_helper.c
+++ b/target/s390x/tcg/fpu_helper.c
@@ -315,6 +315,62 @@ Int128 HELPER(dxb)(CPUS390XState *env, Int128 a, Int128 b)
     return RET128(ret);
 }
 
+void HELPER(dib)(CPUS390XState *env, uint32_t r1, uint32_t r2, uint32_t r3,
+                 uint32_t m4, uint32_t bits)
+{
+    int final_quotient_rounding_mode = s390_get_bfp_rounding_mode(env, m4);
+    bool mask_underflow = (env->fpc >> 24) & S390_IEEE_MASK_UNDERFLOW;
+    bool mask_inexact = (env->fpc >> 24) & S390_IEEE_MASK_INEXACT;
+    float32 a32, b32, n32, r32;
+    float64 a64, b64, n64, r64;
+    int dxc = -1;
+    uint32_t cc;
+
+    if (bits == 32) {
+        a32 = env->vregs[r1][0] >> 32;
+        b32 = env->vregs[r2][0] >> 32;
+
+        float32_s390_divide_to_integer(
+            a32, b32,
+            final_quotient_rounding_mode,
+            mask_underflow, mask_inexact,
+            &r32, &n32, &cc, &dxc, &env->fpu_status);
+    } else {
+        a64 = env->vregs[r1][0];
+        b64 = env->vregs[r2][0];
+
+        float64_s390_divide_to_integer(
+            a64, b64,
+            final_quotient_rounding_mode,
+            mask_underflow, mask_inexact,
+            &r64, &n64, &cc, &dxc, &env->fpu_status);
+    }
+
+    /* Flush the results if needed */
+    if ((env->fpu_status.float_exception_flags & float_flag_invalid) &&
+        ((env->fpc >> 24) & S390_IEEE_MASK_INVALID)) {
+        /* The action for invalid operation is "Suppress" */
+    } else {
+        /* The action for other exceptions is "Complete" */
+        if (bits == 32) {
+            env->vregs[r1][0] = deposit64(env->vregs[r1][0], 32, 32, r32);
+            env->vregs[r3][0] = deposit64(env->vregs[r3][0], 32, 32, n32);
+        } else {
+            env->vregs[r1][0] = r64;
+            env->vregs[r3][0] = n64;
+        }
+        env->cc_op = cc;
+    }
+
+    /* Raise an exception if needed */
+    if (dxc == -1) {
+        handle_exceptions(env, false, GETPC());
+    } else {
+        env->fpu_status.float_exception_flags = 0;
+        tcg_s390_data_exception(env, dxc, GETPC());
+    }
+}
+
 /* 32-bit FP multiplication */
 uint64_t HELPER(meeb)(CPUS390XState *env, uint64_t f1, uint64_t f2)
 {
diff --git a/target/s390x/tcg/insn-data.h.inc b/target/s390x/tcg/insn-data.h.inc
index baaafe922e9..0d5392eac54 100644
--- a/target/s390x/tcg/insn-data.h.inc
+++ b/target/s390x/tcg/insn-data.h.inc
@@ -9,7 +9,7 @@
  *  OPC  = (op << 8) | op2 where op is the major, op2 the minor opcode
  *  NAME = name of the opcode, used internally
  *  FMT  = format of the opcode (defined in insn-format.h.inc)
- *  FAC  = facility the opcode is available in (defined in DisasFacility)
+ *  FAC  = facility the opcode is available in (define in translate.c)
  *  I1   = func in1_xx fills o->in1
  *  I2   = func in2_xx fills o->in2
  *  P    = func prep_xx initializes o->*out*
@@ -361,6 +361,9 @@
     C(0xb91d, DSGFR,   RRE,   Z,   r1p1, r2_32s, r1_P, 0, divs64, 0)
     C(0xe30d, DSG,     RXY_a, Z,   r1p1, m2_64, r1_P, 0, divs64, 0)
     C(0xe31d, DSGF,    RXY_a, Z,   r1p1, m2_32s, r1_P, 0, divs64, 0)
+/* DIVIDE TO INTEGER */
+    D(0xb35b, DIDBR,   RRF_b, Z,   0, 0, 0, 0, dib, 0, 64)
+    D(0xb353, DIEBR,   RRF_b, Z,   0, 0, 0, 0, dib, 0, 32)
 
 /* EXCLUSIVE OR */
     C(0x1700, XR,      RR_a,  Z,   r1, r2, new, r1_32, xor, nz32)
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 540c5a569c0..dee0e710f39 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -2283,6 +2283,32 @@ static DisasJumpType op_dxb(DisasContext *s, DisasOps *o)
     return DISAS_NEXT;
 }
 
+static DisasJumpType op_dib(DisasContext *s, DisasOps *o)
+{
+    const bool fpe = s390_has_feat(S390_FEAT_FLOATING_POINT_EXT);
+    uint8_t m4 = get_field(s, m4);
+
+    if (get_field(s, r1) == get_field(s, r2) ||
+        get_field(s, r1) == get_field(s, r3) ||
+        get_field(s, r2) == get_field(s, r3)) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    if (m4 == 2 || (!fpe && m4 == 3) || m4 > 7) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    gen_helper_dib(tcg_env, tcg_constant_i32(get_field(s, r1)),
+                   tcg_constant_i32(get_field(s, r2)),
+                   tcg_constant_i32(get_field(s, r3)), tcg_constant_i32(m4),
+                   tcg_constant_i32(s->insn->data));
+    set_cc_static(s);
+
+    return DISAS_NEXT;
+}
+
 static DisasJumpType op_ear(DisasContext *s, DisasOps *o)
 {
     int r2 = get_field(s, r2);
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 4/4] tests/tcg/s390x: Test DIVIDE TO INTEGER
  2026-01-27 15:31 [PATCH v2 0/4] target/s390x: Implement DIVIDE TO INTEGER Ilya Leoshkevich
                   ` (2 preceding siblings ...)
  2026-01-27 15:31 ` [PATCH v2 3/4] target/s390x: Implement DIVIDE TO INTEGER Ilya Leoshkevich
@ 2026-01-27 15:31 ` Ilya Leoshkevich
  3 siblings, 0 replies; 11+ messages in thread
From: Ilya Leoshkevich @ 2026-01-27 15:31 UTC (permalink / raw)
  To: Thomas Huth, Richard Henderson
  Cc: David Hildenbrand, qemu-s390x, qemu-devel, Ilya Leoshkevich,
	Alex Bennée

Add a test to prevent regressions. Data is generated using a
libFuzzer-based fuzzer and hopefully covers all the important corner
cases.

Acked-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 tests/tcg/s390x/Makefile.target     |   5 +
 tests/tcg/s390x/divide-to-integer.c | 242 ++++++++++++++++++++++++++++
 2 files changed, 247 insertions(+)
 create mode 100644 tests/tcg/s390x/divide-to-integer.c

diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index da5fe71a407..0ca030ded01 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -49,14 +49,19 @@ TESTS+=cvd
 TESTS+=cvb
 TESTS+=ts
 TESTS+=ex-smc
+TESTS+=divide-to-integer
 
 cdsg: CFLAGS+=-pthread
 cdsg: LDFLAGS+=-pthread
 
+# The following tests contain inline assembly that requires inlining,
+# and thus cannot be built with -O0.
 rxsbg: CFLAGS+=-O2
+divide-to-integer: CFLAGS+=-O2
 
 cgebra: LDFLAGS+=-lm
 clgebr: LDFLAGS+=-lm
+divide-to-integer: LDFLAGS+=-lm
 
 include $(S390X_SRC)/pgm-specification.mak
 $(PGM_SPECIFICATION_TESTS): pgm-specification-user.o
diff --git a/tests/tcg/s390x/divide-to-integer.c b/tests/tcg/s390x/divide-to-integer.c
new file mode 100644
index 00000000000..10040c7058e
--- /dev/null
+++ b/tests/tcg/s390x/divide-to-integer.c
@@ -0,0 +1,242 @@
+/*
+ * Test DIEBR and DIDBR instructions.
+ *
+ * Most inputs were discovered by fuzzing and exercise various corner cases in
+ * the helpers.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <asm/ucontext.h>
+
+static void sigfpe_handler(int sig, siginfo_t *info, void *puc)
+{
+    struct ucontext *uc = puc;
+    unsigned short *xr_insn;
+    int r;
+
+    xr_insn = (unsigned short *)(uc->uc_mcontext.regs.psw.addr - 6);
+    r = *xr_insn & 0xf;
+    uc->uc_mcontext.regs.gprs[r] = sig;
+}
+
+#define DIVIDE_TO_INTEGER(name, floatN)                                        \
+static inline __attribute__((__always_inline__)) int                           \
+name(floatN *r1, floatN r2, floatN *r3, int m4, int *sig)                      \
+{                                                                              \
+    int cc;                                                                    \
+                                                                               \
+    asm(/* Make the initial CC predictable for suppression tests */            \
+        "xr %[sig],%[sig]\n"                                                   \
+        #name " %[r1],%[r3],%[r2],%[m4]\n"                                     \
+        "ipm %[cc]\n"                                                          \
+        "srl %[cc],28"                                                         \
+        /*                                                                     \
+         * Use earlyclobbers to prevent the compiler from reusing floating     \
+         * point registers. This instruction doesn't like it.                  \
+         */                                                                    \
+        : [r1] "+&f" (*r1), [r3] "+&f" (*r3), [sig] "=r" (*sig), [cc] "=d" (cc)\
+        : [r2] "f" (r2), [m4] "i" (m4)                                         \
+        : "cc");                                                               \
+                                                                               \
+    return cc;                                                                 \
+}
+
+DIVIDE_TO_INTEGER(diebr, float)
+DIVIDE_TO_INTEGER(didbr, double)
+
+#define TEST_DIVIDE_TO_INTEGER(name, intN, int_fmt, floatN, float_fmt)         \
+static inline __attribute__((__always_inline__)) int                           \
+test_ ## name(unsigned intN r1i, unsigned intN r2i, int m4, int fpc,           \
+              unsigned intN r1o, unsigned intN r3o, int cco, unsigned int fpco,\
+              int sigo)                                                        \
+{                                                                              \
+    union {                                                                    \
+        floatN f;                                                              \
+        unsigned intN i;                                                       \
+    } r1, r2, r3;                                                              \
+    int cc, err = 0, sig;                                                      \
+                                                                               \
+    r1.i = r1i;                                                                \
+    r2.i = r2i;                                                                \
+    r3.i = 0x12345678;                                                         \
+    printf("[ RUN      ] %" float_fmt "(0x%" int_fmt                           \
+           ") / %" float_fmt "(0x%" int_fmt ")\n", r1.f, r1.i, r2.f, r2.i);    \
+    asm volatile("sfpc %[fpc]" : : [fpc] "r" (fpc));                           \
+    cc = name(&r1.f, r2.f, &r3.f, m4, &sig);                                   \
+    asm volatile("stfpc %[fpc]" : [fpc] "=Q" (fpc));                           \
+    if (r1.i != r1o) {                                                         \
+        printf("[  FAILED  ] remainder 0x%" int_fmt                            \
+               " != expected 0x%" int_fmt "\n", r1.i, r1o);                    \
+        err += 1;                                                              \
+    }                                                                          \
+    if (r3.i != r3o) {                                                         \
+        printf("[  FAILED  ] quotient 0x%" int_fmt                             \
+               " != expected 0x%" int_fmt "\n", r3.i, r3o);                    \
+        err += 1;                                                              \
+    }                                                                          \
+    if (cc != cco) {                                                           \
+        printf("[  FAILED  ] cc %d != expected %d\n", cc, cco);                \
+        err += 1;                                                              \
+    }                                                                          \
+    if (fpc != fpco) {                                                         \
+        printf("[  FAILED  ] fpc 0x%x != expected 0x%x\n", fpc, fpco);         \
+        err += 1;                                                              \
+    }                                                                          \
+    if (sig != sigo) {                                                         \
+        printf("[  FAILED  ] signal 0x%x != expected 0x%x\n", sig, sigo);      \
+        err += 1;                                                              \
+    }                                                                          \
+                                                                               \
+    return err;                                                                \
+}
+
+TEST_DIVIDE_TO_INTEGER(diebr, int, "x", float, "f")
+TEST_DIVIDE_TO_INTEGER(didbr, long, "lx", double, "lf")
+
+int main(void)
+{
+    struct sigaction act = {
+        .sa_sigaction = sigfpe_handler,
+        .sa_flags = SA_SIGINFO,
+    };
+    int err = 0;
+
+    /* Set up SIG handler */
+    if (sigaction(SIGFPE, &act, NULL)) {
+        printf("[  FAILED  ] sigaction(SIGFPE) failed\n");
+        return EXIT_FAILURE;
+    }
+
+    /* 451 / 460 */
+    err += test_diebr(0x43e1f1f1, 0x43e61616, 7, 0,
+                      0x43e1f1f1, 0, 0, 0, 0);
+
+    /* 480 / 0 */
+    err += test_diebr(0x43f00000, 0, 0, 0,
+                      0x7fc00000, 0x7fc00000, 1, 0x800000, 0);
+
+    /* QNaN / QNaN */
+    err += test_diebr(0xffffffff, 0xffffffff, 0, 0,
+                      0xffffffff, 0xffffffff, 1, 0, 0);
+
+    /* -2.08E-8 / -2.08E-8 */
+    err += test_diebr(0xb2b2b2b2, 0xb2b2b2b2, 0, 0,
+                      0x80000000, 0x3f800000, 0, 0, 0);
+
+    /*
+     * Test partial remainder without quotient scaling (cc2).
+     *
+     * 4.62E-2 / -7.94E-11 = { r = 1.68E-10, n = -5.82E+8 }
+     */
+    err += test_diebr(0x3d3d3d3d, 0xaeaeaeae, 0, 0,
+                      0x2f38b8c0, 0xce0aaaab, 2, 0, 0);
+
+    /* 1.07E-31 / 2.19 */
+    err += test_diebr(0x0c0c0c0c, 0x400c0c0c, 6, 0,
+                      0xc00c0c0c, 0x3f800000, 0, 0x80000, 0);
+
+    /* 2.98E+29 / -5.7E-29 */
+    err += test_diebr(0x7070ffff, 0x90909090, 0, 0,
+                      0x6431c0c0, 0xbf5562aa, 3, 0, 0);
+
+    /*
+     * Test large, but representable quotient.
+     *
+     * a = -12040119/549755813888
+     * b = 1/38685626227668133590597632
+     * q = a / b = -847248053779631702016
+     * n = round(q, float32, to_odd) = q
+     * r_precise = a - b * n = -0
+     * r = round(r_precise, float32, nearest_even) = -0
+     */
+    err += test_diebr(0xb7b7b7b7, 0x15000000, 7, 0,
+                      0x80000000, 0xe237b7b7, 0, 0, 0);
+
+    /* 0 / 0 */
+    err += test_diebr(0, 0, 1, 0,
+                      0x7fc00000, 0x7fc00000, 1, 0x800000, 0);
+
+    /* 4.3E-33 / -2.08E-8 with SIGFPE */
+    err += test_diebr(0x09b2b2b2, 0xb2b2b2b2, 0, 0xfc000007,
+                      0xb2b2b2b1, 0xbf800000, 0, 0xfc000807, SIGFPE);
+
+    /*
+     * Test tiny remainder scaling when FPC Underflow Mask is set.
+     *
+     * 1.19E-39 / -1.28E-9 = { r = 1.19E-39 * 2^192, n = -0 }
+     */
+    err += test_diebr(0x000d0100, 0xb0b0b0b0, 6, 0xfc000000,
+                      0x5ed01000, 0x80000000, 0, 0xfc001000, SIGFPE);
+
+    /*
+     * Test "inexact and incremented" DXC.
+     *
+     * a = 53555504
+     * b = -520849213389117849600
+     * q = a / b = -3347219/32553075836819865600
+     * n = round(q, float32, to_odd) = -1
+     * r_precise = a - b * n = -520849213389064294096
+     * r = round(r_precise, float32, to_odd) = -520849213389117849600
+     * abs(r) - abs(r_precise) = 53555504
+     */
+    err += test_diebr(0x4c4c4c4c, 0xe1e1e1e1, 0, 0xfc000007,
+                      0xe1e1e1e1, 0xbf800000, 0, 0xfc000c07, SIGFPE);
+
+    /* 0 / 0 with SIGFPE */
+    err += test_diebr(0, 0, 0, 0xfc000007,
+                      0, 0x12345678, 0, 0xfc008007, SIGFPE);
+
+    /* 5.76E-16 / 5.39E+34 */
+    err += test_diebr(0x26262626, 0x79262626, 6, 0,
+                      0xf9262626, 0x3f800000, 0, 0x80000, 0);
+
+    /* -4.97E+17 / 2.03E-38 */
+    err += test_diebr(0xdcdcdcdc, 0x00dcdcdc, 7, 0xfc000000,
+                      0x80000000, 0xbb800000, 1, 0xfc000000, 0);
+
+    /* -1.23E+17 / SNaN */
+    err += test_diebr(0xdbdb240b, 0xffac73ff, 4, 0,
+                      0xffec73ff, 0xffec73ff, 1, 0x800000, 0);
+
+    /* 2.34E-38 / 3.27E-33 with SIGFPE */
+    err += test_diebr(0x00ff0987, 0x0987c6f6, 6, 0x08000000,
+                      0x8987c6b6, 0x3f800000, 0, 0x8000800, SIGFPE);
+
+    /* -5.93E+11 / -2.7E+4 */
+    err += test_diebr(0xd30a0040, 0xc6d30a00, 0, 0xc4000000,
+                      0xc74a4400, 0x4ba766c6, 2, 0xc4000000, 0);
+
+    /* 9.86E-32 / -inf */
+    err += test_diebr(0x0c000029, 0xff800000, 0, 0,
+                      0xc000029, 0x80000000, 0, 0, 0);
+
+    /* QNaN / SNaN */
+    err += test_diebr(0xffff94ff, 0xff94ff24, 4, 7,
+                      0xffd4ff24, 0xffd4ff24, 1, 0x800007, 0);
+
+    /* 2.8E-43 / -inf */
+    err += test_diebr(0x000000c8, 0xff800000, 0, 0x7c000007,
+                      0x000000c8, 0x80000000, 0, 0x7c000007, 0);
+
+    /* -1.7E+38 / -inf */
+    err += test_diebr(0xff00003d, 0xff800000, 0, 0,
+                      0xff00003d, 0, 0, 0, 0);
+
+    /* 1.94E-304 / 1.94E-304 */
+    err += test_didbr(0x00e100e100e100e1, 0x00e100e100e100e1, 0, 1,
+                      0, 0x3ff0000000000000, 0, 1, 0);
+
+    /* 4.82E-299 / 5.29E-308 */
+    err += test_didbr(0x0200230200230200, 0x0023020023020023, 0, 0,
+                      0x8001a017d247b3f4, 0x41cb2aa05f000000, 0, 0, 0);
+
+    /* -1.38E-75 / -3.77E+208 */
+    err += test_didbr(0xb063eb3d63b063eb, 0xeb3d63b063eb3d63, 3, 0xe8000000,
+                      0x6b3d63b063eb3d63, 0x3ff0000000000000, 0, 0xe8000c00,
+                      SIGFPE);
+
+    return err ? EXIT_FAILURE : EXIT_SUCCESS;
+}
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/4] target/s390x: Extract s390_get_bfp_rounding_mode()
  2026-01-27 15:31 ` [PATCH v2 2/4] target/s390x: Extract s390_get_bfp_rounding_mode() Ilya Leoshkevich
@ 2026-01-28  5:25   ` Richard Henderson
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2026-01-28  5:25 UTC (permalink / raw)
  To: Ilya Leoshkevich, Thomas Huth; +Cc: David Hildenbrand, qemu-s390x, qemu-devel

On 1/28/26 02:31, Ilya Leoshkevich wrote:
> For DIVIDE TO INTEGER it will be helpful to pass final-quotient
> rounding mode around explicitly rather than setting it in fpu_status
> implicitly. To facilitate this, extract a function for converting the
> mask to the rounding mode.
> 
> Signed-off-by: Ilya Leoshkevich<iii@linux.ibm.com>
> ---
>   target/s390x/tcg/fpu_helper.c | 62 +++++++++++++++++------------------
>   1 file changed, 31 insertions(+), 31 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/4] target/s390x: Implement DIVIDE TO INTEGER
  2026-01-27 15:31 ` [PATCH v2 3/4] target/s390x: Implement DIVIDE TO INTEGER Ilya Leoshkevich
@ 2026-01-28  5:50   ` Richard Henderson
  2026-01-28 13:19     ` Ilya Leoshkevich
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Henderson @ 2026-01-28  5:50 UTC (permalink / raw)
  To: Ilya Leoshkevich, Thomas Huth; +Cc: David Hildenbrand, qemu-s390x, qemu-devel

On 1/28/26 02:31, Ilya Leoshkevich wrote:
> +        FloatParts128 a128, b128, *m128, m128_buf, n128, *q128, q128_buf,
> +                      r128, *r128_precise;
> +        int float_exception_flags = 0;
> +        bool is_q128_smallish;
> +        uint32_t r_flags;
> +        int saved_flags;
> +
> +        /* Compute precise quotient */
> +        parts_float_to_float_widen(&a128, a, status);
> +        parts_float_to_float_widen(&b128, b, status);
> +        q128_buf = a128;
> +        q128 = parts_div(&q128_buf, &b128, status);

Why do you need FloatParts128?
You can see an inexact result from float64 with just FloatParts64.
C.f. soft_f64_div.

> +
> +        /* Final or partial case? */
> +        is_q128_smallish = q128->exp < (fmt->frac_size + 1);
> +
> +        /*
> +         * Final quotient is rounded using final-quotient-rounding method, and
> +         * partial quotient is rounded toward zero.
> +         *
> +         * Rounding of partial quotient may be inexact. This is the whole point
> +         * of distinguishing partial quotients, so ignore the exception.
> +         */
> +        n128 = *q128;
> +        saved_flags = status->float_exception_flags;
> +        parts_round_to_int(&n128,
> +                           is_q128_smallish ? final_quotient_rounding_mode :
> +                                              float_round_to_zero,
> +                           0, status, &float128_params);

float128_params is definitely wrong.  The rounding is supposed to be to the target format.

> +        float_exception_flags = saved_flags;
> +        parts_s390_precision_round_normal(&n128, fmt);

parts_round_to_int_normal already takes a rounding mode and frac_size.
It also returns whether or not the rounding was exact.
This appears to be trying to reinvent the wheel.

There's still "... the two integers closest to this precise quotient cannot be both be 
represented exactly in the precision of the quotent ..." to contend with.  Is that your 
"smallish" test?  If so, the comment could use some improvement.

> +
> +        /* Compute precise remainder */
> +        m128_buf = b128;
> +        m128 = parts_mul(&m128_buf, &n128, status);
> +        m128->sign = !m128->sign;
> +        status->float_exception_flags = 0;
> +        r128_precise = parts_addsub(m128, &a128, status, false);

Surely parts_muladd_scalbn (with scale 0).


r~


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/4] target/s390x: Implement DIVIDE TO INTEGER
  2026-01-28  5:50   ` Richard Henderson
@ 2026-01-28 13:19     ` Ilya Leoshkevich
  2026-01-28 20:38       ` Richard Henderson
  0 siblings, 1 reply; 11+ messages in thread
From: Ilya Leoshkevich @ 2026-01-28 13:19 UTC (permalink / raw)
  To: Richard Henderson, Thomas Huth; +Cc: David Hildenbrand, qemu-s390x, qemu-devel

On 1/28/26 06:50, Richard Henderson wrote:
> On 1/28/26 02:31, Ilya Leoshkevich wrote:
>> +        FloatParts128 a128, b128, *m128, m128_buf, n128, *q128, 
>> q128_buf,
>> +                      r128, *r128_precise;
>> +        int float_exception_flags = 0;
>> +        bool is_q128_smallish;
>> +        uint32_t r_flags;
>> +        int saved_flags;
>> +
>> +        /* Compute precise quotient */
>> +        parts_float_to_float_widen(&a128, a, status);
>> +        parts_float_to_float_widen(&b128, b, status);
>> +        q128_buf = a128;
>> +        q128 = parts_div(&q128_buf, &b128, status);
>
> Why do you need FloatParts128?
> You can see an inexact result from float64 with just FloatParts64.
> C.f. soft_f64_div.


I thought I needed this extra precision, but turns out that in reality I 
needed it to mask an issue with parts_round_to_int() - see below.


>> +
>> +        /* Final or partial case? */
>> +        is_q128_smallish = q128->exp < (fmt->frac_size + 1);
>> +
>> +        /*
>> +         * Final quotient is rounded using final-quotient-rounding 
>> method, and
>> +         * partial quotient is rounded toward zero.
>> +         *
>> +         * Rounding of partial quotient may be inexact. This is the 
>> whole point
>> +         * of distinguishing partial quotients, so ignore the 
>> exception.
>> +         */
>> +        n128 = *q128;
>> +        saved_flags = status->float_exception_flags;
>> +        parts_round_to_int(&n128,
>> +                           is_q128_smallish ? 
>> final_quotient_rounding_mode :
>> + float_round_to_zero,
>> +                           0, status, &float128_params);
>
> float128_params is definitely wrong.  The rounding is supposed to be 
> to the target format.


Hmm, yes, nothing in this code is about float128, that can't be right.


With some testcases I hit this condition in parts_round_to_int_normal():

     if (a->exp >= frac_size) {
         /* All integral */
         return false;
     }

which makes it a no-op.

I think the code assumes that FloatParts have just been unpacked and all 
low fraction bits are zero, which is not the case for quotient here.

>> +        float_exception_flags = saved_flags;
>> +        parts_s390_precision_round_normal(&n128, fmt);
>
> parts_round_to_int_normal already takes a rounding mode and frac_size.
> It also returns whether or not the rounding was exact.
> This appears to be trying to reinvent the wheel.


Apparently I use this to paper over the above issue, which is not great.

I guess improving parts_round_to_int() to work with non-zero low 
fraction bits would be better.

What do you think?


> There's still "... the two integers closest to this precise quotient 
> cannot be both be represented exactly in the precision of the quotent 
> ..." to contend with.  Is that your "smallish" test?  If so, the 
> comment could use some improvement.


Ok.


>> +
>> +        /* Compute precise remainder */
>> +        m128_buf = b128;
>> +        m128 = parts_mul(&m128_buf, &n128, status);
>> +        m128->sign = !m128->sign;
>> +        status->float_exception_flags = 0;
>> +        r128_precise = parts_addsub(m128, &a128, status, false);
>
> Surely parts_muladd_scalbn (with scale 0).


This works and is much shorter, thanks.


> r~


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/4] target/s390x: Implement DIVIDE TO INTEGER
  2026-01-28 13:19     ` Ilya Leoshkevich
@ 2026-01-28 20:38       ` Richard Henderson
  2026-01-29 18:24         ` Ilya Leoshkevich
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Henderson @ 2026-01-28 20:38 UTC (permalink / raw)
  To: Ilya Leoshkevich, Thomas Huth; +Cc: David Hildenbrand, qemu-s390x, qemu-devel

On 1/29/26 00:19, Ilya Leoshkevich wrote:
> With some testcases I hit this condition in parts_round_to_int_normal():
> 
>      if (a->exp >= frac_size) {
>          /* All integral */
>          return false;
>      }
> 
> which makes it a no-op.
> 
> I think the code assumes that FloatParts have just been unpacked and all low fraction bits 
> are zero, which is not the case for quotient here.
> 
>>> +        float_exception_flags = saved_flags;
>>> +        parts_s390_precision_round_normal(&n128, fmt);
>>
>> parts_round_to_int_normal already takes a rounding mode and frac_size.
>> It also returns whether or not the rounding was exact.
>> This appears to be trying to reinvent the wheel.
> 
> 
> Apparently I use this to paper over the above issue, which is not great.
> 
> I guess improving parts_round_to_int() to work with non-zero low fraction bits would be 
> better.
> 
> What do you think?

Yes indeed.  I think (a->exp >= N) should be sufficient?


r~


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/4] target/s390x: Implement DIVIDE TO INTEGER
  2026-01-28 20:38       ` Richard Henderson
@ 2026-01-29 18:24         ` Ilya Leoshkevich
  2026-02-02  6:03           ` Richard Henderson
  0 siblings, 1 reply; 11+ messages in thread
From: Ilya Leoshkevich @ 2026-01-29 18:24 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Thomas Huth, David Hildenbrand, qemu-s390x, qemu-devel

On 2026-01-28 21:38, Richard Henderson wrote:
> On 1/29/26 00:19, Ilya Leoshkevich wrote:
>> With some testcases I hit this condition in 
>> parts_round_to_int_normal():
>> 
>>      if (a->exp >= frac_size) {
>>          /* All integral */
>>          return false;
>>      }
>> 
>> which makes it a no-op.
>> 
>> I think the code assumes that FloatParts have just been unpacked and 
>> all low fraction bits are zero, which is not the case for quotient 
>> here.
>> 
>>>> +        float_exception_flags = saved_flags;
>>>> +        parts_s390_precision_round_normal(&n128, fmt);
>>> 
>>> parts_round_to_int_normal already takes a rounding mode and 
>>> frac_size.
>>> It also returns whether or not the rounding was exact.
>>> This appears to be trying to reinvent the wheel.
>> 
>> 
>> Apparently I use this to paper over the above issue, which is not 
>> great.
>> 
>> I guess improving parts_round_to_int() to work with non-zero low 
>> fraction bits would be better.
>> 
>> What do you think?
> 
> Yes indeed.  I think (a->exp >= N) should be sufficient?

Seems like this does not work for really large quotients, because we 
still need to trim the fraction bits.
So I'm currently evaluating the following, which looks promising so far:

--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -1118,11 +1118,6 @@ static bool 
partsN(round_to_int_normal)(FloatPartsN *a, FloatRoundMode rmode,
          return true;
      }

-    if (a->exp >= frac_size) {
-        /* All integral */
-        return false;
-    }
-
      if (N > 64 && a->exp < N - 64) {
          /*
           * Rounding is not in the low word -- shift lsb to bit 2,
@@ -1133,7 +1128,7 @@ static bool 
partsN(round_to_int_normal)(FloatPartsN *a, FloatRoundMode rmode,
          frac_lsb = 1 << 2;
      } else {
          shift_adj = 0;
-        frac_lsb = DECOMPOSED_IMPLICIT_BIT >> (a->exp & 63);
+        frac_lsb = DECOMPOSED_IMPLICIT_BIT >> MIN(a->exp, frac_size);
      }

> r~


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/4] target/s390x: Implement DIVIDE TO INTEGER
  2026-01-29 18:24         ` Ilya Leoshkevich
@ 2026-02-02  6:03           ` Richard Henderson
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2026-02-02  6:03 UTC (permalink / raw)
  To: Ilya Leoshkevich; +Cc: Thomas Huth, David Hildenbrand, qemu-s390x, qemu-devel

On 1/30/26 04:24, Ilya Leoshkevich wrote:
> On 2026-01-28 21:38, Richard Henderson wrote:
>> On 1/29/26 00:19, Ilya Leoshkevich wrote:
>>> With some testcases I hit this condition in parts_round_to_int_normal():
>>>
>>>      if (a->exp >= frac_size) {
>>>          /* All integral */
>>>          return false;
>>>      }
>>>
>>> which makes it a no-op.
>>>
>>> I think the code assumes that FloatParts have just been unpacked and all low fraction 
>>> bits are zero, which is not the case for quotient here.
>>>
>>>>> +        float_exception_flags = saved_flags;
>>>>> +        parts_s390_precision_round_normal(&n128, fmt);
>>>>
>>>> parts_round_to_int_normal already takes a rounding mode and frac_size.
>>>> It also returns whether or not the rounding was exact.
>>>> This appears to be trying to reinvent the wheel.
>>>
>>>
>>> Apparently I use this to paper over the above issue, which is not great.
>>>
>>> I guess improving parts_round_to_int() to work with non-zero low fraction bits would be 
>>> better.
>>>
>>> What do you think?
>>
>> Yes indeed.  I think (a->exp >= N) should be sufficient?
> 
> Seems like this does not work for really large quotients, because we still need to trim 
> the fraction bits.
> So I'm currently evaluating the following, which looks promising so far:
> 
> --- a/fpu/softfloat-parts.c.inc
> +++ b/fpu/softfloat-parts.c.inc
> @@ -1118,11 +1118,6 @@ static bool partsN(round_to_int_normal)(FloatPartsN *a, 
> FloatRoundMode rmode,
>           return true;
>       }
> 
> -    if (a->exp >= frac_size) {
> -        /* All integral */
> -        return false;
> -    }
> -
>       if (N > 64 && a->exp < N - 64) {
>           /*
>            * Rounding is not in the low word -- shift lsb to bit 2,
> @@ -1133,7 +1128,7 @@ static bool partsN(round_to_int_normal)(FloatPartsN *a, 
> FloatRoundMode rmode,
>           frac_lsb = 1 << 2;
>       } else {
>           shift_adj = 0;
> -        frac_lsb = DECOMPOSED_IMPLICIT_BIT >> (a->exp & 63);
> +        frac_lsb = DECOMPOSED_IMPLICIT_BIT >> MIN(a->exp, frac_size);
>       }

Yep, that makes sense.

r~


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-02-02  6:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-27 15:31 [PATCH v2 0/4] target/s390x: Implement DIVIDE TO INTEGER Ilya Leoshkevich
2026-01-27 15:31 ` [PATCH v2 1/4] target/s390x: Dump Floating-Point-Control Register Ilya Leoshkevich
2026-01-27 15:31 ` [PATCH v2 2/4] target/s390x: Extract s390_get_bfp_rounding_mode() Ilya Leoshkevich
2026-01-28  5:25   ` Richard Henderson
2026-01-27 15:31 ` [PATCH v2 3/4] target/s390x: Implement DIVIDE TO INTEGER Ilya Leoshkevich
2026-01-28  5:50   ` Richard Henderson
2026-01-28 13:19     ` Ilya Leoshkevich
2026-01-28 20:38       ` Richard Henderson
2026-01-29 18:24         ` Ilya Leoshkevich
2026-02-02  6:03           ` Richard Henderson
2026-01-27 15:31 ` [PATCH v2 4/4] tests/tcg/s390x: Test " Ilya Leoshkevich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.