From: Richard Henderson <richard.henderson@linaro.org>
To: "Philippe Mathieu-Daudé" <philmd@linaro.org>, qemu-devel@nongnu.org
Cc: mark.cave-ayland@ilande.co.uk
Subject: Re: [PATCH 05/22] target/sparc: Inline FNEG, FABS
Date: Fri, 2 Feb 2024 14:32:25 +1000 [thread overview]
Message-ID: <0670f7d7-6518-4c72-a3b8-bd38152bcf94@linaro.org> (raw)
In-Reply-To: <d26582c2-88e4-4813-bef6-0c12faca34b5@linaro.org>
On 1/30/24 18:40, Philippe Mathieu-Daudé wrote:
> Hi Richard,
>
> On 3/11/23 18:38, Richard Henderson wrote:
>> These are simple bit manipulation insns.
>> Begin using i128 for float128.
>> Implement FMOVq with do_qq.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>> target/sparc/helper.h | 6 ----
>> target/sparc/fop_helper.c | 34 ---------------------
>> target/sparc/translate.c | 62 +++++++++++++++++++--------------------
>> 3 files changed, 30 insertions(+), 72 deletions(-)
>
>
>> @@ -1239,13 +1235,13 @@ static void gen_op_fmovs(TCGv_i32 dst, TCGv_i32 src)
>> static void gen_op_fnegs(TCGv_i32 dst, TCGv_i32 src)
>> {
>> gen_op_clear_ieee_excp_and_FTT();
>> - gen_helper_fnegs(dst, src);
>> + tcg_gen_xori_i32(dst, src, 1u << 31);
>> }
>> static void gen_op_fabss(TCGv_i32 dst, TCGv_i32 src)
>> {
>> gen_op_clear_ieee_excp_and_FTT();
>> - gen_helper_fabss(dst, src);
>> + tcg_gen_andi_i32(dst, src, ~(1u << 31));
>> }
>> static void gen_op_fmovd(TCGv_i64 dst, TCGv_i64 src)
>> @@ -1257,13 +1253,33 @@ static void gen_op_fmovd(TCGv_i64 dst, TCGv_i64 src)
>> static void gen_op_fnegd(TCGv_i64 dst, TCGv_i64 src)
>> {
>> gen_op_clear_ieee_excp_and_FTT();
>> - gen_helper_fnegd(dst, src);
>> + tcg_gen_xori_i64(dst, src, 1ull << 63);
>> }
>> static void gen_op_fabsd(TCGv_i64 dst, TCGv_i64 src)
>> {
>> gen_op_clear_ieee_excp_and_FTT();
>> - gen_helper_fabsd(dst, src);
>> + tcg_gen_andi_i64(dst, src, ~(1ull << 63));
>> +}
>> +
>> +static void gen_op_fnegq(TCGv_i128 dst, TCGv_i128 src)
>> +{
>> + TCGv_i64 l = tcg_temp_new_i64();
>> + TCGv_i64 h = tcg_temp_new_i64();
>> +
>> + tcg_gen_extr_i128_i64(l, h, src);
>> + tcg_gen_xori_i64(h, h, 1ull << 63);
>> + tcg_gen_concat_i64_i128(dst, l, h);
>> +}
>> +
>> +static void gen_op_fabsq(TCGv_i128 dst, TCGv_i128 src)
>> +{
>> + TCGv_i64 l = tcg_temp_new_i64();
>> + TCGv_i64 h = tcg_temp_new_i64();
>> +
>> + tcg_gen_extr_i128_i64(l, h, src);
>> + tcg_gen_andi_i64(h, h, ~(1ull << 63));
>> + tcg_gen_concat_i64_i128(dst, l, h);
>> }
>
> Why not extract these as generic TCG FPU helpers?
The representation of floating-point registers varies wildly between targets. Sparc would
be the only one to (a) have float128 and (b) represent them in TCGv_i128.
Even considering float32, is the representation TCGv_i32 or TCGv_i64?
Should the result be nan-boxed (riscv and loongarch)?
We already provide tcg_gen_xori_i{32,64}, so, really that's enough for any target.
> $ git grep -wE 'float...?_(chs|abs)' target/
> target/arm/tcg/helper-a64.c:214: a = float16_chs(a);
> target/arm/tcg/helper-a64.c:229: a = float32_chs(a);
> target/arm/tcg/helper-a64.c:244: a = float64_chs(a);
> target/arm/tcg/helper-a64.c:259: a = float16_chs(a);
> target/arm/tcg/helper-a64.c:274: a = float32_chs(a);
> target/arm/tcg/helper-a64.c:289: a = float64_chs(a);
> target/arm/tcg/helper-a64.c:632: float16 f0 = float16_abs(a);
> target/arm/tcg/helper-a64.c:633: float16 f1 = float16_abs(b);
> target/arm/tcg/helper-a64.c:642: float16 f0 = float16_abs(a);
> target/arm/tcg/helper-a64.c:643: float16 f1 = float16_abs(b);
> target/arm/tcg/mve_helper.c:2840: return float16_abs(float16_sub(a, b, s));
> target/arm/tcg/mve_helper.c:2845: return float32_abs(float32_sub(a, b, s));
> target/arm/tcg/mve_helper.c:2854: return float16_maxnum(float16_abs(a), float16_abs(b),
> s);
> target/arm/tcg/mve_helper.c:2859: return float32_maxnum(float32_abs(a), float32_abs(b),
> s);
> target/arm/tcg/mve_helper.c:2864: return float16_minnum(float16_abs(a), float16_abs(b),
> s);
> target/arm/tcg/mve_helper.c:2869: return float32_minnum(float32_abs(a), float32_abs(b),
> s);
> target/arm/tcg/neon_helper.c:1513: float32 f0 = float32_abs(make_float32(a));
> target/arm/tcg/neon_helper.c:1514: float32 f1 = float32_abs(make_float32(b));
> target/arm/tcg/neon_helper.c:1521: float32 f0 = float32_abs(make_float32(a));
> target/arm/tcg/neon_helper.c:1522: float32 f1 = float32_abs(make_float32(b));
> target/arm/tcg/neon_helper.c:1529: float64 f0 = float64_abs(make_float64(a));
> target/arm/tcg/neon_helper.c:1530: float64 f1 = float64_abs(make_float64(b));
> target/arm/tcg/neon_helper.c:1537: float64 f0 = float64_abs(make_float64(a));
> target/arm/tcg/neon_helper.c:1538: float64 f1 = float64_abs(make_float64(b));
> target/arm/tcg/sve_helper.c:4227:DO_REDUCE(sve_fmaxv_h, float16, H1_2, max,
> float16_chs(float16_infinity))
> target/arm/tcg/sve_helper.c:4228:DO_REDUCE(sve_fmaxv_s, float32, H1_4, max,
> float32_chs(float32_infinity))
> target/arm/tcg/sve_helper.c:4229:DO_REDUCE(sve_fmaxv_d, float64, H1_8, max,
> float64_chs(float64_infinity))
> target/arm/tcg/sve_helper.c:4345: return float16_abs(float16_sub(a, b, s));
> target/arm/tcg/sve_helper.c:4350: return float32_abs(float32_sub(a, b, s));
> target/arm/tcg/sve_helper.c:4355: return float64_abs(float64_sub(a, b, s));
> target/arm/tcg/sve_helper.c:4997: mm = float16_abs(mm);
> target/arm/tcg/sve_helper.c:5019: mm = float32_abs(mm);
> target/arm/tcg/sve_helper.c:5045: mm = float64_abs(mm);
> target/arm/tcg/sve_helper.c:5062: float16 neg_real = float16_chs(neg_imag);
> target/arm/tcg/sve_helper.c:5094: float32 neg_real = float32_chs(neg_imag);
> target/arm/tcg/sve_helper.c:5126: float64 neg_real = float64_chs(neg_imag);
> target/arm/tcg/vec_helper.c:996: return -float16_le(float16_abs(op2), float16_abs(op1),
> stat);
> target/arm/tcg/vec_helper.c:1001: return -float32_le(float32_abs(op2),
> float32_abs(op1), stat);
> target/arm/tcg/vec_helper.c:1006: return -float16_lt(float16_abs(op2),
> float16_abs(op1), stat);
> target/arm/tcg/vec_helper.c:1011: return -float32_lt(float32_abs(op2),
> float32_abs(op1), stat);
> target/arm/tcg/vec_helper.c:1124: return float16_abs(float16_sub(op1, op2, stat));
> target/arm/tcg/vec_helper.c:1129: return float32_abs(float32_sub(op1, op2, stat));
> target/arm/tcg/vec_helper.c:1304: return float16_muladd(float16_chs(op1), op2, dest, 0,
> stat);
> target/arm/tcg/vec_helper.c:1310: return float32_muladd(float32_chs(op1), op2, dest, 0,
> stat);
> target/arm/vfp_helper.c:286: return float16_chs(a);
> target/arm/vfp_helper.c:291: return float32_chs(a);
> target/arm/vfp_helper.c:296: return float64_chs(a);
> target/arm/vfp_helper.c:301: return float16_abs(a);
> target/arm/vfp_helper.c:306: return float32_abs(a);
> target/arm/vfp_helper.c:311: return float64_abs(a);
> target/arm/vfp_helper.c:688: } else if (float16_abs(f16) < (1 << 8)) {
> target/arm/vfp_helper.c:738: } else if (float32_abs(f32) < (1ULL << 21)) {
> target/arm/vfp_helper.c:1133: if (value == float64_chs(float64_zero)) {
> target/i386/tcg/fpu_helper.c:591: ST0 = floatx80_chs(ST0);
> target/i386/tcg/fpu_helper.c:596: ST0 = floatx80_abs(ST0);
> target/i386/tcg/fpu_helper.c:781: tmp = floatx80_chs(tmp);
> target/i386/tcg/fpu_helper.c:1739: ST0 = floatx80_div(floatx80_chs(floatx80_one),
> floatx80_zero,
> target/i386/tcg/fpu_helper.c:2104: ST1 = floatx80_chs(ST1);
> target/i386/tcg/fpu_helper.c:2119: ST1 = floatx80_chs(ST0);
> target/i386/tcg/fpu_helper.c:2135: ST1 = floatx80_chs(ST1);
> target/i386/tcg/fpu_helper.c:2140: ST1 = floatx80_chs(floatx80_zero);
> target/i386/tcg/fpu_helper.c:2276: floatx80_chs(floatx80_zero) :
> target/i386/tcg/fpu_helper.c:2285: floatx80_chs(floatx80_infinity) :
> target/m68k/fpu_helper.c:212: res->d = floatx80_round(floatx80_abs(val->d),
> &env->fp_status);
> target/m68k/fpu_helper.c:218: res->d = floatx80_round(floatx80_abs(val->d),
> &env->fp_status);
> target/m68k/fpu_helper.c:225: res->d = floatx80_round(floatx80_abs(val->d),
> &env->fp_status);
> target/m68k/fpu_helper.c:231: res->d = floatx80_round(floatx80_chs(val->d),
> &env->fp_status);
> target/m68k/fpu_helper.c:237: res->d = floatx80_round(floatx80_chs(val->d),
> &env->fp_status);
> target/m68k/fpu_helper.c:244: res->d = floatx80_round(floatx80_chs(val->d),
> &env->fp_status);
> target/m68k/fpu_helper.c:557: quotient = floatx80_to_int32(floatx80_abs(fp_quot.d),
> &env->fp_status);
> target/m68k/softfloat.c:2714: fp0 = floatx80_abs(a); /* Y = |X| */
> target/m68k/softfloat.c:2734: fp0 = floatx80_abs(a); /* Y = |X| */
> target/mips/tcg/fpu_helper.c:977: return float64_abs(fdt0);
> target/mips/tcg/fpu_helper.c:982: return float32_abs(fst0);
> target/mips/tcg/fpu_helper.c:990: wt0 = float32_abs(fdt0 & 0XFFFFFFFF);
> target/mips/tcg/fpu_helper.c:991: wth0 = float32_abs(fdt0 >> 32);
> target/mips/tcg/fpu_helper.c:997: return float64_chs(fdt0);
> target/mips/tcg/fpu_helper.c:1002: return float32_chs(fst0);
> target/mips/tcg/fpu_helper.c:1010: wt0 = float32_chs(fdt0 & 0XFFFFFFFF);
> target/mips/tcg/fpu_helper.c:1011: wth0 = float32_chs(fdt0 >> 32);
> target/mips/tcg/fpu_helper.c:1365: fdt2 = float64_chs(float64_sub(fdt2, float64_one,
> target/mips/tcg/fpu_helper.c:1374: fst2 = float32_chs(float32_sub(fst2, float32_one,
> target/mips/tcg/fpu_helper.c:1389: fstl2 = float32_chs(float32_sub(fstl2, float32_one,
> target/mips/tcg/fpu_helper.c:1391: fsth2 = float32_chs(float32_sub(fsth2, float32_one,
> target/mips/tcg/fpu_helper.c:1401: fdt2 = float64_chs(float64_div(fdt2, FLOAT_TWO64,
> target/mips/tcg/fpu_helper.c:1411: fst2 = float32_chs(float32_div(fst2, FLOAT_TWO32,
> target/mips/tcg/fpu_helper.c:1428: fstl2 = float32_chs(float32_div(fstl2, FLOAT_TWO32,
> target/mips/tcg/fpu_helper.c:1430: fsth2 = float32_chs(float32_div(fsth2, FLOAT_TWO32,
> target/mips/tcg/fpu_helper.c:1633: fst0 = float64_chs(fst0);
> target/mips/tcg/fpu_helper.c:1644: fst0 = float32_chs(fst0);
> target/mips/tcg/fpu_helper.c:1662: fstl0 = float32_chs(fstl0);
> target/mips/tcg/fpu_helper.c:1665: fsth0 = float32_chs(fsth0);
> target/mips/tcg/fpu_helper.c:1676: fst0 = float64_chs(fst0);
> target/mips/tcg/fpu_helper.c:1687: fst0 = float32_chs(fst0);
> target/mips/tcg/fpu_helper.c:1705: fstl0 = float32_chs(fstl0);
> target/mips/tcg/fpu_helper.c:1708: fsth0 = float32_chs(fsth0);
> target/mips/tcg/fpu_helper.c:1781: fdt0 = float64_abs(fdt0); \
> target/mips/tcg/fpu_helper.c:1782: fdt1 = float64_abs(fdt1); \
> target/mips/tcg/fpu_helper.c:1860: fst0 = float32_abs(fst0); \
> target/mips/tcg/fpu_helper.c:1861: fst1 = float32_abs(fst1); \
> target/mips/tcg/fpu_helper.c:1950: fst0 = float32_abs(fdt0 &
> 0XFFFFFFFF); \
> target/mips/tcg/fpu_helper.c:1951: fsth0 = float32_abs(fdt0 >> 32);
> \
> target/mips/tcg/fpu_helper.c:1952: fst1 = float32_abs(fdt1 &
> 0XFFFFFFFF); \
> target/mips/tcg/fpu_helper.c:1953: fsth1 = float32_abs(fdt1 >> 32);
> \
> target/ppc/fpu_helper.c:44: return float32_chs(a);
> target/ppc/int_helper.c:694: float32 bneg = float32_chs(b->f32[i]);
> target/s390x/tcg/vec_fpu_helper.c:922: a = float32_abs(a);
> target/s390x/tcg/vec_fpu_helper.c:923: b = float32_abs(b);
> target/s390x/tcg/vec_fpu_helper.c:984: a = float64_abs(a);
> target/s390x/tcg/vec_fpu_helper.c:985: b = float64_abs(b);
> target/s390x/tcg/vec_fpu_helper.c:1042: a = float128_abs(a);
> target/s390x/tcg/vec_fpu_helper.c:1043: b = float128_abs(b);
> target/sparc/fop_helper.c:119: return float32_chs(src);
> target/sparc/fop_helper.c:125: return float64_chs(src);
> target/sparc/fop_helper.c:130: QT0 = float128_chs(QT1);
> target/sparc/fop_helper.c:234: return float32_abs(src);
> target/sparc/fop_helper.c:240: return float64_abs(src);
> target/sparc/fop_helper.c:245: QT0 = float128_abs(QT1);
> target/xtensa/fpu_helper.c:126: return float64_abs(v);
> target/xtensa/fpu_helper.c:131: return float32_abs(v);
> target/xtensa/fpu_helper.c:136: return float64_chs(v);
> target/xtensa/fpu_helper.c:141: return float32_chs(v);
With only a few exceptions, most of of these results are part of a larger out-of-line
operation.
r~
next prev parent reply other threads:[~2024-02-02 4:32 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-03 17:38 [PATCH 00/22] target/sparc: floating-point cleanup Richard Henderson
2023-11-03 17:38 ` [PATCH 01/22] target/sparc: Use tcg_gen_qemu_{ld, st}_i128 for ASI_M_BCOPY Richard Henderson
2023-11-03 17:38 ` [PATCH 02/22] target/sparc: Use tcg_gen_qemu_{ld, st}_i128 for ASI_M_BFILL Richard Henderson
2023-11-03 17:38 ` [PATCH 03/22] target/sparc: Remove gen_dest_fpr_F Richard Henderson
2024-01-30 7:54 ` Philippe Mathieu-Daudé
2023-11-03 17:38 ` [PATCH 04/22] target/sparc: Introduce gen_{load,store}_fpr_Q Richard Henderson
2024-01-30 7:56 ` Philippe Mathieu-Daudé
2023-11-03 17:38 ` [PATCH 05/22] target/sparc: Inline FNEG, FABS Richard Henderson
2024-01-30 8:04 ` Philippe Mathieu-Daudé
2024-01-30 8:40 ` Philippe Mathieu-Daudé
2024-02-02 4:32 ` Richard Henderson [this message]
2023-11-03 17:38 ` [PATCH 06/22] target/sparc: Use i128 for FSQRTq Richard Henderson
2023-11-03 17:38 ` [PATCH 07/22] target/sparc: Use i128 for FADDq, FSUBq, FMULq, FDIVq Richard Henderson
2023-11-03 17:38 ` [PATCH 08/22] target/sparc: Use i128 for FqTOs, FqTOi Richard Henderson
2023-11-03 17:38 ` [PATCH 09/22] target/sparc: Use i128 for FqTOd, FqTOx Richard Henderson
2023-11-03 17:38 ` [PATCH 10/22] target/sparc: Use i128 for FCMPq, FCMPEq Richard Henderson
2023-11-03 17:38 ` [PATCH 11/22] target/sparc: Use i128 for FsTOq, FiTOq Richard Henderson
2023-11-03 17:38 ` [PATCH 12/22] target/sparc: Use i128 for FdTOq, FxTOq Richard Henderson
2023-11-03 17:38 ` [PATCH 13/22] target/sparc: Use i128 for Fdmulq Richard Henderson
2023-11-03 17:38 ` [PATCH 14/22] target/sparc: Remove qt0, qt1 temporaries Richard Henderson
2023-11-03 17:38 ` [PATCH 15/22] target/sparc: Introduce cpu_get_fsr, cpu_put_fsr Richard Henderson
2024-01-30 8:10 ` Philippe Mathieu-Daudé
2023-11-03 17:38 ` [PATCH 16/22] target/split: Split ver from env->fsr Richard Henderson
2024-01-30 8:11 ` Philippe Mathieu-Daudé
2024-01-30 12:56 ` BALATON Zoltan
2023-11-03 17:38 ` [PATCH 17/22] target/sparc: Clear cexc and ftt in do_check_ieee_exceptions Richard Henderson
2023-11-03 17:38 ` [PATCH 18/22] target/sparc: Merge check_ieee_exceptions with FPop helpers Richard Henderson
2023-11-03 17:38 ` [PATCH 19/22] target/sparc: Split cexc and ftt from env->fsr Richard Henderson
2023-11-03 17:38 ` [PATCH 20/22] target/sparc: Remove cpu_fsr Richard Henderson
2023-11-03 17:38 ` [PATCH 21/22] target/sparc: Split fcc out of env->fsr Richard Henderson
2023-11-03 17:38 ` [PATCH 22/22] target/sparc: Remove FSR_FTT_NMASK, FSR_FTT_CEXC_NMASK Richard Henderson
2024-01-28 6:49 ` [PATCH 00/22] target/sparc: floating-point cleanup Richard Henderson
2024-01-29 21:50 ` Mark Cave-Ayland
2024-01-31 21:49 ` Mark Cave-Ayland
2024-02-02 5:36 ` Richard Henderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0670f7d7-6518-4c72-a3b8-bd38152bcf94@linaro.org \
--to=richard.henderson@linaro.org \
--cc=mark.cave-ayland@ilande.co.uk \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).