[Qemu-devel] [PATCH v1 14/14] hostfloat: support float32_to_float64

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Emilio G. Cota" <cota@braap.org>
To: qemu-devel@nongnu.org
Cc: "Aurelien Jarno" <aurelien@aurel32.net>,
	"Peter Maydell" <peter.maydell@linaro.org>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Laurent Vivier" <laurent@vivier.eu>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Mark Cave-Ayland" <mark.cave-ayland@ilande.co.uk>
Subject: [Qemu-devel] [PATCH v1 14/14] hostfloat: support float32_to_float64
Date: Wed, 21 Mar 2018 16:11:49 -0400	[thread overview]
Message-ID: <1521663109-32262-15-git-send-email-cota@braap.org> (raw)
In-Reply-To: <1521663109-32262-1-git-send-email-cota@braap.org>

Performance improvement for SPEC06fp for the last few commits:

                               qemu-aarch64 SPEC06fp (test set) speedup over QEMU f6d81cdec8
                                      Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
                                            error bars: 95% confidence interval

    5 +-+---+-----+----+-----+-----+-----+-----+----+-----+-----+-----+----+-----+-----+-----+-----+----+-----+---+-+
  4.5 +-+..........................+&&+...........................................................................+-+
  3.5 +-+................+++.......@@&...............+++............................................+++dsub       +-+
  2 3 +-+....+++.++++++%%&=+......+@@&....+++...==+..&&=..........................................++&=+++++++     +-+
    2 +-+..%%@&+.%%@=++%%&=.......+%@&..%%@&+.%%@=++%%&=.++&&+.......++&=+.+++++.......+&&=.%%@&+.%%@= +%%@=++%%&=+-+
  1.5 +-+++$%@&+#$%@=+#$%&=##$%&**#$@&**#%@&**$%@=**$%&=##%@&**#+&&**#%@=**$%@=+++&&=##$@&**#%@&**#%@=*+f%@=*#$%&=+-+
  0 1 +-+**#%@&**$%@=**$%&=*#$%&**#$@&**#%@&**$%@=**$%&=*#$@&**#$@&**#%@=**$%@=*#$%&=*#$@&**#%@&**#%@=+sqr@=*#$%&=+-+
    0 +-+**#%@&**$%@=**$%&=*#$%&**#$@&**#%@&**$%@=**$%&=*#$@&**#$@&**#%@=**$%@=*#$%&=*#$@&**#%@&**#%@=*+cmp=*#$%&=+-+
  410.bw416.gam433.434.z435.436.cac437.lesli444.447.de450.so453454.ca459.GemsF465.tont470.lb4482.sph+f32f64ean
  png: https://imgur.com/5BErNz7

That is, a final geomean speedup of 2.21X.

The floating point workloads from nbench show similar improvements:

                                       qemu-aarch64 NBench score; higher is better
                                     Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz

  16 +-+-------------------+---------------------+----------------------+---------------------+-------------------+-+
  14 +-+..............................................====**............@@@&&&==**................................+-+
  12 +-+.........................................@@@@&&..=.*............@.@..&.=.*..................+before       +-+
  10 +-+.........................................@..@.&..=.*............@.@..&.=.*............@@@&&&==***ub       +-+
   8 +-+....................................$$$$%%..@.&..=.*............@.@..&.=.*............@.@..&+= +*ul       +-+
   6 +-+...................@@@@&&===**..***##..$.%..@.&..=.*..++####$$%%%.@..&.=.*....####$$%%%.@..&+= +*iv       +-+
   4 +-+............###$$$%%..@.&..=.*..*+*.#..$.%..@.&..=.*..***..#.$..%.@..&.=.*..***..#.$..%.@..&+= +*ma       +-+
   2 +-+.........****.#..$.%..@.&..=.*..*.*.#..$.%..@.&..=.*..*.*..#.$..%.@..&.=.*..*.*..#.$..%.@..&+=+s*rt       +-+
   0 +-+---------****##$$$%%@@@&&===**--***##$$$%%@@@&&===**--***###$$%%%@@&&&==**--***###$$%%%@@&&&==***mp-------+-+
                    FOURIER            NEURAL NET       LU DECOMPOSITION                 gmean      +f32f64
  png: https://imgur.com/KjLHumh

That is, a ~2.6X speedup. [error bars here are just the standard deviation of
just a few measurements; this explains the noisy results.]

Results for the i386 target are very similar; the only major
difference is that they're much more sensitive to the multiplication
optimization, since the i386 target does not currently use floatX_muladd
(aka fma).

Below are the x86_64 SPEC06fp results, although note that they are from
a development branch, so each bar does not match the patches in this,
and the final numbers might be slightly different from those you'd
get with these patches.

                               qemu-x86_64 SPEC06fp (train set) speedup over QEMU f6d81cdec8
                                      Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
                                            error bars: 95% confidence interval

    4 +-+---+-----+----+-----+-----+%%---+-----+----+-----+-----+-----+----+-----+-----+-----+-----+----+-----+---+-+
  3.5 +-+..........................$$%............................................................................+-+
    3 +-+............**$$$......+**#$%............**$$++..................................+add+sub++%%+sq+++      +-+
  2.5 +-+..+++.**##$%**#+$%......**#$%..+$$%..++%%**#$%%.............+++.**#$$%........$$%+**$$%+###$%is#$$%  $$%%+-+
  1.5 +-+***#$%**.#$%**#.$%..$$%+**#$%***#$%**##$%**#$.%**#$%+++$$%***#$%**#+$%..$$++**#$%+fas$%path$%ul(0$%**#$ %+-+
    1 +-+*+*#$%**+#$%**#+$%**#$%+**#$%*+*#$%**+#$%**#$+%**#$%-**#$%*+*#$%**#+$%**#$%%**#$%+**+f%2 to %4+div%**#$+%+-+
  0.5 +-+*.*#$%**.#$%**#.$%**#$%.**#$%*.*#$%**.#$%**#$.%**#$%.**#$%*.*#$%**#.$%**#$.%**#$%.**#$%**.#$%**#.$%**#$.%+-+
    0 +-+***#$%**##$%**#$$%**#$%-**#$%***#$%**##$%**#$%%**#$%-**#$%***#$%**#$$%**#$%%**#$%-**#$%**##$%**#$$%**#$%%+-+
  410.bw416.gam433.434.z435.436.cac437.lesli444.447.de450.so453454.ca459.GemsF465.tont470.lb4482.sphinxgeomean
  png: https://imgur.com/MfvTb3H

Two points are worth mentioning:

- Special-casing 0-inputs for multiplication pays off handsomely (the same
  thing happens for FMA for targets that use it). I was surprised to
  see that some benchmarks (e.g. GemsFDTD) compute >99% of their
  multiplications with at least one operand being Zero (and this is
  without flush-to-zero!).

- Avoiding comparisons via the host FPU (i.e. using soft_t ## _is_normal()
  instead of glibc's isnormal()) gives a small speedup.

Finally, the same results using native execution time as the baseline,
where we plot the slowdown instead of the speedup.
We bring down the slowdown of SPEC06fp w.r.t. native from ~21X to ~10X:

                         qemu-x86_64 SPEC06fp (train set) slowdown over native (lower is better)
                                     Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
                                           error bars: 95% confidence interval

  90 +-+---+-----+-----+----+-----+-----+-----+-----+-----+----+-----+-----+-----+-----+-----+----+-----+-----+---+-+
  80 +-+.......................+**................................................................................+-+
  70 +-+........................**........................................................+          before       +-+
  50 +-+........................**........................................................+add+sub+mul+sqrt       +-+
  40 +-+......+++...............**................................+++.....................+  +integer isinf       +-+
  30 +-+**+...**+...............**#$%@**.........**+..............+**++.............**+...+fast path mul(0++**    +-+
  10 +-+**#$%@**#$%@**$$@@**#$%@**#$%@**#$%**#$%+**#$%@**#$%@**#$%+**#$%@**#$%@*#$%@**#$%@**#+f@2 to @4+div@**#$%@+-+
   0 +-+**#$%@**#$%@**#$%@**#$%@**#$%@**#$%**#$%@**#$%@**#$%@**#$%@**#$%@**#$%@*#$%@**#$%@**#$%@**#$%@**#$%@**#$%@+-+
 410.bw416.game433434.z435.436.cac437.leslie444.447.d450.so453.454.ca459.GemsF465.tont470.l48482.sphinxgeomean
  png: https://imgur.com/iTmVkJL

All png's shown above can be found here: https://imgur.com/a/YSxxR

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/fpu/hostfloat.h |  2 ++
 include/fpu/softfloat.h |  2 +-
 fpu/hostfloat.c         | 14 ++++++++++++++
 fpu/softfloat.c         |  2 +-
 4 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/fpu/hostfloat.h b/include/fpu/hostfloat.h
index aa555f6..79e9b6c 100644
--- a/include/fpu/hostfloat.h
+++ b/include/fpu/hostfloat.h
@@ -29,4 +29,6 @@ float64 float64_sqrt(float64 a, float_status *status);
 int float64_compare(float64 a, float64 b, float_status *s);
 int float64_compare_quiet(float64 a, float64 b, float_status *s);
 
+float64 float32_to_float64(float32, float_status *status);
+
 #endif /* HOSTFLOAT_H */
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index cb57942..b0a4d75 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -334,7 +334,7 @@ int64_t float32_to_int64(float32, float_status *status);
 uint64_t float32_to_uint64(float32, float_status *status);
 uint64_t float32_to_uint64_round_to_zero(float32, float_status *status);
 int64_t float32_to_int64_round_to_zero(float32, float_status *status);
-float64 float32_to_float64(float32, float_status *status);
+float64 soft_float32_to_float64(float32, float_status *status);
 floatx80 float32_to_floatx80(float32, float_status *status);
 float128 float32_to_float128(float32, float_status *status);
 
diff --git a/fpu/hostfloat.c b/fpu/hostfloat.c
index 139e419..b635839 100644
--- a/fpu/hostfloat.c
+++ b/fpu/hostfloat.c
@@ -326,3 +326,17 @@ GEN_FPU_SQRT(float64_sqrt, float64, double, sqrt)
 GEN_FPU_COMPARE(float32_compare, float32, float)
 GEN_FPU_COMPARE(float64_compare, float64, double)
 #undef GEN_FPU_COMPARE
+
+float64 float32_to_float64(float32 a, float_status *status)
+{
+    if (likely(float32_is_normal(a))) {
+        float f = *(float *)&a;
+        double r = f;
+
+        return *(float64 *)&r;
+    } else if (float32_is_zero(a)) {
+        return float64_set_sign(float64_zero, float32_is_neg(a));
+    } else {
+        return soft_float32_to_float64(a, status);
+    }
+}
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 1a32216..cf8d6ec 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -3149,7 +3149,7 @@ float128 uint64_to_float128(uint64_t a, float_status *status)
 | Arithmetic.
 *----------------------------------------------------------------------------*/
 
-float64 float32_to_float64(float32 a, float_status *status)
+float64 soft_float32_to_float64(float32 a, float_status *status)
 {
     flag aSign;
     int aExp;
-- 
2.7.4

next prev parent reply	other threads:[~2018-03-21 20:12 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-21 20:11 [Qemu-devel] [PATCH v1 00/14] fp-test + hostfloat Emilio G. Cota
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 01/14] tests: add fp-bench, a collection of simple floating-point microbenchmarks Emilio G. Cota
2018-03-27  8:45   ` Alex Bennée
2018-03-27 17:21     ` Emilio G. Cota
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 02/14] tests: add fp-test, a floating point test suite Emilio G. Cota
2018-03-27 10:13   ` Alex Bennée
2018-03-27 18:00     ` Emilio G. Cota
2018-03-28  9:51       ` Alex Bennée
2018-03-28 15:36         ` Emilio G. Cota
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 03/14] softfloat: fix {min, max}nummag for same-abs-value inputs Emilio G. Cota
2018-03-27 10:15   ` Alex Bennée
2018-03-27 10:15   ` Alex Bennée
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 04/14] fp-test: add muladd variants Emilio G. Cota
2018-03-27 11:33   ` Alex Bennée
2018-03-27 18:03     ` Emilio G. Cota
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 05/14] softfloat: add float32_is_normal and float64_is_normal Emilio G. Cota
2018-03-27 11:34   ` Alex Bennée
2018-03-27 18:05     ` Emilio G. Cota
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 06/14] softfloat: add float32_is_denormal and float64_is_denormal Emilio G. Cota
2018-03-27 11:35   ` Alex Bennée
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 07/14] fpu: introduce hostfloat Emilio G. Cota
2018-03-21 20:41   ` Laurent Vivier
2018-03-21 21:45     ` Emilio G. Cota
2018-03-27 11:49   ` Alex Bennée
2018-03-27 18:16     ` Emilio G. Cota
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 08/14] hostfloat: support float32/64 addition and subtraction Emilio G. Cota
2018-03-22  5:05   ` Richard Henderson
2018-03-22  5:57     ` Emilio G. Cota
2018-03-22  6:41       ` Richard Henderson
2018-03-22 15:08         ` Emilio G. Cota
2018-03-22 15:12           ` Laurent Vivier
2018-03-22 19:57         ` Emilio G. Cota
2018-03-27 11:41           ` Alex Bennée
2018-03-27 18:08             ` Emilio G. Cota
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 09/14] hostfloat: support float32/64 multiplication Emilio G. Cota
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 10/14] hostfloat: support float32/64 division Emilio G. Cota
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 11/14] hostfloat: support float32/64 fused multiply-add Emilio G. Cota
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 12/14] hostfloat: support float32/64 square root Emilio G. Cota
2018-03-22  1:29   ` Alex Bennée
2018-03-22  4:02     ` Emilio G. Cota
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 13/14] hostfloat: support float32/64 comparison Emilio G. Cota
2018-03-21 20:11 ` Emilio G. Cota [this message]
2018-03-21 20:36 ` [Qemu-devel] [PATCH v1 00/14] fp-test + hostfloat no-reply
2018-03-22  5:02 ` no-reply
2018-03-22  8:56 ` Alex Bennée
2018-03-22 15:28   ` Emilio G. Cota

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:aa555f6 dfblob:79e9b6c dfblob:cb57942 dfblob:b0a4d75
dfblob:139e419 dfblob:b635839 dfblob:1a32216 dfblob:cf8d6ec )
 OR (
bs:"[Qemu-devel] [PATCH v1 14/14] hostfloat: support float32_to_float64" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1521663109-32262-15-git-send-email-cota@braap.org \
    --to=cota@braap.org \
    --cc=alex.bennee@linaro.org \
    --cc=aurelien@aurel32.net \
    --cc=laurent@vivier.eu \
    --cc=mark.cave-ayland@ilande.co.uk \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.