Re: [Qemu-devel] [PATCH v1 07/14] fpu: introduce hostfloat

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Alex Bennée" <alex.bennee@linaro.org>
To: "Emilio G. Cota" <cota@braap.org>
Cc: qemu-devel@nongnu.org, Aurelien Jarno <aurelien@aurel32.net>,
	Peter Maydell <peter.maydell@linaro.org>,
	Laurent Vivier <laurent@vivier.eu>,
	Richard Henderson <richard.henderson@linaro.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Subject: Re: [Qemu-devel] [PATCH v1 07/14] fpu: introduce hostfloat
Date: Tue, 27 Mar 2018 12:49:48 +0100	[thread overview]
Message-ID: <87po3p91r7.fsf@linaro.org> (raw)
In-Reply-To: <1521663109-32262-8-git-send-email-cota@braap.org>


Emilio G. Cota <cota@braap.org> writes:

> The appended paves the way for leveraging the host FPU for a subset
> of guest FP operations. For most guest workloads (e.g. FP flags
> aren't ever cleared, inexact occurs often and rounding is set to the
> default [to nearest]) this will yield sizable performance speedups.
>
> The approach followed here avoids checking the FP exception flags register.
> See the comment at the top of hostfloat.c for details.
>
> This assumes that QEMU is running on an IEEE754-compliant FPU and
> that the rounding is set to the default (to nearest). The
> implementation-dependent specifics of the FPU should not matter; things
> like tininess detection and snan representation are still dealt with in
> soft-fp. However, this approach will break on most hosts if we compile
> QEMU with flags such as -ffast-math. We control the flags so this should
> be easy to enforce though.

The thing I would avoid is generating is any x87 instructions as we can
get weird effects if the compiler ever decides to stash a signalling NaN
in an x87 register.

Anyway perhaps -fno-fast-math should be explicit when building fpu/* code?

>
> The licensing in softfloat.h is complicated at best, so to keep things
> simple I'm adding this as a separate, GPL'ed file.

I don't think we need to worry about this. It's fine to add GPL only
stuff to softfloat.c and since the re-factoring (or before really) we
"own" this code and are unlikely to upstream anything.

My preference would be to include this all in softfloat.c unless there
is a very good reason not to.

>
> This patch just adds some boilerplate code; subsequent patches add
> operations, one per commit to ease bisection.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
>  Makefile.target           |  2 +-
>  include/fpu/hostfloat.h   | 14 +++++++
>  include/fpu/softfloat.h   |  1 +
>  fpu/hostfloat.c           | 96 +++++++++++++++++++++++++++++++++++++++++++++++
>  target/m68k/Makefile.objs |  2 +-
>  tests/fp-test/Makefile    |  2 +-
>  6 files changed, 114 insertions(+), 3 deletions(-)
>  create mode 100644 include/fpu/hostfloat.h
>  create mode 100644 fpu/hostfloat.c
>
> diff --git a/Makefile.target b/Makefile.target
> index 6549481..efcdfb9 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -97,7 +97,7 @@ obj-$(CONFIG_TCG) += tcg/tcg.o tcg/tcg-op.o tcg/tcg-op-vec.o tcg/tcg-op-gvec.o
>  obj-$(CONFIG_TCG) += tcg/tcg-common.o tcg/optimize.o
>  obj-$(CONFIG_TCG_INTERPRETER) += tcg/tci.o
>  obj-$(CONFIG_TCG_INTERPRETER) += disas/tci.o
> -obj-y += fpu/softfloat.o
> +obj-y += fpu/softfloat.o fpu/hostfloat.o
>  obj-y += target/$(TARGET_BASE_ARCH)/
>  obj-y += disas.o
>  obj-$(call notempty,$(TARGET_XML_FILES)) += gdbstub-xml.o
> diff --git a/include/fpu/hostfloat.h b/include/fpu/hostfloat.h
> new file mode 100644
> index 0000000..b01291b
> --- /dev/null
> +++ b/include/fpu/hostfloat.h
> @@ -0,0 +1,14 @@
> +/*
> + * Copyright (C) 2018, Emilio G. Cota <cota@braap.org>
> + *
> + * License: GNU GPL, version 2 or later.
> + *   See the COPYING file in the top-level directory.
> + */
> +#ifndef HOSTFLOAT_H
> +#define HOSTFLOAT_H
> +
> +#ifndef SOFTFLOAT_H
> +#error fpu/hostfloat.h must only be included from softfloat.h
> +#endif
> +
> +#endif /* HOSTFLOAT_H */
> diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
> index 8fb44a8..8963b68 100644
> --- a/include/fpu/softfloat.h
> +++ b/include/fpu/softfloat.h
> @@ -95,6 +95,7 @@ enum {
>  };
>
>  #include "fpu/softfloat-types.h"
> +#include "fpu/hostfloat.h"
>
>  static inline void set_float_detect_tininess(int val, float_status *status)
>  {
> diff --git a/fpu/hostfloat.c b/fpu/hostfloat.c
> new file mode 100644
> index 0000000..cab0341
> --- /dev/null
> +++ b/fpu/hostfloat.c
> @@ -0,0 +1,96 @@
> +/*
> + * hostfloat.c - FP primitives that use the host's FPU whenever possible.
> + *
> + * Copyright (C) 2018, Emilio G. Cota <cota@braap.org>
> + *
> + * License: GNU GPL, version 2 or later.
> + *   See the COPYING file in the top-level directory.
> + *
> + * Fast emulation of guest FP instructions is challenging for two reasons.
> + * First, FP instruction semantics are similar but not identical, particularly
> + * when handling NaNs. Second, emulating at reasonable speed the guest FP
> + * exception flags is not trivial: reading the host's flags register with a
> + * feclearexcept & fetestexcept pair is slow [slightly slower than soft-fp],
> + * and trapping on every FP exception is not fast nor pleasant to work with.
> + *
> + * This module leverages the host FPU for a subset of the operations. To
> + * do this it follows the main idea presented in this paper:
> + *
> + * Guo, Yu-Chuan, et al. "Translating the ARM Neon and VFP instructions in a
> + * binary translator." Software: Practice and Experience 46.12 (2016):1591-1615.
> + *
> + * The idea is thus to leverage the host FPU to (1) compute FP operations
> + * and (2) identify whether FP exceptions occurred while avoiding
> + * expensive exception flag register accesses.
> + *
> + * An important optimization shown in the paper is that given that exception
> + * flags are rarely cleared by the guest, we can avoid recomputing some flags.
> + * This is particularly useful for the inexact flag, which is very frequently
> + * raised in floating-point workloads.
> + *
> + * We optimize the code further by deferring to soft-fp whenever FP
> + * exception detection might get hairy. Fortunately this is not common.
> + */
> +#include <math.h>
> +
> +#include "qemu/osdep.h"
> +#include "fpu/softfloat.h"
> +
> +#define GEN_TYPE_CONV(name, to_t, from_t)       \
> +    static inline to_t name(from_t a)           \
> +    {                                           \
> +        to_t r = *(to_t *)&a;                   \
> +        return r;                               \
> +    }
> +
> +GEN_TYPE_CONV(float32_to_float, float, float32)
> +GEN_TYPE_CONV(float64_to_double, double, float64)
> +GEN_TYPE_CONV(float_to_float32, float32, float)
> +GEN_TYPE_CONV(double_to_float64, float64, double)
> +#undef GEN_TYPE_CONV
> +
> +#define GEN_INPUT_FLUSH(soft_t)                                         \
> +    static inline __attribute__((always_inline)) void                   \
> +    soft_t ## _input_flush__nocheck(soft_t *a, float_status *s)         \
> +    {                                                                   \
> +        if (unlikely(soft_t ## _is_denormal(*a))) {                     \
> +            *a = soft_t ## _set_sign(soft_t ## _zero,                   \
> +                                     soft_t ## _is_neg(*a));            \
> +            s->float_exception_flags |= float_flag_input_denormal;      \
> +        }                                                               \
> +    }                                                                   \
> +                                                                        \
> +    static inline __attribute__((always_inline)) void                   \
> +    soft_t ## _input_flush1(soft_t *a, float_status *s)                 \
> +    {                                                                   \
> +        if (likely(!s->flush_inputs_to_zero)) {                         \
> +            return;                                                     \
> +        }                                                               \
> +        soft_t ## _input_flush__nocheck(a, s);                          \
> +    }                                                                   \
> +                                                                        \
> +    static inline __attribute__((always_inline)) void                   \
> +    soft_t ## _input_flush2(soft_t *a, soft_t *b, float_status *s)      \
> +    {                                                                   \
> +        if (likely(!s->flush_inputs_to_zero)) {                         \
> +            return;                                                     \
> +        }                                                               \
> +        soft_t ## _input_flush__nocheck(a, s);                          \
> +        soft_t ## _input_flush__nocheck(b, s);                          \
> +    }                                                                   \
> +                                                                        \
> +    static inline __attribute__((always_inline)) void                   \
> +    soft_t ## _input_flush3(soft_t *a, soft_t *b, soft_t *c,            \
> +                            float_status *s)                            \
> +    {                                                                   \
> +        if (likely(!s->flush_inputs_to_zero)) {                         \
> +            return;                                                     \
> +        }                                                               \
> +        soft_t ## _input_flush__nocheck(a, s);                          \
> +        soft_t ## _input_flush__nocheck(b, s);                          \
> +        soft_t ## _input_flush__nocheck(c, s);                          \
> +    }
> +
> +GEN_INPUT_FLUSH(float32)
> +GEN_INPUT_FLUSH(float64)

Having spent time getting rid of a bunch of macro expansions I'm wary of
adding more in. However for these I guess it's kind of marginal.

> +#undef GEN_INPUT_FLUSH
> diff --git a/target/m68k/Makefile.objs b/target/m68k/Makefile.objs
> index ac61948..2868b11 100644
> --- a/target/m68k/Makefile.objs
> +++ b/target/m68k/Makefile.objs
> @@ -1,5 +1,5 @@
>  obj-y += m68k-semi.o
>  obj-y += translate.o op_helper.o helper.o cpu.o
> -obj-y += fpu_helper.o softfloat.o
> +obj-y += fpu_helper.o softfloat.o hostfloat.o
>  obj-y += gdbstub.o
>  obj-$(CONFIG_SOFTMMU) += monitor.o
> diff --git a/tests/fp-test/Makefile b/tests/fp-test/Makefile
> index 703434f..187cfcc 100644
> --- a/tests/fp-test/Makefile
> +++ b/tests/fp-test/Makefile
> @@ -28,7 +28,7 @@ ibm:
>  $(WHITELIST_FILES):
>  	wget -nv -O $@ http://www.cs.columbia.edu/~cota/qemu/fpbench-$@
>
> -fp-test$(EXESUF): fp-test.o softfloat.o
> +fp-test$(EXESUF): fp-test.o softfloat.o hostfloat.o
>
>  clean:
>  	rm -f *.o *.d $(OBJS)


--
Alex Bennée

next prev parent reply	other threads:[~2018-03-27 11:49 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-21 20:11 [Qemu-devel] [PATCH v1 00/14] fp-test + hostfloat Emilio G. Cota
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 01/14] tests: add fp-bench, a collection of simple floating-point microbenchmarks Emilio G. Cota
2018-03-27  8:45   ` Alex Bennée
2018-03-27 17:21     ` Emilio G. Cota
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 02/14] tests: add fp-test, a floating point test suite Emilio G. Cota
2018-03-27 10:13   ` Alex Bennée
2018-03-27 18:00     ` Emilio G. Cota
2018-03-28  9:51       ` Alex Bennée
2018-03-28 15:36         ` Emilio G. Cota
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 03/14] softfloat: fix {min, max}nummag for same-abs-value inputs Emilio G. Cota
2018-03-27 10:15   ` Alex Bennée
2018-03-27 10:15   ` Alex Bennée
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 04/14] fp-test: add muladd variants Emilio G. Cota
2018-03-27 11:33   ` Alex Bennée
2018-03-27 18:03     ` Emilio G. Cota
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 05/14] softfloat: add float32_is_normal and float64_is_normal Emilio G. Cota
2018-03-27 11:34   ` Alex Bennée
2018-03-27 18:05     ` Emilio G. Cota
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 06/14] softfloat: add float32_is_denormal and float64_is_denormal Emilio G. Cota
2018-03-27 11:35   ` Alex Bennée
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 07/14] fpu: introduce hostfloat Emilio G. Cota
2018-03-21 20:41   ` Laurent Vivier
2018-03-21 21:45     ` Emilio G. Cota
2018-03-27 11:49   ` Alex Bennée [this message]
2018-03-27 18:16     ` Emilio G. Cota
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 08/14] hostfloat: support float32/64 addition and subtraction Emilio G. Cota
2018-03-22  5:05   ` Richard Henderson
2018-03-22  5:57     ` Emilio G. Cota
2018-03-22  6:41       ` Richard Henderson
2018-03-22 15:08         ` Emilio G. Cota
2018-03-22 15:12           ` Laurent Vivier
2018-03-22 19:57         ` Emilio G. Cota
2018-03-27 11:41           ` Alex Bennée
2018-03-27 18:08             ` Emilio G. Cota
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 09/14] hostfloat: support float32/64 multiplication Emilio G. Cota
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 10/14] hostfloat: support float32/64 division Emilio G. Cota
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 11/14] hostfloat: support float32/64 fused multiply-add Emilio G. Cota
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 12/14] hostfloat: support float32/64 square root Emilio G. Cota
2018-03-22  1:29   ` Alex Bennée
2018-03-22  4:02     ` Emilio G. Cota
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 13/14] hostfloat: support float32/64 comparison Emilio G. Cota
2018-03-21 20:11 ` [Qemu-devel] [PATCH v1 14/14] hostfloat: support float32_to_float64 Emilio G. Cota
2018-03-21 20:36 ` [Qemu-devel] [PATCH v1 00/14] fp-test + hostfloat no-reply
2018-03-22  5:02 ` no-reply
2018-03-22  8:56 ` Alex Bennée
2018-03-22 15:28   ` Emilio G. Cota

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87po3p91r7.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=aurelien@aurel32.net \
    --cc=cota@braap.org \
    --cc=laurent@vivier.eu \
    --cc=mark.cave-ayland@ilande.co.uk \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).