* [Qemu-devel] [PULL 01/15] contrib: add a basic gitdm config
2018-12-14 13:54 [Qemu-devel] [PULL 00/15] Hardfloat + softfloat maintainers update and gitdm Alex Bennée
@ 2018-12-14 13:54 ` Alex Bennée
2018-12-14 20:55 ` Alex Bennée
2018-12-14 13:54 ` [Qemu-devel] [PULL 02/15] MAINTAINERS: update status of FPU emulation Alex Bennée
` (14 subsequent siblings)
15 siblings, 1 reply; 19+ messages in thread
From: Alex Bennée @ 2018-12-14 13:54 UTC (permalink / raw)
To: peter.maydell; +Cc: qemu-devel, Alex Bennée
This is a QEMU specific version of a gitdm config for generating
reports on the contributor base of the project. I've added enough
group maps and domain aliases to ensure the current top ten is as
reflective as it can be. As of this commit running:
git log --numstat --since "Last Year" | gitdm -n -l 10
Reports:
Top changeset contributors by employer
Red Hat 3172 (44.3%)
Linaro 1153 (16.1%)
(None) 549 (7.7%)
IBM 348 (4.9%)
Academics (various) 170 (2.4%)
Virtuozzo 168 (2.3%)
Wave Computing 118 (1.6%)
Xilinx 102 (1.4%)
Igalia 93 (1.3%)
Cadence Design Systems 88 (1.2%)
Top lines changed by employer
Red Hat 144092 (28.1%)
Cadence Design Systems 126554 (24.6%)
Linaro 77480 (15.1%)
Wave Computing 33134 (6.5%)
SiFive 14392 (2.8%)
IBM 12219 (2.4%)
(None) 11948 (2.3%)
Academics (various) 10447 (2.0%)
Virtuozzo 10445 (2.0%)
CodeWeavers 9179 (1.8%)
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
diff --git a/contrib/gitdm/aliases b/contrib/gitdm/aliases
new file mode 100644
index 0000000000..07fd3391a5
--- /dev/null
+++ b/contrib/gitdm/aliases
@@ -0,0 +1,27 @@
+#
+# This is the email aliases file, mapping secondary addresses
+# onto a single, canonical address. Duplicates some info from .mailmap
+#
+
+# weird commits
+balrog@c046a42c-6fe2-441c-8c8c-71466251a162 balrogg@gmail.com
+aliguori@c046a42c-6fe2-441c-8c8c-71466251a162 anthony@codemonkey.ws
+aurel32@c046a42c-6fe2-441c-8c8c-71466251a162 aurelien@aurel32.net
+blueswir1@c046a42c-6fe2-441c-8c8c-71466251a162 blauwirbel@gmail.com
+edgar_igl@c046a42c-6fe2-441c-8c8c-71466251a162 edgar.iglesias@gmail.com
+bellard@c046a42c-6fe2-441c-8c8c-71466251a162 fabrice@bellard.org
+j_mayer@c046a42c-6fe2-441c-8c8c-71466251a162 l_indien@magic.fr
+pbrook@c046a42c-6fe2-441c-8c8c-71466251a162 paul@codesourcery.com
+ths@c046a42c-6fe2-441c-8c8c-71466251a162 ths@networkno.de
+malc@c046a42c-6fe2-441c-8c8c-71466251a162 av1474@comtv.ru
+
+# There is also a:
+# (no author) <(no author)@c046a42c-6fe2-441c-8c8c-71466251a162>
+# for the cvs2svn initialization commit e63c3dc74bf.
+
+# Next, translate a few commits where mailman rewrote the From: line due
+# to strict SPF, although we prefer to avoid adding more entries like that.
+"Ed Swierk via Qemu-devel" eswierk@skyportsystems.com
+"Ian McKellar via Qemu-devel" ianloic@google.com
+"Julia Suvorova via Qemu-devel" jusual@mail.ru
+"Justin Terry (VM) via Qemu-devel" juterry@microsoft.com
diff --git a/contrib/gitdm/domain-map b/contrib/gitdm/domain-map
new file mode 100644
index 0000000000..8cbbcfe93d
--- /dev/null
+++ b/contrib/gitdm/domain-map
@@ -0,0 +1,19 @@
+#
+# QEMU gitdm domain-map
+#
+# This maps email domains to nice easy to read company names
+#
+
+amd.com AMD
+greensocs.com GreenSocs
+ibm.com IBM
+igalia.com Igalia
+linaro.org Linaro
+oracle.com Oracle
+redhat.com Red Hat
+siemens.com Siemens
+sifive.com SiFive
+suse.de SUSE
+virtuozzo.com Virtuozzo
+wdc.com Western Digital
+xilinx.com Xilinx
diff --git a/contrib/gitdm/filetypes.txt b/contrib/gitdm/filetypes.txt
new file mode 100644
index 0000000000..15d6f803b9
--- /dev/null
+++ b/contrib/gitdm/filetypes.txt
@@ -0,0 +1,146 @@
+# -*- coding:utf-8 -*-
+# Copyright (C) 2006 Libresoft
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU Library General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+#
+# Authors : Gregorio Robles <grex@gsyc.escet.urjc.es>
+# Authors : Germán Póo-Caamaño <gpoo@gnome.org>
+#
+# This QEMU version is a cut-down version of what originally shipped
+# in the gitdm sample-config directory.
+#
+# This file contains associations parameters regarding filetypes
+# (documentation, develompent, multimedia, images...)
+#
+# format:
+# filetype <type> <regex> [<comment>]
+#
+# Order:
+# The list should keep an order, so filetypes can be counted properly.
+# ie. we want ltmain.sh -> 'build' instead of 'code'.
+#
+# If there is an filetype which is not in order but has values, it will
+# be added at the end.
+#
+order build,tests,code,documentation,devel-doc,blobs
+
+#
+#
+# Code files (headers and the like included
+# (most common languages first
+#
+filetype code \.c$ # C
+filetype code \.inc.c$ # C
+filetype code \.C$ # C++
+filetype code \.cpp$ # C++
+filetype code \.c\+\+$ # C++
+filetype code \.cxx$ # C++
+filetype code \.cc$ # C++
+filetype code \.h$ # C or C++ header
+filetype code \.hh$ # C++ header
+filetype code \.hpp$ # C++ header
+filetype code \.hxx$ # C++ header
+filetype code \.sh$ # Shell
+filetype code \.pl$ # Perl
+filetype code \.py$ # Python
+filetype code \.s$ # Assembly
+filetype code \.S$ # Assembly
+filetype code \.asm$ # Assembly
+filetype code \.awk$ # awk
+filetype code ^common$ # script fragements
+filetype code ^common.*$ # script fragements
+filetype code (qom|qmp)-\w+$ # python script fragments
+
+#
+# Interface/api files
+#
+filetype interface \.json$ # json
+filetype interface \.hx$ # documented options
+
+#
+# Test related blobs (unfortunately we can't filter out test code)
+#
+filetype tests \.hex$
+filetype tests \d{2,3}$ # test data 00-999
+filetype tests ^[A-Z]{4}$ # ACPI test data
+filetype tests ^[A-Z]{4}\.*$ # ACPI test data
+filetype tests \.out$
+filetype tests \.out\.nocache$
+filetype tests \.err$
+filetype tests \.exit$ # bad-if-FOO.exit etc
+filetype tests \.decode$
+filetype tests \.yml$ # travis/shippable config
+
+#
+# Development documentation files (for hacking generally)
+#
+filetype devel-doc ^readme.*$
+filetype devel-doc ^changelog.*
+filetype devel-doc ^hacking.*$
+filetype devel-doc ^licen(s|c)e.*$
+filetype devel-doc ^copying.*$
+filetype devel-doc ^MAINTAINERS$
+filetype devel-doc ^BSD-2-Clause$
+filetype devel-doc ^BSD-3-Clause$
+filetype devel-doc ^GPL-2.0$
+filetype devel-doc \.txt$
+filetype devel-doc \.rst$
+filetype devel-doc \.texi$
+filetype devel-doc \.pod$
+
+#
+# Building, compiling, and configuration admin files
+#
+filetype build configure.*$
+filetype build Makefile$
+filetype build Makefile\.*$
+filetype build config$
+filetype build conf$
+filetype build \.cfg$
+filetype build \.mk$
+filetype build \.mak$
+filetype build \.docker$
+filetype build \.pre$
+filetype build ^.gitignore$
+filetype build ^.gitmodules$
+filetype build ^.gitpublish$
+filetype build ^.mailmap$
+filetype build ^.dir-locals.el$
+filetype build ^.editorconfig$
+filetype build ^.exrc$
+filetype build ^.gdbinit$
+filetype build \.cocci$ # Coccinelle semantic patches
+
+#
+# Misc blobs
+#
+filetype blobs \.bin$
+filetype blobs \.dtb$
+filetype blobs \.dts$
+filetype blobs \.rom$
+filetype blobs \.img$
+filetype blobs \.ndrv$
+filetype blobs \.bmp$
+filetype blobs \.svg$
+filetype blobs ^pi_10.com$
+
+
+#
+# Documentation files
+#
+filetype documentation \.html$
+filetype documentation \.txt$
+filetype documentation \.texi$
+filetype documentation \.po$ # translation files
diff --git a/contrib/gitdm/group-map-cadence b/contrib/gitdm/group-map-cadence
new file mode 100644
index 0000000000..ab97dd2fc3
--- /dev/null
+++ b/contrib/gitdm/group-map-cadence
@@ -0,0 +1,3 @@
+# Cadence Design Systems
+
+jcmvbkbc@gmail.com
diff --git a/contrib/gitdm/group-map-codeweavers b/contrib/gitdm/group-map-codeweavers
new file mode 100644
index 0000000000..c4803489e2
--- /dev/null
+++ b/contrib/gitdm/group-map-codeweavers
@@ -0,0 +1 @@
+sergio.g.delreal@gmail.com
diff --git a/contrib/gitdm/group-map-ibm b/contrib/gitdm/group-map-ibm
new file mode 100644
index 0000000000..b66db5f4a8
--- /dev/null
+++ b/contrib/gitdm/group-map-ibm
@@ -0,0 +1,6 @@
+#
+# Some IBM contributors submit via another domain
+#
+
+clg@kaod.org
+groug@kaod.org
diff --git a/contrib/gitdm/group-map-redhat b/contrib/gitdm/group-map-redhat
new file mode 100644
index 0000000000..6d05c6b54f
--- /dev/null
+++ b/contrib/gitdm/group-map-redhat
@@ -0,0 +1,7 @@
+#
+# Red Hat contributors using non-corporate email
+#
+
+david@gibson.dropbear.id.au
+laurent@vivier.eu
+pjp@fedoraproject.org
diff --git a/contrib/gitdm/group-map-wavecomp b/contrib/gitdm/group-map-wavecomp
new file mode 100644
index 0000000000..c571a52c65
--- /dev/null
+++ b/contrib/gitdm/group-map-wavecomp
@@ -0,0 +1,18 @@
+#
+# Wave Computing acquired MIPS in June 2018. Also, from February 2013
+# to October 2017, MIPS was owned by Imagination Technologies.
+#
+
+aleksandar.markovic@imgtec.com
+aleksandar.markovic@mips.com
+amarkovic@wavecomp.com
+arikalo@wavecomp.com
+dnikolic@wavecomp.com
+james.hogan@mips.com
+matthew.fortune@mips.com
+paul.burton@imgtec.com
+pburton@wavecomp.com
+smarkovic@wavecomp.com
+yongbok.kim@imgtec.com
+yongbok.kim@mips.com
+ysu@wavecomp.com
diff --git a/gitdm.config b/gitdm.config
new file mode 100644
index 0000000000..7472d4b8be
--- /dev/null
+++ b/gitdm.config
@@ -0,0 +1,50 @@
+#
+# This is the gitdm configuration file for QEMU.
+#
+# It is to be used with LWN's git dataminer tool for generating
+# reports about development activity in the QEMU repo. The LWN gitdm
+# tool can be found at:
+#
+# git://git.lwn.net/gitdm.git
+#
+# A run to generate a report for the last year of activity would be
+#
+# git log --numstat --since "Last Year" | gitdm -n -l 10
+#
+
+# EmailAliases lets us cope with developers who use more
+# than one address or have changed addresses. This duplicates some of
+# the information in the existing .mailmap but in a slightly different
+# form.
+#
+EmailAliases contrib/gitdm/aliases
+
+#
+# EmailMap does the main work of mapping addresses onto
+# employers.
+#
+EmailMap contrib/gitdm/domain-map
+
+#
+# Use GroupMap to map a file full of addresses to the
+# same employer. This is used for people that don't post from easily
+# identifiable corporate emails.
+#
+
+GroupMap contrib/gitdm/group-map-redhat Red Hat
+GroupMap contrib/gitdm/group-map-wavecomp Wave Computing
+GroupMap contrib/gitdm/group-map-cadence Cadence Design Systems
+GroupMap contrib/gitdm/group-map-codeweavers CodeWeavers
+GroupMap contrib/gitdm/group-map-ibm IBM
+
+# Also group together our prolific individual contributors
+# and those working under academic auspices
+GroupMap contrib/gitdm/group-map-individuals (None)
+GroupMap contrib/gitdm/group-map-academics Academics (various)
+
+#
+#
+# Use FileTypeMap to map a file types to file names using regular
+# regular expressions.
+#
+FileTypeMap contrib/gitdm/filetypes.txt
--
2.17.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PULL 01/15] contrib: add a basic gitdm config
2018-12-14 13:54 ` [Qemu-devel] [PULL 01/15] contrib: add a basic gitdm config Alex Bennée
@ 2018-12-14 20:55 ` Alex Bennée
0 siblings, 0 replies; 19+ messages in thread
From: Alex Bennée @ 2018-12-14 20:55 UTC (permalink / raw)
To: peter.maydell; +Cc: qemu-devel
Alex Bennée <alex.bennee@linaro.org> writes:
> This is a QEMU specific version of a gitdm config for generating
> reports on the contributor base of the project. I've added enough
> group maps and domain aliases to ensure the current top ten is as
> reflective as it can be. As of this commit running:
<snip>
> +# Also group together our prolific individual contributors
> +# and those working under academic auspices
> +GroupMap contrib/gitdm/group-map-individuals (None)
> +GroupMap contrib/gitdm/group-map-academics Academics (various)
Have sent v4 for these two files. I'll make a new PR on Monday.
> +
> +#
> +#
> +# Use FileTypeMap to map a file types to file names using regular
> +# regular expressions.
> +#
> +FileTypeMap contrib/gitdm/filetypes.txt
--
Alex Bennée
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Qemu-devel] [PULL 02/15] MAINTAINERS: update status of FPU emulation
2018-12-14 13:54 [Qemu-devel] [PULL 00/15] Hardfloat + softfloat maintainers update and gitdm Alex Bennée
2018-12-14 13:54 ` [Qemu-devel] [PULL 01/15] contrib: add a basic gitdm config Alex Bennée
@ 2018-12-14 13:54 ` Alex Bennée
2018-12-14 13:54 ` [Qemu-devel] [PULL 03/15] fp-test: pick TARGET_ARM to get its specialization Alex Bennée
` (13 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Alex Bennée @ 2018-12-14 13:54 UTC (permalink / raw)
To: peter.maydell; +Cc: qemu-devel, Alex Bennée
Given I've spent a fair amount of time around this code now I'm
putting myself forward as a maintainer. Also given that the code has
been extensively re-written and has testing and new incoming features
it is probably more than just Odd Fixes.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
diff --git a/MAINTAINERS b/MAINTAINERS
index 83c127f0d6..d676c73f88 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -127,9 +127,11 @@ F: include/sysemu/cpus.h
FPU emulation
M: Aurelien Jarno <aurelien@aurel32.net>
M: Peter Maydell <peter.maydell@linaro.org>
-S: Odd Fixes
+M: Alex Bennée <alex.bennee@linaro.org>
+S: Maintained
F: fpu/
F: include/fpu/
+F: tests/fp/
Alpha
M: Richard Henderson <rth@twiddle.net>
--
2.17.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Qemu-devel] [PULL 03/15] fp-test: pick TARGET_ARM to get its specialization
2018-12-14 13:54 [Qemu-devel] [PULL 00/15] Hardfloat + softfloat maintainers update and gitdm Alex Bennée
2018-12-14 13:54 ` [Qemu-devel] [PULL 01/15] contrib: add a basic gitdm config Alex Bennée
2018-12-14 13:54 ` [Qemu-devel] [PULL 02/15] MAINTAINERS: update status of FPU emulation Alex Bennée
@ 2018-12-14 13:54 ` Alex Bennée
2018-12-14 13:54 ` [Qemu-devel] [PULL 04/15] softfloat: add float{32, 64}_is_{de, }normal Alex Bennée
` (12 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Alex Bennée @ 2018-12-14 13:54 UTC (permalink / raw)
To: peter.maydell
Cc: qemu-devel, Emilio G. Cota, Alex Bennée, Aurelien Jarno
From: "Emilio G. Cota" <cota@braap.org>
This gets rid of the muladd errors due to not raising the invalid flag.
- Before:
Errors found in f64_mulAdd, rounding near_even, tininess before rounding:
+000.0000000000000 +7FF.0000000000000 +7FF.FFFFFFFFFFFFF
=> +7FF.FFFFFFFFFFFFF ..... expected -7FF.FFFFFFFFFFFFF v....
[...]
- After:
In 6133248 tests, no errors found in f64_mulAdd, rounding near_even, tininess before rounding.
[...]
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
diff --git a/tests/fp/Makefile b/tests/fp/Makefile
index d649a5a1db..49cdcd1bd2 100644
--- a/tests/fp/Makefile
+++ b/tests/fp/Makefile
@@ -29,6 +29,9 @@ QEMU_INCLUDES += -I$(TF_SOURCE_DIR)
# work around TARGET_* poisoning
QEMU_CFLAGS += -DHW_POISON_H
+# define a target to match testfloat's implementation-defined choices, such as
+# whether to raise the invalid flag when dealing with NaNs in muladd.
+QEMU_CFLAGS += -DTARGET_ARM
# capstone has a platform.h file that clashes with softfloat's
QEMU_CFLAGS := $(filter-out %capstone, $(QEMU_CFLAGS))
--
2.17.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Qemu-devel] [PULL 04/15] softfloat: add float{32, 64}_is_{de, }normal
2018-12-14 13:54 [Qemu-devel] [PULL 00/15] Hardfloat + softfloat maintainers update and gitdm Alex Bennée
` (2 preceding siblings ...)
2018-12-14 13:54 ` [Qemu-devel] [PULL 03/15] fp-test: pick TARGET_ARM to get its specialization Alex Bennée
@ 2018-12-14 13:54 ` Alex Bennée
2018-12-14 13:54 ` [Qemu-devel] [PULL 05/15] target/tricore: use float32_is_denormal Alex Bennée
` (11 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Alex Bennée @ 2018-12-14 13:54 UTC (permalink / raw)
To: peter.maydell
Cc: qemu-devel, Emilio G. Cota, Alex Bennée, Aurelien Jarno
From: "Emilio G. Cota" <cota@braap.org>
This paves the way for upcoming work.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 8fd9f9bbae..9eeccd88a5 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -464,6 +464,16 @@ static inline int float32_is_zero_or_denormal(float32 a)
return (float32_val(a) & 0x7f800000) == 0;
}
+static inline bool float32_is_normal(float32 a)
+{
+ return ((float32_val(a) + 0x00800000) & 0x7fffffff) >= 0x01000000;
+}
+
+static inline bool float32_is_denormal(float32 a)
+{
+ return float32_is_zero_or_denormal(a) && !float32_is_zero(a);
+}
+
static inline float32 float32_set_sign(float32 a, int sign)
{
return make_float32((float32_val(a) & 0x7fffffff) | (sign << 31));
@@ -605,6 +615,16 @@ static inline int float64_is_zero_or_denormal(float64 a)
return (float64_val(a) & 0x7ff0000000000000LL) == 0;
}
+static inline bool float64_is_normal(float64 a)
+{
+ return ((float64_val(a) + (1ULL << 52)) & -1ULL >> 1) >= 1ULL << 53;
+}
+
+static inline bool float64_is_denormal(float64 a)
+{
+ return float64_is_zero_or_denormal(a) && !float64_is_zero(a);
+}
+
static inline float64 float64_set_sign(float64 a, int sign)
{
return make_float64((float64_val(a) & 0x7fffffffffffffffULL)
--
2.17.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Qemu-devel] [PULL 05/15] target/tricore: use float32_is_denormal
2018-12-14 13:54 [Qemu-devel] [PULL 00/15] Hardfloat + softfloat maintainers update and gitdm Alex Bennée
` (3 preceding siblings ...)
2018-12-14 13:54 ` [Qemu-devel] [PULL 04/15] softfloat: add float{32, 64}_is_{de, }normal Alex Bennée
@ 2018-12-14 13:54 ` Alex Bennée
2018-12-14 13:54 ` [Qemu-devel] [PULL 06/15] softfloat: rename canonicalize to sf_canonicalize Alex Bennée
` (10 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Alex Bennée @ 2018-12-14 13:54 UTC (permalink / raw)
To: peter.maydell
Cc: qemu-devel, Emilio G. Cota, Alex Bennée, Bastian Koppelmann
From: "Emilio G. Cota" <cota@braap.org>
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
diff --git a/target/tricore/fpu_helper.c b/target/tricore/fpu_helper.c
index df162902d6..31df462e4a 100644
--- a/target/tricore/fpu_helper.c
+++ b/target/tricore/fpu_helper.c
@@ -44,11 +44,6 @@ static inline uint8_t f_get_excp_flags(CPUTriCoreState *env)
| float_flag_inexact);
}
-static inline bool f_is_denormal(float32 arg)
-{
- return float32_is_zero_or_denormal(arg) && !float32_is_zero(arg);
-}
-
static inline float32 f_maddsub_nan_result(float32 arg1, float32 arg2,
float32 arg3, float32 result,
uint32_t muladd_negate_c)
@@ -260,8 +255,8 @@ uint32_t helper_fcmp(CPUTriCoreState *env, uint32_t r1, uint32_t r2)
set_flush_inputs_to_zero(0, &env->fp_status);
result = 1 << (float32_compare_quiet(arg1, arg2, &env->fp_status) + 1);
- result |= f_is_denormal(arg1) << 4;
- result |= f_is_denormal(arg2) << 5;
+ result |= float32_is_denormal(arg1) << 4;
+ result |= float32_is_denormal(arg2) << 5;
flags = f_get_excp_flags(env);
if (flags) {
--
2.17.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Qemu-devel] [PULL 06/15] softfloat: rename canonicalize to sf_canonicalize
2018-12-14 13:54 [Qemu-devel] [PULL 00/15] Hardfloat + softfloat maintainers update and gitdm Alex Bennée
` (4 preceding siblings ...)
2018-12-14 13:54 ` [Qemu-devel] [PULL 05/15] target/tricore: use float32_is_denormal Alex Bennée
@ 2018-12-14 13:54 ` Alex Bennée
2018-12-14 13:54 ` [Qemu-devel] [PULL 07/15] softfloat: add float{32, 64}_is_zero_or_normal Alex Bennée
` (9 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Alex Bennée @ 2018-12-14 13:54 UTC (permalink / raw)
To: peter.maydell
Cc: qemu-devel, Emilio G. Cota, Alex Bennée, Aurelien Jarno
From: "Emilio G. Cota" <cota@braap.org>
glibc >= 2.25 defines canonicalize in commit eaf5ad0
(Add canonicalize, canonicalizef, canonicalizel., 2016-10-26).
Given that we'll be including <math.h> soon, prepare
for this by prefixing our canonicalize() with sf_ to avoid
clashing with the libc's canonicalize().
Reported-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Tested-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index e1eef954e6..ecdc00c633 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -336,8 +336,8 @@ static inline float64 float64_pack_raw(FloatParts p)
#include "softfloat-specialize.h"
/* Canonicalize EXP and FRAC, setting CLS. */
-static FloatParts canonicalize(FloatParts part, const FloatFmt *parm,
- float_status *status)
+static FloatParts sf_canonicalize(FloatParts part, const FloatFmt *parm,
+ float_status *status)
{
if (part.exp == parm->exp_max && !parm->arm_althp) {
if (part.frac == 0) {
@@ -513,7 +513,7 @@ static FloatParts round_canonical(FloatParts p, float_status *s,
static FloatParts float16a_unpack_canonical(float16 f, float_status *s,
const FloatFmt *params)
{
- return canonicalize(float16_unpack_raw(f), params, s);
+ return sf_canonicalize(float16_unpack_raw(f), params, s);
}
static FloatParts float16_unpack_canonical(float16 f, float_status *s)
@@ -534,7 +534,7 @@ static float16 float16_round_pack_canonical(FloatParts p, float_status *s)
static FloatParts float32_unpack_canonical(float32 f, float_status *s)
{
- return canonicalize(float32_unpack_raw(f), &float32_params, s);
+ return sf_canonicalize(float32_unpack_raw(f), &float32_params, s);
}
static float32 float32_round_pack_canonical(FloatParts p, float_status *s)
@@ -544,7 +544,7 @@ static float32 float32_round_pack_canonical(FloatParts p, float_status *s)
static FloatParts float64_unpack_canonical(float64 f, float_status *s)
{
- return canonicalize(float64_unpack_raw(f), &float64_params, s);
+ return sf_canonicalize(float64_unpack_raw(f), &float64_params, s);
}
static float64 float64_round_pack_canonical(FloatParts p, float_status *s)
--
2.17.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Qemu-devel] [PULL 07/15] softfloat: add float{32, 64}_is_zero_or_normal
2018-12-14 13:54 [Qemu-devel] [PULL 00/15] Hardfloat + softfloat maintainers update and gitdm Alex Bennée
` (5 preceding siblings ...)
2018-12-14 13:54 ` [Qemu-devel] [PULL 06/15] softfloat: rename canonicalize to sf_canonicalize Alex Bennée
@ 2018-12-14 13:54 ` Alex Bennée
2018-12-14 13:54 ` [Qemu-devel] [PULL 08/15] tests/fp: add fp-bench Alex Bennée
` (8 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Alex Bennée @ 2018-12-14 13:54 UTC (permalink / raw)
To: peter.maydell
Cc: qemu-devel, Emilio G. Cota, Alex Bennée, Aurelien Jarno
From: "Emilio G. Cota" <cota@braap.org>
These will gain some users very soon.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 9eeccd88a5..38a5e99cf3 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -474,6 +474,11 @@ static inline bool float32_is_denormal(float32 a)
return float32_is_zero_or_denormal(a) && !float32_is_zero(a);
}
+static inline bool float32_is_zero_or_normal(float32 a)
+{
+ return float32_is_normal(a) || float32_is_zero(a);
+}
+
static inline float32 float32_set_sign(float32 a, int sign)
{
return make_float32((float32_val(a) & 0x7fffffff) | (sign << 31));
@@ -625,6 +630,11 @@ static inline bool float64_is_denormal(float64 a)
return float64_is_zero_or_denormal(a) && !float64_is_zero(a);
}
+static inline bool float64_is_zero_or_normal(float64 a)
+{
+ return float64_is_normal(a) || float64_is_zero(a);
+}
+
static inline float64 float64_set_sign(float64 a, int sign)
{
return make_float64((float64_val(a) & 0x7fffffffffffffffULL)
--
2.17.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Qemu-devel] [PULL 08/15] tests/fp: add fp-bench
2018-12-14 13:54 [Qemu-devel] [PULL 00/15] Hardfloat + softfloat maintainers update and gitdm Alex Bennée
` (6 preceding siblings ...)
2018-12-14 13:54 ` [Qemu-devel] [PULL 07/15] softfloat: add float{32, 64}_is_zero_or_normal Alex Bennée
@ 2018-12-14 13:54 ` Alex Bennée
2018-12-14 13:54 ` [Qemu-devel] [PULL 09/15] fpu: introduce hardfloat Alex Bennée
` (7 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Alex Bennée @ 2018-12-14 13:54 UTC (permalink / raw)
To: peter.maydell
Cc: qemu-devel, Emilio G. Cota, Alex Bennée, Aurelien Jarno
From: "Emilio G. Cota" <cota@braap.org>
These microbenchmarks will allow us to measure the performance impact of
FP emulation optimizations. Note that we can measure both directly the impact
on the softfloat functions (with "-t soft"), or the impact on an
emulated workload (call with "-t host" and run under qemu user-mode).
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
diff --git a/tests/fp/.gitignore b/tests/fp/.gitignore
index 8d45d18ac4..704fd42992 100644
--- a/tests/fp/.gitignore
+++ b/tests/fp/.gitignore
@@ -1 +1,2 @@
fp-test
+fp-bench
diff --git a/tests/fp/Makefile b/tests/fp/Makefile
index 49cdcd1bd2..5019dcdca0 100644
--- a/tests/fp/Makefile
+++ b/tests/fp/Makefile
@@ -553,7 +553,7 @@ TF_OBJS_LIB += $(TF_OBJS_WRITECASE)
TF_OBJS_LIB += testLoops_common.o
TF_OBJS_LIB += $(TF_OBJS_TEST)
-BINARIES := fp-test$(EXESUF)
+BINARIES := fp-test$(EXESUF) fp-bench$(EXESUF)
# everything depends on config-host.h because platform.h includes it
all: $(BUILD_DIR)/config-host.h
@@ -590,10 +590,13 @@ $(TF_OBJS_LIB) slowfloat.o: %.o: $(TF_SOURCE_DIR)/%.c
libtestfloat.a: $(TF_OBJS_LIB)
+fp-bench$(EXESUF): fp-bench.o $(QEMU_SOFTFLOAT_OBJ) $(LIBQEMUUTIL)
+
clean:
rm -f *.o *.d $(BINARIES)
rm -f *.gcno *.gcda *.gcov
rm -f fp-test$(EXESUF)
+ rm -f fp-bench$(EXESUF)
rm -f libsoftfloat.a
rm -f libtestfloat.a
diff --git a/tests/fp/fp-bench.c b/tests/fp/fp-bench.c
new file mode 100644
index 0000000000..f5bc5edebf
--- /dev/null
+++ b/tests/fp/fp-bench.c
@@ -0,0 +1,630 @@
+/*
+ * fp-bench.c - A collection of simple floating point microbenchmarks.
+ *
+ * Copyright (C) 2018, Emilio G. Cota <cota@braap.org>
+ *
+ * License: GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef HW_POISON_H
+#error Must define HW_POISON_H to work around TARGET_* poisoning
+#endif
+
+#include "qemu/osdep.h"
+#include <math.h>
+#include <fenv.h>
+#include "qemu/timer.h"
+#include "fpu/softfloat.h"
+
+/* amortize the computation of random inputs */
+#define OPS_PER_ITER 50000
+
+#define MAX_OPERANDS 3
+
+#define SEED_A 0xdeadfacedeadface
+#define SEED_B 0xbadc0feebadc0fee
+#define SEED_C 0xbeefdeadbeefdead
+
+enum op {
+ OP_ADD,
+ OP_SUB,
+ OP_MUL,
+ OP_DIV,
+ OP_FMA,
+ OP_SQRT,
+ OP_CMP,
+ OP_MAX_NR,
+};
+
+static const char * const op_names[] = {
+ [OP_ADD] = "add",
+ [OP_SUB] = "sub",
+ [OP_MUL] = "mul",
+ [OP_DIV] = "div",
+ [OP_FMA] = "mulAdd",
+ [OP_SQRT] = "sqrt",
+ [OP_CMP] = "cmp",
+ [OP_MAX_NR] = NULL,
+};
+
+enum precision {
+ PREC_SINGLE,
+ PREC_DOUBLE,
+ PREC_FLOAT32,
+ PREC_FLOAT64,
+ PREC_MAX_NR,
+};
+
+enum rounding {
+ ROUND_EVEN,
+ ROUND_ZERO,
+ ROUND_DOWN,
+ ROUND_UP,
+ ROUND_TIEAWAY,
+ N_ROUND_MODES,
+};
+
+static const char * const round_names[] = {
+ [ROUND_EVEN] = "even",
+ [ROUND_ZERO] = "zero",
+ [ROUND_DOWN] = "down",
+ [ROUND_UP] = "up",
+ [ROUND_TIEAWAY] = "tieaway",
+};
+
+enum tester {
+ TESTER_SOFT,
+ TESTER_HOST,
+ TESTER_MAX_NR,
+};
+
+static const char * const tester_names[] = {
+ [TESTER_SOFT] = "soft",
+ [TESTER_HOST] = "host",
+ [TESTER_MAX_NR] = NULL,
+};
+
+union fp {
+ float f;
+ double d;
+ float32 f32;
+ float64 f64;
+ uint64_t u64;
+};
+
+struct op_state;
+
+typedef float (*float_func_t)(const struct op_state *s);
+typedef double (*double_func_t)(const struct op_state *s);
+
+union fp_func {
+ float_func_t float_func;
+ double_func_t double_func;
+};
+
+typedef void (*bench_func_t)(void);
+
+struct op_desc {
+ const char * const name;
+};
+
+#define DEFAULT_DURATION_SECS 1
+
+static uint64_t random_ops[MAX_OPERANDS] = {
+ SEED_A, SEED_B, SEED_C,
+};
+static float_status soft_status;
+static enum precision precision;
+static enum op operation;
+static enum tester tester;
+static uint64_t n_completed_ops;
+static unsigned int duration = DEFAULT_DURATION_SECS;
+static int64_t ns_elapsed;
+/* disable optimizations with volatile */
+static volatile union fp res;
+
+/*
+ * From: https://en.wikipedia.org/wiki/Xorshift
+ * This is faster than rand_r(), and gives us a wider range (RAND_MAX is only
+ * guaranteed to be >= INT_MAX).
+ */
+static uint64_t xorshift64star(uint64_t x)
+{
+ x ^= x >> 12; /* a */
+ x ^= x << 25; /* b */
+ x ^= x >> 27; /* c */
+ return x * UINT64_C(2685821657736338717);
+}
+
+static void update_random_ops(int n_ops, enum precision prec)
+{
+ int i;
+
+ for (i = 0; i < n_ops; i++) {
+ uint64_t r = random_ops[i];
+
+ if (prec == PREC_SINGLE || PREC_FLOAT32) {
+ do {
+ r = xorshift64star(r);
+ } while (!float32_is_normal(r));
+ } else if (prec == PREC_DOUBLE || PREC_FLOAT64) {
+ do {
+ r = xorshift64star(r);
+ } while (!float64_is_normal(r));
+ } else {
+ g_assert_not_reached();
+ }
+ random_ops[i] = r;
+ }
+}
+
+static void fill_random(union fp *ops, int n_ops, enum precision prec,
+ bool no_neg)
+{
+ int i;
+
+ for (i = 0; i < n_ops; i++) {
+ switch (prec) {
+ case PREC_SINGLE:
+ case PREC_FLOAT32:
+ ops[i].f32 = make_float32(random_ops[i]);
+ if (no_neg && float32_is_neg(ops[i].f32)) {
+ ops[i].f32 = float32_chs(ops[i].f32);
+ }
+ /* raise the exponent to limit the frequency of denormal results */
+ ops[i].f32 |= 0x40000000;
+ break;
+ case PREC_DOUBLE:
+ case PREC_FLOAT64:
+ ops[i].f64 = make_float64(random_ops[i]);
+ if (no_neg && float64_is_neg(ops[i].f64)) {
+ ops[i].f64 = float64_chs(ops[i].f64);
+ }
+ /* raise the exponent to limit the frequency of denormal results */
+ ops[i].f64 |= LIT64(0x4000000000000000);
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ }
+}
+
+/*
+ * The main benchmark function. Instead of (ab)using macros, we rely
+ * on the compiler to unfold this at compile-time.
+ */
+static void bench(enum precision prec, enum op op, int n_ops, bool no_neg)
+{
+ int64_t tf = get_clock() + duration * 1000000000LL;
+
+ while (get_clock() < tf) {
+ union fp ops[MAX_OPERANDS];
+ int64_t t0;
+ int i;
+
+ update_random_ops(n_ops, prec);
+ switch (prec) {
+ case PREC_SINGLE:
+ fill_random(ops, n_ops, prec, no_neg);
+ t0 = get_clock();
+ for (i = 0; i < OPS_PER_ITER; i++) {
+ float a = ops[0].f;
+ float b = ops[1].f;
+ float c = ops[2].f;
+
+ switch (op) {
+ case OP_ADD:
+ res.f = a + b;
+ break;
+ case OP_SUB:
+ res.f = a - b;
+ break;
+ case OP_MUL:
+ res.f = a * b;
+ break;
+ case OP_DIV:
+ res.f = a / b;
+ break;
+ case OP_FMA:
+ res.f = fmaf(a, b, c);
+ break;
+ case OP_SQRT:
+ res.f = sqrtf(a);
+ break;
+ case OP_CMP:
+ res.u64 = isgreater(a, b);
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ }
+ break;
+ case PREC_DOUBLE:
+ fill_random(ops, n_ops, prec, no_neg);
+ t0 = get_clock();
+ for (i = 0; i < OPS_PER_ITER; i++) {
+ double a = ops[0].d;
+ double b = ops[1].d;
+ double c = ops[2].d;
+
+ switch (op) {
+ case OP_ADD:
+ res.d = a + b;
+ break;
+ case OP_SUB:
+ res.d = a - b;
+ break;
+ case OP_MUL:
+ res.d = a * b;
+ break;
+ case OP_DIV:
+ res.d = a / b;
+ break;
+ case OP_FMA:
+ res.d = fma(a, b, c);
+ break;
+ case OP_SQRT:
+ res.d = sqrt(a);
+ break;
+ case OP_CMP:
+ res.u64 = isgreater(a, b);
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ }
+ break;
+ case PREC_FLOAT32:
+ fill_random(ops, n_ops, prec, no_neg);
+ t0 = get_clock();
+ for (i = 0; i < OPS_PER_ITER; i++) {
+ float32 a = ops[0].f32;
+ float32 b = ops[1].f32;
+ float32 c = ops[2].f32;
+
+ switch (op) {
+ case OP_ADD:
+ res.f32 = float32_add(a, b, &soft_status);
+ break;
+ case OP_SUB:
+ res.f32 = float32_sub(a, b, &soft_status);
+ break;
+ case OP_MUL:
+ res.f = float32_mul(a, b, &soft_status);
+ break;
+ case OP_DIV:
+ res.f32 = float32_div(a, b, &soft_status);
+ break;
+ case OP_FMA:
+ res.f32 = float32_muladd(a, b, c, 0, &soft_status);
+ break;
+ case OP_SQRT:
+ res.f32 = float32_sqrt(a, &soft_status);
+ break;
+ case OP_CMP:
+ res.u64 = float32_compare_quiet(a, b, &soft_status);
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ }
+ break;
+ case PREC_FLOAT64:
+ fill_random(ops, n_ops, prec, no_neg);
+ t0 = get_clock();
+ for (i = 0; i < OPS_PER_ITER; i++) {
+ float64 a = ops[0].f64;
+ float64 b = ops[1].f64;
+ float64 c = ops[2].f64;
+
+ switch (op) {
+ case OP_ADD:
+ res.f64 = float64_add(a, b, &soft_status);
+ break;
+ case OP_SUB:
+ res.f64 = float64_sub(a, b, &soft_status);
+ break;
+ case OP_MUL:
+ res.f = float64_mul(a, b, &soft_status);
+ break;
+ case OP_DIV:
+ res.f64 = float64_div(a, b, &soft_status);
+ break;
+ case OP_FMA:
+ res.f64 = float64_muladd(a, b, c, 0, &soft_status);
+ break;
+ case OP_SQRT:
+ res.f64 = float64_sqrt(a, &soft_status);
+ break;
+ case OP_CMP:
+ res.u64 = float64_compare_quiet(a, b, &soft_status);
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ }
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ ns_elapsed += get_clock() - t0;
+ n_completed_ops += OPS_PER_ITER;
+ }
+}
+
+#define GEN_BENCH(name, type, prec, op, n_ops) \
+ static void __attribute__((flatten)) name(void) \
+ { \
+ bench(prec, op, n_ops, false); \
+ }
+
+#define GEN_BENCH_NO_NEG(name, type, prec, op, n_ops) \
+ static void __attribute__((flatten)) name(void) \
+ { \
+ bench(prec, op, n_ops, true); \
+ }
+
+#define GEN_BENCH_ALL_TYPES(opname, op, n_ops) \
+ GEN_BENCH(bench_ ## opname ## _float, float, PREC_SINGLE, op, n_ops) \
+ GEN_BENCH(bench_ ## opname ## _double, double, PREC_DOUBLE, op, n_ops) \
+ GEN_BENCH(bench_ ## opname ## _float32, float32, PREC_FLOAT32, op, n_ops) \
+ GEN_BENCH(bench_ ## opname ## _float64, float64, PREC_FLOAT64, op, n_ops)
+
+GEN_BENCH_ALL_TYPES(add, OP_ADD, 2)
+GEN_BENCH_ALL_TYPES(sub, OP_SUB, 2)
+GEN_BENCH_ALL_TYPES(mul, OP_MUL, 2)
+GEN_BENCH_ALL_TYPES(div, OP_DIV, 2)
+GEN_BENCH_ALL_TYPES(fma, OP_FMA, 3)
+GEN_BENCH_ALL_TYPES(cmp, OP_CMP, 2)
+#undef GEN_BENCH_ALL_TYPES
+
+#define GEN_BENCH_ALL_TYPES_NO_NEG(name, op, n) \
+ GEN_BENCH_NO_NEG(bench_ ## name ## _float, float, PREC_SINGLE, op, n) \
+ GEN_BENCH_NO_NEG(bench_ ## name ## _double, double, PREC_DOUBLE, op, n) \
+ GEN_BENCH_NO_NEG(bench_ ## name ## _float32, float32, PREC_FLOAT32, op, n) \
+ GEN_BENCH_NO_NEG(bench_ ## name ## _float64, float64, PREC_FLOAT64, op, n)
+
+GEN_BENCH_ALL_TYPES_NO_NEG(sqrt, OP_SQRT, 1)
+#undef GEN_BENCH_ALL_TYPES_NO_NEG
+
+#undef GEN_BENCH_NO_NEG
+#undef GEN_BENCH
+
+#define GEN_BENCH_FUNCS(opname, op) \
+ [op] = { \
+ [PREC_SINGLE] = bench_ ## opname ## _float, \
+ [PREC_DOUBLE] = bench_ ## opname ## _double, \
+ [PREC_FLOAT32] = bench_ ## opname ## _float32, \
+ [PREC_FLOAT64] = bench_ ## opname ## _float64, \
+ }
+
+static const bench_func_t bench_funcs[OP_MAX_NR][PREC_MAX_NR] = {
+ GEN_BENCH_FUNCS(add, OP_ADD),
+ GEN_BENCH_FUNCS(sub, OP_SUB),
+ GEN_BENCH_FUNCS(mul, OP_MUL),
+ GEN_BENCH_FUNCS(div, OP_DIV),
+ GEN_BENCH_FUNCS(fma, OP_FMA),
+ GEN_BENCH_FUNCS(sqrt, OP_SQRT),
+ GEN_BENCH_FUNCS(cmp, OP_CMP),
+};
+
+#undef GEN_BENCH_FUNCS
+
+static void run_bench(void)
+{
+ bench_func_t f;
+
+ f = bench_funcs[operation][precision];
+ g_assert(f);
+ f();
+}
+
+/* @arr must be NULL-terminated */
+static int find_name(const char * const *arr, const char *name)
+{
+ int i;
+
+ for (i = 0; arr[i] != NULL; i++) {
+ if (strcmp(name, arr[i]) == 0) {
+ return i;
+ }
+ }
+ return -1;
+}
+
+static void usage_complete(int argc, char *argv[])
+{
+ gchar *op_list = g_strjoinv(", ", (gchar **)op_names);
+ gchar *tester_list = g_strjoinv(", ", (gchar **)tester_names);
+
+ fprintf(stderr, "Usage: %s [options]\n", argv[0]);
+ fprintf(stderr, "options:\n");
+ fprintf(stderr, " -d = duration, in seconds. Default: %d\n",
+ DEFAULT_DURATION_SECS);
+ fprintf(stderr, " -h = show this help message.\n");
+ fprintf(stderr, " -o = floating point operation (%s). Default: %s\n",
+ op_list, op_names[0]);
+ fprintf(stderr, " -p = floating point precision (single, double). "
+ "Default: single\n");
+ fprintf(stderr, " -r = rounding mode (even, zero, down, up, tieaway). "
+ "Default: even\n");
+ fprintf(stderr, " -t = tester (%s). Default: %s\n",
+ tester_list, tester_names[0]);
+ fprintf(stderr, " -z = flush inputs to zero (soft tester only). "
+ "Default: disabled\n");
+ fprintf(stderr, " -Z = flush output to zero (soft tester only). "
+ "Default: disabled\n");
+
+ g_free(tester_list);
+ g_free(op_list);
+}
+
+static int round_name_to_mode(const char *name)
+{
+ int i;
+
+ for (i = 0; i < N_ROUND_MODES; i++) {
+ if (!strcmp(round_names[i], name)) {
+ return i;
+ }
+ }
+ return -1;
+}
+
+static void QEMU_NORETURN die_host_rounding(enum rounding rounding)
+{
+ fprintf(stderr, "fatal: '%s' rounding not supported on this host\n",
+ round_names[rounding]);
+ exit(EXIT_FAILURE);
+}
+
+static void set_host_precision(enum rounding rounding)
+{
+ int rhost;
+
+ switch (rounding) {
+ case ROUND_EVEN:
+ rhost = FE_TONEAREST;
+ break;
+ case ROUND_ZERO:
+ rhost = FE_TOWARDZERO;
+ break;
+ case ROUND_DOWN:
+ rhost = FE_DOWNWARD;
+ break;
+ case ROUND_UP:
+ rhost = FE_UPWARD;
+ break;
+ case ROUND_TIEAWAY:
+ die_host_rounding(rounding);
+ return;
+ default:
+ g_assert_not_reached();
+ }
+
+ if (fesetround(rhost)) {
+ die_host_rounding(rounding);
+ }
+}
+
+static void set_soft_precision(enum rounding rounding)
+{
+ signed char mode;
+
+ switch (rounding) {
+ case ROUND_EVEN:
+ mode = float_round_nearest_even;
+ break;
+ case ROUND_ZERO:
+ mode = float_round_to_zero;
+ break;
+ case ROUND_DOWN:
+ mode = float_round_down;
+ break;
+ case ROUND_UP:
+ mode = float_round_up;
+ break;
+ case ROUND_TIEAWAY:
+ mode = float_round_ties_away;
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ soft_status.float_rounding_mode = mode;
+}
+
+static void parse_args(int argc, char *argv[])
+{
+ int c;
+ int val;
+ int rounding = ROUND_EVEN;
+
+ for (;;) {
+ c = getopt(argc, argv, "d:ho:p:r:t:zZ");
+ if (c < 0) {
+ break;
+ }
+ switch (c) {
+ case 'd':
+ duration = atoi(optarg);
+ break;
+ case 'h':
+ usage_complete(argc, argv);
+ exit(EXIT_SUCCESS);
+ case 'o':
+ val = find_name(op_names, optarg);
+ if (val < 0) {
+ fprintf(stderr, "Unsupported op '%s'\n", optarg);
+ exit(EXIT_FAILURE);
+ }
+ operation = val;
+ break;
+ case 'p':
+ if (!strcmp(optarg, "single")) {
+ precision = PREC_SINGLE;
+ } else if (!strcmp(optarg, "double")) {
+ precision = PREC_DOUBLE;
+ } else {
+ fprintf(stderr, "Unsupported precision '%s'\n", optarg);
+ exit(EXIT_FAILURE);
+ }
+ break;
+ case 'r':
+ rounding = round_name_to_mode(optarg);
+ if (rounding < 0) {
+ fprintf(stderr, "fatal: invalid rounding mode '%s'\n", optarg);
+ exit(EXIT_FAILURE);
+ }
+ break;
+ case 't':
+ val = find_name(tester_names, optarg);
+ if (val < 0) {
+ fprintf(stderr, "Unsupported tester '%s'\n", optarg);
+ exit(EXIT_FAILURE);
+ }
+ tester = val;
+ break;
+ case 'z':
+ soft_status.flush_inputs_to_zero = 1;
+ break;
+ case 'Z':
+ soft_status.flush_to_zero = 1;
+ break;
+ }
+ }
+
+ /* set precision and rounding mode based on the tester */
+ switch (tester) {
+ case TESTER_HOST:
+ set_host_precision(rounding);
+ break;
+ case TESTER_SOFT:
+ set_soft_precision(rounding);
+ switch (precision) {
+ case PREC_SINGLE:
+ precision = PREC_FLOAT32;
+ break;
+ case PREC_DOUBLE:
+ precision = PREC_FLOAT64;
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ break;
+ default:
+ g_assert_not_reached();
+ }
+}
+
+static void pr_stats(void)
+{
+ printf("%.2f MFlops\n", (double)n_completed_ops / ns_elapsed * 1e3);
+}
+
+int main(int argc, char *argv[])
+{
+ parse_args(argc, argv);
+ run_bench();
+ pr_stats();
+ return 0;
+}
--
2.17.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Qemu-devel] [PULL 09/15] fpu: introduce hardfloat
2018-12-14 13:54 [Qemu-devel] [PULL 00/15] Hardfloat + softfloat maintainers update and gitdm Alex Bennée
` (7 preceding siblings ...)
2018-12-14 13:54 ` [Qemu-devel] [PULL 08/15] tests/fp: add fp-bench Alex Bennée
@ 2018-12-14 13:54 ` Alex Bennée
2018-12-14 13:54 ` [Qemu-devel] [PULL 10/15] hardfloat: implement float32/64 addition and subtraction Alex Bennée
` (6 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Alex Bennée @ 2018-12-14 13:54 UTC (permalink / raw)
To: peter.maydell
Cc: qemu-devel, Emilio G. Cota, Alex Bennée, Aurelien Jarno
From: "Emilio G. Cota" <cota@braap.org>
The appended paves the way for leveraging the host FPU for a subset
of guest FP operations. For most guest workloads (e.g. FP flags
aren't ever cleared, inexact occurs often and rounding is set to the
default [to nearest]) this will yield sizable performance speedups.
The approach followed here avoids checking the FP exception flags register.
See the added comment for details.
This assumes that QEMU is running on an IEEE754-compliant FPU and
that the rounding is set to the default (to nearest). The
implementation-dependent specifics of the FPU should not matter; things
like tininess detection and snan representation are still dealt with in
soft-fp. However, this approach will break on most hosts if we compile
QEMU with flags that break IEEE compatibility. There is no way to detect
all of these flags at compilation time, but at least we check for
-ffast-math (which defines __FAST_MATH__) and disable hardfloat
(plus emit a #warning) when it is set.
This patch just adds common code. Some operations will be migrated
to hardfloat in subsequent patches to ease bisection.
Note: some architectures (at least PPC, there might be others) clear
the status flags passed to softfloat before most FP operations. This
precludes the use of hardfloat, so to avoid introducing a performance
regression for those targets, we add a flag to disable hardfloat.
In the long run though it would be good to fix the targets so that
at least the inexact flag passed to softfloat is indeed sticky.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index ecdc00c633..468c4554d5 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -83,6 +83,7 @@ this code that are retained.
* target-dependent and needs the TARGET_* macros.
*/
#include "qemu/osdep.h"
+#include <math.h>
#include "qemu/bitops.h"
#include "fpu/softfloat.h"
@@ -95,6 +96,324 @@ this code that are retained.
*----------------------------------------------------------------------------*/
#include "fpu/softfloat-macros.h"
+/*
+ * Hardfloat
+ *
+ * Fast emulation of guest FP instructions is challenging for two reasons.
+ * First, FP instruction semantics are similar but not identical, particularly
+ * when handling NaNs. Second, emulating at reasonable speed the guest FP
+ * exception flags is not trivial: reading the host's flags register with a
+ * feclearexcept & fetestexcept pair is slow [slightly slower than soft-fp],
+ * and trapping on every FP exception is not fast nor pleasant to work with.
+ *
+ * We address these challenges by leveraging the host FPU for a subset of the
+ * operations. To do this we expand on the idea presented in this paper:
+ *
+ * Guo, Yu-Chuan, et al. "Translating the ARM Neon and VFP instructions in a
+ * binary translator." Software: Practice and Experience 46.12 (2016):1591-1615.
+ *
+ * The idea is thus to leverage the host FPU to (1) compute FP operations
+ * and (2) identify whether FP exceptions occurred while avoiding
+ * expensive exception flag register accesses.
+ *
+ * An important optimization shown in the paper is that given that exception
+ * flags are rarely cleared by the guest, we can avoid recomputing some flags.
+ * This is particularly useful for the inexact flag, which is very frequently
+ * raised in floating-point workloads.
+ *
+ * We optimize the code further by deferring to soft-fp whenever FP exception
+ * detection might get hairy. Two examples: (1) when at least one operand is
+ * denormal/inf/NaN; (2) when operands are not guaranteed to lead to a 0 result
+ * and the result is < the minimum normal.
+ */
+#define GEN_INPUT_FLUSH__NOCHECK(name, soft_t) \
+ static inline void name(soft_t *a, float_status *s) \
+ { \
+ if (unlikely(soft_t ## _is_denormal(*a))) { \
+ *a = soft_t ## _set_sign(soft_t ## _zero, \
+ soft_t ## _is_neg(*a)); \
+ s->float_exception_flags |= float_flag_input_denormal; \
+ } \
+ }
+
+GEN_INPUT_FLUSH__NOCHECK(float32_input_flush__nocheck, float32)
+GEN_INPUT_FLUSH__NOCHECK(float64_input_flush__nocheck, float64)
+#undef GEN_INPUT_FLUSH__NOCHECK
+
+#define GEN_INPUT_FLUSH1(name, soft_t) \
+ static inline void name(soft_t *a, float_status *s) \
+ { \
+ if (likely(!s->flush_inputs_to_zero)) { \
+ return; \
+ } \
+ soft_t ## _input_flush__nocheck(a, s); \
+ }
+
+GEN_INPUT_FLUSH1(float32_input_flush1, float32)
+GEN_INPUT_FLUSH1(float64_input_flush1, float64)
+#undef GEN_INPUT_FLUSH1
+
+#define GEN_INPUT_FLUSH2(name, soft_t) \
+ static inline void name(soft_t *a, soft_t *b, float_status *s) \
+ { \
+ if (likely(!s->flush_inputs_to_zero)) { \
+ return; \
+ } \
+ soft_t ## _input_flush__nocheck(a, s); \
+ soft_t ## _input_flush__nocheck(b, s); \
+ }
+
+GEN_INPUT_FLUSH2(float32_input_flush2, float32)
+GEN_INPUT_FLUSH2(float64_input_flush2, float64)
+#undef GEN_INPUT_FLUSH2
+
+#define GEN_INPUT_FLUSH3(name, soft_t) \
+ static inline void name(soft_t *a, soft_t *b, soft_t *c, float_status *s) \
+ { \
+ if (likely(!s->flush_inputs_to_zero)) { \
+ return; \
+ } \
+ soft_t ## _input_flush__nocheck(a, s); \
+ soft_t ## _input_flush__nocheck(b, s); \
+ soft_t ## _input_flush__nocheck(c, s); \
+ }
+
+GEN_INPUT_FLUSH3(float32_input_flush3, float32)
+GEN_INPUT_FLUSH3(float64_input_flush3, float64)
+#undef GEN_INPUT_FLUSH3
+
+/*
+ * Choose whether to use fpclassify or float32/64_* primitives in the generated
+ * hardfloat functions. Each combination of number of inputs and float size
+ * gets its own value.
+ */
+#if defined(__x86_64__)
+# define QEMU_HARDFLOAT_1F32_USE_FP 0
+# define QEMU_HARDFLOAT_1F64_USE_FP 1
+# define QEMU_HARDFLOAT_2F32_USE_FP 0
+# define QEMU_HARDFLOAT_2F64_USE_FP 1
+# define QEMU_HARDFLOAT_3F32_USE_FP 0
+# define QEMU_HARDFLOAT_3F64_USE_FP 1
+#else
+# define QEMU_HARDFLOAT_1F32_USE_FP 0
+# define QEMU_HARDFLOAT_1F64_USE_FP 0
+# define QEMU_HARDFLOAT_2F32_USE_FP 0
+# define QEMU_HARDFLOAT_2F64_USE_FP 0
+# define QEMU_HARDFLOAT_3F32_USE_FP 0
+# define QEMU_HARDFLOAT_3F64_USE_FP 0
+#endif
+
+/*
+ * QEMU_HARDFLOAT_USE_ISINF chooses whether to use isinf() over
+ * float{32,64}_is_infinity when !USE_FP.
+ * On x86_64/aarch64, using the former over the latter can yield a ~6% speedup.
+ * On power64 however, using isinf() reduces fp-bench performance by up to 50%.
+ */
+#if defined(__x86_64__) || defined(__aarch64__)
+# define QEMU_HARDFLOAT_USE_ISINF 1
+#else
+# define QEMU_HARDFLOAT_USE_ISINF 0
+#endif
+
+/*
+ * Some targets clear the FP flags before most FP operations. This prevents
+ * the use of hardfloat, since hardfloat relies on the inexact flag being
+ * already set.
+ */
+#if defined(TARGET_PPC) || defined(__FAST_MATH__)
+# if defined(__FAST_MATH__)
+# warning disabling hardfloat due to -ffast-math: hardfloat requires an exact \
+ IEEE implementation
+# endif
+# define QEMU_NO_HARDFLOAT 1
+# define QEMU_SOFTFLOAT_ATTR QEMU_FLATTEN
+#else
+# define QEMU_NO_HARDFLOAT 0
+# define QEMU_SOFTFLOAT_ATTR QEMU_FLATTEN __attribute__((noinline))
+#endif
+
+static inline bool can_use_fpu(const float_status *s)
+{
+ if (QEMU_NO_HARDFLOAT) {
+ return false;
+ }
+ return likely(s->float_exception_flags & float_flag_inexact &&
+ s->float_rounding_mode == float_round_nearest_even);
+}
+
+/*
+ * Hardfloat generation functions. Each operation can have two flavors:
+ * either using softfloat primitives (e.g. float32_is_zero_or_normal) for
+ * most condition checks, or native ones (e.g. fpclassify).
+ *
+ * The flavor is chosen by the callers. Instead of using macros, we rely on the
+ * compiler to propagate constants and inline everything into the callers.
+ *
+ * We only generate functions for operations with two inputs, since only
+ * these are common enough to justify consolidating them into common code.
+ */
+
+typedef union {
+ float32 s;
+ float h;
+} union_float32;
+
+typedef union {
+ float64 s;
+ double h;
+} union_float64;
+
+typedef bool (*f32_check_fn)(union_float32 a, union_float32 b);
+typedef bool (*f64_check_fn)(union_float64 a, union_float64 b);
+
+typedef float32 (*soft_f32_op2_fn)(float32 a, float32 b, float_status *s);
+typedef float64 (*soft_f64_op2_fn)(float64 a, float64 b, float_status *s);
+typedef float (*hard_f32_op2_fn)(float a, float b);
+typedef double (*hard_f64_op2_fn)(double a, double b);
+
+/* 2-input is-zero-or-normal */
+static inline bool f32_is_zon2(union_float32 a, union_float32 b)
+{
+ if (QEMU_HARDFLOAT_2F32_USE_FP) {
+ /*
+ * Not using a temp variable for consecutive fpclassify calls ends up
+ * generating faster code.
+ */
+ return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) == FP_ZERO) &&
+ (fpclassify(b.h) == FP_NORMAL || fpclassify(b.h) == FP_ZERO);
+ }
+ return float32_is_zero_or_normal(a.s) &&
+ float32_is_zero_or_normal(b.s);
+}
+
+static inline bool f64_is_zon2(union_float64 a, union_float64 b)
+{
+ if (QEMU_HARDFLOAT_2F64_USE_FP) {
+ return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) == FP_ZERO) &&
+ (fpclassify(b.h) == FP_NORMAL || fpclassify(b.h) == FP_ZERO);
+ }
+ return float64_is_zero_or_normal(a.s) &&
+ float64_is_zero_or_normal(b.s);
+}
+
+/* 3-input is-zero-or-normal */
+static inline
+bool f32_is_zon3(union_float32 a, union_float32 b, union_float32 c)
+{
+ if (QEMU_HARDFLOAT_3F32_USE_FP) {
+ return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) == FP_ZERO) &&
+ (fpclassify(b.h) == FP_NORMAL || fpclassify(b.h) == FP_ZERO) &&
+ (fpclassify(c.h) == FP_NORMAL || fpclassify(c.h) == FP_ZERO);
+ }
+ return float32_is_zero_or_normal(a.s) &&
+ float32_is_zero_or_normal(b.s) &&
+ float32_is_zero_or_normal(c.s);
+}
+
+static inline
+bool f64_is_zon3(union_float64 a, union_float64 b, union_float64 c)
+{
+ if (QEMU_HARDFLOAT_3F64_USE_FP) {
+ return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) == FP_ZERO) &&
+ (fpclassify(b.h) == FP_NORMAL || fpclassify(b.h) == FP_ZERO) &&
+ (fpclassify(c.h) == FP_NORMAL || fpclassify(c.h) == FP_ZERO);
+ }
+ return float64_is_zero_or_normal(a.s) &&
+ float64_is_zero_or_normal(b.s) &&
+ float64_is_zero_or_normal(c.s);
+}
+
+static inline bool f32_is_inf(union_float32 a)
+{
+ if (QEMU_HARDFLOAT_USE_ISINF) {
+ return isinf(a.h);
+ }
+ return float32_is_infinity(a.s);
+}
+
+static inline bool f64_is_inf(union_float64 a)
+{
+ if (QEMU_HARDFLOAT_USE_ISINF) {
+ return isinf(a.h);
+ }
+ return float64_is_infinity(a.s);
+}
+
+/* Note: @fast_test and @post can be NULL */
+static inline float32
+float32_gen2(float32 xa, float32 xb, float_status *s,
+ hard_f32_op2_fn hard, soft_f32_op2_fn soft,
+ f32_check_fn pre, f32_check_fn post,
+ f32_check_fn fast_test, soft_f32_op2_fn fast_op)
+{
+ union_float32 ua, ub, ur;
+
+ ua.s = xa;
+ ub.s = xb;
+
+ if (unlikely(!can_use_fpu(s))) {
+ goto soft;
+ }
+
+ float32_input_flush2(&ua.s, &ub.s, s);
+ if (unlikely(!pre(ua, ub))) {
+ goto soft;
+ }
+ if (fast_test && fast_test(ua, ub)) {
+ return fast_op(ua.s, ub.s, s);
+ }
+
+ ur.h = hard(ua.h, ub.h);
+ if (unlikely(f32_is_inf(ur))) {
+ s->float_exception_flags |= float_flag_overflow;
+ } else if (unlikely(fabsf(ur.h) <= FLT_MIN)) {
+ if (post == NULL || post(ua, ub)) {
+ goto soft;
+ }
+ }
+ return ur.s;
+
+ soft:
+ return soft(ua.s, ub.s, s);
+}
+
+static inline float64
+float64_gen2(float64 xa, float64 xb, float_status *s,
+ hard_f64_op2_fn hard, soft_f64_op2_fn soft,
+ f64_check_fn pre, f64_check_fn post,
+ f64_check_fn fast_test, soft_f64_op2_fn fast_op)
+{
+ union_float64 ua, ub, ur;
+
+ ua.s = xa;
+ ub.s = xb;
+
+ if (unlikely(!can_use_fpu(s))) {
+ goto soft;
+ }
+
+ float64_input_flush2(&ua.s, &ub.s, s);
+ if (unlikely(!pre(ua, ub))) {
+ goto soft;
+ }
+ if (fast_test && fast_test(ua, ub)) {
+ return fast_op(ua.s, ub.s, s);
+ }
+
+ ur.h = hard(ua.h, ub.h);
+ if (unlikely(f64_is_inf(ur))) {
+ s->float_exception_flags |= float_flag_overflow;
+ } else if (unlikely(fabs(ur.h) <= DBL_MIN)) {
+ if (post == NULL || post(ua, ub)) {
+ goto soft;
+ }
+ }
+ return ur.s;
+
+ soft:
+ return soft(ua.s, ub.s, s);
+}
+
/*----------------------------------------------------------------------------
| Returns the fraction bits of the half-precision floating-point value `a'.
*----------------------------------------------------------------------------*/
--
2.17.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Qemu-devel] [PULL 10/15] hardfloat: implement float32/64 addition and subtraction
2018-12-14 13:54 [Qemu-devel] [PULL 00/15] Hardfloat + softfloat maintainers update and gitdm Alex Bennée
` (8 preceding siblings ...)
2018-12-14 13:54 ` [Qemu-devel] [PULL 09/15] fpu: introduce hardfloat Alex Bennée
@ 2018-12-14 13:54 ` Alex Bennée
2018-12-14 13:54 ` [Qemu-devel] [PULL 11/15] hardfloat: implement float32/64 multiplication Alex Bennée
` (5 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Alex Bennée @ 2018-12-14 13:54 UTC (permalink / raw)
To: peter.maydell
Cc: qemu-devel, Emilio G. Cota, Alex Bennée, Aurelien Jarno
From: "Emilio G. Cota" <cota@braap.org>
Performance results (single and double precision) for fp-bench:
1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
add-single: 135.07 MFlops
add-double: 131.60 MFlops
sub-single: 130.04 MFlops
sub-double: 133.01 MFlops
- after:
add-single: 443.04 MFlops
add-double: 301.95 MFlops
sub-single: 411.36 MFlops
sub-double: 293.15 MFlops
2. ARM Aarch64 A57 @ 2.4GHz
- before:
add-single: 44.79 MFlops
add-double: 49.20 MFlops
sub-single: 44.55 MFlops
sub-double: 49.06 MFlops
- after:
add-single: 93.28 MFlops
add-double: 88.27 MFlops
sub-single: 91.47 MFlops
sub-double: 88.27 MFlops
3. IBM POWER8E @ 2.1 GHz
- before:
add-single: 72.59 MFlops
add-double: 72.27 MFlops
sub-single: 75.33 MFlops
sub-double: 70.54 MFlops
- after:
add-single: 112.95 MFlops
add-double: 201.11 MFlops
sub-single: 116.80 MFlops
sub-double: 188.72 MFlops
Note that the IBM and ARM machines benefit from having
HARDFLOAT_2F{32,64}_USE_FP set to 0. Otherwise their performance
can suffer significantly:
- IBM Power8:
add-single: [1] 54.94 vs [0] 116.37 MFlops
add-double: [1] 58.92 vs [0] 201.44 MFlops
- Aarch64 A57:
add-single: [1] 80.72 vs [0] 93.24 MFlops
add-double: [1] 82.10 vs [0] 88.18 MFlops
On the Intel machine, having 2F64 set to 1 pays off, but it
doesn't for 2F32:
- Intel i7-6700K:
add-single: [1] 285.79 vs [0] 426.70 MFlops
add-double: [1] 302.15 vs [0] 278.82 MFlops
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 468c4554d5..08afa861a7 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1054,49 +1054,128 @@ float16 QEMU_FLATTEN float16_add(float16 a, float16 b, float_status *status)
return float16_round_pack_canonical(pr, status);
}
-float32 QEMU_FLATTEN float32_add(float32 a, float32 b, float_status *status)
+float16 QEMU_FLATTEN float16_sub(float16 a, float16 b, float_status *status)
+{
+ FloatParts pa = float16_unpack_canonical(a, status);
+ FloatParts pb = float16_unpack_canonical(b, status);
+ FloatParts pr = addsub_floats(pa, pb, true, status);
+
+ return float16_round_pack_canonical(pr, status);
+}
+
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_f32_addsub(float32 a, float32 b, bool subtract, float_status *status)
{
FloatParts pa = float32_unpack_canonical(a, status);
FloatParts pb = float32_unpack_canonical(b, status);
- FloatParts pr = addsub_floats(pa, pb, false, status);
+ FloatParts pr = addsub_floats(pa, pb, subtract, status);
return float32_round_pack_canonical(pr, status);
}
-float64 QEMU_FLATTEN float64_add(float64 a, float64 b, float_status *status)
+static inline float32 soft_f32_add(float32 a, float32 b, float_status *status)
+{
+ return soft_f32_addsub(a, b, false, status);
+}
+
+static inline float32 soft_f32_sub(float32 a, float32 b, float_status *status)
+{
+ return soft_f32_addsub(a, b, true, status);
+}
+
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_f64_addsub(float64 a, float64 b, bool subtract, float_status *status)
{
FloatParts pa = float64_unpack_canonical(a, status);
FloatParts pb = float64_unpack_canonical(b, status);
- FloatParts pr = addsub_floats(pa, pb, false, status);
+ FloatParts pr = addsub_floats(pa, pb, subtract, status);
return float64_round_pack_canonical(pr, status);
}
-float16 QEMU_FLATTEN float16_sub(float16 a, float16 b, float_status *status)
+static inline float64 soft_f64_add(float64 a, float64 b, float_status *status)
{
- FloatParts pa = float16_unpack_canonical(a, status);
- FloatParts pb = float16_unpack_canonical(b, status);
- FloatParts pr = addsub_floats(pa, pb, true, status);
+ return soft_f64_addsub(a, b, false, status);
+}
- return float16_round_pack_canonical(pr, status);
+static inline float64 soft_f64_sub(float64 a, float64 b, float_status *status)
+{
+ return soft_f64_addsub(a, b, true, status);
}
-float32 QEMU_FLATTEN float32_sub(float32 a, float32 b, float_status *status)
+static float hard_f32_add(float a, float b)
{
- FloatParts pa = float32_unpack_canonical(a, status);
- FloatParts pb = float32_unpack_canonical(b, status);
- FloatParts pr = addsub_floats(pa, pb, true, status);
+ return a + b;
+}
- return float32_round_pack_canonical(pr, status);
+static float hard_f32_sub(float a, float b)
+{
+ return a - b;
}
-float64 QEMU_FLATTEN float64_sub(float64 a, float64 b, float_status *status)
+static double hard_f64_add(double a, double b)
{
- FloatParts pa = float64_unpack_canonical(a, status);
- FloatParts pb = float64_unpack_canonical(b, status);
- FloatParts pr = addsub_floats(pa, pb, true, status);
+ return a + b;
+}
- return float64_round_pack_canonical(pr, status);
+static double hard_f64_sub(double a, double b)
+{
+ return a - b;
+}
+
+static bool f32_addsub_post(union_float32 a, union_float32 b)
+{
+ if (QEMU_HARDFLOAT_2F32_USE_FP) {
+ return !(fpclassify(a.h) == FP_ZERO && fpclassify(b.h) == FP_ZERO);
+ }
+ return !(float32_is_zero(a.s) && float32_is_zero(b.s));
+}
+
+static bool f64_addsub_post(union_float64 a, union_float64 b)
+{
+ if (QEMU_HARDFLOAT_2F64_USE_FP) {
+ return !(fpclassify(a.h) == FP_ZERO && fpclassify(b.h) == FP_ZERO);
+ } else {
+ return !(float64_is_zero(a.s) && float64_is_zero(b.s));
+ }
+}
+
+static float32 float32_addsub(float32 a, float32 b, float_status *s,
+ hard_f32_op2_fn hard, soft_f32_op2_fn soft)
+{
+ return float32_gen2(a, b, s, hard, soft,
+ f32_is_zon2, f32_addsub_post, NULL, NULL);
+}
+
+static float64 float64_addsub(float64 a, float64 b, float_status *s,
+ hard_f64_op2_fn hard, soft_f64_op2_fn soft)
+{
+ return float64_gen2(a, b, s, hard, soft,
+ f64_is_zon2, f64_addsub_post, NULL, NULL);
+}
+
+float32 QEMU_FLATTEN
+float32_add(float32 a, float32 b, float_status *s)
+{
+ return float32_addsub(a, b, s, hard_f32_add, soft_f32_add);
+}
+
+float32 QEMU_FLATTEN
+float32_sub(float32 a, float32 b, float_status *s)
+{
+ return float32_addsub(a, b, s, hard_f32_sub, soft_f32_sub);
+}
+
+float64 QEMU_FLATTEN
+float64_add(float64 a, float64 b, float_status *s)
+{
+ return float64_addsub(a, b, s, hard_f64_add, soft_f64_add);
+}
+
+float64 QEMU_FLATTEN
+float64_sub(float64 a, float64 b, float_status *s)
+{
+ return float64_addsub(a, b, s, hard_f64_sub, soft_f64_sub);
}
/*
--
2.17.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Qemu-devel] [PULL 11/15] hardfloat: implement float32/64 multiplication
2018-12-14 13:54 [Qemu-devel] [PULL 00/15] Hardfloat + softfloat maintainers update and gitdm Alex Bennée
` (9 preceding siblings ...)
2018-12-14 13:54 ` [Qemu-devel] [PULL 10/15] hardfloat: implement float32/64 addition and subtraction Alex Bennée
@ 2018-12-14 13:54 ` Alex Bennée
2018-12-14 13:54 ` [Qemu-devel] [PULL 12/15] hardfloat: implement float32/64 division Alex Bennée
` (4 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Alex Bennée @ 2018-12-14 13:54 UTC (permalink / raw)
To: peter.maydell
Cc: qemu-devel, Emilio G. Cota, Alex Bennée, Aurelien Jarno
From: "Emilio G. Cota" <cota@braap.org>
Performance results for fp-bench:
1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
mul-single: 126.91 MFlops
mul-double: 118.28 MFlops
- after:
mul-single: 258.02 MFlops
mul-double: 197.96 MFlops
2. ARM Aarch64 A57 @ 2.4GHz
- before:
mul-single: 37.42 MFlops
mul-double: 38.77 MFlops
- after:
mul-single: 73.41 MFlops
mul-double: 76.93 MFlops
3. IBM POWER8E @ 2.1 GHz
- before:
mul-single: 58.40 MFlops
mul-double: 59.33 MFlops
- after:
mul-single: 60.25 MFlops
mul-double: 94.79 MFlops
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 08afa861a7..dad2ce2d9f 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1236,7 +1236,8 @@ float16 QEMU_FLATTEN float16_mul(float16 a, float16 b, float_status *status)
return float16_round_pack_canonical(pr, status);
}
-float32 QEMU_FLATTEN float32_mul(float32 a, float32 b, float_status *status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_f32_mul(float32 a, float32 b, float_status *status)
{
FloatParts pa = float32_unpack_canonical(a, status);
FloatParts pb = float32_unpack_canonical(b, status);
@@ -1245,7 +1246,8 @@ float32 QEMU_FLATTEN float32_mul(float32 a, float32 b, float_status *status)
return float32_round_pack_canonical(pr, status);
}
-float64 QEMU_FLATTEN float64_mul(float64 a, float64 b, float_status *status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_f64_mul(float64 a, float64 b, float_status *status)
{
FloatParts pa = float64_unpack_canonical(a, status);
FloatParts pb = float64_unpack_canonical(b, status);
@@ -1254,6 +1256,54 @@ float64 QEMU_FLATTEN float64_mul(float64 a, float64 b, float_status *status)
return float64_round_pack_canonical(pr, status);
}
+static float hard_f32_mul(float a, float b)
+{
+ return a * b;
+}
+
+static double hard_f64_mul(double a, double b)
+{
+ return a * b;
+}
+
+static bool f32_mul_fast_test(union_float32 a, union_float32 b)
+{
+ return float32_is_zero(a.s) || float32_is_zero(b.s);
+}
+
+static bool f64_mul_fast_test(union_float64 a, union_float64 b)
+{
+ return float64_is_zero(a.s) || float64_is_zero(b.s);
+}
+
+static float32 f32_mul_fast_op(float32 a, float32 b, float_status *s)
+{
+ bool signbit = float32_is_neg(a) ^ float32_is_neg(b);
+
+ return float32_set_sign(float32_zero, signbit);
+}
+
+static float64 f64_mul_fast_op(float64 a, float64 b, float_status *s)
+{
+ bool signbit = float64_is_neg(a) ^ float64_is_neg(b);
+
+ return float64_set_sign(float64_zero, signbit);
+}
+
+float32 QEMU_FLATTEN
+float32_mul(float32 a, float32 b, float_status *s)
+{
+ return float32_gen2(a, b, s, hard_f32_mul, soft_f32_mul,
+ f32_is_zon2, NULL, f32_mul_fast_test, f32_mul_fast_op);
+}
+
+float64 QEMU_FLATTEN
+float64_mul(float64 a, float64 b, float_status *s)
+{
+ return float64_gen2(a, b, s, hard_f64_mul, soft_f64_mul,
+ f64_is_zon2, NULL, f64_mul_fast_test, f64_mul_fast_op);
+}
+
/*
* Returns the result of multiplying the floating-point values `a' and
* `b' then adding 'c', with no intermediate rounding step after the
--
2.17.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Qemu-devel] [PULL 12/15] hardfloat: implement float32/64 division
2018-12-14 13:54 [Qemu-devel] [PULL 00/15] Hardfloat + softfloat maintainers update and gitdm Alex Bennée
` (10 preceding siblings ...)
2018-12-14 13:54 ` [Qemu-devel] [PULL 11/15] hardfloat: implement float32/64 multiplication Alex Bennée
@ 2018-12-14 13:54 ` Alex Bennée
2018-12-14 13:54 ` [Qemu-devel] [PULL 13/15] hardfloat: implement float32/64 fused multiply-add Alex Bennée
` (3 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Alex Bennée @ 2018-12-14 13:54 UTC (permalink / raw)
To: peter.maydell
Cc: qemu-devel, Emilio G. Cota, Alex Bennée, Aurelien Jarno
From: "Emilio G. Cota" <cota@braap.org>
Performance results for fp-bench:
1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
div-single: 34.84 MFlops
div-double: 34.04 MFlops
- after:
div-single: 275.23 MFlops
div-double: 216.38 MFlops
2. ARM Aarch64 A57 @ 2.4GHz
- before:
div-single: 9.33 MFlops
div-double: 9.30 MFlops
- after:
div-single: 51.55 MFlops
div-double: 15.09 MFlops
3. IBM POWER8E @ 2.1 GHz
- before:
div-single: 25.65 MFlops
div-double: 24.91 MFlops
- after:
div-single: 96.83 MFlops
div-double: 31.01 MFlops
Here setting 2FP64_USE_FP to 1 pays off for x86_64:
[1] 215.97 vs [0] 62.15 MFlops
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index dad2ce2d9f..82294458fe 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1628,7 +1628,8 @@ float16 float16_div(float16 a, float16 b, float_status *status)
return float16_round_pack_canonical(pr, status);
}
-float32 float32_div(float32 a, float32 b, float_status *status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_f32_div(float32 a, float32 b, float_status *status)
{
FloatParts pa = float32_unpack_canonical(a, status);
FloatParts pb = float32_unpack_canonical(b, status);
@@ -1637,7 +1638,8 @@ float32 float32_div(float32 a, float32 b, float_status *status)
return float32_round_pack_canonical(pr, status);
}
-float64 float64_div(float64 a, float64 b, float_status *status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_f64_div(float64 a, float64 b, float_status *status)
{
FloatParts pa = float64_unpack_canonical(a, status);
FloatParts pb = float64_unpack_canonical(b, status);
@@ -1646,6 +1648,64 @@ float64 float64_div(float64 a, float64 b, float_status *status)
return float64_round_pack_canonical(pr, status);
}
+static float hard_f32_div(float a, float b)
+{
+ return a / b;
+}
+
+static double hard_f64_div(double a, double b)
+{
+ return a / b;
+}
+
+static bool f32_div_pre(union_float32 a, union_float32 b)
+{
+ if (QEMU_HARDFLOAT_2F32_USE_FP) {
+ return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) == FP_ZERO) &&
+ fpclassify(b.h) == FP_NORMAL;
+ }
+ return float32_is_zero_or_normal(a.s) && float32_is_normal(b.s);
+}
+
+static bool f64_div_pre(union_float64 a, union_float64 b)
+{
+ if (QEMU_HARDFLOAT_2F64_USE_FP) {
+ return (fpclassify(a.h) == FP_NORMAL || fpclassify(a.h) == FP_ZERO) &&
+ fpclassify(b.h) == FP_NORMAL;
+ }
+ return float64_is_zero_or_normal(a.s) && float64_is_normal(b.s);
+}
+
+static bool f32_div_post(union_float32 a, union_float32 b)
+{
+ if (QEMU_HARDFLOAT_2F32_USE_FP) {
+ return fpclassify(a.h) != FP_ZERO;
+ }
+ return !float32_is_zero(a.s);
+}
+
+static bool f64_div_post(union_float64 a, union_float64 b)
+{
+ if (QEMU_HARDFLOAT_2F64_USE_FP) {
+ return fpclassify(a.h) != FP_ZERO;
+ }
+ return !float64_is_zero(a.s);
+}
+
+float32 QEMU_FLATTEN
+float32_div(float32 a, float32 b, float_status *s)
+{
+ return float32_gen2(a, b, s, hard_f32_div, soft_f32_div,
+ f32_div_pre, f32_div_post, NULL, NULL);
+}
+
+float64 QEMU_FLATTEN
+float64_div(float64 a, float64 b, float_status *s)
+{
+ return float64_gen2(a, b, s, hard_f64_div, soft_f64_div,
+ f64_div_pre, f64_div_post, NULL, NULL);
+}
+
/*
* Float to Float conversions
*
--
2.17.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Qemu-devel] [PULL 13/15] hardfloat: implement float32/64 fused multiply-add
2018-12-14 13:54 [Qemu-devel] [PULL 00/15] Hardfloat + softfloat maintainers update and gitdm Alex Bennée
` (11 preceding siblings ...)
2018-12-14 13:54 ` [Qemu-devel] [PULL 12/15] hardfloat: implement float32/64 division Alex Bennée
@ 2018-12-14 13:54 ` Alex Bennée
2018-12-14 13:54 ` [Qemu-devel] [PULL 14/15] hardfloat: implement float32/64 square root Alex Bennée
` (2 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Alex Bennée @ 2018-12-14 13:54 UTC (permalink / raw)
To: peter.maydell
Cc: qemu-devel, Emilio G. Cota, Alex Bennée, Aurelien Jarno
From: "Emilio G. Cota" <cota@braap.org>
Performance results for fp-bench:
1. Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
fma-single: 74.73 MFlops
fma-double: 74.54 MFlops
- after:
fma-single: 203.37 MFlops
fma-double: 169.37 MFlops
2. ARM Aarch64 A57 @ 2.4GHz
- before:
fma-single: 23.24 MFlops
fma-double: 23.70 MFlops
- after:
fma-single: 66.14 MFlops
fma-double: 63.10 MFlops
3. IBM POWER8E @ 2.1 GHz
- before:
fma-single: 37.26 MFlops
fma-double: 37.29 MFlops
- after:
fma-single: 48.90 MFlops
fma-double: 59.51 MFlops
Here having 3FP64 set to 1 pays off for x86_64:
[1] 170.15 vs [0] 153.12 MFlops
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 82294458fe..7554d63495 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1518,8 +1518,9 @@ float16 QEMU_FLATTEN float16_muladd(float16 a, float16 b, float16 c,
return float16_round_pack_canonical(pr, status);
}
-float32 QEMU_FLATTEN float32_muladd(float32 a, float32 b, float32 c,
- int flags, float_status *status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_f32_muladd(float32 a, float32 b, float32 c, int flags,
+ float_status *status)
{
FloatParts pa = float32_unpack_canonical(a, status);
FloatParts pb = float32_unpack_canonical(b, status);
@@ -1529,8 +1530,9 @@ float32 QEMU_FLATTEN float32_muladd(float32 a, float32 b, float32 c,
return float32_round_pack_canonical(pr, status);
}
-float64 QEMU_FLATTEN float64_muladd(float64 a, float64 b, float64 c,
- int flags, float_status *status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_f64_muladd(float64 a, float64 b, float64 c, int flags,
+ float_status *status)
{
FloatParts pa = float64_unpack_canonical(a, status);
FloatParts pb = float64_unpack_canonical(b, status);
@@ -1540,6 +1542,128 @@ float64 QEMU_FLATTEN float64_muladd(float64 a, float64 b, float64 c,
return float64_round_pack_canonical(pr, status);
}
+float32 QEMU_FLATTEN
+float32_muladd(float32 xa, float32 xb, float32 xc, int flags, float_status *s)
+{
+ union_float32 ua, ub, uc, ur;
+
+ ua.s = xa;
+ ub.s = xb;
+ uc.s = xc;
+
+ if (unlikely(!can_use_fpu(s))) {
+ goto soft;
+ }
+ if (unlikely(flags & float_muladd_halve_result)) {
+ goto soft;
+ }
+
+ float32_input_flush3(&ua.s, &ub.s, &uc.s, s);
+ if (unlikely(!f32_is_zon3(ua, ub, uc))) {
+ goto soft;
+ }
+ /*
+ * When (a || b) == 0, there's no need to check for under/over flow,
+ * since we know the addend is (normal || 0) and the product is 0.
+ */
+ if (float32_is_zero(ua.s) || float32_is_zero(ub.s)) {
+ union_float32 up;
+ bool prod_sign;
+
+ prod_sign = float32_is_neg(ua.s) ^ float32_is_neg(ub.s);
+ prod_sign ^= !!(flags & float_muladd_negate_product);
+ up.s = float32_set_sign(float32_zero, prod_sign);
+
+ if (flags & float_muladd_negate_c) {
+ uc.h = -uc.h;
+ }
+ ur.h = up.h + uc.h;
+ } else {
+ if (flags & float_muladd_negate_product) {
+ ua.h = -ua.h;
+ }
+ if (flags & float_muladd_negate_c) {
+ uc.h = -uc.h;
+ }
+
+ ur.h = fmaf(ua.h, ub.h, uc.h);
+
+ if (unlikely(f32_is_inf(ur))) {
+ s->float_exception_flags |= float_flag_overflow;
+ } else if (unlikely(fabsf(ur.h) <= FLT_MIN)) {
+ goto soft;
+ }
+ }
+ if (flags & float_muladd_negate_result) {
+ return float32_chs(ur.s);
+ }
+ return ur.s;
+
+ soft:
+ return soft_f32_muladd(ua.s, ub.s, uc.s, flags, s);
+}
+
+float64 QEMU_FLATTEN
+float64_muladd(float64 xa, float64 xb, float64 xc, int flags, float_status *s)
+{
+ union_float64 ua, ub, uc, ur;
+
+ ua.s = xa;
+ ub.s = xb;
+ uc.s = xc;
+
+ if (unlikely(!can_use_fpu(s))) {
+ goto soft;
+ }
+ if (unlikely(flags & float_muladd_halve_result)) {
+ goto soft;
+ }
+
+ float64_input_flush3(&ua.s, &ub.s, &uc.s, s);
+ if (unlikely(!f64_is_zon3(ua, ub, uc))) {
+ goto soft;
+ }
+ /*
+ * When (a || b) == 0, there's no need to check for under/over flow,
+ * since we know the addend is (normal || 0) and the product is 0.
+ */
+ if (float64_is_zero(ua.s) || float64_is_zero(ub.s)) {
+ union_float64 up;
+ bool prod_sign;
+
+ prod_sign = float64_is_neg(ua.s) ^ float64_is_neg(ub.s);
+ prod_sign ^= !!(flags & float_muladd_negate_product);
+ up.s = float64_set_sign(float64_zero, prod_sign);
+
+ if (flags & float_muladd_negate_c) {
+ uc.h = -uc.h;
+ }
+ ur.h = up.h + uc.h;
+ } else {
+ if (flags & float_muladd_negate_product) {
+ ua.h = -ua.h;
+ }
+ if (flags & float_muladd_negate_c) {
+ uc.h = -uc.h;
+ }
+
+ ur.h = fma(ua.h, ub.h, uc.h);
+
+ if (unlikely(f64_is_inf(ur))) {
+ s->float_exception_flags |= float_flag_overflow;
+ } else if (unlikely(fabs(ur.h) <= FLT_MIN)) {
+ goto soft;
+ }
+ }
+ if (flags & float_muladd_negate_result) {
+ return float64_chs(ur.s);
+ }
+ return ur.s;
+
+ soft:
+ return soft_f64_muladd(ua.s, ub.s, uc.s, flags, s);
+}
+
/*
* Returns the result of dividing the floating-point value `a' by the
* corresponding value `b'. The operation is performed according to
--
2.17.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Qemu-devel] [PULL 14/15] hardfloat: implement float32/64 square root
2018-12-14 13:54 [Qemu-devel] [PULL 00/15] Hardfloat + softfloat maintainers update and gitdm Alex Bennée
` (12 preceding siblings ...)
2018-12-14 13:54 ` [Qemu-devel] [PULL 13/15] hardfloat: implement float32/64 fused multiply-add Alex Bennée
@ 2018-12-14 13:54 ` Alex Bennée
2018-12-14 13:54 ` [Qemu-devel] [PULL 15/15] hardfloat: implement float32/64 comparison Alex Bennée
2018-12-16 21:51 ` [Qemu-devel] [PULL 00/15] Hardfloat + softfloat maintainers update and gitdm Peter Maydell
15 siblings, 0 replies; 19+ messages in thread
From: Alex Bennée @ 2018-12-14 13:54 UTC (permalink / raw)
To: peter.maydell
Cc: qemu-devel, Emilio G. Cota, Alex Bennée, Aurelien Jarno
From: "Emilio G. Cota" <cota@braap.org>
Performance results for fp-bench:
Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
sqrt-single: 42.30 MFlops
sqrt-double: 22.97 MFlops
- after:
sqrt-single: 311.42 MFlops
sqrt-double: 311.08 MFlops
Here USE_FP makes a huge difference for f64's, with throughput
going from ~200 MFlops to ~300 MFlops.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 7554d63495..fbd66fd8dc 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -3044,20 +3044,76 @@ float16 QEMU_FLATTEN float16_sqrt(float16 a, float_status *status)
return float16_round_pack_canonical(pr, status);
}
-float32 QEMU_FLATTEN float32_sqrt(float32 a, float_status *status)
+static float32 QEMU_SOFTFLOAT_ATTR
+soft_f32_sqrt(float32 a, float_status *status)
{
FloatParts pa = float32_unpack_canonical(a, status);
FloatParts pr = sqrt_float(pa, status, &float32_params);
return float32_round_pack_canonical(pr, status);
}
-float64 QEMU_FLATTEN float64_sqrt(float64 a, float_status *status)
+static float64 QEMU_SOFTFLOAT_ATTR
+soft_f64_sqrt(float64 a, float_status *status)
{
FloatParts pa = float64_unpack_canonical(a, status);
FloatParts pr = sqrt_float(pa, status, &float64_params);
return float64_round_pack_canonical(pr, status);
}
+float32 QEMU_FLATTEN float32_sqrt(float32 xa, float_status *s)
+{
+ union_float32 ua, ur;
+
+ ua.s = xa;
+ if (unlikely(!can_use_fpu(s))) {
+ goto soft;
+ }
+
+ float32_input_flush1(&ua.s, s);
+ if (QEMU_HARDFLOAT_1F32_USE_FP) {
+ if (unlikely(!(fpclassify(ua.h) == FP_NORMAL ||
+ fpclassify(ua.h) == FP_ZERO) ||
+ signbit(ua.h))) {
+ goto soft;
+ }
+ } else if (unlikely(!float32_is_zero_or_normal(ua.s) ||
+ float32_is_neg(ua.s))) {
+ goto soft;
+ }
+ ur.h = sqrtf(ua.h);
+ return ur.s;
+
+ soft:
+ return soft_f32_sqrt(ua.s, s);
+}
+
+float64 QEMU_FLATTEN float64_sqrt(float64 xa, float_status *s)
+{
+ union_float64 ua, ur;
+
+ ua.s = xa;
+ if (unlikely(!can_use_fpu(s))) {
+ goto soft;
+ }
+
+ float64_input_flush1(&ua.s, s);
+ if (QEMU_HARDFLOAT_1F64_USE_FP) {
+ if (unlikely(!(fpclassify(ua.h) == FP_NORMAL ||
+ fpclassify(ua.h) == FP_ZERO) ||
+ signbit(ua.h))) {
+ goto soft;
+ }
+ } else if (unlikely(!float64_is_zero_or_normal(ua.s) ||
+ float64_is_neg(ua.s))) {
+ goto soft;
+ }
+ ur.h = sqrt(ua.h);
+ return ur.s;
+
+ soft:
+ return soft_f64_sqrt(ua.s, s);
+}
+
/*----------------------------------------------------------------------------
| The pattern for a default generated NaN.
*----------------------------------------------------------------------------*/
--
2.17.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Qemu-devel] [PULL 15/15] hardfloat: implement float32/64 comparison
2018-12-14 13:54 [Qemu-devel] [PULL 00/15] Hardfloat + softfloat maintainers update and gitdm Alex Bennée
` (13 preceding siblings ...)
2018-12-14 13:54 ` [Qemu-devel] [PULL 14/15] hardfloat: implement float32/64 square root Alex Bennée
@ 2018-12-14 13:54 ` Alex Bennée
2018-12-16 21:51 ` [Qemu-devel] [PULL 00/15] Hardfloat + softfloat maintainers update and gitdm Peter Maydell
15 siblings, 0 replies; 19+ messages in thread
From: Alex Bennée @ 2018-12-14 13:54 UTC (permalink / raw)
To: peter.maydell
Cc: qemu-devel, Emilio G. Cota, Alex Bennée, Aurelien Jarno
From: "Emilio G. Cota" <cota@braap.org>
Performance results for fp-bench:
Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
cmp-single: 110.98 MFlops
cmp-double: 107.12 MFlops
- after:
cmp-single: 506.28 MFlops
cmp-double: 524.77 MFlops
Note that flattening both eq and eq_signaling versions
would give us extra performance (695v506, 615v524 Mflops
for single/double, respectively) but this would emit two
essentially identical functions for each eq/signaling pair,
which is a waste.
Aggregate performance improvement for the last few patches:
[ all charts in png: https://imgur.com/a/4yV8p ]
1. Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
qemu-aarch64 NBench score; higher is better
Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
16 +-+-----------+-------------+----===-------+---===-------+-----------+-+
14 +-+..........................@@@&&.=.......@@@&&.=...................+-+
12 +-+..........................@.@.&.=.......@.@.&.=.....+befor=== +-+
10 +-+..........................@.@.&.=.......@.@.&.=.....+ad@@&& = +-+
8 +-+.......................$$$%.@.&.=.......@.@.&.=.....+ @@u& = +-+
6 +-+............@@@&&=+***##.$%.@.&.=***##$$%+@.&.=..###$$%%@i& = +-+
4 +-+.......###$%%.@.&=.*.*.#.$%.@.&.=*.*.#.$%.@.&.=+**.#+$ +@m& = +-+
2 +-+.....***.#$.%.@.&=.*.*.#.$%.@.&.=*.*.#.$%.@.&.=.**.#+$+sqr& = +-+
0 +-+-----***##$%%@@&&=-***##$$%@@&&==***##$$%@@&&==-**##$$%+cmp==-----+-+
FOURIER NEURAL NELU DECOMPOSITION gmean
qemu-aarch64 SPEC06fp (test set) speedup over QEMU 4c2c1015905
Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
error bars: 95% confidence interval
4.5 +-+---+-----+----+-----+-----+-&---+-----+----+-----+-----+-----+----+-----+-----+-----+-----+----+-----+---+-+
4 +-+..........................+@@+...........................................................................+-+
3.5 +-+..............%%@&.........@@..............%%@&............................................+++dsub +-+
2.5 +-+....&&+.......%%@&.......+%%@..+%%&+..@@&+.%%@&....................................+%%&+.+%@&++%%@& +-+
2 +-+..+%%&..+%@&+.%%@&...+++..%%@...%%&.+$$@&..%%@&..%%@&.......+%%&+.%%@&+......+%%@&.+%%&++$$@&++d%@& %%@&+-+
1.5 +-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**#%@&**$%@&*#$%@**#$%&**#$@&*+f%@&**$%@&+-+
0.5 +-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**#%@&**$%@&*#$%@**#$%&**#$@&+sqr@&**$%@&+-+
0 +-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**#%@&**$%@&*#$%@**#$%&**#$@&*+cmp&**$%@&+-+
410.bw416.gam433.434.z435.436.cac437.lesli444.447.de450.so453454.ca459.GemsF465.tont470.lb4482.sphinxgeomean
2. Host: ARM Aarch64 A57 @ 2.4GHz
qemu-aarch64 NBench score; higher is better
Host: Applied Micro X-Gene, Aarch64 A57 @ 2.4 GHz
5 +-+-----------+-------------+-------------+-------------+-----------+-+
4.5 +-+........................................@@@&==...................+-+
3 4 +-+..........................@@@&==........@.@&.=.....+before +-+
3 +-+..........................@.@&.=........@.@&.=.....+ad@@@&== +-+
2.5 +-+.....................##$$%%.@&.=........@.@&.=.....+ @m@& = +-+
2 +-+............@@@&==.***#.$.%.@&.=.***#$$%%.@&.=.***#$$%%d@& = +-+
1.5 +-+.....***#$$%%.@&.=.*.*#.$.%.@&.=.*.*#.$.%.@&.=.*.*#+$ +f@& = +-+
0.5 +-+.....*.*#.$.%.@&.=.*.*#.$.%.@&.=.*.*#.$.%.@&.=.*.*#+$+sqr& = +-+
0 +-+-----***#$$%%@@&==-***#$$%%@@&==-***#$$%%@@&==-***#$$%+cmp==-----+-+
FOURIER NEURAL NLU DECOMPOSITION gmean
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index fbd66fd8dc..59eac97d10 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -2903,28 +2903,109 @@ static int compare_floats(FloatParts a, FloatParts b, bool is_quiet,
}
}
-#define COMPARE(sz) \
-int float ## sz ## _compare(float ## sz a, float ## sz b, \
- float_status *s) \
+#define COMPARE(name, attr, sz) \
+static int attr \
+name(float ## sz a, float ## sz b, bool is_quiet, float_status *s) \
{ \
FloatParts pa = float ## sz ## _unpack_canonical(a, s); \
FloatParts pb = float ## sz ## _unpack_canonical(b, s); \
- return compare_floats(pa, pb, false, s); \
-} \
-int float ## sz ## _compare_quiet(float ## sz a, float ## sz b, \
- float_status *s) \
-{ \
- FloatParts pa = float ## sz ## _unpack_canonical(a, s); \
- FloatParts pb = float ## sz ## _unpack_canonical(b, s); \
- return compare_floats(pa, pb, true, s); \
+ return compare_floats(pa, pb, is_quiet, s); \
}
-COMPARE(16)
-COMPARE(32)
-COMPARE(64)
+COMPARE(soft_f16_compare, QEMU_FLATTEN, 16)
+COMPARE(soft_f32_compare, QEMU_SOFTFLOAT_ATTR, 32)
+COMPARE(soft_f64_compare, QEMU_SOFTFLOAT_ATTR, 64)
#undef COMPARE
+int float16_compare(float16 a, float16 b, float_status *s)
+{
+ return soft_f16_compare(a, b, false, s);
+}
+
+int float16_compare_quiet(float16 a, float16 b, float_status *s)
+{
+ return soft_f16_compare(a, b, true, s);
+}
+
+static int QEMU_FLATTEN
+f32_compare(float32 xa, float32 xb, bool is_quiet, float_status *s)
+{
+ union_float32 ua, ub;
+
+ ua.s = xa;
+ ub.s = xb;
+
+ if (QEMU_NO_HARDFLOAT) {
+ goto soft;
+ }
+
+ float32_input_flush2(&ua.s, &ub.s, s);
+ if (isgreaterequal(ua.h, ub.h)) {
+ if (isgreater(ua.h, ub.h)) {
+ return float_relation_greater;
+ }
+ return float_relation_equal;
+ }
+ if (likely(isless(ua.h, ub.h))) {
+ return float_relation_less;
+ }
+ /* The only condition remaining is unordered.
+ * Fall through to set flags.
+ */
+ soft:
+ return soft_f32_compare(ua.s, ub.s, is_quiet, s);
+}
+
+int float32_compare(float32 a, float32 b, float_status *s)
+{
+ return f32_compare(a, b, false, s);
+}
+
+int float32_compare_quiet(float32 a, float32 b, float_status *s)
+{
+ return f32_compare(a, b, true, s);
+}
+
+static int QEMU_FLATTEN
+f64_compare(float64 xa, float64 xb, bool is_quiet, float_status *s)
+{
+ union_float64 ua, ub;
+
+ ua.s = xa;
+ ub.s = xb;
+
+ if (QEMU_NO_HARDFLOAT) {
+ goto soft;
+ }
+
+ float64_input_flush2(&ua.s, &ub.s, s);
+ if (isgreaterequal(ua.h, ub.h)) {
+ if (isgreater(ua.h, ub.h)) {
+ return float_relation_greater;
+ }
+ return float_relation_equal;
+ }
+ if (likely(isless(ua.h, ub.h))) {
+ return float_relation_less;
+ }
+ /* The only condition remaining is unordered.
+ * Fall through to set flags.
+ */
+ soft:
+ return soft_f64_compare(ua.s, ub.s, is_quiet, s);
+}
+
+int float64_compare(float64 a, float64 b, float_status *s)
+{
+ return f64_compare(a, b, false, s);
+}
+
+int float64_compare_quiet(float64 a, float64 b, float_status *s)
+{
+ return f64_compare(a, b, true, s);
+}
+
/* Multiply A by 2 raised to the power N. */
static FloatParts scalbn_decomposed(FloatParts a, int n, float_status *s)
{
--
2.17.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PULL 00/15] Hardfloat + softfloat maintainers update and gitdm
2018-12-14 13:54 [Qemu-devel] [PULL 00/15] Hardfloat + softfloat maintainers update and gitdm Alex Bennée
` (14 preceding siblings ...)
2018-12-14 13:54 ` [Qemu-devel] [PULL 15/15] hardfloat: implement float32/64 comparison Alex Bennée
@ 2018-12-16 21:51 ` Peter Maydell
2018-12-17 9:29 ` Alex Bennée
15 siblings, 1 reply; 19+ messages in thread
From: Peter Maydell @ 2018-12-16 21:51 UTC (permalink / raw)
To: Alex Bennée; +Cc: QEMU Developers
On Fri, 14 Dec 2018 at 13:54, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> The following changes since commit d8d5fefd8657d4f7b380b3a1533340434b5b9def:
>
> Merge remote-tracking branch 'remotes/berrange/tags/qcrypto-next-pull-request' into staging (2018-12-13 18:45:18 +0000)
>
> are available in the Git repository at:
>
> https://github.com/stsquad/qemu.git tags/pull-hardfloat-and-gitdm-141218-1
>
> for you to fetch changes up to 26c535d8522296399d43ebf39c109f6d8628c096:
>
> hardfloat: implement float32/64 comparison (2018-12-14 11:43:00 +0000)
>
> ----------------------------------------------------------------
> Hardfloat + maintainers and gitdm
>
> ----------------------------------------------------------------
Hi Alex -- this doesn't appear to be signed with the GPG key
I have on record for you...
thanks
-- PMM
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PULL 00/15] Hardfloat + softfloat maintainers update and gitdm
2018-12-16 21:51 ` [Qemu-devel] [PULL 00/15] Hardfloat + softfloat maintainers update and gitdm Peter Maydell
@ 2018-12-17 9:29 ` Alex Bennée
0 siblings, 0 replies; 19+ messages in thread
From: Alex Bennée @ 2018-12-17 9:29 UTC (permalink / raw)
To: Peter Maydell; +Cc: QEMU Developers
Peter Maydell <peter.maydell@linaro.org> writes:
> On Fri, 14 Dec 2018 at 13:54, Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>> The following changes since commit d8d5fefd8657d4f7b380b3a1533340434b5b9def:
>>
>> Merge remote-tracking branch 'remotes/berrange/tags/qcrypto-next-pull-request' into staging (2018-12-13 18:45:18 +0000)
>>
>> are available in the Git repository at:
>>
>> https://github.com/stsquad/qemu.git tags/pull-hardfloat-and-gitdm-141218-1
>>
>> for you to fetch changes up to 26c535d8522296399d43ebf39c109f6d8628c096:
>>
>> hardfloat: implement float32/64 comparison (2018-12-14 11:43:00 +0000)
>>
>> ----------------------------------------------------------------
>> Hardfloat + maintainers and gitdm
>>
>> ----------------------------------------------------------------
>
> Hi Alex -- this doesn't appear to be signed with the GPG key
> I have on record for you...
Hmm.. I did create a sub-key for one of my other machines - I wonder if
that has taken precedence....
>
> thanks
> -- PMM
--
Alex Bennée
^ permalink raw reply [flat|nested] 19+ messages in thread