From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57274) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eo7GH-0003kL-Hr for qemu-devel@nongnu.org; Tue, 20 Feb 2018 07:43:10 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eo7GB-0004Y6-Dy for qemu-devel@nongnu.org; Tue, 20 Feb 2018 07:43:09 -0500 Received: from mail-wr0-x230.google.com ([2a00:1450:400c:c0c::230]:37791) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eo7GB-0004XN-38 for qemu-devel@nongnu.org; Tue, 20 Feb 2018 07:43:03 -0500 Received: by mail-wr0-x230.google.com with SMTP id z12so7815191wrg.4 for ; Tue, 20 Feb 2018 04:43:02 -0800 (PST) References: <20180206164815.10084-1-alex.bennee@linaro.org> <579a7106-ecdb-984e-97b5-bd23d0625156@vivier.eu> From: Alex =?utf-8?Q?Benn=C3=A9e?= In-reply-to: <579a7106-ecdb-984e-97b5-bd23d0625156@vivier.eu> Date: Tue, 20 Feb 2018 12:43:00 +0000 Message-ID: <87mv03al2j.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v4 00/22] re-factor softfloat and add fp16 functions List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Laurent Vivier Cc: Peter Maydell , Richard Henderson , bharata@linux.vnet.ibm.com, Andrew Dutcher , QEMU Developers Laurent Vivier writes: > Le 13/02/2018 =C3=A0 16:51, Peter Maydell a =C3=A9crit: >> On 6 February 2018 at 16:47, Alex Benn=C3=A9e w= rote: >>> Hi, >>> >>> The main change is applying the __attribute__((flatten)) to some of >>> the public functions that show up in Emilio's dbt-benchmark. This >>> seems to be a cleaner solution that squashing inlines higher up the >>> chain and still leaves the chance for re-use for the less widely used >>> functions. The results are an improvement over v3 by some margin: >>> >>> NBench score; higher is better >>> >>> 5 +-+-----------+-------------+------------+-------------+---------= --+-+ >>> | ****### %%%% +++ = | >>> 4.5 +-+...................*..*..#.%..%..****##..%%%%+ system-2.5 = +-+ >>> | * * # % % * * # % % master = | >>> 4 +-+...................*..*..#.%..%..*..*.#..%..%softfloat-v3 = +-+ >>> 3.5 +-+...................*..*..#.%..%..*..*.#..%..%softfloat-%%%%...= ..+-+ >>> | * * # % % * * # % % * * # % % = | >>> 3 +-+...................*..*..#.%..%..*..*.#..%..%..*.*..#..%..%...= ..+-+ >>> | * * #+% % * * #$$$ % * * # % % = | >>> 2.5 +-+........####.......*..*..#$$..%..*..*.#..$..%..*.*..#..%..%...= ..+-+ >>> | **** # %%% * * # $ % * * # $ % * * #$$$ % = | >>> 2 +-+.....*..*..#..%.%..*..*..#.$..%..*..*.#..$..%..*.*..#..$..%...= ..+-+ >>> | * * # % % * * # $ % * * # $ % * * # $ % = | >>> 1.5 +-+.....*..*..#$$$.%..*..*..#.$..%..*..*.#..$..%..*.*..#..$..%...= ..+-+ >>> 1 +-+.....*..*..#..$.%..*..*..#.$..%..*..*.#..$..%..*.*..#..$..%...= ..+-+ >>> | * * # $ % * * # $ % * * # $ % * * # $ % = | >>> 0.5 +-+.....*..*..#..$.%..*..*..#.$..%..*..*.#..$..%..*.*..#..$..%...= ..+-+ >>> | * * # $ % * * # $ % * * # $ % * * # $ % = | >>> 0 +-+-----****###$$$%%--****###$$%%%--****##$$$%%%--***###$$$%%%---= --+-+ >>> FOURIER NEURAL NETLU DECOMPOSITION gmean >>> >>> Slightly easier to read PNG: >>> >>> https://i.imgur.com/XEeL0bC.png >>> >>> I think it's pretty ready for a merge. Shall I submit a pull myself or >>> does it make sense going via someone else? According to MAINTAINERS >>> Peter and Aurelien are responsible for this code... >> >> I had some nits but I think the best thing to do is if you fix those >> and then just send a pull request for this. > > Just to be sure no one has missed that: > > https://bellard.org/libbf/ > > I'm wondering if it can help for this work. I did have a brief look through to get a sense of how it works. The first thing it is missing however is half-precision. It only seems to deal in 32 and 64 bit floats. The code is also fairly sparse in its commenting. The main approach seems to be somewhere between rth's glibc macro fest and what we have now. It makes extensive use of every QEMU developers favourite glue macro to instantiate code from a "template". This allows some better usage of size appropriate types in each instantiation where we just do most things at the highest precision. However I think it also suffers the same problem as SoftFloat3 as in it is not an upstream project so it is just another lump of code to import into out code base. Based on that I favour our re-factor more as I think it is easier to follow and hopefully will be easier to maintain. I think we can address the inefficiencies in our mul/div code by passing FloatFmt in and letting the compiler deal with it in each flattened implementation. I prototyped mul: http://ix.io/MYw However unless we are super worried about these inefficiencies I'm proposing we merge what we have and deal with these in a later round. > > Thanks, > Laurent -- Alex Benn=C3=A9e