[Qemu-devel] How to add my implementation of the fmadds instruction to QEMU

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] How to add my implementation of the fmadds instruction to QEMU
@ 2016-09-27  1:05 G 3
  2016-09-27  3:41 ` [Qemu-devel] [Qemu-ppc] " David Gibson
  2016-09-27 11:43 ` [Qemu-devel] " Peter Maydell
  0 siblings, 2 replies; 18+ messages in thread
From: G 3 @ 2016-09-27  1:05 UTC (permalink / raw)
  To: list@suse.de:PowerPC list:PowerPC, qemu-devel qemu-devel

I made my own experimental implementation of the fmadds instruction  
that I would like to add to QEMU. How would I do this?

My implementation would probably look like this:

void fmadds(float *frD, float frA, float frC, float frB)
{
	*frD = frA * frC + frB;
}


I then want to see if this implementation will make things faster.  
This code will test my implementation:

#include <stdio.h>
#include <time.h>

/*
     fmadds basically does this frD = frA * frC + frB
*/

int main (int argc, const char * argv[]) {
     const int iteration_count = 100000000;
     double iter, frD, frA, frB, frC;
     clock_t start_time, end_time;

     frA = 10;
     frB = 5;
     frC = 2;

     start_time = clock();
     for(iter = 0; iter < iteration_count; iter++)
     {
         asm volatile("fmadds %0, %1, %2, %3" : "=f" (frD) :  
"f" (frA), "f" (frC), "f" (frB));
     }
     end_time = clock();
     printf("frD:%f frA:%f frB:%f frC:%f\n", frD, frA, frB, frC);
     printf("Time elapsed: %0.2f seconds\n", (float)(end_time -  
start_time) / CLOCKS_PER_SEC);

     return 0;
}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] How to add my implementation of the fmadds instruction to QEMU
  2016-09-27  1:05 [Qemu-devel] How to add my implementation of the fmadds instruction to QEMU G 3
@ 2016-09-27  3:41 ` David Gibson
  2016-09-27 11:43 ` [Qemu-devel] " Peter Maydell
  1 sibling, 0 replies; 18+ messages in thread
From: David Gibson @ 2016-09-27  3:41 UTC (permalink / raw)
  To: G 3; +Cc: list@suse.de:PowerPC list:PowerPC, qemu-devel qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1783 bytes --]

On Mon, Sep 26, 2016 at 09:05:22PM -0400, G 3 wrote:
> I made my own experimental implementation of the fmadds instruction that I
> would like to add to QEMU. How would I do this?
> 
> My implementation would probably look like this:
> 
> void fmadds(float *frD, float frA, float frC, float frB)
> {
> 	*frD = frA * frC + frB;
> }

So.. using a helper essentially?

You'd need to submit a patch adding the new implementation, with a
commit message which made the case for replacing the existing
implementation with yours.  So, you'd need data to suggest both that
your version generates correct results, and that it is faster or
otherwise better than the existing one.

> 
> 
> I then want to see if this implementation will make things faster. This code
> will test my implementation:
> 
> #include <stdio.h>
> #include <time.h>
> 
> /*
>     fmadds basically does this frD = frA * frC + frB
> */
> 
> int main (int argc, const char * argv[]) {
>     const int iteration_count = 100000000;
>     double iter, frD, frA, frB, frC;
>     clock_t start_time, end_time;
> 
>     frA = 10;
>     frB = 5;
>     frC = 2;
> 
>     start_time = clock();
>     for(iter = 0; iter < iteration_count; iter++)
>     {
>         asm volatile("fmadds %0, %1, %2, %3" : "=f" (frD) : "f" (frA), "f"
> (frC), "f" (frB));
>     }
>     end_time = clock();
>     printf("frD:%f frA:%f frB:%f frC:%f\n", frD, frA, frB, frC);
>     printf("Time elapsed: %0.2f seconds\n", (float)(end_time - start_time) /
> CLOCKS_PER_SEC);
> 
>     return 0;
> }
> 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] How to add my implementation of the fmadds instruction to QEMU
  2016-09-27  1:05 [Qemu-devel] How to add my implementation of the fmadds instruction to QEMU G 3
  2016-09-27  3:41 ` [Qemu-devel] [Qemu-ppc] " David Gibson
@ 2016-09-27 11:43 ` Peter Maydell
  2016-09-27 14:33   ` G 3
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Maydell @ 2016-09-27 11:43 UTC (permalink / raw)
  To: G 3; +Cc: list@suse.de:PowerPC list:PowerPC, qemu-devel qemu-devel

On 26 September 2016 at 18:05, G 3 <programmingkidx@gmail.com> wrote:
> I made my own experimental implementation of the fmadds instruction that I
> would like to add to QEMU. How would I do this?
>
> My implementation would probably look like this:
>
> void fmadds(float *frD, float frA, float frC, float frB)
> {
>         *frD = frA * frC + frB;
> }

This isn't portable, because different host CPUs can have
different implementations of floating point arithmetic
with subtle differences (notably in corner cases like
subnormals and also related to the floating point exception
flags). This is why we use the softfloat library in fpu/,
which (although slow) is guaranteed to give the right results.
We did use to have a version of the fpu functions which
used a "just do a C float or double operation", but we
removed it many years ago for this reason.

In particular, for fmadds, it is important that there
is no intermediate rounding done between the multiply
and the addition. This means that you need to effectively
do the multiply and the addition at a higher precision than
the input arguments, so simple multiplication and addition
of floats will give you wrong answers.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] How to add my implementation of the fmadds instruction to QEMU
  2016-09-27 11:43 ` [Qemu-devel] " Peter Maydell
@ 2016-09-27 14:33   ` G 3
  2016-09-27 15:21     ` Peter Maydell
  2016-09-27 16:16     ` Eric Blake
  0 siblings, 2 replies; 18+ messages in thread
From: G 3 @ 2016-09-27 14:33 UTC (permalink / raw)
  To: Peter Maydell; +Cc: list@suse.de:PowerPC list:PowerPC, qemu-devel qemu-devel


On Sep 27, 2016, at 7:43 AM, Peter Maydell wrote:

> On 26 September 2016 at 18:05, G 3 <programmingkidx@gmail.com> wrote:
>> I made my own experimental implementation of the fmadds  
>> instruction that I
>> would like to add to QEMU. How would I do this?
>>
>> My implementation would probably look like this:
>>
>> void fmadds(float *frD, float frA, float frC, float frB)
>> {
>>         *frD = frA * frC + frB;
>> }
>
> This isn't portable, because different host CPUs can have
> different implementations of floating point arithmetic
> with subtle differences (notably in corner cases like
> subnormals and also related to the floating point exception
> flags). This is why we use the softfloat library in fpu/,
> which (although slow) is guaranteed to give the right results.
> We did use to have a version of the fpu functions which
> used a "just do a C float or double operation", but we
> removed it many years ago for this reason.
>
> In particular, for fmadds, it is important that there
> is no intermediate rounding done between the multiply
> and the addition. This means that you need to effectively
> do the multiply and the addition at a higher precision than
> the input arguments, so simple multiplication and addition
> of floats will give you wrong answers.

It sounds like I should change my argument types to double.

I still want to try implementing this function. I'm thinking  
rewriting the
helper_fmadd() function in target-ppc/fpu_helper.c. Does that
sound correct?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] How to add my implementation of the fmadds instruction to QEMU
  2016-09-27 14:33   ` G 3
@ 2016-09-27 15:21     ` Peter Maydell
  2016-09-27 16:16     ` Eric Blake
  1 sibling, 0 replies; 18+ messages in thread
From: Peter Maydell @ 2016-09-27 15:21 UTC (permalink / raw)
  To: G 3; +Cc: list@suse.de:PowerPC list:PowerPC, qemu-devel qemu-devel

On 27 September 2016 at 07:33, G 3 <programmingkidx@gmail.com> wrote:
>
> On Sep 27, 2016, at 7:43 AM, Peter Maydell wrote:
>> In particular, for fmadds, it is important that there
>> is no intermediate rounding done between the multiply
>> and the addition. This means that you need to effectively
>> do the multiply and the addition at a higher precision than
>> the input arguments, so simple multiplication and addition
>> of floats will give you wrong answers.
>
>
> It sounds like I should change my argument types to double.

That will fix only a very tiny part of the problem.
You will also need to get right floating point exception flags,
handling of subnormal numbers, and various other implementation
specifics of IEEE.

> I still want to try implementing this function. I'm thinking
> rewriting the helper_fmadd() function in target-ppc/fpu_helper.c.
> Does that sound correct?

That's the place that would need to be reimplemented.
I recommend doing very thorough testing by feeding the
instruction a lot of randomly selected input values and
comparing the output value and the output floating point
status/exception flags against the current implementation
and/or real hardware.

I really think you'll find that this is just impossible
to implement correctly with the host compiler's floating
point C expressions, though.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] How to add my implementation of the fmadds instruction to QEMU
  2016-09-27 14:33   ` G 3
  2016-09-27 15:21     ` Peter Maydell
@ 2016-09-27 16:16     ` Eric Blake
  2016-09-27 16:51       ` G 3
  1 sibling, 1 reply; 18+ messages in thread
From: Eric Blake @ 2016-09-27 16:16 UTC (permalink / raw)
  To: G 3, Peter Maydell
  Cc: list@suse.de:PowerPC list:PowerPC, qemu-devel qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1222 bytes --]

On 09/27/2016 09:33 AM, G 3 wrote:
>>> void fmadds(float *frD, float frA, float frC, float frB)
>>> {
>>>         *frD = frA * frC + frB;
>>> }

> 
> It sounds like I should change my argument types to double.

Insufficient.  The whole reason that fmadds exists is that there are
provably cases where two operations that both round are GUARANTEED to
get the wrong answer when compared to a single operation, regardless of
the precisions involved.  Widening from float to double does NOT
eliminate the double-rounding problem.

> 
> I still want to try implementing this function. I'm thinking rewriting the
> helper_fmadd() function in target-ppc/fpu_helper.c. Does that
> sound correct?

I seriously doubt you would be able to write a correct implementation,
if you aren't even aware of the double-rounding reasons why fmadds was
added to the IEEE floating point specification in the first place.  Your
idea that you would be able to speed things up is probably a premature
optimization, given that you have no realistic clue how hard it is to
CORRECTLY implement fused-multiply-add.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] How to add my implementation of the fmadds instruction to QEMU
  2016-09-27 16:16     ` Eric Blake
@ 2016-09-27 16:51       ` G 3
  2016-09-27 16:58         ` Peter Maydell
  0 siblings, 1 reply; 18+ messages in thread
From: G 3 @ 2016-09-27 16:51 UTC (permalink / raw)
  To: Eric Blake
  Cc: Peter Maydell, list@suse.de:PowerPC list:PowerPC,
	qemu-devel qemu-devel

On Sep 27, 2016, at 12:16 PM, Eric Blake wrote:

> On 09/27/2016 09:33 AM, G 3 wrote:
>>>> void fmadds(float *frD, float frA, float frC, float frB)
>>>> {
>>>>         *frD = frA * frC + frB;
>>>> }
>
>>
>> It sounds like I should change my argument types to double.
>
> Insufficient.  The whole reason that fmadds exists is that there are
> provably cases where two operations that both round are GUARANTEED to
> get the wrong answer when compared to a single operation,  
> regardless of
> the precisions involved.  Widening from float to double does NOT
> eliminate the double-rounding problem.
>
>>
>> I still want to try implementing this function. I'm thinking  
>> rewriting the
>> helper_fmadd() function in target-ppc/fpu_helper.c. Does that
>> sound correct?
>
> I seriously doubt you would be able to write a correct implementation,
> if you aren't even aware of the double-rounding reasons why fmadds was
> added to the IEEE floating point specification in the first place.   
> Your
> idea that you would be able to speed things up is probably a premature
> optimization, given that you have no realistic clue how hard it is to
> CORRECTLY implement fused-multiply-add.

The problem with your reasoning is you assume this instruction has to  
be 100%
correctly implemented. That every single "corner-case" has to be  
accounted for.
I have only just begun my research into the floating point  
instructions so of
course I'm not going to know everything initially. I plan on  
experimenting
and learning along the way.

My ultimate end goal is to make sound play correctly on a PowerPC-Mac  
OS guest.
The source code to Apple's audio kernel extensions indicate explicit  
use of
certain floating-point instructions. The current theory is audio  
playback doesn't
work because the floating point unit is too slow. So if I implemented  
a floating
point instruction such as fmadds that was optimized for speed, then I  
could
make sound play better than it does now. Accounting for every single
corner-case may sound like the right thing to do, but it may actually be
causing more harm than good. It takes CPU time to handle the corner- 
cases.
These corner-cases are not even guaranteed to appear during execution
time. I'm hoping by implementing a scaled down version of the fmadds
instruction, audio playback may actually work.

Maybe some time down the road a command-line switch could be added
that allows the user to decide which is more important: speed or  
accuracy?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] How to add my implementation of the fmadds instruction to QEMU
  2016-09-27 16:51       ` G 3
@ 2016-09-27 16:58         ` Peter Maydell
  2016-09-29  4:17           ` [Qemu-devel] [Qemu-ppc] " David Gibson
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Maydell @ 2016-09-27 16:58 UTC (permalink / raw)
  To: G 3; +Cc: Eric Blake, list@suse.de:PowerPC list:PowerPC,
	qemu-devel qemu-devel

On 27 September 2016 at 09:51, G 3 <programmingkidx@gmail.com> wrote:
> The problem with your reasoning is you assume this instruction has to be
> 100% correctly implemented. That every single "corner-case" has to be
> accounted for.

For upstream QEMU we've already made this design decision --
emulation accuracy comes first, and speed is secondary.
That's why we implement this the way we do.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] How to add my implementation of the fmadds instruction to QEMU
  2016-09-27 16:58         ` Peter Maydell
@ 2016-09-29  4:17           ` David Gibson
  2016-09-29 15:20             ` Programmingkid
  2016-09-29 15:41             ` Peter Maydell
  0 siblings, 2 replies; 18+ messages in thread
From: David Gibson @ 2016-09-29  4:17 UTC (permalink / raw)
  To: Peter Maydell
  Cc: G 3, list@suse.de:PowerPC list:PowerPC, Eric Blake,
	qemu-devel qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1337 bytes --]

On Tue, Sep 27, 2016 at 09:58:02AM -0700, Peter Maydell wrote:
> On 27 September 2016 at 09:51, G 3 <programmingkidx@gmail.com> wrote:
> > The problem with your reasoning is you assume this instruction has to be
> > 100% correctly implemented. That every single "corner-case" has to be
> > accounted for.
> 
> For upstream QEMU we've already made this design decision --
> emulation accuracy comes first, and speed is secondary.
> That's why we implement this the way we do.

I think there is a way you could get both speed and accuracy, but it's
a huge project:

You'd need to add full float awareness to TCG - so floating point TCG
values and floating point operations as tcp micro-ops, defined
according to IEEE semantics.  Then you'd need to rewrite the TCG
frontends in terms of those new ops, at least for target CPUs close
enough to IEEE semantics for that to work.  And you'd need to rewrite
the TCG backends to implement those fp ops in terms of host cpu fp
instructions .. at least when the host has fp behaviour close enough
to IEEE to make that work, with a fallback to soft float when that's
not the case.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] How to add my implementation of the fmadds instruction to QEMU
  2016-09-29  4:17           ` [Qemu-devel] [Qemu-ppc] " David Gibson
@ 2016-09-29 15:20             ` Programmingkid
  2016-09-29 18:19               ` Alex Bennée
  2016-09-29 15:41             ` Peter Maydell
  1 sibling, 1 reply; 18+ messages in thread
From: Programmingkid @ 2016-09-29 15:20 UTC (permalink / raw)
  To: David Gibson
  Cc: list@suse.de:PowerPC list:PowerPC, qemu-devel qemu-devel,
	Peter Maydell, Eric Blake


On Sep 29, 2016, at 12:17 AM, David Gibson wrote:

> On Tue, Sep 27, 2016 at 09:58:02AM -0700, Peter Maydell wrote:
>> On 27 September 2016 at 09:51, G 3 <programmingkidx@gmail.com> wrote:
>>> The problem with your reasoning is you assume this instruction has to be
>>> 100% correctly implemented. That every single "corner-case" has to be
>>> accounted for.
>> 
>> For upstream QEMU we've already made this design decision --
>> emulation accuracy comes first, and speed is secondary.
>> That's why we implement this the way we do.
> 
> I think there is a way you could get both speed and accuracy, but it's
> a huge project:
> 
> You'd need to add full float awareness to TCG - so floating point TCG
> values and floating point operations as tcp micro-ops, defined
> according to IEEE semantics.  Then you'd need to rewrite the TCG
> frontends in terms of those new ops, at least for target CPUs close
> enough to IEEE semantics for that to work.  And you'd need to rewrite
> the TCG backends to implement those fp ops in terms of host cpu fp
> instructions .. at least when the host has fp behaviour close enough
> to IEEE to make that work, with a fallback to soft float when that's
> not the case.

Interesting idea. Do you think we would see a large enough increase in speed
to justify this project?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] How to add my implementation of the fmadds instruction to QEMU
  2016-09-29  4:17           ` [Qemu-devel] [Qemu-ppc] " David Gibson
  2016-09-29 15:20             ` Programmingkid
@ 2016-09-29 15:41             ` Peter Maydell
  2016-09-29 16:55               ` Programmingkid
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Maydell @ 2016-09-29 15:41 UTC (permalink / raw)
  To: David Gibson
  Cc: G 3, list@suse.de:PowerPC list:PowerPC, Eric Blake,
	qemu-devel qemu-devel

On 28 September 2016 at 21:17, David Gibson <david@gibson.dropbear.id.au> wrote:
> I think there is a way you could get both speed and accuracy, but it's
> a huge project:
>
> You'd need to add full float awareness to TCG - so floating point TCG
> values and floating point operations as tcp micro-ops, defined
> according to IEEE semantics.  Then you'd need to rewrite the TCG
> frontends in terms of those new ops, at least for target CPUs close
> enough to IEEE semantics for that to work.  And you'd need to rewrite
> the TCG backends to implement those fp ops in terms of host cpu fp
> instructions .. at least when the host has fp behaviour close enough
> to IEEE to make that work, with a fallback to soft float when that's
> not the case.

Also even if you have float support in both frontend and backend
you still need to fall back to fully-emulated for the runtime
corner cases (like where tininess before/after rounding makes a
difference or where you need to care about minutiae of the
floating point exception flags, etc). It's not impossible
but it is a very large amount of technically complicated work.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] How to add my implementation of the fmadds instruction to QEMU
  2016-09-29 15:41             ` Peter Maydell
@ 2016-09-29 16:55               ` Programmingkid
  2016-09-30  0:39                 ` David Gibson
  0 siblings, 1 reply; 18+ messages in thread
From: Programmingkid @ 2016-09-29 16:55 UTC (permalink / raw)
  To: Peter Maydell
  Cc: David Gibson, list@suse.de:PowerPC list:PowerPC, Eric Blake,
	qemu-devel qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1486 bytes --]


On Sep 29, 2016, at 11:41 AM, Peter Maydell wrote:

> On 28 September 2016 at 21:17, David Gibson <david@gibson.dropbear.id.au> wrote:
>> I think there is a way you could get both speed and accuracy, but it's
>> a huge project:
>> 
>> You'd need to add full float awareness to TCG - so floating point TCG
>> values and floating point operations as tcp micro-ops, defined
>> according to IEEE semantics.  Then you'd need to rewrite the TCG
>> frontends in terms of those new ops, at least for target CPUs close
>> enough to IEEE semantics for that to work.  And you'd need to rewrite
>> the TCG backends to implement those fp ops in terms of host cpu fp
>> instructions .. at least when the host has fp behaviour close enough
>> to IEEE to make that work, with a fallback to soft float when that's
>> not the case.
> 
> Also even if you have float support in both frontend and backend
> you still need to fall back to fully-emulated for the runtime
> corner cases (like where tininess before/after rounding makes a
> difference or where you need to care about minutiae of the
> floating point exception flags, etc). It's not impossible
> but it is a very large amount of technically complicated work.


This project sounds like it should have its own web page. Maybe even
its own Google Summer of Code entry. I created a mindmap of
this project. The picture is attached to this email. This is
just a start. Please let me know what should be added or changed.

[-- Attachment #2: floating point mindmap.png --]
[-- Type: image/png, Size: 12027 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] How to add my implementation of the fmadds instruction to QEMU
  2016-09-29 15:20             ` Programmingkid
@ 2016-09-29 18:19               ` Alex Bennée
  2016-09-29 21:52                 ` Programmingkid
  0 siblings, 1 reply; 18+ messages in thread
From: Alex Bennée @ 2016-09-29 18:19 UTC (permalink / raw)
  To: Programmingkid
  Cc: David Gibson, Peter Maydell, list@suse.de:PowerPC list:PowerPC,
	qemu-devel qemu-devel


Programmingkid <programmingkidx@gmail.com> writes:

> On Sep 29, 2016, at 12:17 AM, David Gibson wrote:
>
>> On Tue, Sep 27, 2016 at 09:58:02AM -0700, Peter Maydell wrote:
>>> On 27 September 2016 at 09:51, G 3 <programmingkidx@gmail.com> wrote:
>>>> The problem with your reasoning is you assume this instruction has to be
>>>> 100% correctly implemented. That every single "corner-case" has to be
>>>> accounted for.
>>>
>>> For upstream QEMU we've already made this design decision --
>>> emulation accuracy comes first, and speed is secondary.
>>> That's why we implement this the way we do.
>>
>> I think there is a way you could get both speed and accuracy, but it's
>> a huge project:
>>
>> You'd need to add full float awareness to TCG - so floating point TCG
>> values and floating point operations as tcp micro-ops, defined
>> according to IEEE semantics.  Then you'd need to rewrite the TCG
>> frontends in terms of those new ops, at least for target CPUs close
>> enough to IEEE semantics for that to work.  And you'd need to rewrite
>> the TCG backends to implement those fp ops in terms of host cpu fp
>> instructions .. at least when the host has fp behaviour close enough
>> to IEEE to make that work, with a fallback to soft float when that's
>> not the case.
>
> Interesting idea. Do you think we would see a large enough increase in speed
> to justify this project?

It really depends on workload. If you want to run lots of encoding/audio
workloads in TCG guests it is certainly something that could be
improved.

As others have pointed out however it is a fairly big project.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] How to add my implementation of the fmadds instruction to QEMU
  2016-09-29 18:19               ` Alex Bennée
@ 2016-09-29 21:52                 ` Programmingkid
  2016-09-29 22:36                   ` Alex Bennée
  0 siblings, 1 reply; 18+ messages in thread
From: Programmingkid @ 2016-09-29 21:52 UTC (permalink / raw)
  To: Alex Bennée
  Cc: David Gibson, Peter Maydell, list@suse.de:PowerPC list:PowerPC,
	qemu-devel qemu-devel


On Sep 29, 2016, at 2:19 PM, Alex Bennée wrote:

> 
> Programmingkid <programmingkidx@gmail.com> writes:
> 
>> On Sep 29, 2016, at 12:17 AM, David Gibson wrote:
>> 
>>> On Tue, Sep 27, 2016 at 09:58:02AM -0700, Peter Maydell wrote:
>>>> On 27 September 2016 at 09:51, G 3 <programmingkidx@gmail.com> wrote:
>>>>> The problem with your reasoning is you assume this instruction has to be
>>>>> 100% correctly implemented. That every single "corner-case" has to be
>>>>> accounted for.
>>>> 
>>>> For upstream QEMU we've already made this design decision --
>>>> emulation accuracy comes first, and speed is secondary.
>>>> That's why we implement this the way we do.
>>> 
>>> I think there is a way you could get both speed and accuracy, but it's
>>> a huge project:
>>> 
>>> You'd need to add full float awareness to TCG - so floating point TCG
>>> values and floating point operations as tcp micro-ops, defined
>>> according to IEEE semantics.  Then you'd need to rewrite the TCG
>>> frontends in terms of those new ops, at least for target CPUs close
>>> enough to IEEE semantics for that to work.  And you'd need to rewrite
>>> the TCG backends to implement those fp ops in terms of host cpu fp
>>> instructions .. at least when the host has fp behaviour close enough
>>> to IEEE to make that work, with a fallback to soft float when that's
>>> not the case.
>> 
>> Interesting idea. Do you think we would see a large enough increase in speed
>> to justify this project?
> 
> It really depends on workload. If you want to run lots of encoding/audio
> workloads in TCG guests it is certainly something that could be
> improved.
> 
> As others have pointed out however it is a fairly big project.
> 
> --
> Alex Bennée

Alex Bennée? I was just watching your KVM video about MTTCG! Small world. 

I so want audio to play correctly in a PowerPC Mac OS guest. So this
project might be necessary. 

If it is a fairly big project, then I will need to map it out some more.
I've made a mind map of what I know so far. It is attached to this email.
Let me know if you can think of anything to add.

http://i.imgur.com/MYkiKGx.png

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] How to add my implementation of the fmadds instruction to QEMU
  2016-09-29 21:52                 ` Programmingkid
@ 2016-09-29 22:36                   ` Alex Bennée
  2016-09-29 22:39                     ` Programmingkid
  0 siblings, 1 reply; 18+ messages in thread
From: Alex Bennée @ 2016-09-29 22:36 UTC (permalink / raw)
  To: Programmingkid
  Cc: David Gibson, Peter Maydell, list@suse.de:PowerPC list:PowerPC,
	qemu-devel qemu-devel


Programmingkid <programmingkidx@gmail.com> writes:

> On Sep 29, 2016, at 2:19 PM, Alex Bennée wrote:
>
>>
>> Programmingkid <programmingkidx@gmail.com> writes:
>>
>>> On Sep 29, 2016, at 12:17 AM, David Gibson wrote:
>>>
>>>> On Tue, Sep 27, 2016 at 09:58:02AM -0700, Peter Maydell wrote:
>>>>> On 27 September 2016 at 09:51, G 3 <programmingkidx@gmail.com> wrote:
>>>>>> The problem with your reasoning is you assume this instruction has to be
>>>>>> 100% correctly implemented. That every single "corner-case" has to be
>>>>>> accounted for.
>>>>>
>>>>> For upstream QEMU we've already made this design decision --
>>>>> emulation accuracy comes first, and speed is secondary.
>>>>> That's why we implement this the way we do.
>>>>
>>>> I think there is a way you could get both speed and accuracy, but it's
>>>> a huge project:
>>>>
>>>> You'd need to add full float awareness to TCG - so floating point TCG
>>>> values and floating point operations as tcp micro-ops, defined
>>>> according to IEEE semantics.  Then you'd need to rewrite the TCG
>>>> frontends in terms of those new ops, at least for target CPUs close
>>>> enough to IEEE semantics for that to work.  And you'd need to rewrite
>>>> the TCG backends to implement those fp ops in terms of host cpu fp
>>>> instructions .. at least when the host has fp behaviour close enough
>>>> to IEEE to make that work, with a fallback to soft float when that's
>>>> not the case.
>>>
>>> Interesting idea. Do you think we would see a large enough increase in speed
>>> to justify this project?
>>
>> It really depends on workload. If you want to run lots of encoding/audio
>> workloads in TCG guests it is certainly something that could be
>> improved.
>>
>> As others have pointed out however it is a fairly big project.
>>
>> --
>> Alex Bennée
>
> Alex Bennée? I was just watching your KVM video about MTTCG! Small world.
>
> I so want audio to play correctly in a PowerPC Mac OS guest. So this
> project might be necessary.
>
> If it is a fairly big project, then I will need to map it out some more.
> I've made a mind map of what I know so far. It is attached to this email.
> Let me know if you can think of anything to add.
>
> http://i.imgur.com/MYkiKGx.png

While I appreciate your target is PPC I think if you are going to
suggest any core floating point TCGOps you will need to survey the
behaviour of the FPUs on all (or at least the most common) TCG targets
and go for instructions that behave the same across a broad range of
targets.

I think if we were to introduce this into the code base we would need to
have a decent range of test cases. I'm talking about making sure we
exercise the whole range of behaviour:

  - min/max rounding behaviour
  - handling of denormalisation
  - signalling and non-signalling NaN behaviour
  - exception generation

Testing is going to be very important for confidence.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] How to add my implementation of the fmadds instruction to QEMU
  2016-09-29 22:36                   ` Alex Bennée
@ 2016-09-29 22:39                     ` Programmingkid
  0 siblings, 0 replies; 18+ messages in thread
From: Programmingkid @ 2016-09-29 22:39 UTC (permalink / raw)
  To: Alex Bennée
  Cc: David Gibson, Peter Maydell, list@suse.de:PowerPC list:PowerPC,
	qemu-devel qemu-devel


On Sep 29, 2016, at 6:36 PM, Alex Bennée wrote:

> 
> Programmingkid <programmingkidx@gmail.com> writes:
> 
>> On Sep 29, 2016, at 2:19 PM, Alex Bennée wrote:
>> 
>>> 
>>> Programmingkid <programmingkidx@gmail.com> writes:
>>> 
>>>> On Sep 29, 2016, at 12:17 AM, David Gibson wrote:
>>>> 
>>>>> On Tue, Sep 27, 2016 at 09:58:02AM -0700, Peter Maydell wrote:
>>>>>> On 27 September 2016 at 09:51, G 3 <programmingkidx@gmail.com> wrote:
>>>>>>> The problem with your reasoning is you assume this instruction has to be
>>>>>>> 100% correctly implemented. That every single "corner-case" has to be
>>>>>>> accounted for.
>>>>>> 
>>>>>> For upstream QEMU we've already made this design decision --
>>>>>> emulation accuracy comes first, and speed is secondary.
>>>>>> That's why we implement this the way we do.
>>>>> 
>>>>> I think there is a way you could get both speed and accuracy, but it's
>>>>> a huge project:
>>>>> 
>>>>> You'd need to add full float awareness to TCG - so floating point TCG
>>>>> values and floating point operations as tcp micro-ops, defined
>>>>> according to IEEE semantics.  Then you'd need to rewrite the TCG
>>>>> frontends in terms of those new ops, at least for target CPUs close
>>>>> enough to IEEE semantics for that to work.  And you'd need to rewrite
>>>>> the TCG backends to implement those fp ops in terms of host cpu fp
>>>>> instructions .. at least when the host has fp behaviour close enough
>>>>> to IEEE to make that work, with a fallback to soft float when that's
>>>>> not the case.
>>>> 
>>>> Interesting idea. Do you think we would see a large enough increase in speed
>>>> to justify this project?
>>> 
>>> It really depends on workload. If you want to run lots of encoding/audio
>>> workloads in TCG guests it is certainly something that could be
>>> improved.
>>> 
>>> As others have pointed out however it is a fairly big project.
>>> 
>>> --
>>> Alex Bennée
>> 
>> Alex Bennée? I was just watching your KVM video about MTTCG! Small world.
>> 
>> I so want audio to play correctly in a PowerPC Mac OS guest. So this
>> project might be necessary.
>> 
>> If it is a fairly big project, then I will need to map it out some more.
>> I've made a mind map of what I know so far. It is attached to this email.
>> Let me know if you can think of anything to add.
>> 
>> http://i.imgur.com/MYkiKGx.png
> 
> While I appreciate your target is PPC I think if you are going to
> suggest any core floating point TCGOps you will need to survey the
> behaviour of the FPUs on all (or at least the most common) TCG targets
> and go for instructions that behave the same across a broad range of
> targets.
> 
> I think if we were to introduce this into the code base we would need to
> have a decent range of test cases. I'm talking about making sure we
> exercise the whole range of behaviour:
> 
>  - min/max rounding behaviour
>  - handling of denormalisation
>  - signalling and non-signalling NaN behaviour
>  - exception generation
> 
> Testing is going to be very important for confidence.

Thank you very much for this list. I will see what I can come up with.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] How to add my implementation of the fmadds instruction to QEMU
  2016-09-29 16:55               ` Programmingkid
@ 2016-09-30  0:39                 ` David Gibson
  2016-09-30  0:44                   ` Programmingkid
  0 siblings, 1 reply; 18+ messages in thread
From: David Gibson @ 2016-09-30  0:39 UTC (permalink / raw)
  To: Programmingkid
  Cc: Peter Maydell, list@suse.de:PowerPC list:PowerPC, Eric Blake,
	qemu-devel qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1866 bytes --]

On Thu, Sep 29, 2016 at 12:55:23PM -0400, Programmingkid wrote:
> 
> On Sep 29, 2016, at 11:41 AM, Peter Maydell wrote:
> 
> > On 28 September 2016 at 21:17, David Gibson <david@gibson.dropbear.id.au> wrote:
> >> I think there is a way you could get both speed and accuracy, but it's
> >> a huge project:
> >> 
> >> You'd need to add full float awareness to TCG - so floating point TCG
> >> values and floating point operations as tcp micro-ops, defined
> >> according to IEEE semantics.  Then you'd need to rewrite the TCG
> >> frontends in terms of those new ops, at least for target CPUs close
> >> enough to IEEE semantics for that to work.  And you'd need to rewrite
> >> the TCG backends to implement those fp ops in terms of host cpu fp
> >> instructions .. at least when the host has fp behaviour close enough
> >> to IEEE to make that work, with a fallback to soft float when that's
> >> not the case.
> > 
> > Also even if you have float support in both frontend and backend
> > you still need to fall back to fully-emulated for the runtime
> > corner cases (like where tininess before/after rounding makes a
> > difference or where you need to care about minutiae of the
> > floating point exception flags, etc). It's not impossible
> > but it is a very large amount of technically complicated work.
> 
> 
> This project sounds like it should have its own web page. Maybe even
> its own Google Summer of Code entry. I created a mindmap of
> this project. The picture is attached to this email. This is
> just a start. Please let me know what should be added or changed.

TBH, I think this is rather bigger than a GSoC project.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] How to add my implementation of the fmadds instruction to QEMU
  2016-09-30  0:39                 ` David Gibson
@ 2016-09-30  0:44                   ` Programmingkid
  0 siblings, 0 replies; 18+ messages in thread
From: Programmingkid @ 2016-09-30  0:44 UTC (permalink / raw)
  To: David Gibson
  Cc: Peter Maydell, list@suse.de:PowerPC list:PowerPC, Eric Blake,
	qemu-devel qemu-devel


On Sep 29, 2016, at 8:39 PM, David Gibson wrote:

> On Thu, Sep 29, 2016 at 12:55:23PM -0400, Programmingkid wrote:
>> 
>> On Sep 29, 2016, at 11:41 AM, Peter Maydell wrote:
>> 
>>> On 28 September 2016 at 21:17, David Gibson <david@gibson.dropbear.id.au> wrote:
>>>> I think there is a way you could get both speed and accuracy, but it's
>>>> a huge project:
>>>> 
>>>> You'd need to add full float awareness to TCG - so floating point TCG
>>>> values and floating point operations as tcp micro-ops, defined
>>>> according to IEEE semantics.  Then you'd need to rewrite the TCG
>>>> frontends in terms of those new ops, at least for target CPUs close
>>>> enough to IEEE semantics for that to work.  And you'd need to rewrite
>>>> the TCG backends to implement those fp ops in terms of host cpu fp
>>>> instructions .. at least when the host has fp behaviour close enough
>>>> to IEEE to make that work, with a fallback to soft float when that's
>>>> not the case.
>>> 
>>> Also even if you have float support in both frontend and backend
>>> you still need to fall back to fully-emulated for the runtime
>>> corner cases (like where tininess before/after rounding makes a
>>> difference or where you need to care about minutiae of the
>>> floating point exception flags, etc). It's not impossible
>>> but it is a very large amount of technically complicated work.
>> 
>> 
>> This project sounds like it should have its own web page. Maybe even
>> its own Google Summer of Code entry. I created a mindmap of
>> this project. The picture is attached to this email. This is
>> just a start. Please let me know what should be added or changed.
> 
> TBH, I think this is rather bigger than a GSoC project.

If it is really big, then it should be broken down into easier steps.
Any idea what those steps could be?

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-09-30  0:53 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-27  1:05 [Qemu-devel] How to add my implementation of the fmadds instruction to QEMU G 3
2016-09-27  3:41 ` [Qemu-devel] [Qemu-ppc] " David Gibson
2016-09-27 11:43 ` [Qemu-devel] " Peter Maydell
2016-09-27 14:33   ` G 3
2016-09-27 15:21     ` Peter Maydell
2016-09-27 16:16     ` Eric Blake
2016-09-27 16:51       ` G 3
2016-09-27 16:58         ` Peter Maydell
2016-09-29  4:17           ` [Qemu-devel] [Qemu-ppc] " David Gibson
2016-09-29 15:20             ` Programmingkid
2016-09-29 18:19               ` Alex Bennée
2016-09-29 21:52                 ` Programmingkid
2016-09-29 22:36                   ` Alex Bennée
2016-09-29 22:39                     ` Programmingkid
2016-09-29 15:41             ` Peter Maydell
2016-09-29 16:55               ` Programmingkid
2016-09-30  0:39                 ` David Gibson
2016-09-30  0:44                   ` Programmingkid

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).