Re: About hardfloat in ppc

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Alex Bennée" <alex.bennee@linaro.org>
To: BALATON Zoltan <balaton@eik.bme.hu>
Cc: Richard Henderson <richard.henderson@linaro.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Programmingkid <programmingkidx@gmail.com>,
	luoyonggang@gmail.com,
	"qemu-ppc@nongnu.org" <qemu-ppc@nongnu.org>,
	Howard Spoelstra <hsp.cat7@gmail.com>,
	Dino Papararo <skizzato73@msn.com>
Subject: Re: About hardfloat in ppc
Date: Fri, 01 May 2020 15:01:24 +0100	[thread overview]
Message-ID: <87y2qbiwjv.fsf@linaro.org> (raw)
In-Reply-To: <alpine.BSF.2.22.395.2005011517360.62443@zero.eik.bme.hu>


BALATON Zoltan <balaton@eik.bme.hu> writes:

> On Fri, 1 May 2020, Alex Bennée wrote:
>> 罗勇刚(Yonggang Luo) <luoyonggang@gmail.com> writes:
>>> On Fri, May 1, 2020 at 7:58 PM BALATON Zoltan <balaton@eik.bme.hu> wrote:
>>>> On Fri, 1 May 2020, 罗勇刚(Yonggang Luo) wrote:
>>>>> That's what I suggested,
>>>>> We preserve a  float computing cache
>>>>> typedef struct FpRecord {
>>>>>  uint8_t op;
>>>>>  float32 A;
>>>>>  float32 B;
>>>>> }  FpRecord;
>>>>> FpRecord fp_cache[1024];
>>>>> int fp_cache_length;
>>>>> uint32_t fp_exceptions;
>>>>>
>>>>> 1. For each new fp operation we push it to the  fp_cache,
>>>>> 2. Once we read the fp_exceptions , then we re-compute
>>>>> the fp_exceptions by re-running the fp FpRecord sequence.
>>>>> and clear  fp_cache_length.
>>>>
>>>> Why do you need to store more than the last fp op? The cumulative bits can
>>>> be tracked like it's done for other targets by not clearing fp_status then
>>>> you can read it from there. Only the non-sticky FI bit needs to be
>>>> computed but that's only determined by the last op so it's enough to
>>>> remember that and run that with softfloat (or even hardfloat after
>>>> clearing status but softfloat may be faster for this) to get the bits for
>>>> last op when status is read.
>>>>
>>> Yeap, store only the last fp op is also an option. Do you means that store
>>> the last fp op,
>>> and calculate it when necessary?  I am thinking about a general fp
>>> optmize method that suite
>>> for all target.
>>
>> I think that's getting a little ahead of yourself. Let's prove the
>> technique is valuable for PPC (given it has the most to gain). We can
>> always generalise later if it's worthwhile.
>>
>> Rather than creating a new structure I would suggest creating 3 new tcg
>> globals (op, inA, inB) and re-factor the front-end code so each FP op
>> loaded the TCG globals.
>
> So that's basically wherever you see helper_reset_fpstatus() in
> target/ppc we would need to replace it with saving op and args to
> globals? Or just repurpose this helper to do that. This is called
> before every fp op but not before sub ops within vector ops. Is that
> correct? Probably it is, as vector ops are a single op but how do we
> detect changes in flags by sub ops for those? These might have some
> existing bugs I think.

I'll defer to the PPC front end experts on this. I'm not familiar with
how it all goes together at all.

>
>> The TCG optimizer should pick up aliased loads
>> and automatically eliminate the dead ones. We might need some new
>> machinery for the TCG to avoid spilling the values over potentially
>> faulting loads/stores but that is likely a phase 2 problem.
>
> I have no idea how to do this or even where to look. Some more
> detailed explanation may be needed here.

Don't worry about it now. Let's worry about it when we see how often
faulting instructions are interleaved with fp ops.

>
>> Next you will want to find places that care about the per-op bits of
>> cpu_fpscr and call a helper with the new globals to re-run the
>> computation and feed the values in.
>
> So the code that cares about these bits are in guest thus we would
> need to compute it if we detect the guest accessing these. Detecting
> when the individual bits are accessed might be difficult so at first
> we could go for checking if the fpscr is read and recompute FI bit
> then before returning value. You previously said these might be when
> fpscr is read or when generating exceptions but not sure where exactly
> are these done for ppc. (I'd expect to have mffpscr but there seem to
> be different other ops instead accessing parts of fpscr which are
> found in target/ppc/fp-impl.inc.c:567 so this would need studying the
> PPC docs to understand how the guest can access the FI bit of fpscr
> reg.)
>
>> That would give you a reasonable working prototype to start doing some
>> measurements of overhead and if it makes a difference.
>>
>>>
>>>>
>>>>> 3. If we clear the fp_exceptions , then we set fp_cache_length to 0 and
>>>>> clear  fp_exceptions.
>>>>> 4. If the  fp_cache are full, then we re-compute
>>>>> the fp_exceptions by re-running the fp FpRecord sequence.
>>>>
>>>> All this cache management and more than one element seems unnecessary to
>>>> me although I may be missing something.
>>>>
>>>>> Now the keypoint is how to tracking the read and write of FPSCR register,
>>>>> The current code are
>>>>>    cpu_fpscr = tcg_global_mem_new(cpu_env,
>>>>>                                   offsetof(CPUPPCState, fpscr), "fpscr");
>>>>
>>>> Maybe you could search where the value is read which should be the places
>>>> where we need to handle it but changes may be needed to make a clear API
>>>> for this between target/ppc, TCG and softfloat which likely does not
>>>> exist yet.
>>
>> Once the per-op calculation is fixed in the PPC front-end I thing the
>> only change needed is to remove the #if defined(TARGET_PPC) in
>> softfloat.c - it's only really there because it avoids the overhead of
>> checking flags which we always know to be clear in it's case.
>
> That's the theory but I've found that removing that define currently
> makes general fp ops slower but vector ops faster so I think there may
> be some bugs that would need to be found and fixed. So testing with
> some proper test suite might be needed.

You might want to do what Laurent did and hack up a testfloat with
"system" implementations:

  https://github.com/vivier/m68k-testfloat/blob/master/testfloat/M68K-Linux-GCC/systfloat.c

I would be nice to plumb that sort of support into our existing
testfloat fork in the code base (tests/fp) but I suspect getting an
out-of-tree fork building and running first would be the quickest way
forward. 

>
> Regards,
> BALATON Zoltan


-- 
Alex Bennée

next prev parent reply	other threads:[~2020-05-01 14:02 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-27  6:39 About hardfloat in ppc 罗勇刚(Yonggang Luo)
2020-04-27  9:42 ` Alex Bennée
2020-04-27 10:34   ` BALATON Zoltan
2020-04-27 11:10     ` Alex Bennée
2020-04-27 21:18       ` 罗勇刚(Yonggang Luo)
2020-04-28  8:36         ` Alex Bennée
2020-04-28 14:29           ` 罗勇刚(Yonggang Luo)
2020-04-29 10:17           ` R: " Dino Papararo
2020-04-29 10:31             ` Dino Papararo
2020-04-29 11:57             ` Alex Bennée
2020-04-29 12:33               ` 罗勇刚(Yonggang Luo)
2020-04-29 13:38                 ` Alex Bennée
2020-04-29 14:31               ` R: " Dino Papararo
2020-04-29 14:49                 ` Peter Maydell
2020-04-29 18:25                 ` R: " Alex Bennée
2020-04-30  0:20                   ` 罗勇刚(Yonggang Luo)
2020-04-30  2:18                     ` Richard Henderson
2020-04-30  7:26                       ` 罗勇刚(Yonggang Luo)
2020-04-30  8:11                         ` Alex Bennée
2020-04-30  8:13                       ` 罗勇刚(Yonggang Luo)
2020-04-30 15:35                         ` BALATON Zoltan
2020-04-30 16:34                           ` R: " Dino Papararo
2020-05-01  1:59                             ` Programmingkid
2020-05-01  2:21                               ` 罗勇刚(Yonggang Luo)
2020-05-01 11:58                                 ` BALATON Zoltan
2020-05-01 12:04                                   ` 罗勇刚(Yonggang Luo)
2020-05-01 13:10                                     ` Alex Bennée
2020-05-01 13:39                                       ` BALATON Zoltan
2020-05-01 14:01                                         ` Alex Bennée [this message]
2020-05-01 14:18                                       ` Richard Henderson
2020-05-01 16:25                                         ` 罗勇刚(Yonggang Luo)
2020-05-01 19:33                                           ` Alex Bennée
2020-05-01 16:29                                         ` 罗勇刚(Yonggang Luo)
2020-05-01 16:51                                           ` Richard Henderson
2020-05-01 17:49                                             ` 罗勇刚(Yonggang Luo)
2020-05-01 20:35                                               ` Richard Henderson
2020-04-29 23:12               ` R: " 罗勇刚(Yonggang Luo)
2020-04-30 15:16           ` BALATON Zoltan
2020-04-30 18:59             ` Alex Bennée
2020-04-30 20:17               ` BALATON Zoltan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y2qbiwjv.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=balaton@eik.bme.hu \
    --cc=hsp.cat7@gmail.com \
    --cc=luoyonggang@gmail.com \
    --cc=programmingkidx@gmail.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=skizzato73@msn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.