qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Alex Bennée" <alex.bennee@linaro.org>
To: BALATON Zoltan <balaton@eik.bme.hu>
Cc: Richard Henderson <richard.henderson@linaro.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Programmingkid <programmingkidx@gmail.com>,
	luoyonggang@gmail.com,
	"qemu-ppc@nongnu.org" <qemu-ppc@nongnu.org>,
	Howard Spoelstra <hsp.cat7@gmail.com>,
	Dino Papararo <skizzato73@msn.com>
Subject: Re: About hardfloat in ppc
Date: Fri, 01 May 2020 15:01:24 +0100	[thread overview]
Message-ID: <87y2qbiwjv.fsf@linaro.org> (raw)
In-Reply-To: <alpine.BSF.2.22.395.2005011517360.62443@zero.eik.bme.hu>


BALATON Zoltan <balaton@eik.bme.hu> writes:

> On Fri, 1 May 2020, Alex Bennée wrote:
>> 罗勇刚(Yonggang Luo) <luoyonggang@gmail.com> writes:
>>> On Fri, May 1, 2020 at 7:58 PM BALATON Zoltan <balaton@eik.bme.hu> wrote:
>>>> On Fri, 1 May 2020, 罗勇刚(Yonggang Luo) wrote:
>>>>> That's what I suggested,
>>>>> We preserve a  float computing cache
>>>>> typedef struct FpRecord {
>>>>>  uint8_t op;
>>>>>  float32 A;
>>>>>  float32 B;
>>>>> }  FpRecord;
>>>>> FpRecord fp_cache[1024];
>>>>> int fp_cache_length;
>>>>> uint32_t fp_exceptions;
>>>>>
>>>>> 1. For each new fp operation we push it to the  fp_cache,
>>>>> 2. Once we read the fp_exceptions , then we re-compute
>>>>> the fp_exceptions by re-running the fp FpRecord sequence.
>>>>> and clear  fp_cache_length.
>>>>
>>>> Why do you need to store more than the last fp op? The cumulative bits can
>>>> be tracked like it's done for other targets by not clearing fp_status then
>>>> you can read it from there. Only the non-sticky FI bit needs to be
>>>> computed but that's only determined by the last op so it's enough to
>>>> remember that and run that with softfloat (or even hardfloat after
>>>> clearing status but softfloat may be faster for this) to get the bits for
>>>> last op when status is read.
>>>>
>>> Yeap, store only the last fp op is also an option. Do you means that store
>>> the last fp op,
>>> and calculate it when necessary?  I am thinking about a general fp
>>> optmize method that suite
>>> for all target.
>>
>> I think that's getting a little ahead of yourself. Let's prove the
>> technique is valuable for PPC (given it has the most to gain). We can
>> always generalise later if it's worthwhile.
>>
>> Rather than creating a new structure I would suggest creating 3 new tcg
>> globals (op, inA, inB) and re-factor the front-end code so each FP op
>> loaded the TCG globals.
>
> So that's basically wherever you see helper_reset_fpstatus() in
> target/ppc we would need to replace it with saving op and args to
> globals? Or just repurpose this helper to do that. This is called
> before every fp op but not before sub ops within vector ops. Is that
> correct? Probably it is, as vector ops are a single op but how do we
> detect changes in flags by sub ops for those? These might have some
> existing bugs I think.

I'll defer to the PPC front end experts on this. I'm not familiar with
how it all goes together at all.

>
>> The TCG optimizer should pick up aliased loads
>> and automatically eliminate the dead ones. We might need some new
>> machinery for the TCG to avoid spilling the values over potentially
>> faulting loads/stores but that is likely a phase 2 problem.
>
> I have no idea how to do this or even where to look. Some more
> detailed explanation may be needed here.

Don't worry about it now. Let's worry about it when we see how often
faulting instructions are interleaved with fp ops.

>
>> Next you will want to find places that care about the per-op bits of
>> cpu_fpscr and call a helper with the new globals to re-run the
>> computation and feed the values in.
>
> So the code that cares about these bits are in guest thus we would
> need to compute it if we detect the guest accessing these. Detecting
> when the individual bits are accessed might be difficult so at first
> we could go for checking if the fpscr is read and recompute FI bit
> then before returning value. You previously said these might be when
> fpscr is read or when generating exceptions but not sure where exactly
> are these done for ppc. (I'd expect to have mffpscr but there seem to
> be different other ops instead accessing parts of fpscr which are
> found in target/ppc/fp-impl.inc.c:567 so this would need studying the
> PPC docs to understand how the guest can access the FI bit of fpscr
> reg.)
>
>> That would give you a reasonable working prototype to start doing some
>> measurements of overhead and if it makes a difference.
>>
>>>
>>>>
>>>>> 3. If we clear the fp_exceptions , then we set fp_cache_length to 0 and
>>>>> clear  fp_exceptions.
>>>>> 4. If the  fp_cache are full, then we re-compute
>>>>> the fp_exceptions by re-running the fp FpRecord sequence.
>>>>
>>>> All this cache management and more than one element seems unnecessary to
>>>> me although I may be missing something.
>>>>
>>>>> Now the keypoint is how to tracking the read and write of FPSCR register,
>>>>> The current code are
>>>>>    cpu_fpscr = tcg_global_mem_new(cpu_env,
>>>>>                                   offsetof(CPUPPCState, fpscr), "fpscr");
>>>>
>>>> Maybe you could search where the value is read which should be the places
>>>> where we need to handle it but changes may be needed to make a clear API
>>>> for this between target/ppc, TCG and softfloat which likely does not
>>>> exist yet.
>>
>> Once the per-op calculation is fixed in the PPC front-end I thing the
>> only change needed is to remove the #if defined(TARGET_PPC) in
>> softfloat.c - it's only really there because it avoids the overhead of
>> checking flags which we always know to be clear in it's case.
>
> That's the theory but I've found that removing that define currently
> makes general fp ops slower but vector ops faster so I think there may
> be some bugs that would need to be found and fixed. So testing with
> some proper test suite might be needed.

You might want to do what Laurent did and hack up a testfloat with
"system" implementations:

  https://github.com/vivier/m68k-testfloat/blob/master/testfloat/M68K-Linux-GCC/systfloat.c

I would be nice to plumb that sort of support into our existing
testfloat fork in the code base (tests/fp) but I suspect getting an
out-of-tree fork building and running first would be the quickest way
forward. 

>
> Regards,
> BALATON Zoltan


-- 
Alex Bennée


  reply	other threads:[~2020-05-01 14:02 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-27  6:39 About hardfloat in ppc 罗勇刚(Yonggang Luo)
2020-04-27  9:42 ` Alex Bennée
2020-04-27 10:34   ` BALATON Zoltan
2020-04-27 11:10     ` Alex Bennée
2020-04-27 21:18       ` 罗勇刚(Yonggang Luo)
2020-04-28  8:36         ` Alex Bennée
2020-04-28 14:29           ` 罗勇刚(Yonggang Luo)
2020-04-29 10:17           ` R: " Dino Papararo
2020-04-29 10:31             ` Dino Papararo
2020-04-29 11:57             ` Alex Bennée
2020-04-29 12:33               ` 罗勇刚(Yonggang Luo)
2020-04-29 13:38                 ` Alex Bennée
2020-04-29 14:31               ` R: " Dino Papararo
2020-04-29 14:49                 ` Peter Maydell
2020-04-29 18:25                 ` R: " Alex Bennée
2020-04-30  0:20                   ` 罗勇刚(Yonggang Luo)
2020-04-30  2:18                     ` Richard Henderson
2020-04-30  7:26                       ` 罗勇刚(Yonggang Luo)
2020-04-30  8:11                         ` Alex Bennée
2020-04-30  8:13                       ` 罗勇刚(Yonggang Luo)
2020-04-30 15:35                         ` BALATON Zoltan
2020-04-30 16:34                           ` R: " Dino Papararo
2020-05-01  1:59                             ` Programmingkid
2020-05-01  2:21                               ` 罗勇刚(Yonggang Luo)
2020-05-01 11:58                                 ` BALATON Zoltan
2020-05-01 12:04                                   ` 罗勇刚(Yonggang Luo)
2020-05-01 13:10                                     ` Alex Bennée
2020-05-01 13:39                                       ` BALATON Zoltan
2020-05-01 14:01                                         ` Alex Bennée [this message]
2020-05-01 14:18                                       ` Richard Henderson
2020-05-01 16:25                                         ` 罗勇刚(Yonggang Luo)
2020-05-01 19:33                                           ` Alex Bennée
2020-05-01 16:29                                         ` 罗勇刚(Yonggang Luo)
2020-05-01 16:51                                           ` Richard Henderson
2020-05-01 17:49                                             ` 罗勇刚(Yonggang Luo)
2020-05-01 20:35                                               ` Richard Henderson
2020-04-29 23:12               ` R: " 罗勇刚(Yonggang Luo)
2020-04-30 15:16           ` BALATON Zoltan
2020-04-30 18:59             ` Alex Bennée
2020-04-30 20:17               ` BALATON Zoltan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y2qbiwjv.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=balaton@eik.bme.hu \
    --cc=hsp.cat7@gmail.com \
    --cc=luoyonggang@gmail.com \
    --cc=programmingkidx@gmail.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=skizzato73@msn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).