From: "Alex Bennée" <alex.bennee@linaro.org>
To: BALATON Zoltan <balaton@eik.bme.hu>
Cc: Richard Henderson <richard.henderson@linaro.org>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
Programmingkid <programmingkidx@gmail.com>,
luoyonggang@gmail.com,
"qemu-ppc@nongnu.org" <qemu-ppc@nongnu.org>,
Howard Spoelstra <hsp.cat7@gmail.com>,
Dino Papararo <skizzato73@msn.com>
Subject: Re: About hardfloat in ppc
Date: Fri, 01 May 2020 15:01:24 +0100 [thread overview]
Message-ID: <87y2qbiwjv.fsf@linaro.org> (raw)
In-Reply-To: <alpine.BSF.2.22.395.2005011517360.62443@zero.eik.bme.hu>
BALATON Zoltan <balaton@eik.bme.hu> writes:
> On Fri, 1 May 2020, Alex Bennée wrote:
>> 罗勇刚(Yonggang Luo) <luoyonggang@gmail.com> writes:
>>> On Fri, May 1, 2020 at 7:58 PM BALATON Zoltan <balaton@eik.bme.hu> wrote:
>>>> On Fri, 1 May 2020, 罗勇刚(Yonggang Luo) wrote:
>>>>> That's what I suggested,
>>>>> We preserve a float computing cache
>>>>> typedef struct FpRecord {
>>>>> uint8_t op;
>>>>> float32 A;
>>>>> float32 B;
>>>>> } FpRecord;
>>>>> FpRecord fp_cache[1024];
>>>>> int fp_cache_length;
>>>>> uint32_t fp_exceptions;
>>>>>
>>>>> 1. For each new fp operation we push it to the fp_cache,
>>>>> 2. Once we read the fp_exceptions , then we re-compute
>>>>> the fp_exceptions by re-running the fp FpRecord sequence.
>>>>> and clear fp_cache_length.
>>>>
>>>> Why do you need to store more than the last fp op? The cumulative bits can
>>>> be tracked like it's done for other targets by not clearing fp_status then
>>>> you can read it from there. Only the non-sticky FI bit needs to be
>>>> computed but that's only determined by the last op so it's enough to
>>>> remember that and run that with softfloat (or even hardfloat after
>>>> clearing status but softfloat may be faster for this) to get the bits for
>>>> last op when status is read.
>>>>
>>> Yeap, store only the last fp op is also an option. Do you means that store
>>> the last fp op,
>>> and calculate it when necessary? I am thinking about a general fp
>>> optmize method that suite
>>> for all target.
>>
>> I think that's getting a little ahead of yourself. Let's prove the
>> technique is valuable for PPC (given it has the most to gain). We can
>> always generalise later if it's worthwhile.
>>
>> Rather than creating a new structure I would suggest creating 3 new tcg
>> globals (op, inA, inB) and re-factor the front-end code so each FP op
>> loaded the TCG globals.
>
> So that's basically wherever you see helper_reset_fpstatus() in
> target/ppc we would need to replace it with saving op and args to
> globals? Or just repurpose this helper to do that. This is called
> before every fp op but not before sub ops within vector ops. Is that
> correct? Probably it is, as vector ops are a single op but how do we
> detect changes in flags by sub ops for those? These might have some
> existing bugs I think.
I'll defer to the PPC front end experts on this. I'm not familiar with
how it all goes together at all.
>
>> The TCG optimizer should pick up aliased loads
>> and automatically eliminate the dead ones. We might need some new
>> machinery for the TCG to avoid spilling the values over potentially
>> faulting loads/stores but that is likely a phase 2 problem.
>
> I have no idea how to do this or even where to look. Some more
> detailed explanation may be needed here.
Don't worry about it now. Let's worry about it when we see how often
faulting instructions are interleaved with fp ops.
>
>> Next you will want to find places that care about the per-op bits of
>> cpu_fpscr and call a helper with the new globals to re-run the
>> computation and feed the values in.
>
> So the code that cares about these bits are in guest thus we would
> need to compute it if we detect the guest accessing these. Detecting
> when the individual bits are accessed might be difficult so at first
> we could go for checking if the fpscr is read and recompute FI bit
> then before returning value. You previously said these might be when
> fpscr is read or when generating exceptions but not sure where exactly
> are these done for ppc. (I'd expect to have mffpscr but there seem to
> be different other ops instead accessing parts of fpscr which are
> found in target/ppc/fp-impl.inc.c:567 so this would need studying the
> PPC docs to understand how the guest can access the FI bit of fpscr
> reg.)
>
>> That would give you a reasonable working prototype to start doing some
>> measurements of overhead and if it makes a difference.
>>
>>>
>>>>
>>>>> 3. If we clear the fp_exceptions , then we set fp_cache_length to 0 and
>>>>> clear fp_exceptions.
>>>>> 4. If the fp_cache are full, then we re-compute
>>>>> the fp_exceptions by re-running the fp FpRecord sequence.
>>>>
>>>> All this cache management and more than one element seems unnecessary to
>>>> me although I may be missing something.
>>>>
>>>>> Now the keypoint is how to tracking the read and write of FPSCR register,
>>>>> The current code are
>>>>> cpu_fpscr = tcg_global_mem_new(cpu_env,
>>>>> offsetof(CPUPPCState, fpscr), "fpscr");
>>>>
>>>> Maybe you could search where the value is read which should be the places
>>>> where we need to handle it but changes may be needed to make a clear API
>>>> for this between target/ppc, TCG and softfloat which likely does not
>>>> exist yet.
>>
>> Once the per-op calculation is fixed in the PPC front-end I thing the
>> only change needed is to remove the #if defined(TARGET_PPC) in
>> softfloat.c - it's only really there because it avoids the overhead of
>> checking flags which we always know to be clear in it's case.
>
> That's the theory but I've found that removing that define currently
> makes general fp ops slower but vector ops faster so I think there may
> be some bugs that would need to be found and fixed. So testing with
> some proper test suite might be needed.
You might want to do what Laurent did and hack up a testfloat with
"system" implementations:
https://github.com/vivier/m68k-testfloat/blob/master/testfloat/M68K-Linux-GCC/systfloat.c
I would be nice to plumb that sort of support into our existing
testfloat fork in the code base (tests/fp) but I suspect getting an
out-of-tree fork building and running first would be the quickest way
forward.
>
> Regards,
> BALATON Zoltan
--
Alex Bennée
next prev parent reply other threads:[~2020-05-01 14:02 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-27 6:39 About hardfloat in ppc 罗勇刚(Yonggang Luo)
2020-04-27 9:42 ` Alex Bennée
2020-04-27 10:34 ` BALATON Zoltan
2020-04-27 11:10 ` Alex Bennée
2020-04-27 21:18 ` 罗勇刚(Yonggang Luo)
2020-04-28 8:36 ` Alex Bennée
2020-04-28 14:29 ` 罗勇刚(Yonggang Luo)
2020-04-29 10:17 ` R: " Dino Papararo
2020-04-29 10:31 ` Dino Papararo
2020-04-29 11:57 ` Alex Bennée
2020-04-29 12:33 ` 罗勇刚(Yonggang Luo)
2020-04-29 13:38 ` Alex Bennée
2020-04-29 14:31 ` R: " Dino Papararo
2020-04-29 14:49 ` Peter Maydell
2020-04-29 18:25 ` R: " Alex Bennée
2020-04-30 0:20 ` 罗勇刚(Yonggang Luo)
2020-04-30 2:18 ` Richard Henderson
2020-04-30 7:26 ` 罗勇刚(Yonggang Luo)
2020-04-30 8:11 ` Alex Bennée
2020-04-30 8:13 ` 罗勇刚(Yonggang Luo)
2020-04-30 15:35 ` BALATON Zoltan
2020-04-30 16:34 ` R: " Dino Papararo
2020-05-01 1:59 ` Programmingkid
2020-05-01 2:21 ` 罗勇刚(Yonggang Luo)
2020-05-01 11:58 ` BALATON Zoltan
2020-05-01 12:04 ` 罗勇刚(Yonggang Luo)
2020-05-01 13:10 ` Alex Bennée
2020-05-01 13:39 ` BALATON Zoltan
2020-05-01 14:01 ` Alex Bennée [this message]
2020-05-01 14:18 ` Richard Henderson
2020-05-01 16:25 ` 罗勇刚(Yonggang Luo)
2020-05-01 19:33 ` Alex Bennée
2020-05-01 16:29 ` 罗勇刚(Yonggang Luo)
2020-05-01 16:51 ` Richard Henderson
2020-05-01 17:49 ` 罗勇刚(Yonggang Luo)
2020-05-01 20:35 ` Richard Henderson
2020-04-29 23:12 ` R: " 罗勇刚(Yonggang Luo)
2020-04-30 15:16 ` BALATON Zoltan
2020-04-30 18:59 ` Alex Bennée
2020-04-30 20:17 ` BALATON Zoltan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y2qbiwjv.fsf@linaro.org \
--to=alex.bennee@linaro.org \
--cc=balaton@eik.bme.hu \
--cc=hsp.cat7@gmail.com \
--cc=luoyonggang@gmail.com \
--cc=programmingkidx@gmail.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=skizzato73@msn.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).