From: David Gibson <david@gibson.dropbear.id.au>
To: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Cc: qemu-ppc@nongnu.org, rth@twiddle.net, qemu-devel@nongnu.org,
benh@kernel.crashing.org
Subject: Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation
Date: Wed, 21 Sep 2016 11:57:50 +1000 [thread overview]
Message-ID: <20160921015750.GQ20488@umbus> (raw)
In-Reply-To: <87lgymqyz8.fsf@abhimanyu.i-did-not-set--mail-host-address--so-tickle-me>
[-- Attachment #1: Type: text/plain, Size: 5571 bytes --]
On Tue, Sep 20, 2016 at 10:40:03PM +0530, Nikunj A Dadhania wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
>
> > [ Unknown signature status ]
> > On Mon, Sep 19, 2016 at 04:06:40PM +0530, Nikunj A Dadhania wrote:
> >> David Gibson <david@gibson.dropbear.id.au> writes:
> >> > [ Unknown signature status ]
> >> > On Mon, Sep 19, 2016 at 04:19:34PM +1000, David Gibson wrote:
> >> >> On Fri, Sep 16, 2016 at 04:21:48PM +0530, Nikunj A Dadhania wrote:
> >> >> > diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/translate/vsx-impl.inc.c
> >> >> > index eee6052..df278df 100644
> >> >> > --- a/target-ppc/translate/vsx-impl.inc.c
> >> >> > +++ b/target-ppc/translate/vsx-impl.inc.c
> >> >> > @@ -75,7 +75,6 @@ static void gen_lxvdsx(DisasContext *ctx)
> >> >> > static void gen_lxvw4x(DisasContext *ctx)
> >> >> > {
> >> >> > TCGv EA;
> >> >> > - TCGv_i64 tmp;
> >> >> > TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
> >> >> > TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
> >> >> > if (unlikely(!ctx->vsx_enabled)) {
> >> >> > @@ -84,22 +83,14 @@ static void gen_lxvw4x(DisasContext *ctx)
> >> >> > }
> >> >> > gen_set_access_type(ctx, ACCESS_INT);
> >> >> > EA = tcg_temp_new();
> >> >> > - tmp = tcg_temp_new_i64();
> >> >> >
> >> >> > gen_addr_reg_index(ctx, EA);
> >> >> > - gen_qemu_ld32u_i64(ctx, tmp, EA);
> >> >> > - tcg_gen_addi_tl(EA, EA, 4);
> >> >> > - gen_qemu_ld32u_i64(ctx, xth, EA);
> >> >> > - tcg_gen_deposit_i64(xth, xth, tmp, 32, 32);
> >> >> > -
> >> >> > - tcg_gen_addi_tl(EA, EA, 4);
> >> >> > - gen_qemu_ld32u_i64(ctx, tmp, EA);
> >> >> > - tcg_gen_addi_tl(EA, EA, 4);
> >> >> > - gen_qemu_ld32u_i64(ctx, xtl, EA);
> >> >> > - tcg_gen_deposit_i64(xtl, xtl, tmp, 32, 32);
> >> >> > -
> >> >> > + tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
> >> >> > + gen_helper_deposit32x2(xth, xth);
> >> >> > + tcg_gen_addi_tl(EA, EA, 8);
> >> >> > + tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
> >> >> > + gen_helper_deposit32x2(xtl, xtl);
> >> >
> >> > ..and I think this is wrong for BE mode. The deposit32x2 will get the
> >> > words in the right order, but the bytes within each word will be wrong
> >> > because of the LE mode load on a BE setup.
> >>
> >> Since lxvw4x/stxvw4x is available on POWER8. I tried running my test
> >> code on BE and LE Fedora24 VM. TCG Results match the POWER8 hardware.
> >> The order within the word is not changed. Snippet of the test code at
> >> the end of email. Can share full code if needed (maybe will do it in
> >> kvm-unit-test)
> >
> > Ugh.. now I'm confused. I would not have expected the results you've
> > seen from these tests. But I still can't understand *how* the
> > emulation could be correct: IIUC MO_LEQ would mean it loads the 8
> > bytes as a single 64-bit LE integer.
>
> For both the case LE/BE we do a LE read ...
.. and I can't see how that can be right for the BE case.
> > Which should be the same as
> > loading one 32-bit LE integer into the low half of the target
> > register, then a 32-bit LE integer into the high half ot the target
> > register.
>
> .. The 64-bit integer read is not same in these cases. The input itself
> would be in the order of the format.
>
> Input rb32[]: 00010203 20212223 30313233 40414243
>
> LE:
> helper_deposit32x2: 2021222300010203
> helper_deposit32x2: 4041424330313233
>
> BE
> helper_deposit32x2: 2322212003020100
> helper_deposit32x2: 4342414033323130
Sorry.. I can't really follow the above, because I'm not sure if
you're displaying the bytes within each word in significance order, or
increasing-address order.
> >
> > As I said above, the deposit32x2 will swap the order of the two ints,
> > but it won't byteswap the individual int32s which should have been BE
> > in memory.
> >
> > Can you find the flaw in my reasoning?
>
> One anomaly that I see in BE code generation: it also generates a
> stxvw4x after lxvw4x. I am not sure why.
Ah... see I'm wondering if it's using the stxvw4x to store back to the
union which you then get the results from. If that's so it could
explain the results, since the bug I suspect is in lxvw4x would be
cancelled out by the corresponding bug in stxv4wx, which is exactly
why I'd prefer the approach to testing mentioned below.
>
> >>>>>>>>>>>>>>>> BE BE BE >>>>>>>>>>>>>>
> Input rb32[]: 00010203 20212223 30313233 40414243
>
> gen_lxvw4x: called
> helper_deposit32x2: 2322212003020100
> helper_deposit32x2: 4342414033323130
> gen_stxvw4x: called
> helper_deposit32x2: 0302010023222120
> helper_deposit32x2: 3332313043424140
> Output VRT32: 00010203 20212223 30313233 40414243
>
> >> vsx.h:
> >> ======
> >> #define U32_SIZE (sizeof(__vector uint32_t) / sizeof(uint32_t))
> >>
> >> typedef union {
> >> __vector uint32_t v;
> >> uint32_t a[U32_SIZE];
> >> } vuint32_t;
> >
> > I am a little suspicious that whatever the compiler does to convert
> > the vector to an array via this union might be undoing a byte reverse.
> >
> > I'd be more confident if you used VSX instructions to extract and
> > store separately one of the 32-bit subwords of the vector.
>
> I will try to figure those instructions.
Ok, thanks.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
next prev parent reply other threads:[~2016-09-21 2:10 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-16 10:51 [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablements - part4(pending) Nikunj A Dadhania
2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 1/5] target-ppc: implement darn instruction Nikunj A Dadhania
2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation Nikunj A Dadhania
2016-09-19 6:19 ` David Gibson
2016-09-19 6:50 ` David Gibson
2016-09-19 10:36 ` Nikunj A Dadhania
2016-09-20 4:34 ` David Gibson
2016-09-20 17:10 ` Nikunj A Dadhania
2016-09-21 1:57 ` David Gibson [this message]
2016-09-21 3:44 ` Nikunj A Dadhania
2016-09-19 8:32 ` Nikunj A Dadhania
2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 3/5] target-ppc: improve stxvw4x implementation Nikunj A Dadhania
2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 4/5] target-ppc: add lxvh8x and stxvh8x Nikunj A Dadhania
2016-09-19 6:33 ` David Gibson
2016-09-16 10:51 ` [Qemu-devel] [PATCH v3 5/5] target-ppc: add lxvb16x and stxvb16x Nikunj A Dadhania
2016-09-19 6:35 ` David Gibson
2016-09-19 6:51 ` [Qemu-devel] [PATCH v3 0/5] POWER9 TCG enablements - part4(pending) David Gibson
2016-09-19 8:30 ` Nikunj A Dadhania
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160921015750.GQ20488@umbus \
--to=david@gibson.dropbear.id.au \
--cc=benh@kernel.crashing.org \
--cc=nikunj@linux.vnet.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).