From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52404) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bmWzT-0001NQ-1l for qemu-devel@nongnu.org; Tue, 20 Sep 2016 22:10:28 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bmWzP-0008Iv-P1 for qemu-devel@nongnu.org; Tue, 20 Sep 2016 22:10:26 -0400 Date: Wed, 21 Sep 2016 11:57:50 +1000 From: David Gibson Message-ID: <20160921015750.GQ20488@umbus> References: <1474023111-11992-1-git-send-email-nikunj@linux.vnet.ibm.com> <1474023111-11992-3-git-send-email-nikunj@linux.vnet.ibm.com> <20160919061934.GC20488@umbus> <20160919065037.GF20488@umbus> <87wpi8kwg7.fsf@abhimanyu.i-did-not-set--mail-host-address--so-tickle-me> <20160920043429.GI20488@umbus> <87lgymqyz8.fsf@abhimanyu.i-did-not-set--mail-host-address--so-tickle-me> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="YPYi+6JBnn8IjLOH" Content-Disposition: inline In-Reply-To: <87lgymqyz8.fsf@abhimanyu.i-did-not-set--mail-host-address--so-tickle-me> Subject: Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Nikunj A Dadhania Cc: qemu-ppc@nongnu.org, rth@twiddle.net, qemu-devel@nongnu.org, benh@kernel.crashing.org --YPYi+6JBnn8IjLOH Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Sep 20, 2016 at 10:40:03PM +0530, Nikunj A Dadhania wrote: > David Gibson writes: >=20 > > [ Unknown signature status ] > > On Mon, Sep 19, 2016 at 04:06:40PM +0530, Nikunj A Dadhania wrote: > >> David Gibson writes: > >> > [ Unknown signature status ] > >> > On Mon, Sep 19, 2016 at 04:19:34PM +1000, David Gibson wrote: > >> >> On Fri, Sep 16, 2016 at 04:21:48PM +0530, Nikunj A Dadhania wrote: > >> >> > diff --git a/target-ppc/translate/vsx-impl.inc.c b/target-ppc/tra= nslate/vsx-impl.inc.c > >> >> > index eee6052..df278df 100644 > >> >> > --- a/target-ppc/translate/vsx-impl.inc.c > >> >> > +++ b/target-ppc/translate/vsx-impl.inc.c > >> >> > @@ -75,7 +75,6 @@ static void gen_lxvdsx(DisasContext *ctx) > >> >> > static void gen_lxvw4x(DisasContext *ctx) > >> >> > { > >> >> > TCGv EA; > >> >> > - TCGv_i64 tmp; > >> >> > TCGv_i64 xth =3D cpu_vsrh(xT(ctx->opcode)); > >> >> > TCGv_i64 xtl =3D cpu_vsrl(xT(ctx->opcode)); > >> >> > if (unlikely(!ctx->vsx_enabled)) { > >> >> > @@ -84,22 +83,14 @@ static void gen_lxvw4x(DisasContext *ctx) > >> >> > } > >> >> > gen_set_access_type(ctx, ACCESS_INT); > >> >> > EA =3D tcg_temp_new(); > >> >> > - tmp =3D tcg_temp_new_i64(); > >> >> > =20 > >> >> > gen_addr_reg_index(ctx, EA); > >> >> > - gen_qemu_ld32u_i64(ctx, tmp, EA); > >> >> > - tcg_gen_addi_tl(EA, EA, 4); > >> >> > - gen_qemu_ld32u_i64(ctx, xth, EA); > >> >> > - tcg_gen_deposit_i64(xth, xth, tmp, 32, 32); > >> >> > - > >> >> > - tcg_gen_addi_tl(EA, EA, 4); > >> >> > - gen_qemu_ld32u_i64(ctx, tmp, EA); > >> >> > - tcg_gen_addi_tl(EA, EA, 4); > >> >> > - gen_qemu_ld32u_i64(ctx, xtl, EA); > >> >> > - tcg_gen_deposit_i64(xtl, xtl, tmp, 32, 32); > >> >> > - > >> >> > + tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ); > >> >> > + gen_helper_deposit32x2(xth, xth); > >> >> > + tcg_gen_addi_tl(EA, EA, 8); > >> >> > + tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ); > >> >> > + gen_helper_deposit32x2(xtl, xtl); > >> > > >> > ..and I think this is wrong for BE mode. The deposit32x2 will get t= he > >> > words in the right order, but the bytes within each word will be wro= ng > >> > because of the LE mode load on a BE setup. > >>=20 > >> Since lxvw4x/stxvw4x is available on POWER8. I tried running my test > >> code on BE and LE Fedora24 VM. TCG Results match the POWER8 hardware. > >> The order within the word is not changed. Snippet of the test code at > >> the end of email. Can share full code if needed (maybe will do it in > >> kvm-unit-test) > > > > Ugh.. now I'm confused. I would not have expected the results you've > > seen from these tests. But I still can't understand *how* the > > emulation could be correct: IIUC MO_LEQ would mean it loads the 8 > > bytes as a single 64-bit LE integer. >=20 > For both the case LE/BE we do a LE read ... =2E. and I can't see how that can be right for the BE case. > > Which should be the same as > > loading one 32-bit LE integer into the low half of the target > > register, then a 32-bit LE integer into the high half ot the target > > register. >=20 > .. The 64-bit integer read is not same in these cases. The input itself > would be in the order of the format. >=20 > Input rb32[]: 00010203 20212223 30313233 40414243=20 >=20 > LE: > helper_deposit32x2: 2021222300010203=20 > helper_deposit32x2: 4041424330313233 >=20 > BE > helper_deposit32x2: 2322212003020100 > helper_deposit32x2: 4342414033323130 Sorry.. I can't really follow the above, because I'm not sure if you're displaying the bytes within each word in significance order, or increasing-address order. > > > > As I said above, the deposit32x2 will swap the order of the two ints, > > but it won't byteswap the individual int32s which should have been BE > > in memory. > > > > Can you find the flaw in my reasoning? >=20 > One anomaly that I see in BE code generation: it also generates a > stxvw4x after lxvw4x. I am not sure why. Ah... see I'm wondering if it's using the stxvw4x to store back to the union which you then get the results from. If that's so it could explain the results, since the bug I suspect is in lxvw4x would be cancelled out by the corresponding bug in stxv4wx, which is exactly why I'd prefer the approach to testing mentioned below. >=20 > >>>>>>>>>>>>>>>> BE BE BE >>>>>>>>>>>>>> > Input rb32[]: 00010203 20212223 30313233 40414243=20 >=20 > gen_lxvw4x: called > helper_deposit32x2: 2322212003020100 > helper_deposit32x2: 4342414033323130 > gen_stxvw4x: called > helper_deposit32x2: 0302010023222120 > helper_deposit32x2: 3332313043424140 > Output VRT32: 00010203 20212223 30313233 40414243=20 >=20 > >> vsx.h: > >> =3D=3D=3D=3D=3D=3D > >> #define U32_SIZE (sizeof(__vector uint32_t) / sizeof(uint32_t)) > >>=20 > >> typedef union { > >> __vector uint32_t v; > >> uint32_t a[U32_SIZE]; > >> } vuint32_t; > > > > I am a little suspicious that whatever the compiler does to convert > > the vector to an array via this union might be undoing a byte reverse. > > > > I'd be more confident if you used VSX instructions to extract and > > store separately one of the 32-bit subwords of the vector. >=20 > I will try to figure those instructions. Ok, thanks. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --YPYi+6JBnn8IjLOH Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJX4ekeAAoJEGw4ysog2bOSwjQP+wfGNAQnfq6UlifStNhesH5F S9D0kltOGn0VOqr1ED/c9sbOgwviSWQK9GmOf1xrhYatGZvxaFuq08O5k6ePd8bj HrrLvC/NQjd80KonxtfCVoOEabh1eW86kfrDLIK/C0NWqdl07cwOkKKQFomwK/sW dXxD+njRPNzFMDTUpUSodbKVmvc/t5wQVn6oJrAb9JtY3MLF1uph69hBj5iygvmq 534FV/bSfloguOG2Va2kjHf/JNKQa6UpsXjM+S2tped9cqBfd0eK38ueiU2RRY7C taJDk5zgjuuxRVi1MmvUqXZxHHsnApuYq4wsT1sei/JlrUNWcBb+DpJji846j45/ 7fgyQWXOLxv9WG7tXqPc/n/T2vZmzgo+vT16XUMHJk6CVzYFs+1SZXKxgmmO0uj6 b9r+JgNgoarNV//Ab0JTqH32d31gi3Jyr0B4PppxVuxmd0yVY/AP3TEdNJLennAt gaVIyrNgeDfvUEgi2FKTGtkX+4n7umd5FRRXmt8osUEk2zbDSvzobktEnb0Hp+BW xLiDpP4OfgdhjqdBXXUqSBQrGLYUuWsbuFqDIpjwnkHuq6ZNGXMNTdU5S3VTfOMf pyXoJwkhJstVn1tEq7s6RfKdLyRJTlvWOuGxte0xJe/vFFAAlyCJ6LYcGqc9Y6EL NURTgTVzh3BhD/qKPPZd =JXfQ -----END PGP SIGNATURE----- --YPYi+6JBnn8IjLOH--