From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47643) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cEBXd-00038P-0A for qemu-devel@nongnu.org; Tue, 06 Dec 2016 03:56:02 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cEBXZ-0002KB-UN for qemu-devel@nongnu.org; Tue, 06 Dec 2016 03:56:01 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:52457 helo=mx0a-001b2d01.pphosted.com) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cEBXZ-0002Jc-Od for qemu-devel@nongnu.org; Tue, 06 Dec 2016 03:55:57 -0500 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id uB68rkjG146161 for ; Tue, 6 Dec 2016 03:55:53 -0500 Received: from e28smtp06.in.ibm.com (e28smtp06.in.ibm.com [125.16.236.6]) by mx0b-001b2d01.pphosted.com with ESMTP id 2756tfnj2n-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 06 Dec 2016 03:55:53 -0500 Received: from localhost by e28smtp06.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 6 Dec 2016 14:25:50 +0530 Date: Tue, 6 Dec 2016 14:25:41 +0530 From: Bharata B Rao Reply-To: bharata@linux.vnet.ibm.com References: <1480937130-24561-1-git-send-email-nikunj@linux.vnet.ibm.com> <1480937130-24561-14-git-send-email-nikunj@linux.vnet.ibm.com> <20161206041122.GP32366@umbus.fritz.box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161206041122.GP32366@umbus.fritz.box> Message-Id: <20161206085541.GD4211@in.ibm.com> Subject: Re: [Qemu-devel] [PATCH 13/13] target-ppc: Add xxperm and xxpermr instructions List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: Nikunj A Dadhania , qemu-ppc@nongnu.org, rth@twiddle.net, qemu-devel@nongnu.org On Tue, Dec 06, 2016 at 03:11:22PM +1100, David Gibson wrote: > On Mon, Dec 05, 2016 at 04:55:30PM +0530, Nikunj A Dadhania wrote: > > From: Bharata B Rao > > > > xxperm: VSX Vector Permute > > xxpermr: VSX Vector Permute Right-indexed > > > > Signed-off-by: Bharata B Rao > > Signed-off-by: Nikunj A Dadhania > > --- > > target-ppc/fpu_helper.c | 50 +++++++++++++++++++++++++++++++++++++ > > target-ppc/helper.h | 2 ++ > > target-ppc/translate/vsx-impl.inc.c | 2 ++ > > target-ppc/translate/vsx-ops.inc.c | 2 ++ > > 4 files changed, 56 insertions(+) > > > > diff --git a/target-ppc/fpu_helper.c b/target-ppc/fpu_helper.c > > index 3b867cf..be552c7 100644 > > --- a/target-ppc/fpu_helper.c > > +++ b/target-ppc/fpu_helper.c > > @@ -2869,3 +2869,53 @@ uint64_t helper_xsrsp(CPUPPCState *env, uint64_t xb) > > float_check_status(env); > > return xt; > > } > > + > > +static void vsr_copy_256(ppc_vsr_t *xa, ppc_vsr_t *xt, int8_t *src) > > +{ > > +#if defined(HOST_WORDS_BIGENDIAN) > > + memcpy(src, xa, sizeof(*xa)); > > + memcpy(src + 16, xt, sizeof(*xt)); > > +#else > > + memcpy(src, xt, sizeof(*xt)); > > + memcpy(src + 16, xa, sizeof(*xa)); > > Is this right? I thought the order of the bytes within each word > varied with the host endianness as well. Since we are already working with 2 16 byte vectors xa and xb here, I thought we don't have to worry about order of bytes within each vector, but instead can construct the 32 byte vector as above based on host endianness. > > > +#endif > > +} > > + > > +static int8_t vsr_get_byte(int8_t *src, int bound, int idx) > > +{ > > + if (idx >= bound) { > > + return 0xFF; > > + } > > AFAICT you don't need this check. For both xxperm and xxpermr you're > already masking the index to 5 bits, so it can't exceed 31. Was thinking of making it a generic API and hence had that boundary check but yes, no point for the check in the context of this instruction. > > > +#if defined(HOST_WORDS_BIGENDIAN) > > + return src[idx]; > > +#else > > + return src[bound - 1 - idx]; > > +#endif > > +} > > + > > +#define VSX_XXPERM(op, indexed) \ > > +void helper_##op(CPUPPCState *env, uint32_t opcode) \ > > +{ \ > > + ppc_vsr_t xt, xa, pcv; \ > > + int i, idx; \ > > + int8_t src[32]; \ > > + \ > > + getVSR(xA(opcode), &xa, env); \ > > + getVSR(xT(opcode), &xt, env); \ > > + getVSR(xB(opcode), &pcv, env); \ > > + \ > > + vsr_copy_256(&xa, &xt, src); \ > > You have a double copy here AFAICT - first from the actual env > structure to xt and xa, then to the src array. That seems like it > would be good to avoid. > > It seems like it would nice in any case to avoid even the one copy. > You'd need a temporary for the output of course and to copy that, but > you should be able to combine indexed with host endianness to > translate each index to retrieve directly from the VSR values in env. I am not sure it would be good to retrieve byte values directly from env as getVSR nicely abstracts out from which fields (env->[fpr, vsr, avr] the data is fetched based on the register specified in the opcode. I can reduce one copy though by not constructing a 32 byte vector (src) but instead retrieving the bytes directly from xa and xt based on the index. Regards, Bharata.