From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:38597) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QTfvC-0000gT-11 for qemu-devel@nongnu.org; Mon, 06 Jun 2011 15:57:11 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QTfvA-0002ek-8k for qemu-devel@nongnu.org; Mon, 06 Jun 2011 15:57:09 -0400 Received: from mail-yx0-f173.google.com ([209.85.213.173]:55956) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QTfv9-0002eQ-Ok for qemu-devel@nongnu.org; Mon, 06 Jun 2011 15:57:07 -0400 Received: by yxj19 with SMTP id 19so1325661yxj.4 for ; Mon, 06 Jun 2011 12:57:06 -0700 (PDT) Sender: Richard Henderson Message-ID: <4DED3109.1070902@twiddle.net> Date: Mon, 06 Jun 2011 14:56:57 -0500 From: Richard Henderson MIME-Version: 1.0 References: <1307370348-28400-1-git-send-email-pbonzini@redhat.com> <1307370348-28400-4-git-send-email-pbonzini@redhat.com> In-Reply-To: <1307370348-28400-4-git-send-email-pbonzini@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v2 3/3] move unaligned memory access functions to bswap.h List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: qemu-devel@nongnu.org Patches 1-3: Reviewed-by: Richard Henderson That said, On 06/06/2011 09:25 AM, Paolo Bonzini wrote: > +/* conservative code for little endian unaligned accesses */ > +static inline int lduw_le_p(const void *ptr) > +{ > +#ifdef _ARCH_PPC > + int val; > + __asm__ __volatile__ ("lhbrx %0,0,%1" : "=r" (val) : "r" (ptr)); > + return val; > +#else > + const uint8_t *p = ptr; > + return p[0] | (p[1] << 8); > +#endif > +} Can we add a patch 4/3 that removes this sort of hard-coded assembly stuff in favour of generic gcc code. E.g. static inline int lduw_le_p(const void *ptr) { struct pack { uint16_t x __attribute__((packed)); }; const struct pack *p = ptr; uint16_t ret; ret = p->x; ret = le_bswap(ret, 16); return ret; } One could fairly well macroize all 6 instances. The compiler knows that ppc and i386 can do the unaligned loads. The compiler *should* be able to match the bswap_N patterns, and thus also fold the unaligned load with the bswap for ppc. I can confirm that gcc head does in fact do the right thing for ppc32/64 here, as well as for i386/x86_64. I don't have previous versions of gcc checked out atm... I havn't checked, but this *ought* to enable the load-asi bswap instruction for sparcv9. I also havn't checked what happens for a target like sparcv8 that lacks both unaligned load and bswap, to see that we don't simply double the number of shifts. However, I'd be tempted to file that as a gcc missed optimization bug. r~