From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50986) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VujGP-0006Gf-MI for qemu-devel@nongnu.org; Sun, 22 Dec 2013 08:40:19 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VujGJ-00081C-NT for qemu-devel@nongnu.org; Sun, 22 Dec 2013 08:40:13 -0500 Received: from mx1.redhat.com ([209.132.183.28]:50666) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VujGJ-0007t0-Gs for qemu-devel@nongnu.org; Sun, 22 Dec 2013 08:40:07 -0500 Message-ID: <52B6EBA4.6050803@redhat.com> Date: Sun, 22 Dec 2013 14:39:48 +0100 From: Paolo Bonzini MIME-Version: 1.0 References: <1387580412-5828-1-git-send-email-rth@twiddle.net> <52B5A0D5.4080005@redhat.com> <20131222122450.GB4326@ohm.rr44.fr> In-Reply-To: <20131222122450.GB4326@ohm.rr44.fr> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] tcg-i386: Use MOVBE if available List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Aurelien Jarno Cc: qemu-devel@nongnu.org, Richard Henderson Il 22/12/2013 13:24, Aurelien Jarno ha scritto: > On Sat, Dec 21, 2013 at 03:08:21PM +0100, Paolo Bonzini wrote: >> Il 21/12/2013 00:00, Richard Henderson ha scritto: >>> + if (real_bswap && have_movbe) { >>> + tcg_out_modrm_offset(s, OPC_MOVBE_GyMy + P_DATA16 + seg, >>> + datalo, base, ofs); >>> + tcg_out_ext16u(s, datalo, datalo); >> >> Do partial register stalls still exist on Atom and Haswell? I don't >> remember exactly what you had to do to prevent them, but IIRC you first >> moved zero to the register and then overwrote the the low 16 bits. > > Note that for unsigned 16-bit load you can do either movzw + bswap or > movbe + movzw. Yeah, I was asking if xor + movbe would be faster. Benchmarking could tell, but anyway xor + movbe is likely the smallest code you can produce. Paolo