From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58718) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WFWz3-00050i-G4 for qemu-devel@nongnu.org; Mon, 17 Feb 2014 17:48:25 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WFWyt-0004dr-K9 for qemu-devel@nongnu.org; Mon, 17 Feb 2014 17:48:17 -0500 Received: from mail-qc0-x235.google.com ([2607:f8b0:400d:c01::235]:65000) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WFWyt-0004dB-GK for qemu-devel@nongnu.org; Mon, 17 Feb 2014 17:48:07 -0500 Received: by mail-qc0-f181.google.com with SMTP id e9so24268211qcy.26 for ; Mon, 17 Feb 2014 14:48:06 -0800 (PST) Sender: Richard Henderson Message-ID: <53023262.6030402@twiddle.net> Date: Mon, 17 Feb 2014 10:01:38 -0600 From: Richard Henderson MIME-Version: 1.0 References: <1391179418-13422-1-git-send-email-rth@twiddle.net> <1391179418-13422-6-git-send-email-rth@twiddle.net> <5300C968.9040609@redhat.com> In-Reply-To: <5300C968.9040609@redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 5/5] tcg/i386: Use SHLX/SHRX/SARX instructions List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini , qemu-devel@nongnu.org Cc: aurelien@aurel32.net On 02/16/2014 08:21 AM, Paolo Bonzini wrote: > Il 31/01/2014 15:43, Richard Henderson ha scritto: >> + gen_shift_maybe_vex: >> + if (have_bmi2 && !const_args[2]) { >> + tcg_out_vex_modrm(s, vexop + rexw, args[0], args[2], args[1]); >> + break; >> + } >> + /* FALLTHRU */ > > What if args[2] happens to be ECX? I ran some measurements and as I expected this basically never happens. For 64-bit, I never saw it occur. For 32-bit, 1/800 of all shifts used ecx. For 64-bit, the use of shlx et al is always a size win. The mov and shift, including their rex prefixes, are 3 bytes each, while the shlx is 5 byes. For 32-bit, things are more complicated. The mov and shift are 2 bytes each, so the use of shlx is by itself a 1 byte size penalty. Except that sometimes the avoidance of the mov results in fewer spills, and thus fewer bytes overall. So overall I see the barest fraction (< 0.01%) size decrease across all TBs. r~