From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=51861 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OF6Mm-000761-Pe for qemu-devel@nongnu.org; Thu, 20 May 2010 10:04:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OF6Mc-0000bY-OB for qemu-devel@nongnu.org; Thu, 20 May 2010 10:04:52 -0400 Received: from hall.aurel32.net ([88.191.82.174]:52948) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OF6Mc-0000bH-G9 for qemu-devel@nongnu.org; Thu, 20 May 2010 10:04:42 -0400 Date: Thu, 20 May 2010 16:04:28 +0200 From: Aurelien Jarno Subject: Re: [Qemu-devel] [PATCH 03/22] tcg-i386: Tidy ext8u and ext16u operations. Message-ID: <20100520140428.GA1950@volta.aurel32.net> References: <20100519064713.GC25432@ohm.aurel32.net> <4BF42E7F.60008@twiddle.net> <20100520133908.GC18828@hall.aurel32.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20100520133908.GC18828@hall.aurel32.net> List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Richard Henderson Cc: qemu-devel@nongnu.org On Thu, May 20, 2010 at 03:39:08PM +0200, Aurelien Jarno wrote: > On Wed, May 19, 2010 at 11:31:27AM -0700, Richard Henderson wrote: > > On 05/18/2010 11:47 PM, Aurelien Jarno wrote: > > > The reg allocator is able to issue move if needed, so the only > > > improvement this patch is for doing a ext8u on both "q" registers. > > > > > > OTOH the reg allocator knows this situation and will try to avoid this > > > situation during the allocation. Cheating on the reg allocator might > > > have some wrong effects, especially after your patch "Allocate > > > call-saved registers first". I am thinking of the scenario where the > > > value is in memory (which is likely to be the case given the limited > > > number of registers), it will be likely loaded in a "r" register (they > > > are now at the top priority), and then ext8u will be called, which will > > > issue "mov" + "and" instructions instead of a "movzbl" instruction. > > > > The case I was concerned with is the fact that if we have a value > > allocated to, say, %esi, and we need to to an ext8u, then the > > register allocator has been told that it must move the value to a > > "q" register in order to perform the movzbl. In this case, the > > new code will simply emit the andl. > > > > I.e. the real problem is that we've told the register allocator > > one way that the extend can be implemented, but not every way. > > > > > All of that is purely theoretical. Do you know how does it behave in > > > practice? > > > > Picking the i386 target since it seems to use more extensions than > > any other target, from linux-user-test -d op_opt,out_asm i386/ls: > > > > There are 176 instances of ext8u. > > Of those, 83 instances are in-place, i.e. "ext8u_i32 tmp0,tmp0" > > > > I examined the first 2 dozen appearances in the output assembly: > > > > There are several instances of the value being in an "r" register: > > > > shr_i32 tmp1,edx,tmp13 > > ext8u_i32 tmp1,tmp1 > > => > > 0x601c5468: shr $0x8,%edi > > 0x601c546b: and $0xff,%edi > > > > All of the instances that I looked at that were not in-place happened > > to already be using a "q" register -- usually %ebx. I assume that's > > because we place %ebx as the first allocation register and that's just > > how things happen to work out once we've flushed the registers before > > the qemu_ld. > > > > qemu_ld8u tmp0,tmp2,$0xffffffff > > ext8u_i32 tmp13,tmp0 > > => > > 0x601c82f9: movzbl (%esi),%ebx > > 0x601c82fc: movzbl %bl,%ebx > > > > Do you have tried to compare the generated code before and after your > patch? I expect a few cases where your patch has some drawbacks, so I > don't know if there is a net gain on the size of the translated code. > I have done a quick test on /bin/ls. | instr | size | +--------+--------+ before | 101305 | 344770 | after | 101258 | 344829 | In short a small gain in the number of instructions, and a small loss in the size of the translated code. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurelien@aurel32.net http://www.aurel32.net