From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:35882) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QNVW5-0003ng-GY for qemu-devel@nongnu.org; Fri, 20 May 2011 15:37:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QNVW4-0003RE-Cc for qemu-devel@nongnu.org; Fri, 20 May 2011 15:37:45 -0400 Received: from hall.aurel32.net ([88.191.126.93]:39719) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QNVW4-0003RA-65 for qemu-devel@nongnu.org; Fri, 20 May 2011 15:37:44 -0400 Date: Fri, 20 May 2011 21:37:41 +0200 From: Aurelien Jarno Message-ID: <20110520193741.GC27170@hall.aurel32.net> References: <4DD6A9F9.7040805@twiddle.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <4DD6A9F9.7040805@twiddle.net> Sender: Aurelien Jarno Subject: Re: [Qemu-devel] [PATCH 0/6] Implement constant folding and copy propagation in TCG List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Richard Henderson Cc: mj.mccormack@samsung.com, qemu-devel@nongnu.org, zhur@ispras.ru, Kirill Batuzov On Fri, May 20, 2011 at 10:50:49AM -0700, Richard Henderson wrote: > On 05/20/2011 05:39 AM, Kirill Batuzov wrote: > > This series implements some basic machine-independent optimizations. They > > simplify code and allow liveness analysis do it's work better. > > > > Suppose we have following ARM code: > > > > movw r12, #0xb6db > > movt r12, #0xdb6d > > > > In TCG before optimizations we'll have: > > > > movi_i32 tmp8,$0xb6db > > mov_i32 r12,tmp8 > > mov_i32 tmp8,r12 > > ext16u_i32 tmp8,tmp8 > > movi_i32 tmp9,$0xdb6d0000 > > or_i32 tmp8,tmp8,tmp9 > > mov_i32 r12,tmp8 > > > > And after optimizations we'll have this: > > > > movi_i32 r12,$0xdb6db6db > > > > Here are performance evaluation results on SPEC CPU2000 integer tests in > > user-mode emulation on x86_64 host. There were 5 runs of each test on > > reference data set. The tables below show runtime in seconds for all these > > runs. > > I totally agree that this sort of optimization is needed in TCG. Essentially > all RISC guests have the same problem. When emulating one RISC upon another, > the problem may be exacerbated. E.g. Sparc on PPC -- sparc will use a 21/11 > bit split of the constant, ppc will use a 16/16 split of the constant, which > results in 3 insns in the generated code where 2 would do. > > You should be aware of prior work in this area by Aurelien Jarno: > > git://git.aurel32.net/qemu.git tcg-optimizations > > Given that's now 2 years old, and doesn't seem to be progressing, I hope your > patch series can get things going again... I basically stopped working on constant propagation, as while the TCG code looked nicer, the resulting code was always slower. Since the discussion about TCG_AREG0, I have started to work again on the register allocation (see the first patch series I sent about that), I hope to have something ready by the end of the week-end. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurelien@aurel32.net http://www.aurel32.net