From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:44844) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLr3Z-00015u-5g for qemu-devel@nongnu.org; Wed, 10 Oct 2012 03:50:22 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TLr3O-0007Mn-6j for qemu-devel@nongnu.org; Wed, 10 Oct 2012 03:50:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:2745) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLr3N-0007H1-TZ for qemu-devel@nongnu.org; Wed, 10 Oct 2012 03:50:06 -0400 Message-ID: <507528A1.3050200@redhat.com> Date: Wed, 10 Oct 2012 09:49:53 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <1349812584-19551-1-git-send-email-aurelien@aurel32.net> <50751CDF.7000300@redhat.com> <20121010074207.GA7444@ohm.aurel32.net> In-Reply-To: <20121010074207.GA7444@ohm.aurel32.net> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v2 00/26] tcg: rework liveness analysis and register allocator List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Aurelien Jarno Cc: qemu-devel@nongnu.org Il 10/10/2012 09:42, Aurelien Jarno ha scritto: > On Wed, Oct 10, 2012 at 08:59:43AM +0200, Paolo Bonzini wrote: >> Il 09/10/2012 21:55, Aurelien Jarno ha scritto: >>> This patch series rework the liveness analysis and register allocator >>> in order to generate more optimized code, by avoiding a lot of move >>> instructions. I have measured a 9% performance improvement in user mode >>> and 4% in system mode. >>> >>> The idea behind this patch series is to free registers as soon as the >>> temps are not used anymore instead of waiting for a basic block end or >>> an op with side effects. >> >> Would it make any sense to express the saves as real TCG ops? This >> would have a couple of advantages: > > It depends what you mean by that. Spills are decided more or less at the > last moment (no free registers available, clobbered registers in a > function call). I'm not talking of spills; only saves of dead globals and local temps. These can be computed before the optimizer runs, right? > If it's about inserting them in the TCG stream, as it is done at the > last step, ie after copy propagation and dead code elimination, it's not > really useful anymore. > >> - more copy propagation and dead code elimination. Something like this: >> >> mov_i64 cc_dst,rax >> >> right now is compiled as follows: >> >> 0x5555557ac37a: mov %rbp,(%r14) # spill rax >> 0x5555557ac381: mov (%r14),%rbp # load rax from memory >> 0x5555557ac38f: mov %rbp,0x98(%r14) # spill cc_dst to memory > > I am surprised by this kind of code, and I think there's a bug somewhere > in TCG. With the current TCG code, given rax is not dead, it should be > spilled only after the move of cc_dst to memory, and thus second line is > not supposed to be emitted. With this patch series applied the second > line should simply be removed. Note that the above was without your series. >> - constant propagation using constraints. This would let tcg-i386 use >> effectively the mov $imm,(addr) instruction for spills of known-constant >> values. > > This is indeed something quite frustrating and even more when the > same immediate value is loaded multiple time. One way to do that would > be to provide an optional tcg_out_st_immediate(). Yes, that would be simple. Paolo