From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LnwSP-0002dq-5E for qemu-devel@nongnu.org; Sun, 29 Mar 2009 10:57:53 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LnwSN-0002de-N2 for qemu-devel@nongnu.org; Sun, 29 Mar 2009 10:57:51 -0400 Received: from [199.232.76.173] (port=57153 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LnwSN-0002db-Hn for qemu-devel@nongnu.org; Sun, 29 Mar 2009 10:57:51 -0400 Received: from hall.aurel32.net ([88.191.82.174]:36201) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1LnwSN-00080S-0c for qemu-devel@nongnu.org; Sun, 29 Mar 2009 10:57:51 -0400 Received: from aurel32 by hall.aurel32.net with local (Exim 4.69) (envelope-from ) id 1LnwSL-0007BZ-Qd for qemu-devel@nongnu.org; Sun, 29 Mar 2009 16:57:49 +0200 Date: Sun, 29 Mar 2009 16:57:49 +0200 From: Aurelien Jarno Subject: Re: [Qemu-devel] [PATCH 0/4] target-ppc: create TCG slots for registers based on CPU Message-ID: <20090329145749.GD12026@hall.aurel32.net> References: <1238275817-9758-1-git-send-email-froydnj@codesourcery.com> <20090328225443.GL20944@hall.aurel32.net> <20090329001834.GK7336@codesourcery.com> <20090329133453.GA12026@hall.aurel32.net> <20090329144250.GC12026@hall.aurel32.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20090329144250.GC12026@hall.aurel32.net> Sender: Aurelien Jarno Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org On Sun, Mar 29, 2009 at 04:42:50PM +0200, Aurelien Jarno wrote: > On Sun, Mar 29, 2009 at 03:34:53PM +0200, Aurelien Jarno wrote: > > On Sat, Mar 28, 2009 at 05:18:34PM -0700, Nathan Froyd wrote: > > > On Sat, Mar 28, 2009 at 11:54:43PM +0100, Aurelien Jarno wrote: > > > > On Sat, Mar 28, 2009 at 02:30:13PM -0700, Nathan Froyd wrote: > > > > > I am not a TCG expert, but there are several loops in TCG over all > > > > > globals and it seems like those loops would go faster if they didn't > > > > > have to consider registers that would never be touched. If this patch > > > > > series makes no difference in TCG's performance, then I'd be glad to > > > > > have an explanation of why that's the case. > > > > > > > > Do you actually have run a benchmark with those changes? TCG is > > > > sometimes a bit strange, and some optimizations does not change the > > > > execution speed, while others improve it a lot. It is very difficult to > > > > predict what will give a gain or not. > > > > > > > > Suggestions of benchmarks: gzip/bzip2 on a big file using user emulation > > > > or a compilation in system emulation. > > > > > > Benchmarking? Pffft. ;) > > > > > > A benchmarking session with qemu-ppc and bzip2/bunzip2 on ~400MB files > > > and a 603e emulated CPU suggests that these changes are not terribly > > > beneficial (maybe 1% improvement, if that). I don't imagine that a > > > similarly stressful benchmark in system emulation would be much > > > different. Consider the patch series withdrawn. > > > > > > > I have done a few profiling on qemu-system-ppc and qemu-system-mips. You > > are actually right that the loop on the TCG variables lists takes time. > > This is mainly due to the call of save_globals() for TCG functions marked > > as TCG_OPF_CALL_CLOBBER. > > > > However it looks like it should be better to address this comment first > > before trying to reduce the number of TCG variables: > > > > /* XXX: for load/store we could do that only for the slow path > > (i.e. when a memory callback is called) */ > > > > Thinking a bit more I think we should avoid mapping FPU registers as > global TCG variables. Those variables are mostly modified by helpers > (except for move and load/store), and they will be written back to > memory before the call to the helper. This means TCG can't delay the > memory accesses, so there is very few (or no) difference in the > generated code if the FPU register is accessed through a global TCG > variable or through tcg_gen_ld_tl(). > > I have done the test with qemu-system-mips, and I have found a gain > around 1% in speed. > My measurements were wrong, the gain is around 9%. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurelien@aurel32.net http://www.aurel32.net