From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1N7tBH-0001P8-Q0 for qemu-devel@nongnu.org; Tue, 10 Nov 2009 11:02:55 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1N7tBH-0001Og-6Q for qemu-devel@nongnu.org; Tue, 10 Nov 2009 11:02:55 -0500 Received: from [199.232.76.173] (port=58357 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1N7tBH-0001OW-0S for qemu-devel@nongnu.org; Tue, 10 Nov 2009 11:02:55 -0500 Received: from hall.aurel32.net ([88.191.82.174]:46420) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1N7tBF-0008U8-IU for qemu-devel@nongnu.org; Tue, 10 Nov 2009 11:02:54 -0500 Date: Tue, 10 Nov 2009 17:02:50 +0100 From: Aurelien Jarno Subject: Re: [Qemu-devel] [PATCH 2/9] S/390 CPU emulation Message-ID: <20091110160250.GC9052@hall.aurel32.net> References: <1255696735-21396-1-git-send-email-uli@suse.de> <20091102184250.GQ26129@hall.aurel32.net> <761ea48b0911021103l52e87487kd75beef667ec0035@mail.gmail.com> <200911091755.24535.uli@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <200911091755.24535.uli@suse.de> Sender: Aurelien Jarno List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ulrich Hecht Cc: Laurent Desnogues , qemu-devel@nongnu.org, agraf@suse.de On Mon, Nov 09, 2009 at 06:55:23PM +0200, Ulrich Hecht wrote: > On Monday 02 November 2009, Laurent Desnogues wrote: > > That indeed looks strange: fixing the TB chaining on ARM > > made nbench i386 three times faster. Note the gain was > > less for FP parts of the benchmark due to the use of > > helpers. > > > > out of curiosity could you post your tb_set_jmp_target1 > > function? > > I'm on an AMD64 host, so it's the same code as in mainline. > > > The only thing I can think of at the moment that > > could make the code slower is that the program you ran > > was not reusing blocks and/or cache flushing in > > tb_set_jmp_target1 is overkill. > > There is no cache flushing in the AMD64 tb_set_jmp_target1() function, > and the polarssl test suite is by nature rather repetitive. > > I did some experiments, and it seems disabling the TB chaining (by > emptying tb_set_jmp_target()) does not have any impact on performance at > all on AMD64. I tested it with several CPU-intensive programs (md5sum > and the like) with AMD64 on AMD64 userspace emulation (qemu-x86_64), and > the difference in performance with TB chaining and without is hardly > measurable. The chaining is performed as advertised if enabled, I > checked that, but it does not seem to help performance. I have tested it by removing all the block around tb_add_jump in cpu_exec.c. I have a speed loss of about 2.5x in the boot time of an x86_64 image. > How is this possible? Could this be related to cache size? I suspect the > Phenom 9500 of mine is better equipped in that area than the average ARM > controller. For me it's on a Core 2 Duo T7200, so I doubt it is related to cache size. > And does the TB chaining actually work on AMD64 at all? I checked by > adding some debug output, and it seems to patch the jumps correctly, but > maybe somebody can verify that. > Given the gain in speed I have, I guess it works. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurelien@aurel32.net http://www.aurel32.net