From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1N7XVZ-0002ee-4N for qemu-devel@nongnu.org; Mon, 09 Nov 2009 11:54:25 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1N7XVU-0002bl-5e for qemu-devel@nongnu.org; Mon, 09 Nov 2009 11:54:24 -0500 Received: from [199.232.76.173] (port=40431 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1N7XVS-0002bQ-Nh for qemu-devel@nongnu.org; Mon, 09 Nov 2009 11:54:19 -0500 Received: from cantor.suse.de ([195.135.220.2]:58236 helo=mx1.suse.de) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1N7XVS-0007kx-4p for qemu-devel@nongnu.org; Mon, 09 Nov 2009 11:54:18 -0500 From: Ulrich Hecht Subject: Re: [Qemu-devel] [PATCH 2/9] S/390 CPU emulation Date: Mon, 9 Nov 2009 18:55:23 +0200 References: <1255696735-21396-1-git-send-email-uli@suse.de> <20091102184250.GQ26129@hall.aurel32.net> <761ea48b0911021103l52e87487kd75beef667ec0035@mail.gmail.com> In-Reply-To: <761ea48b0911021103l52e87487kd75beef667ec0035@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Disposition: inline Message-Id: <200911091755.24535.uli@suse.de> Content-Transfer-Encoding: quoted-printable List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Laurent Desnogues Cc: qemu-devel@nongnu.org, Aurelien Jarno , agraf@suse.de On Monday 02 November 2009, Laurent Desnogues wrote: > That indeed looks strange: fixing the TB chaining on ARM > made nbench i386 three times faster. Note the gain was > less for FP parts of the benchmark due to the use of > helpers. > > out of curiosity could you post your tb_set_jmp_target1 > function? I'm on an AMD64 host, so it's the same code as in mainline. > The only thing I can think of at the moment that=20 > could make the code slower is that the program you ran > was not reusing blocks and/or cache flushing in > tb_set_jmp_target1 is overkill. There is no cache flushing in the AMD64 tb_set_jmp_target1() function,=20 and the polarssl test suite is by nature rather repetitive. I did some experiments, and it seems disabling the TB chaining (by=20 emptying tb_set_jmp_target()) does not have any impact on performance at=20 all on AMD64. I tested it with several CPU-intensive programs (md5sum=20 and the like) with AMD64 on AMD64 userspace emulation (qemu-x86_64), and=20 the difference in performance with TB chaining and without is hardly=20 measurable. The chaining is performed as advertised if enabled, I=20 checked that, but it does not seem to help performance. How is this possible? Could this be related to cache size? I suspect the=20 Phenom 9500 of mine is better equipped in that area than the average ARM=20 controller. And does the TB chaining actually work on AMD64 at all? I checked by=20 adding some debug output, and it seems to patch the jumps correctly, but=20 maybe somebody can verify that. CU Uli --=20 SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG N=FCrnberg)