From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1N7XVZ-0002ee-4N
	for qemu-devel@nongnu.org; Mon, 09 Nov 2009 11:54:25 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1N7XVU-0002bl-5e
	for qemu-devel@nongnu.org; Mon, 09 Nov 2009 11:54:24 -0500
Received: from [199.232.76.173] (port=40431 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1N7XVS-0002bQ-Nh
	for qemu-devel@nongnu.org; Mon, 09 Nov 2009 11:54:19 -0500
Received: from cantor.suse.de ([195.135.220.2]:58236 helo=mx1.suse.de)
	by monty-python.gnu.org with esmtps
	(TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60)
	(envelope-from <uli@suse.de>) id 1N7XVS-0007kx-4p
	for qemu-devel@nongnu.org; Mon, 09 Nov 2009 11:54:18 -0500
From: Ulrich Hecht <uli@suse.de>
Subject: Re: [Qemu-devel] [PATCH 2/9] S/390 CPU emulation
Date: Mon, 9 Nov 2009 18:55:23 +0200
References: <1255696735-21396-1-git-send-email-uli@suse.de>
	<20091102184250.GQ26129@hall.aurel32.net>
	<761ea48b0911021103l52e87487kd75beef667ec0035@mail.gmail.com>
In-Reply-To: <761ea48b0911021103l52e87487kd75beef667ec0035@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Disposition: inline
Message-Id: <200911091755.24535.uli@suse.de>
Content-Transfer-Encoding: quoted-printable
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Laurent Desnogues <laurent.desnogues@gmail.com>
Cc: qemu-devel@nongnu.org, Aurelien Jarno <aurelien@aurel32.net>, agraf@suse.de

On Monday 02 November 2009, Laurent Desnogues wrote:
> That indeed looks strange:  fixing the TB chaining on ARM
> made nbench i386 three times faster.  Note the gain was
> less for FP parts of the benchmark due to the use of
> helpers.
>
> out of curiosity could you post your tb_set_jmp_target1
> function?

I'm on an AMD64 host, so it's the same code as in mainline.

> The only thing I can think of at the moment that=20
> could make the code slower is that the program you ran
> was not reusing blocks and/or cache flushing in
> tb_set_jmp_target1 is overkill.

There is no cache flushing in the AMD64 tb_set_jmp_target1() function,=20
and the polarssl test suite is by nature rather repetitive.

I did some experiments, and it seems disabling the TB chaining (by=20
emptying tb_set_jmp_target()) does not have any impact on performance at=20
all on AMD64. I tested it with several CPU-intensive programs (md5sum=20
and the like) with AMD64 on AMD64 userspace emulation (qemu-x86_64), and=20
the difference in performance with TB chaining and without is hardly=20
measurable. The chaining is performed as advertised if enabled, I=20
checked that, but it does not seem to help performance.

How is this possible? Could this be related to cache size? I suspect the=20
Phenom 9500 of mine is better equipped in that area than the average ARM=20
controller.

And does the TB chaining actually work on AMD64 at all? I checked by=20
adding some debug output, and it seems to patch the jumps correctly, but=20
maybe somebody can verify that.

CU
Uli

--=20
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG N=FCrnberg)