From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1Gf2bF-0003Qz-Bo
	for qemu-devel@nongnu.org; Tue, 31 Oct 2006 18:00:53 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1Gf2bB-0003NQ-VT
	for qemu-devel@nongnu.org; Tue, 31 Oct 2006 18:00:52 -0500
Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Gf2bB-0003NL-Iw
	for qemu-devel@nongnu.org; Tue, 31 Oct 2006 18:00:49 -0500
Received: from [65.74.133.4] (helo=mail.codesourcery.com)
	by monty-python.gnu.org with esmtps
	(TLS-1.0:DHE_RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1Gf2bA-0001i9-Vu
	for qemu-devel@nongnu.org; Tue, 31 Oct 2006 18:00:49 -0500
From: Paul Brook <paul@codesourcery.com>
Subject: Re: [Qemu-devel] qemu vs gcc4
Date: Tue, 31 Oct 2006 23:00:42 +0000
References: <45391B22.1050608@palmsource.com>
	<200610312208.20278.paul@codesourcery.com>
	<4547CEA8.9040903@wanadoo.fr>
In-Reply-To: <4547CEA8.9040903@wanadoo.fr>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Message-Id: <200610312300.43562.paul@codesourcery.com>
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

On Tuesday 31 October 2006 22:31, Laurent Desnogues wrote:
> Paul Brook a =E9crit :
> > Replacing the pregenerated blocks with hand written assembly isn't
> > feasible. Each target has its own set of ops, and each host would need
> > its own assembly implementation of those ops. Multiply 11 targets by 11
> > hosts and you get a unmaintainable mess :-)
>
> Shouldn't you have 11+11 and not 11*11, given your intermediate
> representation?  And of these 11+11, 11 have to be written
> anyway (target).  Or did I miss something?

If you use qops (which is a target and host independent intermediate=20
representation) it's 11 + 11. If you just replace the existing dyngen op.c=
=20
with hand written assembly it's 11 * 11.

> > On RISC targets like ARM most instructions don't set the condition code=
s,
> > so we don't bother doing this.
>
> Except for ARM Thumb ISA which always sets flags.  ARM is a bad
> RISC example :)

Bah. Details :-)

> I was wondering if you did some profiling to know how much time
> is spent in disas_arm_insn.  Of course the profiling results
> would be very different for a Linux boot or a synthetic benchmark

The qop generator does add some overhead to the code translation. I haven't=
=20
done proper benchmarks, but in most cases it doesn't seem to be too bad=20
(maybe 10%). I'm hoping we can get most of that back.

> (which makes me think that you don't support MMU, do you?).

qemu does implement a MMU.
Currently this still uses the dyngen code, but that's fixable.

> There is a very nice trick to speed up decoding of ARM
> instructions:  pick up bits 20-27 and 4-7 and you (almost) get
> one instruction per case entry;  of course this means using a
> generator to write the 4096 entries, but the result was good for
> my interpreted ISS, reaching 44 M i/s on an Opteron @2.4GHz
> without any compiler dependent trick (such as gcc jump to labels).

qemu generally gets 100-200MIPS on my 2GHz Opteron.

Paul