From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1Gf6gB-0006Az-C2
	for qemu-devel@nongnu.org; Tue, 31 Oct 2006 22:22:15 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1Gf6g9-0006Ak-LB
	for qemu-devel@nongnu.org; Tue, 31 Oct 2006 22:22:15 -0500
Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Gf6g9-0006Ah-HM
	for qemu-devel@nongnu.org; Tue, 31 Oct 2006 22:22:13 -0500
Received: from [65.74.133.4] (helo=mail.codesourcery.com)
	by monty-python.gnu.org with esmtps
	(TLS-1.0:DHE_RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1Gf6g9-0006t8-9K
	for qemu-devel@nongnu.org; Tue, 31 Oct 2006 22:22:13 -0500
From: Paul Brook <paul@codesourcery.com>
Subject: Re: [Qemu-devel] qemu vs gcc4
Date: Wed, 1 Nov 2006 03:22:09 +0000
References: <45391B22.1050608@palmsource.com>
	<200611010029.29406.paul@codesourcery.com>
	<200610312051.30749.rob@landley.net>
In-Reply-To: <200610312051.30749.rob@landley.net>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Message-Id: <200611010322.09885.paul@codesourcery.com>
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Rob Landley <rob@landley.net>
Cc: qemu-devel@nongnu.org

On Wednesday 01 November 2006 01:51, Rob Landley wrote:
> On Tuesday 31 October 2006 7:29 pm, Paul Brook wrote:
> > > Actually it sounds additive rather than multiplicative. =A0Does each
> > > target have an entirely unrelated set of ops, or is there a shared set
> > > of primitive ops plus some oddballs?
> >
> > The shared set of primitive ops is basically qops :-)
> > You probably could figure out a single common qet of qops, then write
>
> assembly
>
> > and glue them together like we do with dyngen. However once you've done
> > that you've implemented most of what's needed for fully dynamic qops, so
> > it doesn't really seem worth it.
>
> I missed a curve.  What's "fully dynamic qops"?  (There's no translation
> cache?)

I mean all the qop stuff I've implemented.

> > > > It corresponds to "T0" in dyngen. In addition to the actual CPU
> > > > state, dyngen
> > > > uses 3 fixed register as scratch workspace. for qop purposes these
> > > > are part of the guest CPU state. They're only there to aid conversi=
on
> > > > of the translation code, they'll go away eventually.
> > >
> > > Presumably the m68k target is pure qop, and hasn't got this sort of
> > > thing?
> >
> > Correct.
> > There is one use of T0 left for communicating with the TB chaining code,
> > but that's it and will probably go away eventually.
>
> Any idea where I can get a toolchain that can output a "hello world"
> program for m68k nommu?  (Or perhaps you have a statically linked "hello
> world" program for the platform lying around?)

=46unnily enough I do :-)
http://www.codesourcery.com/gnu_toolchains/coldfire/

> > Theoretically possible, but not so easy in practice. Especially when you
> > get things like partial flag clobbers, and lazy flag evaluation. Doing =
it
> > as a target specific hack is much simpler and quicker.
>
> I think I know what partial flag clobbers are (although if you're working
> your way back, in theory you could handle it with a mask of exposed bits),
> but what's lazy flag evaulation?  (I thought that was the point of
> eliminating the unused flag setting.  Are you saying the hardware also do=
es
> this and we have to emulate that?)

Lazy flag evaluation is where you don't bother calculating the actual flags=
=20
when executing the flag-setting instruction. Instead you save the=20
operands/result and compute the flags when you actually need them.

> > > > There are three fairly independent stages:
> > > > 1) target-*/translate.c converts guest code into qops.
> > > > 2) translate-all.c messes about with those qops a bit (allocates ho=
st
> > > > registers, etc).
> > > > 3) translate-op.c,translate-qop.c and target-*/ turns those qops in=
to
> > > > host code.
> > >
> > > Is pass 2 where the flag elimination pass goes (and presumably any
> > > other optimizations that might get added)?  No, that can't be the case
> > > or the m68k code wouldn't need its own implementation of the flag
> > > elimination pass...
> >
> > Flag elimination is at the end of step 1.
>
> Because it's platform specific?

Yes.

> > > > qops and dyngen ops are both small "functions" that are represented
> > > > in a similar way. The difference is that dyngen ops are target
> > > > specific fixed functions, whereas qops are generic parameterized
> > > > functions.
> > >
> > > So the 11x11 exponential complexity of qemu producing its own assembly
> > > output might not be as much of a problem after switching to qops?
> >
> > RIght. The exponential complexity is if you write the assembly by hand
> > instead of using gcc to generate it.
>
> The exponential complexity is if you have to write different code for each
> combination of host and target.  If every target disassembles to the same
> set of target QOPs, then you could have a hand-written assembly version of
> each QOP for each host platform, and still have N rather than N^2 of them.

Right, but by the time you've got everything to use the same set of ops you=
=20
may as well teach qemu how to generate code instead of using potted=20
fragments.

Using hand-written assembly fragments probably doesn't make qemu any faster=
,=20
it just removes the gcc dependency. Using qops also allows qemu to generate=
=20
better (faster) translated code.

Paul