From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1Gf5Gb-0006ZY-NY
	for qemu-devel@nongnu.org; Tue, 31 Oct 2006 20:51:45 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1Gf5Gb-0006ZI-2j
	for qemu-devel@nongnu.org; Tue, 31 Oct 2006 20:51:45 -0500
Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Gf5Ga-0006ZF-TT
	for qemu-devel@nongnu.org; Tue, 31 Oct 2006 20:51:45 -0500
Received: from [71.162.243.5] (helo=grelber.thyrsus.com)
	by monty-python.gnu.org with esmtps
	(TLS-1.0:DHE_RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1Gf5Ga-0000jk-Pz
	for qemu-devel@nongnu.org; Tue, 31 Oct 2006 20:51:45 -0500
From: Rob Landley <rob@landley.net>
Subject: Re: [Qemu-devel] qemu vs gcc4
Date: Tue, 31 Oct 2006 20:51:30 -0500
References: <45391B22.1050608@palmsource.com>
	<200610311900.15886.rob@landley.net>
	<200611010029.29406.paul@codesourcery.com>
In-Reply-To: <200611010029.29406.paul@codesourcery.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Disposition: inline
Message-Id: <200610312051.30749.rob@landley.net>
Content-Transfer-Encoding: quoted-printable
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paul Brook <paul@codesourcery.com>
Cc: qemu-devel@nongnu.org

On Tuesday 31 October 2006 7:29 pm, Paul Brook wrote:
> > Actually it sounds additive rather than multiplicative. =A0Does each =
target
> > have an entirely unrelated set of ops, or is there a shared set of
> > primitive ops plus some oddballs?
>=20
> The shared set of primitive ops is basically qops :-)
> You probably could figure out a single common qet of qops, then write=20
assembly=20
> and glue them together like we do with dyngen. However once you've done=
 that=20
> you've implemented most of what's needed for fully dynamic qops, so it=20
> doesn't really seem worth it.

I missed a curve.  What's "fully dynamic qops"?  (There's no translation=20
cache?)

> > But I already know=20
> > that doesn't work because it doesn't explain the "unable to find spil=
l
> > register" problem.=20
>=20
> That a separate gcc bug. It gets stuck when you tell it not to use half=
 the=20
> registers, then ask it to do 64-bit math. This is one of the reasons=20
> eliminating the fixed registers is a good idea.

Sigh.  The problems motivating me to learn the code are highly esoteric=20
breakage, yet I'm still not quite up to the task of understanding what's=20
going on when all this works _right_.  Grumble...=20

> > > It corresponds to "T0" in dyngen. In addition to the actual CPU sta=
te,
> > > dyngen
> > > uses 3 fixed register as scratch workspace. for qop purposes these =
are
> > > part of the guest CPU state. They're only there to aid conversion o=
f the
> > > translation code, they'll go away eventually.
> >
> > Presumably the m68k target is pure qop, and hasn't got this sort of t=
hing?
>=20
> Correct.
> There is one use of T0 left for communicating with the TB chaining code=
, but=20
> that's it and will probably go away eventually.

Any idea where I can get a toolchain that can output a "hello world" prog=
ram=20
for m68k nommu?  (Or perhaps you have a statically linked "hello world"=20
program for the platform lying around?)

Building toolchains is one of my other hobbies but it's a royal pain beca=
use=20
in order to get "hello world" to compile and link you have to supply kern=
el=20
headers, build binutils and gcc with various configuration options and pa=
th=20
overrides and such, build uClibc with the result and get them all talking=
 to=20
each other.  I.E. you've got to do hours of work before you get to the fi=
rst=20
real "did it work" point, and then backtrack to figure out why the answer=
 is=20
usually "no".  (Prebuilt binary toolchains are useful just to narrow down=
 the=20
number of possible things that could be broken when you first try out a n=
ew=20
platform.)

> > > > Possible translation: you can feed a qreg containing an I64 value=
 to a
> > > > qop taking an i32 argument, and it'll typecast the sucker down
> > > > intelligently, but if you produce an I32 result and expect to use=
 that
> > > > qreg's value as an I64 argument later, you have to call a
> > > > sign-extending qop on it first?
> > >
> > > Exactly.
> > > If you mix I32,F32 and/or F64 in this way Bad Things will happen.
> >
> > Presumably just the same kinds of Bad Things as "float f; *(int *)&f;=
"?
>=20
> Or qemu will get confused and crash.

I've had that happen without qops, although not recently.  (I have this n=
asty=20
habit of trying Ubuntu's PPC and x86-64 distros under qemu with each new=20
release.  They usually fail in amusing new ways.)

> > > > seeing end with _im which I presume means "immediate".  The=20
alternative
> > > > is _cc, but what does that mean?  (Presumably not "closed caption=
ed".)
> > >
> > > _cc are variants that set the condition codes. I may have got T0 an=
d T1
> > > backwards in the first 3 lines.
> >
> > Ah!
> >
> > Is this written down anywhere?  I've read Fabrice's paper and the des=
ign
> > documentation, and I'm not remembering this.  It's quite possible I m=
issed
> > it when my brain filled up, though.
>=20
> Dunno.

So if at any point I actually understand this stuff, I need to write=20
documentation?  (I can do part 2, part 1 the jury's still out on...)

> It also means you don't need to reserve that register, avoiding the gcc
> unable to find spill register bug you mentioned above.

I'm all for it.

> > Um, wouldn't the flag setting code be fairly straightforward as a qop=
 that
> > comes right _before_ the other op, as in "set the flags for doing thi=
s=20
with
> > these registers", that does nothing but set the flags (I.E. it wouldn=
't
> > modify the contents of any the registers, so it could be immediately
> > followed by the appropriate add or shift or so on), and then the flag
> > setting pass could just turn all the ones that weren't needed into
> > QOP_NULL?
>=20
> Theoretically possible, but not so easy in practice. Especially when yo=
u get=20
> things like partial flag clobbers, and lazy flag evaluation. Doing it a=
s a=20
> target specific hack is much simpler and quicker.

I think I know what partial flag clobbers are (although if you're working=
 your=20
way back, in theory you could handle it with a mask of exposed bits), but=
=20
what's lazy flag evaulation?  (I thought that was the point of eliminatin=
g=20
the unused flag setting.  Are you saying the hardware also does this and =
we=20
have to emulate that?)

> > Or is that what's happening now?  (Do QOPs ever modify their input
> > registers, or only the output one?)
>=20
> The generic qops never modify inputs, and never read outputs. Inputs an=
d=20
> outputs can be the same qreg.

Hm.

> > > There are three fairly independent stages:
> > > 1) target-*/translate.c converts guest code into qops.
> > > 2) translate-all.c messes about with those qops a bit (allocates ho=
st
> > > registers, etc).
> > > 3) translate-op.c,translate-qop.c and target-*/ turns those qops in=
to
> > > host code.
> >
> > Is pass 2 where the flag elimination pass goes (and presumably any ot=
her
> > optimizations that might get added)?  No, that can't be the case or t=
he
> > m68k code wouldn't need its own implementation of the flag eliminatio=
n
> > pass...
>=20
> Flag elimination is at the end of step 1.

Because it's platform specific?
\
> > > qops and dyngen ops are both small "functions" that are represented=
 in a
> > > similar way. The difference is that dyngen ops are target specific =
fixed
> > > functions, whereas qops are generic parameterized functions.
> >
> > So the 11x11 exponential complexity of qemu producing its own assembl=
y
> > output might not be as much of a problem after switching to qops?
>=20
> RIght. The exponential complexity is if you write the assembly by hand
> instead of using gcc to generate it.

The exponential complexity is if you have to write different code for eac=
h=20
combination of host and target.  If every target disassembles to the same=
 set=20
of target QOPs, then you could have a hand-written assembly version of ea=
ch=20
QOP for each host platform, and still have N rather than N^2 of them.

And I still wanna use tcc to generate it, someday. :)

> > Possibly some of the common qops can have an asm block for 'em, and t=
he
> > rest can go through the contortions target-*/op.c is currently doing =
with
> > (glue(glue(blah))) and so on.
>=20
> Currently we know how to generate code direcly for all qops. Anything m=
ore=20
> complicated must be either put in a helper function or split into multi=
ple=20
> qops.

Split into multiple qops I can understand.

> > > I started off by saying qops were effectively instructions for an
> > > imaginary machine. translate-all.c rearranges them so they match up=
 very
> > > closely with the instructions available on the host. Once this has =
been
> > > done turning them into binary code is relatively simple.
> >
> > I sort of thought this is what it was already doing, but apparently n=
ot...
>=20
> We're getting confused with tenses. I mean this once translate-all.c ha=
s=20
> rearranged the qops we *do* generate host instructions from them withou=
t too=20
> much effort.

By "already doing" I meant I thought the 0.8.2 code was dong this, before=
 your=20
new tree switching everything over to qops.  (Trying to read dyngen.c rem=
inds=20
me of reading cgi code that outputs html with embedded javascript.)

Rob
--=20
"Perfection is reached, not when there is no longer anything to add, but
when there is no longer anything to take away." - Antoine de Saint-Exuper=
y