From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Gf5Gb-0006ZY-NY for qemu-devel@nongnu.org; Tue, 31 Oct 2006 20:51:45 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Gf5Gb-0006ZI-2j for qemu-devel@nongnu.org; Tue, 31 Oct 2006 20:51:45 -0500 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Gf5Ga-0006ZF-TT for qemu-devel@nongnu.org; Tue, 31 Oct 2006 20:51:45 -0500 Received: from [71.162.243.5] (helo=grelber.thyrsus.com) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1Gf5Ga-0000jk-Pz for qemu-devel@nongnu.org; Tue, 31 Oct 2006 20:51:45 -0500 From: Rob Landley Subject: Re: [Qemu-devel] qemu vs gcc4 Date: Tue, 31 Oct 2006 20:51:30 -0500 References: <45391B22.1050608@palmsource.com> <200610311900.15886.rob@landley.net> <200611010029.29406.paul@codesourcery.com> In-Reply-To: <200611010029.29406.paul@codesourcery.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Disposition: inline Message-Id: <200610312051.30749.rob@landley.net> Content-Transfer-Encoding: quoted-printable Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paul Brook Cc: qemu-devel@nongnu.org On Tuesday 31 October 2006 7:29 pm, Paul Brook wrote: > > Actually it sounds additive rather than multiplicative. =A0Does each = target > > have an entirely unrelated set of ops, or is there a shared set of > > primitive ops plus some oddballs? >=20 > The shared set of primitive ops is basically qops :-) > You probably could figure out a single common qet of qops, then write=20 assembly=20 > and glue them together like we do with dyngen. However once you've done= that=20 > you've implemented most of what's needed for fully dynamic qops, so it=20 > doesn't really seem worth it. I missed a curve. What's "fully dynamic qops"? (There's no translation=20 cache?) > > But I already know=20 > > that doesn't work because it doesn't explain the "unable to find spil= l > > register" problem.=20 >=20 > That a separate gcc bug. It gets stuck when you tell it not to use half= the=20 > registers, then ask it to do 64-bit math. This is one of the reasons=20 > eliminating the fixed registers is a good idea. Sigh. The problems motivating me to learn the code are highly esoteric=20 breakage, yet I'm still not quite up to the task of understanding what's=20 going on when all this works _right_. Grumble...=20 > > > It corresponds to "T0" in dyngen. In addition to the actual CPU sta= te, > > > dyngen > > > uses 3 fixed register as scratch workspace. for qop purposes these = are > > > part of the guest CPU state. They're only there to aid conversion o= f the > > > translation code, they'll go away eventually. > > > > Presumably the m68k target is pure qop, and hasn't got this sort of t= hing? >=20 > Correct. > There is one use of T0 left for communicating with the TB chaining code= , but=20 > that's it and will probably go away eventually. Any idea where I can get a toolchain that can output a "hello world" prog= ram=20 for m68k nommu? (Or perhaps you have a statically linked "hello world"=20 program for the platform lying around?) Building toolchains is one of my other hobbies but it's a royal pain beca= use=20 in order to get "hello world" to compile and link you have to supply kern= el=20 headers, build binutils and gcc with various configuration options and pa= th=20 overrides and such, build uClibc with the result and get them all talking= to=20 each other. I.E. you've got to do hours of work before you get to the fi= rst=20 real "did it work" point, and then backtrack to figure out why the answer= is=20 usually "no". (Prebuilt binary toolchains are useful just to narrow down= the=20 number of possible things that could be broken when you first try out a n= ew=20 platform.) > > > > Possible translation: you can feed a qreg containing an I64 value= to a > > > > qop taking an i32 argument, and it'll typecast the sucker down > > > > intelligently, but if you produce an I32 result and expect to use= that > > > > qreg's value as an I64 argument later, you have to call a > > > > sign-extending qop on it first? > > > > > > Exactly. > > > If you mix I32,F32 and/or F64 in this way Bad Things will happen. > > > > Presumably just the same kinds of Bad Things as "float f; *(int *)&f;= "? >=20 > Or qemu will get confused and crash. I've had that happen without qops, although not recently. (I have this n= asty=20 habit of trying Ubuntu's PPC and x86-64 distros under qemu with each new=20 release. They usually fail in amusing new ways.) > > > > seeing end with _im which I presume means "immediate". The=20 alternative > > > > is _cc, but what does that mean? (Presumably not "closed caption= ed".) > > > > > > _cc are variants that set the condition codes. I may have got T0 an= d T1 > > > backwards in the first 3 lines. > > > > Ah! > > > > Is this written down anywhere? I've read Fabrice's paper and the des= ign > > documentation, and I'm not remembering this. It's quite possible I m= issed > > it when my brain filled up, though. >=20 > Dunno. So if at any point I actually understand this stuff, I need to write=20 documentation? (I can do part 2, part 1 the jury's still out on...) > It also means you don't need to reserve that register, avoiding the gcc > unable to find spill register bug you mentioned above. I'm all for it. > > Um, wouldn't the flag setting code be fairly straightforward as a qop= that > > comes right _before_ the other op, as in "set the flags for doing thi= s=20 with > > these registers", that does nothing but set the flags (I.E. it wouldn= 't > > modify the contents of any the registers, so it could be immediately > > followed by the appropriate add or shift or so on), and then the flag > > setting pass could just turn all the ones that weren't needed into > > QOP_NULL? >=20 > Theoretically possible, but not so easy in practice. Especially when yo= u get=20 > things like partial flag clobbers, and lazy flag evaluation. Doing it a= s a=20 > target specific hack is much simpler and quicker. I think I know what partial flag clobbers are (although if you're working= your=20 way back, in theory you could handle it with a mask of exposed bits), but= =20 what's lazy flag evaulation? (I thought that was the point of eliminatin= g=20 the unused flag setting. Are you saying the hardware also does this and = we=20 have to emulate that?) > > Or is that what's happening now? (Do QOPs ever modify their input > > registers, or only the output one?) >=20 > The generic qops never modify inputs, and never read outputs. Inputs an= d=20 > outputs can be the same qreg. Hm. > > > There are three fairly independent stages: > > > 1) target-*/translate.c converts guest code into qops. > > > 2) translate-all.c messes about with those qops a bit (allocates ho= st > > > registers, etc). > > > 3) translate-op.c,translate-qop.c and target-*/ turns those qops in= to > > > host code. > > > > Is pass 2 where the flag elimination pass goes (and presumably any ot= her > > optimizations that might get added)? No, that can't be the case or t= he > > m68k code wouldn't need its own implementation of the flag eliminatio= n > > pass... >=20 > Flag elimination is at the end of step 1. Because it's platform specific? \ > > > qops and dyngen ops are both small "functions" that are represented= in a > > > similar way. The difference is that dyngen ops are target specific = fixed > > > functions, whereas qops are generic parameterized functions. > > > > So the 11x11 exponential complexity of qemu producing its own assembl= y > > output might not be as much of a problem after switching to qops? >=20 > RIght. The exponential complexity is if you write the assembly by hand > instead of using gcc to generate it. The exponential complexity is if you have to write different code for eac= h=20 combination of host and target. If every target disassembles to the same= set=20 of target QOPs, then you could have a hand-written assembly version of ea= ch=20 QOP for each host platform, and still have N rather than N^2 of them. And I still wanna use tcc to generate it, someday. :) > > Possibly some of the common qops can have an asm block for 'em, and t= he > > rest can go through the contortions target-*/op.c is currently doing = with > > (glue(glue(blah))) and so on. >=20 > Currently we know how to generate code direcly for all qops. Anything m= ore=20 > complicated must be either put in a helper function or split into multi= ple=20 > qops. Split into multiple qops I can understand. > > > I started off by saying qops were effectively instructions for an > > > imaginary machine. translate-all.c rearranges them so they match up= very > > > closely with the instructions available on the host. Once this has = been > > > done turning them into binary code is relatively simple. > > > > I sort of thought this is what it was already doing, but apparently n= ot... >=20 > We're getting confused with tenses. I mean this once translate-all.c ha= s=20 > rearranged the qops we *do* generate host instructions from them withou= t too=20 > much effort. By "already doing" I meant I thought the 0.8.2 code was dong this, before= your=20 new tree switching everything over to qops. (Trying to read dyngen.c rem= inds=20 me of reading cgi code that outputs html with embedded javascript.) Rob --=20 "Perfection is reached, not when there is no longer anything to add, but when there is no longer anything to take away." - Antoine de Saint-Exuper= y