From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Gf3z4-0001n5-Gv for qemu-devel@nongnu.org; Tue, 31 Oct 2006 19:29:34 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Gf3z3-0001ma-N8 for qemu-devel@nongnu.org; Tue, 31 Oct 2006 19:29:34 -0500 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Gf3z3-0001mV-K4 for qemu-devel@nongnu.org; Tue, 31 Oct 2006 19:29:33 -0500 Received: from [65.74.133.4] (helo=mail.codesourcery.com) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1Gf3z3-0002LC-8q for qemu-devel@nongnu.org; Tue, 31 Oct 2006 19:29:33 -0500 From: Paul Brook Subject: Re: [Qemu-devel] qemu vs gcc4 Date: Wed, 1 Nov 2006 00:29:28 +0000 References: <45391B22.1050608@palmsource.com> <200610312208.20278.paul@codesourcery.com> <200610311900.15886.rob@landley.net> In-Reply-To: <200610311900.15886.rob@landley.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200611010029.29406.paul@codesourcery.com> Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org > Actually it sounds additive rather than multiplicative. =A0Does each targ= et > have an entirely unrelated set of ops, or is there a shared set of > primitive ops plus some oddballs? The shared set of primitive ops is basically qops :-) You probably could figure out a single common qet of qops, then write assem= bly=20 and glue them together like we do with dyngen. However once you've done tha= t=20 you've implemented most of what's needed for fully dynamic qops, so it=20 doesn't really seem worth it. > But backing up and just accepting that for a moment, in theory what you > need is some way to compile a C function to machine code, and then unwrap > that function into a .raw file containing just the machine code. So the > only per-compiler thing would be this unwrapper thingy. =20 Right. > But I already know=20 > that doesn't work because it doesn't explain the "unable to find spill > register" problem.=20 That a separate gcc bug. It gets stuck when you tell it not to use half the= =20 registers, then ask it to do 64-bit math. This is one of the reasons=20 eliminating the fixed registers is a good idea. > > It corresponds to "T0" in dyngen. In addition to the actual CPU state, > > dyngen > > uses 3 fixed register as scratch workspace. for qop purposes these are > > part of the guest CPU state. They're only there to aid conversion of the > > translation code, they'll go away eventually. > > Presumably the m68k target is pure qop, and hasn't got this sort of thing? Correct. There is one use of T0 left for communicating with the TB chaining code, bu= t=20 that's it and will probably go away eventually. > > > Or the value currently in a qreg has a type associated with it, but > > > the next value stored in that qreg may have a different type? > > > > A qreg has a fixed type. The value stored in that qreg has that type. To > > convert it to a different type you need to use an explicit conversion > > qop. > > So values don't have types, the qregs the values are _in_ have types. But > I thought there were an unlimited number of them (well, 1024 or so), and > they're dynamically allocated (at least some of the time). How does it > keep track of the type of a given qreg? (When you convert, you copy valu= es > from one qreg into another?) Yes. Conversion is just like any other qop. It reads one qreg, and writes t= he=20 result to a different qreg which happens to be a different type. > > > Possible translation: you can feed a qreg containing an I64 value to a > > > qop taking an i32 argument, and it'll typecast the sucker down > > > intelligently, but if you produce an I32 result and expect to use that > > > qreg's value as an I64 argument later, you have to call a > > > sign-extending qop on it first? > > > > Exactly. > > If you mix I32,F32 and/or F64 in this way Bad Things will happen. > > Presumably just the same kinds of Bad Things as "float f; *(int *)&f;"? Or qemu will get confused and crash. > > > seeing end with _im which I presume means "immediate". The alternati= ve > > > is _cc, but what does that mean? (Presumably not "closed captioned".) > > > > _cc are variants that set the condition codes. I may have got T0 and T1 > > backwards in the first 3 lines. > > Ah! > > Is this written down anywhere? I've read Fabrice's paper and the design > documentation, and I'm not remembering this. It's quite possible I missed > it when my brain filled up, though. Dunno. > > > Um, is my earlier characterization of "unwrapping stuff" at all close? > > > > Not entirely. I'm also replacing fixed locations (T2) with dynamicall > > allocated qregs. > > The dynamic allocation buys you what? (Less spilling?) More-or-less. It makes it easier to optimize. The code generator can pick w= hat=20 to put in registers, or even not put them there at all, instead of having t= o=20 do things exactly how you told it. It also means you don't need to reserve that register, avoiding the gcc una= ble=20 to find spill register bug you mentioned above. > > Most x86 instructions set the condition code flags. However most of the > > time these flags are ignored. eg. if you have to consecutive add > > instructions the first will set the flags, and the second will > > immediately overwrite them. > > > > qemu contains a back-propagation pass that will remove the code to set > > the flags after the first instruction. Currently this is implemented by > > changing an addl_cc op into a plain addl op. > > I actually understood that. Yay! > > > The flag-setting code would most likely require several qops to > > implement, so > > it would be much harder to prove it is not needed and remove it. So the= re > > is a mechanism for adding extra target qops, doing the flag elimination > > pass, then expanding those to generic qops. > > Um, wouldn't the flag setting code be fairly straightforward as a qop that > comes right _before_ the other op, as in "set the flags for doing this wi= th > these registers", that does nothing but set the flags (I.E. it wouldn't > modify the contents of any the registers, so it could be immediately > followed by the appropriate add or shift or so on), and then the flag > setting pass could just turn all the ones that weren't needed into > QOP_NULL? Theoretically possible, but not so easy in practice. Especially when you ge= t=20 things like partial flag clobbers, and lazy flag evaluation. Doing it as a= =20 target specific hack is much simpler and quicker. > Or is that what's happening now? (Do QOPs ever modify their input > registers, or only the output one?) The generic qops never modify inputs, and never read outputs. Inputs and=20 outputs can be the same qreg. > > > Ah, hang on. There's target_reginfo in translate-all.c, that's using > > > some of the other values. So what the heck does translate-all.c do?= =20 > > > (Shared code called by all the platform-dependent translate functions= ?) > > > > There are three fairly independent stages: > > 1) target-*/translate.c converts guest code into qops. > > 2) translate-all.c messes about with those qops a bit (allocates host > > registers, etc). > > 3) translate-op.c,translate-qop.c and target-*/ turns those qops into > > host code. > > Is pass 2 where the flag elimination pass goes (and presumably any other > optimizations that might get added)? No, that can't be the case or the > m68k code wouldn't need its own implementation of the flag elimination > pass... =46lag elimination is at the end of step 1. > > > > For converting targets you can probably ignore most of the > > > > translate-all and host-*/ changes. These implement generating code > > > > from the qops. > > > > > > Ok, this implies that qops are a new thing. Which looking at the code > > > sort > > > of supports. Which means I don't understand what's going on at all. > > > > qops and dyngen ops are both small "functions" that are represented in a > > similar way. The difference is that dyngen ops are target specific fixed > > functions, whereas qops are generic parameterized functions. > > So the 11x11 exponential complexity of qemu producing its own assembly > output might not be as much of a problem after switching to qops? RIght. The exponential complexity is if you write the assembly by hand inst= ead=20 of using gcc to generate it. > Possibly some of the common qops can have an asm block for 'em, and the > rest can go through the contortions target-*/op.c is currently doing with > (glue(glue(blah))) and so on. Currently we know how to generate code direcly for all qops. Anything more= =20 complicated must be either put in a helper function or split into multiple= =20 qops. > > While they are really separate things, the details have been chosen so = it > > should be possible to adapt the existing translate.c code rather than > > having to rewrite it from scratch. Decoding x86 instruction semantics is > > complicated :-) > > Yay iterative transformation with regression testing. (And nothing says > regression testing like booting a Linux distro under the sucker.) Exactly. > > Many of the simpler dyngen ops can be replaced with a single qop. Others > > can be replaces with a sequence of a few qops. Some of the more > > complicated ones may need to be moved into helper functions. > > At some point, I hope to understand helper functions. But I'm not there > yet. > > > > I need to re-read this later. My brain's full and I'm deeply confuse= d. > > > > I started off by saying qops were effectively instructions for an > > imaginary machine. translate-all.c rearranges them so they match up very > > closely with the instructions available on the host. Once this has been > > done turning them into binary code is relatively simple. > > I sort of thought this is what it was already doing, but apparently not... We're getting confused with tenses. I mean this once translate-all.c has=20 rearranged the qops we *do* generate host instructions from them without to= o=20 much effort. > > If native host FP is not available qemu will include appropriate bits so > > that > > after macro expansion and inlining you end up with: > > > > tmp =3D gen_new_qreg(QMODE_I32); > > gen_op_helper(HELPER_addf32, tmp, QREG_FOO, QREG_BAR). > > > > and the addf32 helper does the floating point addition using the > > "softfloat" library. The qemu softfloat library implementation may > > actually use hardware floating point rather than doing everything > > manually. > > No reason (except speed) the code output into a translation block can't do > function calls. I think. That's exactly what a helper function is. Calling functions is complicated,= so=20 I've restricted the functions that can be called to explicitly declared=20 helper functions. Paul