[Qemu-devel] Automatic generation of code-generator components (RETRY)

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Automatic generation of code-generator components (RETRY)
@ 2010-07-19 14:54 Eliot Moss
  2010-07-20 22:14 ` Blue Swirl
  0 siblings, 1 reply; 3+ messages in thread
From: Eliot Moss @ 2010-07-19 14:54 UTC (permalink / raw)
  To: qemu-devel

Dear developers -- I've seen no responses yet.  My proposal
is due in early August, so if anyone has feelings on this or
comments about it, please respond soon :-) ...

Thank you for your patience -- Eliot moss

-------- Original Message --------
Subject: [Qemu-devel] Automatic generation of code-generator components
Date: Tue, 13 Jul 2010 14:09:56 -0400
From: Eliot Moss <moss@cs.umass.edu>
Reply-To: moss@cs.umass.edu
To: qemu-devel@nongnu.org

Dear QEMU developers --

I have had some email conversation with a few active developers,
and with their encouragement, want to open it up for the whole list
to comment.

For several years my research group at UMass has been developing
generic code-generator generator (CGG) technology. Historic CGGs
have always been tied to a particular code-generation framework,
that is, to a particular intermediate representation (IR) and
compiler.  Our tool, called GIST (for Generator of Instruction
Selectors Tool), is designed to work from any reasonable IR and
to connect to any reasonable framework.  More technical details
below, but what we are hoping for is to be able to say that if
we make this industrial-strength with some funding from the
National Science Foundation, the QEMU community will be interested
in using it.  No commitment -- just that you think it *might*
be a good idea if we can make it go.  We would use QEMU as one
of our "demo" environments.

Ok, more details.  We have an architecture description language
called CISL (CoGenT Instruction Set Language; CoGenT is our overall
project's name).  It is somewhat like C or Java in appearance. You
define the various memories and registers, and the instructions.
To generate an instruction selector from input ISA A (generally
a compiler IR, but not necessarily) to output ISA B (generally,
but not always, a hardware architecture), you start with descriptions
of A and B in CISL -- some of which may already be around.  You
also write what we call a *mapping* from A to B, which simply
indicates where on B each memory/register of A should go.  The
tool then finds instruction selector patterns, at least one for
each instruction of the A machine.

For any given retargetable *framework* (compiler, interpreter, emulator),
we write one *adapter*, that knows how to take GIST patterns in their
internal form and write them out in the way that the framework
needs them.

Here's an example.  Suppose we are going from A = QEMU IR to
B = MIPS, that is, the same as the TCG back end for an emulator
running on the MIPS processor.  We have written a CISL
description for the QEMU IR (yes, already), and suppose we have
one for the MIPS, sufficient for code generation anyway.
[Side note: Compilers do not generally use every instruction
of their target, e.g., not the privileged mode ones, etc.
Also, in the presence of register allocation, they generally
target a slightly virtualized machine -- one with a huge
number of registers, which register allocation then resolves
to real registers and occasional spilled locations.]

The mapping would talk about how to find QEMU memory on the
MIPS (perhaps a dedicated base register), etc., and would
also capture the conventions for calling helper routines,
and so on.

The adapter for QEMU TCG back ends would generate something
like a C switch statement with one case for each QEMU IR
instruction. Each case might have some additional case
analysis. This is because (as you see in QEMU), a given
IR instruction can have special cases depending on values
of constants, whether something is in a register, etc.
GIST will have found different patterns for each of these,
and with each one there would be a *constraint*, indicating
when it applies.  For example, patterns for adding a constant
value on the MIPS would likely have a special case for
constants that fit in 16 bits, since then you can use one
immediate instruction.  Likewise, the constant 0 is a
special case since it can just be a move.  In addition
to constraints, patterns have costs, which one can develop
for any given target, but would typically be based on
number of instructions, number of instruction bytes,
number of memory references, etc. Thus the case analysis
for a given instruction would check for the lowest cost
patterns first, and would conclude with the most general
pattern (but which may be the most expensive).

The adapter would also need to generate the information
needed by the QEMU TCG register allocator.

Now, here are some things of additional interest:

- While QEMU IR -> emulation host code-generation is maybe
   the most obvious case, we can also handle the "front end"
   emulation target -> QEMU IR generation.  This probably
   requires a slightly different description of machine A
   than when A is the emulation host -- after all, we must
   handle *all* instructions, including privileged ones,
   etc.  But it is possible to make the descriptions
   modular in such a way that instructions used in both
   cases are not repeated.

- I noticed that someone is looking at interpretation
   rather than compilation.  We have seen that we can generate
   functional simulators (very close to emulators) from
   CISL descriptions.  Thus, it would be possible to generate
   a simulator for any of the machines of interest.  What
   QEMU provides is a framework with all the memory and
   device modeling, etc.

- An approach like this might make it easier to maintain a
   range of different models of the same ISA.  It might also
   facilitate moving towards multicore emulation, maybe even
   heterogeneous multicore.  It would also make it easier to
   change around how the simulated memory is organized and
   accessed, if that would be helpful.

- It would make it particularly simple to build an emulator for
   a new or extended machine.  Of course you still need a compiler
   for it, but we can use the same description to generate a C
   compiler, etc.

This would be a several year long project, with real support ($$)
for three or more years.  The goal is for GIST to have its own
self-sustaining open-source community after that.  We are in
conversation with some other software communities of interest
concerning whether they would also be in favor of the project.
These include the Jikes RVM Java Virtual Machine project
(both the optimizing and the non-optimizing compilers), another
compiler framework, and a simulator framework.

I look forward to your thoughts, questions, and reactions.

Regards -- Eliot Moss
==============================================================================
J. Eliot B. Moss, Professor               http://www.cs.umass.edu/~moss    www
Director, Arch. and Lang. Impl. Lab.      +1-413-545-4206                voice
Department of Computer Science            +1-413-695-4226                 cell
140 Governor's Drive, Room 372            +1-413-545-1249                  fax
University of Massachusetts at Amherst    moss@cs.umass.edu              email
Amherst, MA  01003-9264  USA              +1-413-545-2746 Laurie Downey  sec'y
==============================================================================

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] Automatic generation of code-generator components (RETRY)
  2010-07-19 14:54 [Qemu-devel] Automatic generation of code-generator components (RETRY) Eliot Moss
@ 2010-07-20 22:14 ` Blue Swirl
  2010-07-21  0:29   ` Eliot Moss
  0 siblings, 1 reply; 3+ messages in thread
From: Blue Swirl @ 2010-07-20 22:14 UTC (permalink / raw)
  To: moss; +Cc: qemu-devel

On Mon, Jul 19, 2010 at 2:54 PM, Eliot Moss <moss@cs.umass.edu> wrote:
> Dear developers -- I've seen no responses yet.  My proposal
> is due in early August, so if anyone has feelings on this or
> comments about it, please respond soon :-) ...
>
> Thank you for your patience -- Eliot moss
>
> -------- Original Message --------
> Subject: [Qemu-devel] Automatic generation of code-generator components
> Date: Tue, 13 Jul 2010 14:09:56 -0400
> From: Eliot Moss <moss@cs.umass.edu>
> Reply-To: moss@cs.umass.edu
> To: qemu-devel@nongnu.org
>
> Dear QEMU developers --
>
> I have had some email conversation with a few active developers,
> and with their encouragement, want to open it up for the whole list
> to comment.
>
> For several years my research group at UMass has been developing
> generic code-generator generator (CGG) technology. Historic CGGs
> have always been tied to a particular code-generation framework,
> that is, to a particular intermediate representation (IR) and
> compiler.  Our tool, called GIST (for Generator of Instruction
> Selectors Tool), is designed to work from any reasonable IR and
> to connect to any reasonable framework.  More technical details
> below, but what we are hoping for is to be able to say that if
> we make this industrial-strength with some funding from the
> National Science Foundation, the QEMU community will be interested
> in using it.  No commitment -- just that you think it *might*
> be a good idea if we can make it go.  We would use QEMU as one
> of our "demo" environments.

If you keep the development process open, with patches, RFCs and other
proposals flowing regularly to upstream QEMU, I'd suppose the
developer community would welcome this. But there have been several
forks of QEMU and merging those looks hard. The problem with those was
that the development was kept behind a curtain until it was 'finished'
according to some internal standards and then it was presented to us
as something that should just be swallowed in whole. In the continuous
cooperation model the result will be seamless. It's also possible that
there's too much mismatch between the goals of the projects, but with
early involvement of both sides, it can be detected before it's too
late.

> Ok, more details.  We have an architecture description language
> called CISL (CoGenT Instruction Set Language; CoGenT is our overall
> project's name).  It is somewhat like C or Java in appearance. You
> define the various memories and registers, and the instructions.
> To generate an instruction selector from input ISA A (generally
> a compiler IR, but not necessarily) to output ISA B (generally,
> but not always, a hardware architecture), you start with descriptions
> of A and B in CISL -- some of which may already be around.  You
> also write what we call a *mapping* from A to B, which simply
> indicates where on B each memory/register of A should go.  The
> tool then finds instruction selector patterns, at least one for
> each instruction of the A machine.
>
> For any given retargetable *framework* (compiler, interpreter, emulator),
> we write one *adapter*, that knows how to take GIST patterns in their
> internal form and write them out in the way that the framework
> needs them.
>
> Here's an example.  Suppose we are going from A = QEMU IR to
> B = MIPS, that is, the same as the TCG back end for an emulator
> running on the MIPS processor.  We have written a CISL
> description for the QEMU IR (yes, already), and suppose we have
> one for the MIPS, sufficient for code generation anyway.
> [Side note: Compilers do not generally use every instruction
> of their target, e.g., not the privileged mode ones, etc.
> Also, in the presence of register allocation, they generally
> target a slightly virtualized machine -- one with a huge
> number of registers, which register allocation then resolves
> to real registers and occasional spilled locations.]
>
> The mapping would talk about how to find QEMU memory on the
> MIPS (perhaps a dedicated base register), etc., and would
> also capture the conventions for calling helper routines,
> and so on.
>
> The adapter for QEMU TCG back ends would generate something
> like a C switch statement with one case for each QEMU IR
> instruction. Each case might have some additional case
> analysis. This is because (as you see in QEMU), a given
> IR instruction can have special cases depending on values
> of constants, whether something is in a register, etc.
> GIST will have found different patterns for each of these,
> and with each one there would be a *constraint*, indicating
> when it applies.  For example, patterns for adding a constant
> value on the MIPS would likely have a special case for
> constants that fit in 16 bits, since then you can use one
> immediate instruction.  Likewise, the constant 0 is a
> special case since it can just be a move.  In addition
> to constraints, patterns have costs, which one can develop
> for any given target, but would typically be based on
> number of instructions, number of instruction bytes,
> number of memory references, etc. Thus the case analysis
> for a given instruction would check for the lowest cost
> patterns first, and would conclude with the most general
> pattern (but which may be the most expensive).
>
> The adapter would also need to generate the information
> needed by the QEMU TCG register allocator.

I think this case is interesting for the currently unimplemented
hosts, or if the performance is improved from existing hosts compared
to TCG. The extra optimization pass may waste time which will not be
gained back by faster execution time of the better generated code.
There's also KVM, which should be always the fastest for cases where
host and target CPUs match.

>
> Now, here are some things of additional interest:
>
> - While QEMU IR -> emulation host code-generation is maybe
>  the most obvious case, we can also handle the "front end"
>  emulation target -> QEMU IR generation.  This probably
>  requires a slightly different description of machine A
>  than when A is the emulation host -- after all, we must
>  handle *all* instructions, including privileged ones,
>  etc.  But it is possible to make the descriptions
>  modular in such a way that instructions used in both
>  cases are not repeated.

Again, current implementations basically consist of just a huge C
switch statement, so it's hard to improve performance from that.
Enabling new targets would be interesting though, or maybe one day an
automatically generated translator could beat a hand written one.

> - I noticed that someone is looking at interpretation
>  rather than compilation.  We have seen that we can generate
>  functional simulators (very close to emulators) from
>  CISL descriptions.  Thus, it would be possible to generate
>  a simulator for any of the machines of interest.  What
>  QEMU provides is a framework with all the memory and
>  device modeling, etc.
>
> - An approach like this might make it easier to maintain a
>  range of different models of the same ISA.  It might also
>  facilitate moving towards multicore emulation, maybe even
>  heterogeneous multicore.  It would also make it easier to
>  change around how the simulated memory is organized and
>  accessed, if that would be helpful.
>
> - It would make it particularly simple to build an emulator for
>  a new or extended machine.  Of course you still need a compiler
>  for it, but we can use the same description to generate a C
>  compiler, etc.
>
> This would be a several year long project, with real support ($$)
> for three or more years.  The goal is for GIST to have its own
> self-sustaining open-source community after that.  We are in
> conversation with some other software communities of interest
> concerning whether they would also be in favor of the project.
> These include the Jikes RVM Java Virtual Machine project
> (both the optimizing and the non-optimizing compilers), another
> compiler framework, and a simulator framework.
>
> I look forward to your thoughts, questions, and reactions.
>
> Regards -- Eliot Moss
> ==============================================================================
> J. Eliot B. Moss, Professor               http://www.cs.umass.edu/~moss
>  www
> Director, Arch. and Lang. Impl. Lab.      +1-413-545-4206
>  voice
> Department of Computer Science            +1-413-695-4226
> cell
> 140 Governor's Drive, Room 372            +1-413-545-1249
>  fax
> University of Massachusetts at Amherst    moss@cs.umass.edu
>  email
> Amherst, MA  01003-9264  USA              +1-413-545-2746 Laurie Downey
>  sec'y
> ==============================================================================
>
>
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] Automatic generation of code-generator components (RETRY)
  2010-07-20 22:14 ` Blue Swirl
@ 2010-07-21  0:29   ` Eliot Moss
  0 siblings, 0 replies; 3+ messages in thread
From: Eliot Moss @ 2010-07-21  0:29 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel

On 7/20/2010 6:14 PM, Blue Swirl wrote:

> If you keep the development process open, with patches, RFCs and other
> proposals flowing regularly to upstream QEMU, I'd suppose the
> developer community would welcome this. But there have been several
> forks of QEMU and merging those looks hard. The problem with those was
> that the development was kept behind a curtain until it was 'finished'
> according to some internal standards and then it was presented to us
> as something that should just be swallowed in whole. In the continuous
> cooperation model the result will be seamless. It's also possible that
> there's too much mismatch between the goals of the projects, but with
> early involvement of both sides, it can be detected before it's too
> late.

Yes, I agree that an incremental and open process will work much
better.  Obviously we'd need to learn the local customs :-) , but
I expect we'd start with one or two of the existing guests and
hosts, not replacing the current code but doing something like
using a different but related name.  Among other things, we'd
probably want a scheme where we could test our code side by side
a known-working version at the start -- an interesting challenge,
but perhaps not too hard if we give it some thought.  Anyway,
yes, it's important not to present something that is a huge change,
in one gulp, without experience-building and confidence-building
steps in the middle.

> I think this case is interesting for the currently unimplemented
> hosts, or if the performance is improved from existing hosts compared
> to TCG. The extra optimization pass may waste time which will not be
> gained back by faster execution time of the better generated code.
> There's also KVM, which should be always the fastest for cases where
> host and target CPUs match.

We can experiment some with different degrees of (simple)
optimization, though I expect TCG probably already chose
a sweet spot.  I am not familiar yet with KVM and should
become more so.  I gather that it is the case where guest
and host are the same, which indeed gives rise to both
unique opportunities and unique challenges in performance.
However, I remain confident that we can "encode" suitable
strategies with our tool.

You are right, though, that the most promise is for new
ISAs.  We probably cannot do much better than TCG front-ends
and back-ends tuned by experts.  However, the ease of
reworking various details may make it easier to tune and
possibly do a little better; certainly with new ISAs.

> Again, current implementations basically consist of just a huge C
> switch statement, so it's hard to improve performance from that.
> Enabling new targets would be interesting though, or maybe one day an
> automatically generated translator could beat a hand written one.

Agreed.

Again, thanks for your encouraging reply and your very
thoughtful and useful suggestions.  I have been pleased to
find this a polite community -- sad that not all open
source communities are so polite and friendly!

Best wishes -- Eliot Moss

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-07-21  0:29 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-19 14:54 [Qemu-devel] Automatic generation of code-generator components (RETRY) Eliot Moss
2010-07-20 22:14 ` Blue Swirl
2010-07-21  0:29   ` Eliot Moss

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).