* [Qemu-devel] Running programs that dynamically generate code
@ 2014-08-29 2:24 Byron Hawkins
2014-08-29 9:22 ` Peter Maydell
0 siblings, 1 reply; 4+ messages in thread
From: Byron Hawkins @ 2014-08-29 2:24 UTC (permalink / raw)
To: QEMU Developer List
[-- Attachment #1: Type: text/plain, Size: 850 bytes --]
Hi, I'm working on a research project to optimize binary translation for
target applications that dynamically generate code, such as browser JIT
engines. When I run the octane benchmark in Chrome v8 under QEMU (i.e.,
qemu-x86_64), it shows significant overhead compared to a native run. Can
someone tell me how QEMU maintains consistency with the target application
when it dynamically generates code? For example, does it set executable
pages readonly and catch the page fault when the target app writes to it? I
searched the documentation and mailing list, but all the references to
"dynamically generated code" and "JIT" are about code generated by QEMU, not
about code generated by the target application. If there is a document about
this somewhere, please send me a link-or just a basic explanation would also
be very helpful. Thanks.
Byron
[-- Attachment #2: Type: text/html, Size: 2653 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] Running programs that dynamically generate code
2014-08-29 2:24 [Qemu-devel] Running programs that dynamically generate code Byron Hawkins
@ 2014-08-29 9:22 ` Peter Maydell
2014-09-02 8:16 ` Byron Hawkins
0 siblings, 1 reply; 4+ messages in thread
From: Peter Maydell @ 2014-08-29 9:22 UTC (permalink / raw)
To: Byron Hawkins; +Cc: QEMU Developer List
On 29 August 2014 03:24, Byron Hawkins <byronh@uci.edu> wrote:
> Hi, I’m working on a research project to optimize binary translation for
> target applications that dynamically generate code, such as browser JIT
> engines. When I run the octane benchmark in Chrome v8 under QEMU (i.e.,
> qemu-x86_64), it shows significant overhead compared to a native run. Can
> someone tell me how QEMU maintains consistency with the target application
> when it dynamically generates code?
When we generate code from a particular page of guest RAM, we
arrange to trap into QEMU if/when the guest writes to that page
(either using QEMU's own softmmu facilities if using the -system-
emulators, or by marking the page readonly and handling the segfault
if using the linux-user emulators). If we trap then we flush the cached
translation we had for that page and let the guest write proceed.
For guest CPUs like x86 there is some further complication to allow
writes to the page the guest is currently executing from to behave
as the guest expects. For guest CPUs like ARM where the architecture
requires guest code to perform icache/dcache maintenance operations
before the writes are visible to instruction fetch, we take advantage
of that to avoid jumping through some of the hoops.
The most obvious cause of overhead compared to a native run
would be that we emulate all our floating point and SIMD operations,
so unless Chrome's JIT is sticking strictly to integer operations
we're bound to go rather slower.
-- PMM
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] Running programs that dynamically generate code
2014-08-29 9:22 ` Peter Maydell
@ 2014-09-02 8:16 ` Byron Hawkins
2014-09-02 8:50 ` Peter Maydell
0 siblings, 1 reply; 4+ messages in thread
From: Byron Hawkins @ 2014-09-02 8:16 UTC (permalink / raw)
To: 'Peter Maydell'; +Cc: 'QEMU Developer List'
> -----Original Message-----
> From: Peter Maydell [mailto:peter.maydell@linaro.org]
> Sent: Friday, August 29, 2014 2:23 AM
> To: Byron Hawkins
> Cc: QEMU Developer List
> Subject: Re: [Qemu-devel] Running programs that dynamically generate
> code
>
> On 29 August 2014 03:24, Byron Hawkins <byronh@uci.edu> wrote:
> > Hi, I’m working on a research project to optimize binary translation
> > for target applications that dynamically generate code, such as
> > browser JIT engines. When I run the octane benchmark in Chrome v8
> > under QEMU (i.e., qemu-x86_64), it shows significant overhead compared
> > to a native run. Can someone tell me how QEMU maintains consistency
> > with the target application when it dynamically generates code?
>
> When we generate code from a particular page of guest RAM, we arrange to
> trap into QEMU if/when the guest writes to that page (either using QEMU's
> own softmmu facilities if using the -system- emulators, or by marking the
> page readonly and handling the segfault if using the linux-user emulators). If
> we trap then we flush the cached translation we had for that page and let
> the guest write proceed.
> For guest CPUs like x86 there is some further complication to allow writes to
> the page the guest is currently executing from to behave as the guest
> expects. For guest CPUs like ARM where the architecture requires guest
> code to perform icache/dcache maintenance operations before the writes
> are visible to instruction fetch, we take advantage of that to avoid jumping
> through some of the hoops.
>
> The most obvious cause of overhead compared to a native run would be that
> we emulate all our floating point and SIMD operations, so unless Chrome's
> JIT is sticking strictly to integer operations we're bound to go rather slower.
>
> -- PMM
Thanks Peter. Can you tell me how much translated code is flushed when a write is detected in an executable region? Is it just the code translated from the one modified page?
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] Running programs that dynamically generate code
2014-09-02 8:16 ` Byron Hawkins
@ 2014-09-02 8:50 ` Peter Maydell
0 siblings, 0 replies; 4+ messages in thread
From: Peter Maydell @ 2014-09-02 8:50 UTC (permalink / raw)
To: Byron Hawkins; +Cc: QEMU Developer List
On 2 September 2014 09:16, Byron Hawkins <byronh@uci.edu> wrote:
> Thanks Peter. Can you tell me how much translated code is flushed
> when a write is detected in an executable region? Is it just the code
> translated from the one modified page?
It should just be the code translated in that page (strictly speaking,
a region TARGET_PAGE_SIZE bytes long, which might be less
than what the guest considers its page size if the CPU family
might have several different page sizes.)
-- PMM
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-09-02 8:50 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-29 2:24 [Qemu-devel] Running programs that dynamically generate code Byron Hawkins
2014-08-29 9:22 ` Peter Maydell
2014-09-02 8:16 ` Byron Hawkins
2014-09-02 8:50 ` Peter Maydell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).