[Qemu-devel] Get only TCG code without execution

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Get only TCG code without execution
@ 2012-01-15 23:09 Rajat Goyal
  2012-01-16  5:32 ` Mulyadi Santosa
  2012-01-16  8:41 ` Stefan Hajnoczi
  0 siblings, 2 replies; 19+ messages in thread
From: Rajat Goyal @ 2012-01-15 23:09 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 787 bytes --]

I am doing a project to build a daemonic ARM emulator using QEMU. One of
the requirements is to get the complete TCG code for any multi-threaded ARM
program that I run on QEMU. I do not need QEMU to execute the program and
show me the output. Just the entire TCG code. The latest version of
qemu-arm seems to break while running pthread parallel ARM binaries, ie,
qemu-arm terminates without completing execution and hence, the entire TCG
code cannot be captured in the log. Is there a way by which I can get the
complete TCG code for pthread parallel binaries in exchange for not making
QEMU execute the binary?

Any help would be appreciated.

-- 
Rajat Goyal
5th year undergraduate student
Integrated Master of Technology
Mathematics and Computing
Department of Mathematics
IIT Delhi

[-- Attachment #2: Type: text/html, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] Get only TCG code without execution
  2012-01-15 23:09 [Qemu-devel] Get only TCG code without execution Rajat Goyal
@ 2012-01-16  5:32 ` Mulyadi Santosa
  2012-01-16  8:41 ` Stefan Hajnoczi
  1 sibling, 0 replies; 19+ messages in thread
From: Mulyadi Santosa @ 2012-01-16  5:32 UTC (permalink / raw)
  To: Rajat Goyal; +Cc: qemu-devel

Hi....

On Mon, Jan 16, 2012 at 06:09, Rajat Goyal <rajat.goyal.90@gmail.com> wrote:
Is there a way by which I can get the
> complete TCG code for pthread parallel binaries in exchange for not making
> QEMU execute the binary?

The thing is, the way I see it, TCG is meant to be like JIT compiler.
Whereas what you're going to do is referring to static compiler.

Assuming your program has no interactive part (no user input, no need
to wait keypress etc), maybe you can just comment out the Qemu code
part that jump into translated block

NB: You were referrring to qemu user mode emulation, right?

-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] Get only TCG code without execution
  2012-01-15 23:09 [Qemu-devel] Get only TCG code without execution Rajat Goyal
  2012-01-16  5:32 ` Mulyadi Santosa
@ 2012-01-16  8:41 ` Stefan Hajnoczi
  2012-01-16 12:23   ` Rajat Goyal
  1 sibling, 1 reply; 19+ messages in thread
From: Stefan Hajnoczi @ 2012-01-16  8:41 UTC (permalink / raw)
  To: Rajat Goyal; +Cc: qemu-devel

On Sun, Jan 15, 2012 at 11:09:18PM +0000, Rajat Goyal wrote:
> I am doing a project to build a daemonic ARM emulator using QEMU. One of
> the requirements is to get the complete TCG code for any multi-threaded ARM
> program that I run on QEMU. I do not need QEMU to execute the program and
> show me the output. Just the entire TCG code. The latest version of
> qemu-arm seems to break while running pthread parallel ARM binaries, ie,
> qemu-arm terminates without completing execution and hence, the entire TCG
> code cannot be captured in the log. Is there a way by which I can get the
> complete TCG code for pthread parallel binaries in exchange for not making
> QEMU execute the binary?

QEMU is a dynamic binary translator.  You don't know the next block
without executing the current block.  It's not possible to translate a
whole program without executing it - remember it can load shared
libraries, use self-modifying code, or just employ indirect jumps which
you cannot analyze statically.

In the general case it's not possible.  Can you explain why you're
trying to do this?

Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] Get only TCG code without execution
  2012-01-16  8:41 ` Stefan Hajnoczi
@ 2012-01-16 12:23   ` Rajat Goyal
  2012-01-16 12:29     ` Peter Maydell
  0 siblings, 1 reply; 19+ messages in thread
From: Rajat Goyal @ 2012-01-16 12:23 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2422 bytes --]

Thanks for your text, Stefan.

The situation is like this. The most basic multi-threaded program (using
pthreads) which just prints something like "I am Thread 1" and "I am Thread
2" does not work over the QEMU user emulator. There are no output messages
saying "I am thread 1" etc. when the program binary is run over qemu-arm or
qemu-i386. For qemu-i386, the reason is alright - there is no
implementation for the futex syscall. But for qemu-arm, the syscall trace
shows *" *** longjmp causes uninitialized stack frame ***: qemu-arm
terminated"*. And hence, the entire TCG code for the binary is not obtained
in the QEMU log since QEMU does not complete execution of the binary.

What is the way out of this? The reason I need TCG code is because my
project work is to write a semantics for TCG micro-operations and then
compare my semantics with a semantics for ARM instructions being written by
someone else. To test my semantics, I need the corresponding TCG code for
several different multi-threaded ARM binaries.

Many thanks in anticipation,
Rajat.

On Mon, Jan 16, 2012 at 8:41 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:

> On Sun, Jan 15, 2012 at 11:09:18PM +0000, Rajat Goyal wrote:
> > I am doing a project to build a daemonic ARM emulator using QEMU. One of
> > the requirements is to get the complete TCG code for any multi-threaded
> ARM
> > program that I run on QEMU. I do not need QEMU to execute the program and
> > show me the output. Just the entire TCG code. The latest version of
> > qemu-arm seems to break while running pthread parallel ARM binaries, ie,
> > qemu-arm terminates without completing execution and hence, the entire
> TCG
> > code cannot be captured in the log. Is there a way by which I can get the
> > complete TCG code for pthread parallel binaries in exchange for not
> making
> > QEMU execute the binary?
>
> QEMU is a dynamic binary translator.  You don't know the next block
> without executing the current block.  It's not possible to translate a
> whole program without executing it - remember it can load shared
> libraries, use self-modifying code, or just employ indirect jumps which
> you cannot analyze statically.
>
> In the general case it's not possible.  Can you explain why you're
> trying to do this?
>
> Stefan
>

-- 
Rajat Goyal
5th year undergraduate student
Integrated Master of Technology
Mathematics and Computing
Department of Mathematics
IIT Delhi

[-- Attachment #2: Type: text/html, Size: 2967 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] Get only TCG code without execution
  2012-01-16 12:23   ` Rajat Goyal
@ 2012-01-16 12:29     ` Peter Maydell
  2012-01-17  1:04       ` 陳韋任
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Maydell @ 2012-01-16 12:29 UTC (permalink / raw)
  To: Rajat Goyal; +Cc: Stefan Hajnoczi, qemu-devel

On 16 January 2012 12:23, Rajat Goyal <rajat.goyal.90@gmail.com> wrote:
> The situation is like this. The most basic multi-threaded program (using
> pthreads) which just prints something like "I am Thread 1" and "I am Thread
> 2" does not work over the QEMU user emulator. There are no output messages
> saying "I am thread 1" etc. when the program binary is run over qemu-arm or
> qemu-i386. For qemu-i386, the reason is alright - there is no implementation
> for the futex syscall. But for qemu-arm, the syscall trace shows " ***
> longjmp causes uninitialized stack frame ***: qemu-arm terminated". And
> hence, the entire TCG code for the binary is not obtained in the QEMU log
> since QEMU does not complete execution of the binary.

Which version of QEMU are you using? The "uninitialized stack frame"
bug should be fixed in 1.0: https://bugs.launchpad.net/qemu/+bug/823902

> What is the way out of this? The reason I need TCG code is because my
> project work is to write a semantics for TCG micro-operations and then
> compare my semantics with a semantics for ARM instructions being written by
> someone else. To test my semantics, I need the corresponding TCG code for
> several different multi-threaded ARM binaries.

Why does this have to be a multi-threaded binary? In the multithreaded
case, the instructions executed by QEMU won't be deterministic (it will
depend on how the host OS schedules the multiple threads) so it's going
to be hard to compare a long trace output to something else.

-- PMM

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] Get only TCG code without execution
  2012-01-16 12:29     ` Peter Maydell
@ 2012-01-17  1:04       ` 陳韋任
  2012-01-17  8:33         ` Peter Maydell
  0 siblings, 1 reply; 19+ messages in thread
From: 陳韋任 @ 2012-01-17  1:04 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Stefan Hajnoczi, Rajat Goyal, qemu-devel

> > What is the way out of this? The reason I need TCG code is because my
> > project work is to write a semantics for TCG micro-operations and then
> > compare my semantics with a semantics for ARM instructions being written by
> > someone else. To test my semantics, I need the corresponding TCG code for
> > several different multi-threaded ARM binaries.
> 
> Why does this have to be a multi-threaded binary? In the multithreaded
> case, the instructions executed by QEMU won't be deterministic (it will
> depend on how the host OS schedules the multiple threads) so it's going
> to be hard to compare a long trace output to something else.

  I guess Rajat's goal is to compare the "semantics" of TCG ops and ARM binary,
therefore the non-deterministic might not be the issue. Or he want to use
"semantics" to solve the non-deterministic problem.

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] Get only TCG code without execution
  2012-01-17  1:04       ` 陳韋任
@ 2012-01-17  8:33         ` Peter Maydell
  2012-01-19 16:00           ` Rajat Goyal
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Maydell @ 2012-01-17  8:33 UTC (permalink / raw)
  To: 陳韋任; +Cc: Stefan Hajnoczi, Rajat Goyal, qemu-devel

On 17 January 2012 01:04, 陳韋任 <chenwj@iis.sinica.edu.tw> wrote:
>> > What is the way out of this? The reason I need TCG code is because my
>> > project work is to write a semantics for TCG micro-operations and then
>> > compare my semantics with a semantics for ARM instructions being written by
>> > someone else. To test my semantics, I need the corresponding TCG code for
>> > several different multi-threaded ARM binaries.
>>
>> Why does this have to be a multi-threaded binary? In the multithreaded
>> case, the instructions executed by QEMU won't be deterministic (it will
>> depend on how the host OS schedules the multiple threads) so it's going
>> to be hard to compare a long trace output to something else.
>
>  I guess Rajat's goal is to compare the "semantics" of TCG ops and ARM binary,
> therefore the non-deterministic might not be the issue. Or he want to use
> "semantics" to solve the non-deterministic problem.

But if you're looking at the semantics at a level where you don't
care about the non-determinism of the threading, you might just
as well look at them at an individual instruction or TB level,
in which case a single threaded program is just as good and less
confusing, surely?

-- PMM

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] Get only TCG code without execution
  2012-01-17  8:33         ` Peter Maydell
@ 2012-01-19 16:00           ` Rajat Goyal
  2012-01-19 16:15             ` Peter Maydell
  2012-01-20  6:12             ` 陳韋任
  0 siblings, 2 replies; 19+ messages in thread
From: Rajat Goyal @ 2012-01-19 16:00 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2397 bytes --]

Thank you so much for your help Peter. I was using version 0.15.1. On 1.0,
it works like a dream!

I was not talking about semantics of individual instructions but semantics
of the whole multi-threaded program. Multi-threaded programs can lead to
several different (most of which are unintended) states of the CPU. What
states are possible is described in a mathematically rigorous definition of
the ARM memory model. My task is to implement this memory model over TCG
ops and then compare the results on several different (multi-threaded)
litmus tests with the implementation of the memory model over ARM
instructions. For the same task, I need QEMU to give me the TCG translation
for code which it never branches into and hence, never needs to translate
and execute (because ARM multiprocessors can perform speculative execution).

Rajat.

On Tue, Jan 17, 2012 at 8:33 AM, Peter Maydell <peter.maydell@linaro.org>wrote:

> On 17 January 2012 01:04, 陳韋任 <chenwj@iis.sinica.edu.tw> wrote:
> >> > What is the way out of this? The reason I need TCG code is because my
> >> > project work is to write a semantics for TCG micro-operations and then
> >> > compare my semantics with a semantics for ARM instructions being
> written by
> >> > someone else. To test my semantics, I need the corresponding TCG code
> for
> >> > several different multi-threaded ARM binaries.
> >>
> >> Why does this have to be a multi-threaded binary? In the multithreaded
> >> case, the instructions executed by QEMU won't be deterministic (it will
> >> depend on how the host OS schedules the multiple threads) so it's going
> >> to be hard to compare a long trace output to something else.
> >
> >  I guess Rajat's goal is to compare the "semantics" of TCG ops and ARM
> binary,
> > therefore the non-deterministic might not be the issue. Or he want to use
> > "semantics" to solve the non-deterministic problem.
>
> But if you're looking at the semantics at a level where you don't
> care about the non-determinism of the threading, you might just
> as well look at them at an individual instruction or TB level,
> in which case a single threaded program is just as good and less
> confusing, surely?
>
> -- PMM
>

-- 
Rajat Goyal
5th year undergraduate student
Integrated Master of Technology
Mathematics and Computing
Department of Mathematics
IIT Delhi

[-- Attachment #2: Type: text/html, Size: 2926 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] Get only TCG code without execution
  2012-01-19 16:00           ` Rajat Goyal
@ 2012-01-19 16:15             ` Peter Maydell
  2012-01-20  6:38               ` 陳韋任
  2012-01-20  6:12             ` 陳韋任
  1 sibling, 1 reply; 19+ messages in thread
From: Peter Maydell @ 2012-01-19 16:15 UTC (permalink / raw)
  To: Rajat Goyal; +Cc: qemu-devel

On 19 January 2012 16:00, Rajat Goyal <rajat.goyal.90@gmail.com> wrote:
> Thank you so much for your help Peter. I was using version 0.15.1. On 1.0,
> it works like a dream!

Good.

> I was not talking about semantics of individual instructions but semantics
> of the whole multi-threaded program. Multi-threaded programs can lead to
> several different (most of which are unintended) states of the CPU. What
> states are possible is described in a mathematically rigorous definition of
> the ARM memory model. My task is to implement this memory model over TCG ops
> and then compare the results on several different (multi-threaded) litmus
> tests with the implementation of the memory model over ARM instructions.

Some points to note:
 * The current QEMU code has some known race conditions which can cause
crashes/hangs in heavily threaded programs in linux-user mode; see eg
https://bugs.launchpad.net/qemu/+bug/668799
 * We don't really make a serious attempt at implementing the ARM memory
model in QEMU; our load/store exclusive implementation is pretty hopeless,
for instance
 * In linux-user mode we basically just pass loads/stores/etc through as
host-cpu loads/stores, so you get whatever the host's memory model semantics
are, not what the guest CPU is supposed to do
 * a combination of the above plus the fact we don't implement caches in
system emulation mode means that our implementation of all the barrier
insns is a simple no-op; you'll never see barriers at the TCG op level

> For
> the same task, I need QEMU to give me the TCG translation for code which it
> never branches into and hence, never needs to translate and execute (because
> ARM multiprocessors can perform speculative execution).

QEMU does not do TCG translation for code which it doesn't branch into.
Indeed, it's not actually possible to tell whether it is code and not
data until you've branched into it...

-- PMM

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] Get only TCG code without execution
  2012-01-19 16:15             ` Peter Maydell
@ 2012-01-20  6:38               ` 陳韋任
  2012-01-21  0:21                 ` Jamie Lokier
  0 siblings, 1 reply; 19+ messages in thread
From: 陳韋任 @ 2012-01-20  6:38 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Rajat Goyal, qemu-devel

> > I was not talking about semantics of individual instructions but semantics
> > of the whole multi-threaded program. Multi-threaded programs can lead to
> > several different (most of which are unintended) states of the CPU. What
> > states are possible is described in a mathematically rigorous definition of
> > the ARM memory model. My task is to implement this memory model over TCG ops
> > and then compare the results on several different (multi-threaded) litmus
> > tests with the implementation of the memory model over ARM instructions.
> 
> Some points to note:
>  * The current QEMU code has some known race conditions which can cause
> crashes/hangs in heavily threaded programs in linux-user mode; see eg
> https://bugs.launchpad.net/qemu/+bug/668799
>  * We don't really make a serious attempt at implementing the ARM memory
> model in QEMU; our load/store exclusive implementation is pretty hopeless,
> for instance
>  * In linux-user mode we basically just pass loads/stores/etc through as
> host-cpu loads/stores, so you get whatever the host's memory model semantics
> are, not what the guest CPU is supposed to do
>  * a combination of the above plus the fact we don't implement caches in
> system emulation mode means that our implementation of all the barrier
> insns is a simple no-op; you'll never see barriers at the TCG op level

  What's load/store exclusive implementation? And as a general emulator, QEMU
shouldn't implement any architecture-specific memory model, right? What comes
into my mind is QEMU only need to follow guest memory operations when translates
guest binary to TCG ops. When translate TCG ops to host binary, it also has to
be careful not to mess up the memory ordering.

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] Get only TCG code without execution
  2012-01-20  6:38               ` 陳韋任
@ 2012-01-21  0:21                 ` Jamie Lokier
  2012-02-02 19:35                   ` Rajat Goyal
  0 siblings, 1 reply; 19+ messages in thread
From: Jamie Lokier @ 2012-01-21  0:21 UTC (permalink / raw)
  To: 陳韋任; +Cc: Peter Maydell, Rajat Goyal, qemu-devel

陳韋任 wrote:
>   What's load/store exclusive implementation?

It's how some architectures do atomic operations, instead of having
atomic instructions like x86 does.

> And as a general emulator, QEMU shouldn't implement any
> architecture-specific memory model, right? What comes into my mind
> is QEMU only need to follow guest memory operations when translates
> guest binary to TCG ops. When translate TCG ops to host binary, it
> also has to be careful not to mess up the memory ordering.

The error occurs when emulating two or more guest CPUs in parallel
using two or more host CPUs for speed.  Then "not mess up the memory
ordering" may require barrier instructions in the host binary code,
depending on the guest and host architectures.  Without barrier
instructions, the CPUs reorder memory accesses even if the instruction
order is kept the same. This reordering done by the CPU is called the
memory model. TCG cannot currently produce these barrier instructions,
and it's not clear if it will ever be able to do so efficiently.

-- Jamie

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] Get only TCG code without execution
  2012-01-21  0:21                 ` Jamie Lokier
@ 2012-02-02 19:35                   ` Rajat Goyal
  0 siblings, 0 replies; 19+ messages in thread
From: Rajat Goyal @ 2012-02-02 19:35 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1798 bytes --]

Hi,

I have modified QEMU to act as a TCG compiler and give me the TCG code for
the whole binary. However, I cannot find a way to obtain the last address
in the binary. The symbol table loaded into syminfos contains only the
address of the last symbol. Not the address of the last machine
instruction. I can obtain this if I can obtain the length of the last
section in the ELF. How can I do that in QEMU?

Thanks,
Rajat.

On Sat, Jan 21, 2012 at 12:21 AM, Jamie Lokier <jamie@shareable.org> wrote:

> 陳韋任 wrote:
> >   What's load/store exclusive implementation?
>
> It's how some architectures do atomic operations, instead of having
> atomic instructions like x86 does.
>
> > And as a general emulator, QEMU shouldn't implement any
> > architecture-specific memory model, right? What comes into my mind
> > is QEMU only need to follow guest memory operations when translates
> > guest binary to TCG ops. When translate TCG ops to host binary, it
> > also has to be careful not to mess up the memory ordering.
>
> The error occurs when emulating two or more guest CPUs in parallel
> using two or more host CPUs for speed.  Then "not mess up the memory
> ordering" may require barrier instructions in the host binary code,
> depending on the guest and host architectures.  Without barrier
> instructions, the CPUs reorder memory accesses even if the instruction
> order is kept the same. This reordering done by the CPU is called the
> memory model. TCG cannot currently produce these barrier instructions,
> and it's not clear if it will ever be able to do so efficiently.
>
> -- Jamie
>



-- 
Rajat Goyal
5th year undergraduate student
Master of Technology in Mathematics and Computing - Integrated Program
Department of Mathematics
IIT Delhi

[-- Attachment #2: Type: text/html, Size: 2270 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] Get only TCG code without execution
  2012-01-19 16:00           ` Rajat Goyal
  2012-01-19 16:15             ` Peter Maydell
@ 2012-01-20  6:12             ` 陳韋任
  2012-01-20  9:09               ` Peter Maydell
  1 sibling, 1 reply; 19+ messages in thread
From: 陳韋任 @ 2012-01-20  6:12 UTC (permalink / raw)
  To: Rajat Goyal; +Cc: Peter Maydell, qemu-devel

> I was not talking about semantics of individual instructions but semantics
> of the whole multi-threaded program. Multi-threaded programs can lead to
> several different (most of which are unintended) states of the CPU. What
> states are possible is described in a mathematically rigorous definition of
> the ARM memory model. My task is to implement this memory model over TCG
> ops and then compare the results on several different (multi-threaded)
> litmus tests with the implementation of the memory model over ARM
> instructions. For the same task, I need QEMU to give me the TCG translation
> for code which it never branches into and hence, never needs to translate
> and execute (because ARM multiprocessors can perform speculative execution).

  Out of curiosity. What's ARM memory model? From the Wikipedia [1], it seems
ARMv7 has the same memory model as IA64.

Regards,
chenwj

[1] http://en.wikipedia.org/wiki/Memory_ordering

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] Get only TCG code without execution
  2012-01-20  6:12             ` 陳韋任
@ 2012-01-20  9:09               ` Peter Maydell
  2012-01-20  9:44                 ` 陳韋任
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Maydell @ 2012-01-20  9:09 UTC (permalink / raw)
  To: 陳韋任; +Cc: Rajat Goyal, qemu-devel

On 20 January 2012 06:12, 陳韋任 <chenwj@iis.sinica.edu.tw> wrote:
>  Out of curiosity. What's ARM memory model? From the Wikipedia [1], it seems
> ARMv7 has the same memory model as IA64.

The ARM memory model is the set of semantics for memory
accesses as defined in the ARM Architecture Reference
Manual (covering not just reordering but also exclusive
accesses, alignment, barriers, etc). The manual devotes
50 pages to it so I'm not about to try to summarise it here :-)

> What's load/store exclusive implementation?

How we implement the ARM instructions LDREX/STREX/LDREXD/STREXD/etc.
These have documented (complicated!) semantics which our
implementation doesn't provide.

> And as a general emulator, QEMU shouldn't implement any
> architecture-specific memory model, right?

Wrong, at least in theory. Ideally QEMU should implement exactly
the semantics required by the guest architecture memory model
(it's allowed to be stricter than the architecture requires, of
course), in the same way it should implement the semantics required
by the guest architecture instruction set. A guest binary for ARM
can rely on the memory ordering constraints imposed by the memory
model just as much as it can rely on the fact that the ADD instruction
adds two registers together. In practice, of course (a) this is an
enormous amount of work and also slows the emulator down drastically
and (b) guest binaries don't actually rely that much on the memory
model. And the fairly strict memory model provided by x86 means that
for x86 hosts we actually get most of the important bits of the guest
memory model right anyway.

> What comes into my mind is QEMU only need to follow guest memory
> operations when translates guest binary to TCG ops. When translate
> TCG ops to host binary, it also has to be careful not to mess up
> the memory ordering.

This might be doable if TCG provided a set of ops which allowed
you to implement the guest memory model; it doesn't. If we ever
move to emulating guest SMP in multiple host threads this will
become more important, I suspect.

>From a pragmatic "we just want to run guests" point of view, what
QEMU does now is entirely sufficient; I'm just saying that for
a strictly correct emulation of the guest architecture we're a
bit lacking.

-- PMM

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] Get only TCG code without execution
  2012-01-20  9:09               ` Peter Maydell
@ 2012-01-20  9:44                 ` 陳韋任
  2012-01-20 10:46                   ` Peter Maydell
  0 siblings, 1 reply; 19+ messages in thread
From: 陳韋任 @ 2012-01-20  9:44 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Rajat Goyal, qemu-devel, 陳韋任

On Fri, Jan 20, 2012 at 09:09:46AM +0000, Peter Maydell wrote:
> On 20 January 2012 06:12, 陳韋任 <chenwj@iis.sinica.edu.tw> wrote:
> >  Out of curiosity. What's ARM memory model? From the Wikipedia [1], it seems
> > ARMv7 has the same memory model as IA64.
> 
> The ARM memory model is the set of semantics for memory
> accesses as defined in the ARM Architecture Reference
> Manual (covering not just reordering but also exclusive
> accesses, alignment, barriers, etc). The manual devotes
> 50 pages to it so I'm not about to try to summarise it here :-)

  Seems the Wikipedia only lists the memory ordering part. ;)
 
> > And as a general emulator, QEMU shouldn't implement any
> > architecture-specific memory model, right?
> 
> Wrong, at least in theory. Ideally QEMU should implement exactly
> the semantics required by the guest architecture memory model
> (it's allowed to be stricter than the architecture requires, of
> course), in the same way it should implement the semantics required
> by the guest architecture instruction set. A guest binary for ARM
> can rely on the memory ordering constraints imposed by the memory
> model just as much as it can rely on the fact that the ADD instruction
> adds two registers together. In practice, of course (a) this is an
> enormous amount of work and also slows the emulator down drastically
> and (b) guest binaries don't actually rely that much on the memory
> model. And the fairly strict memory model provided by x86 means that
> for x86 hosts we actually get most of the important bits of the guest
> memory model right anyway.

  AFAIK, LLVM defines it's own memory model [1] which is inspired by the C++11
memory model. That's why I think instead of implementing architecture-specific
memory model, QEMU should define a more general (strict) one.

  You said,

  "guest binaries don't actually rely that much on the memory model."

I think the reason is those guest binaries are single thread. Memory model is
important in multi-threaded case. BTW, our binary translator now can translate
x86 binary to ARM binary, and ARM has weaker memory model than x86.
 
[1] http://llvm.org/docs/LangRef.html#memmodel

Regards,
chenwj

P.S. Happy Chinese New Year. :)

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] Get only TCG code without execution
  2012-01-20  9:44                 ` 陳韋任
@ 2012-01-20 10:46                   ` Peter Maydell
  2012-01-20 19:40                     ` Jamie Lokier
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Maydell @ 2012-01-20 10:46 UTC (permalink / raw)
  To: 陳韋任; +Cc: Rajat Goyal, qemu-devel

On 20 January 2012 09:44, 陳韋任 <chenwj@iis.sinica.edu.tw> wrote:
> On Fri, Jan 20, 2012 at 09:09:46AM +0000, Peter Maydell wrote:
>  AFAIK, LLVM defines it's own memory model [1] which is inspired by the C++11
> memory model. That's why I think instead of implementing architecture-specific
> memory model, QEMU should define a more general (strict) one.

LLVM has the advantage that it can require all its incoming code
to adhere to a common memory model (ie something like the C++ one).

>  You said,
>
>  "guest binaries don't actually rely that much on the memory model."
>
> I think the reason is those guest binaries are single thread. Memory model is
> important in multi-threaded case. BTW, our binary translator now can translate
> x86 binary to ARM binary, and ARM has weaker memory model than x86.

Yes. At the moment this works for QEMU on ARM hosts because in
system mode QEMU itself is single-threaded so the nastier interactions
between multiple guest CPUs don't occur (just about every memory model
defines that memory interactions within a single thread of execution
behave in the obvious manner). I also had in mind that guest binaries
tend to make fairly stereotypical use of things like LDREX/STREX
rather than relying on obscure details like their interaction with
plain load/stores.

> P.S. Happy Chinese New Year. :)

You too!

-- PMM

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] Get only TCG code without execution
  2012-01-20 10:46                   ` Peter Maydell
@ 2012-01-20 19:40                     ` Jamie Lokier
  2012-02-06  7:25                       ` 陳韋任
  0 siblings, 1 reply; 19+ messages in thread
From: Jamie Lokier @ 2012-01-20 19:40 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Rajat Goyal, qemu-devel, 陳韋任

Peter Maydell wrote:
> >  "guest binaries don't actually rely that much on the memory model."
> >
> > I think the reason is those guest binaries are single thread. Memory model is
> > important in multi-threaded case. BTW, our binary translator now can translate
> > x86 binary to ARM binary, and ARM has weaker memory model than x86.
> 
> Yes. At the moment this works for QEMU on ARM hosts because in
> system mode QEMU itself is single-threaded so the nastier interactions
> between multiple guest CPUs don't occur (just about every memory model
> defines that memory interactions within a single thread of execution
> behave in the obvious manner).

> I also had in mind that guest binaries
> tend to make fairly stereotypical use of things like LDREX/STREX
> rather than relying on obscure details like their interaction with
> plain load/stores.

As x86 doesn't use or need barrier instructions, when translating x86
to (say) run on ARM host, multi-threaded code that needs barriers
isn't easy to detect, so barriers may be required between every memory
access in the generated ARM code.

-- Jamie

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] Get only TCG code without execution
  2012-01-20 19:40                     ` Jamie Lokier
@ 2012-02-06  7:25                       ` 陳韋任
  2012-02-10  3:08                         ` Jamie Lokier
  0 siblings, 1 reply; 19+ messages in thread
From: 陳韋任 @ 2012-02-06  7:25 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Peter Maydell, Rajat Goyal, qemu-devel, 陳韋任

> As x86 doesn't use or need barrier instructions, when translating x86
> to (say) run on ARM host, multi-threaded code that needs barriers
> isn't easy to detect, so barriers may be required between every memory
> access in the generated ARM code.

  Sounds awful to me. Regardless current QEMU's support for multi-threaded
application, it's possible to emulate a architecture with stronger memory
model on a weaker one?

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667
Homepage: http://people.cs.nctu.edu.tw/~chenwj

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] Get only TCG code without execution
  2012-02-06  7:25                       ` 陳韋任
@ 2012-02-10  3:08                         ` Jamie Lokier
  0 siblings, 0 replies; 19+ messages in thread
From: Jamie Lokier @ 2012-02-10  3:08 UTC (permalink / raw)
  To: 陳韋任; +Cc: Peter Maydell, Rajat Goyal, qemu-devel

陳韋任 wrote:
> > As x86 doesn't use or need barrier instructions, when translating x86
> > to (say) run on ARM host, multi-threaded code that needs barriers
> > isn't easy to detect, so barriers may be required between every memory
> > access in the generated ARM code.
> 
>   Sounds awful to me. Regardless current QEMU's support for multi-threaded
> application, it's possible to emulate a architecture with stronger memory
> model on a weaker one?

It's possible, unfortunately those barriers tends to be quite
expensive and they are needed often, so it would run slowly. Probably
a lot slower than using a single host thread with preemption to
simulate multiple guest CPUs. But someone should try it and find out.

It might be possible to do some deep analysis of the guest to work out
which memory accesses don't need barriers, but it's a hard research
problem with no guarantee of a good solution.

One strategy which comes to mind is simulated MESI or MOESI (cache
coherency protocols) at the page level, so independent guest threads
never have unsynchronised access to the same page. Or at finer
granularity, with more emulation overhead (but still maybe less than
barriers). Another is software transactional memory techniques.

Neither will run system software at great speed, but certain kinds of
mostly-independent processing, for example a guest running mainly
userspace number crunching in independent processes, might work
alright.

-- Jamie

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-02-10  3:08 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-15 23:09 [Qemu-devel] Get only TCG code without execution Rajat Goyal
2012-01-16  5:32 ` Mulyadi Santosa
2012-01-16  8:41 ` Stefan Hajnoczi
2012-01-16 12:23   ` Rajat Goyal
2012-01-16 12:29     ` Peter Maydell
2012-01-17  1:04       ` 陳韋任
2012-01-17  8:33         ` Peter Maydell
2012-01-19 16:00           ` Rajat Goyal
2012-01-19 16:15             ` Peter Maydell
2012-01-20  6:38               ` 陳韋任
2012-01-21  0:21                 ` Jamie Lokier
2012-02-02 19:35                   ` Rajat Goyal
2012-01-20  6:12             ` 陳韋任
2012-01-20  9:09               ` Peter Maydell
2012-01-20  9:44                 ` 陳韋任
2012-01-20 10:46                   ` Peter Maydell
2012-01-20 19:40                     ` Jamie Lokier
2012-02-06  7:25                       ` 陳韋任
2012-02-10  3:08                         ` Jamie Lokier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).