* [Qemu-devel] [RFC] Towards an Heterogeneous QEMU
@ 2015-07-27 13:54 Christian Pinto
2015-07-31 12:03 ` Christopher Covington
2015-07-31 13:10 ` Paolo Bonzini
0 siblings, 2 replies; 5+ messages in thread
From: Christian Pinto @ 2015-07-27 13:54 UTC (permalink / raw)
To: qemu-devel
Cc: Jani Kokkonen, VirtualOpenSystems Technical Team, Claudio Fontana
Hi all,
this message is to present, and get feedback, on a QEMU enhancement
which we
are working on. Most of the state-of-the-art SoCs use the heterogeneous
paradigm, in which a Master processor is surrounded by multiple (Slave) co-
processors (other CPUs, MCUs, hardware accelerators, etc) that usually
share
the very same physical memory. An example is a multi-core ARM CPU working
alongside with two Cortex-M micro controllers.
From the user point of view there is usually an operating system
booting on
the Master processor (e.g. Linux) at platform startup, while the other
processors are used to offload the Master one from some computation or
to deal
with real-time interfaces. It is the Master OS that triggers the boot of
the
Slave processors, and provides them also the binary code to execute (e.g.
RTOS, binary firmware) by placing it into a pre-defined memory area that is
accessible to the Slaves. Usually the memory for the Slaves is carved
out from
the Master OS during boot. Once a Slave is booted the two processors can
communicate through queues in shared memory and inter-processor interrupts
(IPIs). In Linux, it is the remoteproc/rpmsg framework that enables the
control (boot/shutdown) of Slave processors, and also to establish a
communication channel based on virtio queues.
Currently, QEMU is not able to model such an architecture, mainly
because only
a single processor can be emulated at one time, and the OS binary image
needs
to be placed in memory at model startup.
We are working on some extensions in QEMU, that enable Heterogeneous SoCs
modeling. In our proposal each processor of the target Heterogeneous SoC is
represented by a separate QEMU process, one of which will act as the
Master of
the target platform. The physical shared memory abstraction is created by
leveraging on Posix shared memory. At model boot the Master QEMU will
allocate
the whole memory of the target platform as a Posix shared memory
segment, by
using the hostmem-file backend. The Slave QEMU instances, instead, will not
allocate any memory but wait, over a Unix socket, to receive the file
descriptor of the Posix shared memory segment allocated by the Master
and an
offset. Once received, the file descriptor is mmap-ed starting from the
received offset and used as memory backend for the Slave instance. For a
Slave
QEMU instance a new memory backend will be defined, to receive the file
descriptor from a socket instead of allocating the RAM of the model from a
file or regular memory.
To resemble the behavior of a real platform, the Slave QEMU instances
will not
jump into the target code until the information on the memory to be used is
received from the Master. This happens only when at a certain point during
execution, an application running on the Master OS needs to use one of
the co-
processors and triggers its boot. The initialization and boot phase of a
Slave
QEMU will differ from the regular one in the following:
- No RAM memory is allocated for the model.
- No binary image is copied into memory.
- After the model initialization is complete, QEMU will jump into a wait
state
in which no code is executed (since the memory is not yet available).
When the Slave receives the fd and offset of its memory into the
platform one,
it will find into such memory also the binary image to be executed and any
other information needed to complete the boot process. The Slave QEMU
instances will mmap the shared memory segment only starting from a specific
offset, thus there will be no possibility for them to corrupt the Master
memory
since it will not be visible to the target Slave OS.
Finally a new QEMU device, the Interrupt Distribution Module (IDM), will be
implemented to model a hardware mailbox/inter processor interrupt
module, to be
used to send interrupts across all the QEMU instances involved in the
heterogeneous model. Such module will be based on eventfd, whose file
descriptors are exchanged with the Master using a Unix domain socket.
Each QEMU
instance participating to the heterogeneous model will embed this new
hardware
module into its memory map. As an example, such hardware mailboxes and IPI
modules are used in real rpmsg applications to signal with an interrupt
the kick
of a virtio queue to a remote processor.
The proposed changes are to be considered as the minimal building blocks
to enable
the emulation of an Heterogeneous SoC, that allow programmers to
experiment with
various intra-SoC communication frameworks (e.g. remoteproc/rpmsg) and
perform a
functional validation of their drivers and software targeting a
heterogeneous
SoC.
This work has been sponsored by Huawei Technologies Duesseldorf GmbH,
Huawei ERC Munich.
Looking forward for your feedback.
Christian
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [RFC] Towards an Heterogeneous QEMU
2015-07-27 13:54 [Qemu-devel] [RFC] Towards an Heterogeneous QEMU Christian Pinto
@ 2015-07-31 12:03 ` Christopher Covington
2015-07-31 16:23 ` Christian Pinto
2015-07-31 13:10 ` Paolo Bonzini
1 sibling, 1 reply; 5+ messages in thread
From: Christopher Covington @ 2015-07-31 12:03 UTC (permalink / raw)
To: Christian Pinto, qemu-devel
Cc: Jani Kokkonen, VirtualOpenSystems Technical Team, Claudio Fontana
Hi Christian,
On 07/27/2015 09:54 AM, Christian Pinto wrote:
> Hi all,
>
> this message is to present, and get feedback, on a QEMU enhancement which we
> are working on. Most of the state-of-the-art SoCs use the heterogeneous
> paradigm, in which a Master processor is surrounded by multiple (Slave) co-
> processors (other CPUs, MCUs, hardware accelerators, etc) that usually share
> the very same physical memory. An example is a multi-core ARM CPU working
> alongside with two Cortex-M micro controllers.
>
> From the user point of view there is usually an operating system booting on
> the Master processor (e.g. Linux) at platform startup, while the other
> processors are used to offload the Master one from some computation or to deal
> with real-time interfaces. It is the Master OS that triggers the boot of the
> Slave processors, and provides them also the binary code to execute (e.g.
> RTOS, binary firmware) by placing it into a pre-defined memory area that is
> accessible to the Slaves. Usually the memory for the Slaves is carved out from
> the Master OS during boot. Once a Slave is booted the two processors can
> communicate through queues in shared memory and inter-processor interrupts
> (IPIs). In Linux, it is the remoteproc/rpmsg framework that enables the
> control (boot/shutdown) of Slave processors, and also to establish a
> communication channel based on virtio queues.
>
> Currently, QEMU is not able to model such an architecture, mainly because only
> a single processor can be emulated at one time, and the OS binary image needs
> to be placed in memory at model startup.
>
> We are working on some extensions in QEMU, that enable Heterogeneous SoCs
> modeling. In our proposal each processor of the target Heterogeneous SoC is
> represented by a separate QEMU process, one of which will act as the Master of
> the target platform. The physical shared memory abstraction is created by
> leveraging on Posix shared memory. At model boot the Master QEMU will allocate
> the whole memory of the target platform as a Posix shared memory segment, by
> using the hostmem-file backend. The Slave QEMU instances, instead, will not
> allocate any memory but wait, over a Unix socket, to receive the file
> descriptor of the Posix shared memory segment allocated by the Master and an
> offset. Once received, the file descriptor is mmap-ed starting from the
> received offset and used as memory backend for the Slave instance. For a Slave
> QEMU instance a new memory backend will be defined, to receive the file
> descriptor from a socket instead of allocating the RAM of the model from a
> file or regular memory.
>
> To resemble the behavior of a real platform, the Slave QEMU instances will not
> jump into the target code until the information on the memory to be used is
> received from the Master. This happens only when at a certain point during
> execution, an application running on the Master OS needs to use one of the co-
> processors and triggers its boot. The initialization and boot phase of a Slave
> QEMU will differ from the regular one in the following:
>
> - No RAM memory is allocated for the model.
> - No binary image is copied into memory.
> - After the model initialization is complete, QEMU will jump into a wait state
> in which no code is executed (since the memory is not yet available).
>
> When the Slave receives the fd and offset of its memory into the platform one,
> it will find into such memory also the binary image to be executed and any
> other information needed to complete the boot process. The Slave QEMU
> instances will mmap the shared memory segment only starting from a specific
> offset, thus there will be no possibility for them to corrupt the Master memory
> since it will not be visible to the target Slave OS.
>
> Finally a new QEMU device, the Interrupt Distribution Module (IDM), will be
> implemented to model a hardware mailbox/inter processor interrupt module, to be
> used to send interrupts across all the QEMU instances involved in the
> heterogeneous model. Such module will be based on eventfd, whose file
> descriptors are exchanged with the Master using a Unix domain socket. Each QEMU
> instance participating to the heterogeneous model will embed this new hardware
> module into its memory map. As an example, such hardware mailboxes and IPI
> modules are used in real rpmsg applications to signal with an interrupt the kick
> of a virtio queue to a remote processor.
>
> The proposed changes are to be considered as the minimal building blocks to
> enable
> the emulation of an Heterogeneous SoC, that allow programmers to experiment with
> various intra-SoC communication frameworks (e.g. remoteproc/rpmsg) and perform a
> functional validation of their drivers and software targeting a heterogeneous
> SoC.
How does this multiprocess architecture compare to current efforts for
multithreaded TCG?
Do you anticipate needing a mechanism to keep processes roughly in sync with
each other, so that one doesn't unrealistically get way far ahead of the rest?
Thanks,
Christopher Covington
--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [RFC] Towards an Heterogeneous QEMU
2015-07-27 13:54 [Qemu-devel] [RFC] Towards an Heterogeneous QEMU Christian Pinto
2015-07-31 12:03 ` Christopher Covington
@ 2015-07-31 13:10 ` Paolo Bonzini
2015-07-31 16:47 ` Christian Pinto
1 sibling, 1 reply; 5+ messages in thread
From: Paolo Bonzini @ 2015-07-31 13:10 UTC (permalink / raw)
To: Christian Pinto, qemu-devel
Cc: Jani Kokkonen, VirtualOpenSystems Technical Team, Claudio Fontana
On 27/07/2015 15:54, Christian Pinto wrote:
> From the user point of view there is usually an operating system
> booting on the Master processor (e.g. Linux) at platform startup,
> while the other processors are used to offload the Master one from
> some computation or to deal with real-time interfaces. It is the
> Master OS that triggers the boot of the Slave processors, and
> provides them also the binary code to execute (e.g. RTOS, binary
> firmware) by placing it into a pre-defined memory area that is
> accessible to the Slaves. Usually the memory for the Slaves is
> carved out from the Master OS during boot. Once a Slave is booted the
> two processors can communicate through queues in shared memory and
> inter-processor interrupts (IPIs). In Linux, it is the
> remoteproc/rpmsg framework that enables the control (boot/shutdown)
> of Slave processors, and also to establish a communication channel
> based on virtio queues.
>
> Currently, QEMU is not able to model such an architecture, mainly
> because only a single processor can be emulated at one time, and the
> OS binary image needs to be placed in memory at model startup.
Hi, you may be interested in the "multi-arch" patches here:
http://thread.gmane.org/gmane.comp.emulators.qemu/351808
These do roughly what you are saying, though in a single QEMU process.
Paolo
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [RFC] Towards an Heterogeneous QEMU
2015-07-31 12:03 ` Christopher Covington
@ 2015-07-31 16:23 ` Christian Pinto
0 siblings, 0 replies; 5+ messages in thread
From: Christian Pinto @ 2015-07-31 16:23 UTC (permalink / raw)
To: Christopher Covington, qemu-devel
Cc: Jani Kokkonen, VirtualOpenSystems Technical Team, Claudio Fontana
Hello Cristopher,
On 31/07/2015 14:03, Christopher Covington wrote:
> Hi Christian,
>
> On 07/27/2015 09:54 AM, Christian Pinto wrote:
>> Hi all,
>>
>> this message is to present, and get feedback, on a QEMU enhancement which we
>> are working on. Most of the state-of-the-art SoCs use the heterogeneous
>> paradigm, in which a Master processor is surrounded by multiple (Slave) co-
>> processors (other CPUs, MCUs, hardware accelerators, etc) that usually share
>> the very same physical memory. An example is a multi-core ARM CPU working
>> alongside with two Cortex-M micro controllers.
>>
>> From the user point of view there is usually an operating system booting on
>> the Master processor (e.g. Linux) at platform startup, while the other
>> processors are used to offload the Master one from some computation or to deal
>> with real-time interfaces. It is the Master OS that triggers the boot of the
>> Slave processors, and provides them also the binary code to execute (e.g.
>> RTOS, binary firmware) by placing it into a pre-defined memory area that is
>> accessible to the Slaves. Usually the memory for the Slaves is carved out from
>> the Master OS during boot. Once a Slave is booted the two processors can
>> communicate through queues in shared memory and inter-processor interrupts
>> (IPIs). In Linux, it is the remoteproc/rpmsg framework that enables the
>> control (boot/shutdown) of Slave processors, and also to establish a
>> communication channel based on virtio queues.
>>
>> Currently, QEMU is not able to model such an architecture, mainly because only
>> a single processor can be emulated at one time, and the OS binary image needs
>> to be placed in memory at model startup.
>>
>> We are working on some extensions in QEMU, that enable Heterogeneous SoCs
>> modeling. In our proposal each processor of the target Heterogeneous SoC is
>> represented by a separate QEMU process, one of which will act as the Master of
>> the target platform. The physical shared memory abstraction is created by
>> leveraging on Posix shared memory. At model boot the Master QEMU will allocate
>> the whole memory of the target platform as a Posix shared memory segment, by
>> using the hostmem-file backend. The Slave QEMU instances, instead, will not
>> allocate any memory but wait, over a Unix socket, to receive the file
>> descriptor of the Posix shared memory segment allocated by the Master and an
>> offset. Once received, the file descriptor is mmap-ed starting from the
>> received offset and used as memory backend for the Slave instance. For a Slave
>> QEMU instance a new memory backend will be defined, to receive the file
>> descriptor from a socket instead of allocating the RAM of the model from a
>> file or regular memory.
>>
>> To resemble the behavior of a real platform, the Slave QEMU instances will not
>> jump into the target code until the information on the memory to be used is
>> received from the Master. This happens only when at a certain point during
>> execution, an application running on the Master OS needs to use one of the co-
>> processors and triggers its boot. The initialization and boot phase of a Slave
>> QEMU will differ from the regular one in the following:
>>
>> - No RAM memory is allocated for the model.
>> - No binary image is copied into memory.
>> - After the model initialization is complete, QEMU will jump into a wait state
>> in which no code is executed (since the memory is not yet available).
>>
>> When the Slave receives the fd and offset of its memory into the platform one,
>> it will find into such memory also the binary image to be executed and any
>> other information needed to complete the boot process. The Slave QEMU
>> instances will mmap the shared memory segment only starting from a specific
>> offset, thus there will be no possibility for them to corrupt the Master memory
>> since it will not be visible to the target Slave OS.
>>
>> Finally a new QEMU device, the Interrupt Distribution Module (IDM), will be
>> implemented to model a hardware mailbox/inter processor interrupt module, to be
>> used to send interrupts across all the QEMU instances involved in the
>> heterogeneous model. Such module will be based on eventfd, whose file
>> descriptors are exchanged with the Master using a Unix domain socket. Each QEMU
>> instance participating to the heterogeneous model will embed this new hardware
>> module into its memory map. As an example, such hardware mailboxes and IPI
>> modules are used in real rpmsg applications to signal with an interrupt the kick
>> of a virtio queue to a remote processor.
>>
>> The proposed changes are to be considered as the minimal building blocks to
>> enable
>> the emulation of an Heterogeneous SoC, that allow programmers to experiment with
>> various intra-SoC communication frameworks (e.g. remoteproc/rpmsg) and perform a
>> functional validation of their drivers and software targeting a heterogeneous
>> SoC.
> How does this multiprocess architecture compare to current efforts for
> multithreaded TCG?
The multi-threaded TCG work is to be considered as orthogonal to what we
propose here,
in the sense that if one wants to model, through our extensions, a
heterogeneous system
with multi-core master + a multi-core slave it will still be possible to
exploit the multi-threaded
TCG on both QEMU instances to obtain higher performance.
None of the two works excludes the other.
>
> Do you anticipate needing a mechanism to keep processes roughly in sync with
> each other, so that one doesn't unrealistically get way far ahead of the rest?
For the use-case scenario we are looking at, a remoteproc/rpmsg type of
communication, we don't
see the need for synchronization between the processes. In such type of
interaction in fact
two (or more) processors exchange messages using explicit
synchronization points
(e.g., virtio queues kicks through inter-processor interrupts), and do
not rely on global
timers or shared time-based resources.
Do you see any use-case where the two processes might need to be
synchronized?
Christian
>
> Thanks,
> Christopher Covington
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [RFC] Towards an Heterogeneous QEMU
2015-07-31 13:10 ` Paolo Bonzini
@ 2015-07-31 16:47 ` Christian Pinto
0 siblings, 0 replies; 5+ messages in thread
From: Christian Pinto @ 2015-07-31 16:47 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
Cc: Jani Kokkonen, VirtualOpenSystems Technical Team, Claudio Fontana
Hello Paolo,
On 31/07/2015 15:10, Paolo Bonzini wrote:
>
> On 27/07/2015 15:54, Christian Pinto wrote:
>> From the user point of view there is usually an operating system
>> booting on the Master processor (e.g. Linux) at platform startup,
>> while the other processors are used to offload the Master one from
>> some computation or to deal with real-time interfaces. It is the
>> Master OS that triggers the boot of the Slave processors, and
>> provides them also the binary code to execute (e.g. RTOS, binary
>> firmware) by placing it into a pre-defined memory area that is
>> accessible to the Slaves. Usually the memory for the Slaves is
>> carved out from the Master OS during boot. Once a Slave is booted the
>> two processors can communicate through queues in shared memory and
>> inter-processor interrupts (IPIs). In Linux, it is the
>> remoteproc/rpmsg framework that enables the control (boot/shutdown)
>> of Slave processors, and also to establish a communication channel
>> based on virtio queues.
>>
>> Currently, QEMU is not able to model such an architecture, mainly
>> because only a single processor can be emulated at one time, and the
>> OS binary image needs to be placed in memory at model startup.
> Hi, you may be interested in the "multi-arch" patches here:
> http://thread.gmane.org/gmane.comp.emulators.qemu/351808
>
> These do roughly what you are saying, though in a single QEMU process.
Thanks for pointing this out. In fact the final goal is the same: model
a system embedding a heterogeneous set of processors.
The approach presented in this RFC is to provide the building blocks,
and extensions, to experiment with heterogeneous
systems functional modeling by exploiting the current "architecture" of
QEMU in the less invasive possible way.
In addition, once the multi-arch work will be finished, there will still
be the need to implement some sort of inter-processor
signaling/communication and part of our work (i.e. the Interrupt
Distribution Module) might be used for such scope.
Thanks,
Christian
>
> Paolo
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-07-31 16:47 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-27 13:54 [Qemu-devel] [RFC] Towards an Heterogeneous QEMU Christian Pinto
2015-07-31 12:03 ` Christopher Covington
2015-07-31 16:23 ` Christian Pinto
2015-07-31 13:10 ` Paolo Bonzini
2015-07-31 16:47 ` Christian Pinto
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).