Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Christian Pinto <c.pinto@virtualopensystems.com>
To: Peter Crosthwaite <crosthwaitepeter@gmail.com>
Cc: Edgar Iglesias <edgar.iglesias@xilinx.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	mst@redhat.com, Claudio.Fontana@huawei.com,
	"qemu-devel@nongnu.org Developers" <qemu-devel@nongnu.org>,
	Jani.Kokkonen@huawei.com, tech@virtualopensystems.com,
	"mar.krzeminski" <mar.krzeminski@gmail.com>
Subject: Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU
Date: Tue, 27 Oct 2015 11:30:54 +0100	[thread overview]
Message-ID: <562F525E.7060604@virtualopensystems.com> (raw)
In-Reply-To: <CAPokK=rAkNFKFcJmH7nDMB2o5bqY9a9tbHtYzhtmZDMV7T9Oxw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 11761 bytes --]



On 25/10/2015 22:38, Peter Crosthwaite wrote:
> On Thu, Oct 22, 2015 at 2:21 AM, Christian Pinto
> <c.pinto@virtualopensystems.com>  wrote:
>> Hello Peter,
>>
>>
>> On 07/10/2015 17:48, Peter Crosthwaite wrote:
>>> On Mon, Oct 5, 2015 at 8:50 AM, Christian Pinto
>>> <c.pinto@virtualopensystems.com>  wrote:
>>>> Hello Peter,
>>>>
>>>> thanks for your comments
>>>>
>>>> On 01/10/2015 18:26, Peter Crosthwaite wrote:
>>>>> On Tue, Sep 29, 2015 at 6:57 AM, Christian Pinto
>>>>> <c.pinto@virtualopensystems.com>   wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> This RFC patch-series introduces the set of changes enabling the
>>>>>> architectural elements to model the architecture presented in a
>>>>>> previous
>>>>>> RFC
>>>>>> letter: "[Qemu-devel][RFC] Towards an Heterogeneous QEMU".
>>>>>> and the OS binary image needs
>>>>>> to be placed in memory at model startup.
>>>>>>
>>>>> I don't see what this limitation is exactly. Can you explain more? I
>>>>> do see a need to work on the ARM bootloader for AMP flows, it is a
>>>>> pure SMP bootloader than assumes total control.
>>>> the problem here was to me that when we launch QEMU a binary needs to be
>>>> provided and put in memory
>>>> in order to be executed. In this patch series the slave doesn't have a
>>>> proper memory allocated when first launched.
>>> But it could though couldn't it? Can't the slave guest just have full
>>> access to it's own address space (probably very similar to the masters
>>> address space) from machine init time? This seems more realistic than
>>> setting up the hardware based on guest level information.
>> Actually the address space for a slave is built at init time, the thing that
>> is not
>> completely configured is the memory region modeling the RAM. Such region is
>> configured
>> in terms of size, but there is no pointer to the actual memory. The pointer
>> is mmap-ed later
>> before the slave boots.
>>
> based on what information? Is the master guest controlling this? If so
> what is the real-hardware analogue for this concept where the address
> map of the slave can change (i.e. be configured) at runtime?
Hello Peter,

The memory map of a slave is not controlled by the master guest, since it is
dependent from the machine model used for the slave. The only thing the 
master
controls is the subset of the main memory that is assigned to a slave.  By
saying that the memory pointer is sent to the slave later, before the 
boot, it is like setting the
boot address for that specific slave within the whole platform memory. So
essentially the offset passed for the mmap is from beginning of master 
memory up to the
beginning of the memory carved out for the specific slave. I see this as 
a way to
protect the master memory from  malicious accesses from the slave side, 
so this
way the slave will only "see" the part of the memory that it got assigned.

>>>> The information about memory (fd + offset for mmap) is sent only later
>>>> when
>>>> the boot is triggered. This is also
>>>> safe since the slave will be waiting in the incoming state, and thus no
>>>> corruption or errors can happen before the
>>>> boot is triggered.
>>> I was thinking more about your comment about slave-to-slave
>>> interrupts. This would just trivially be a local software-generated
>>> interrupts of some form within the slave cluster.
>> Sorry, I did not catch your comment at first time. You are right, if cores
>> are in the same cluster
>> a software generated interrupt is going to be enough. Of course the eventfd
>> based interrupts
>> make sense for a remote QEMU.
>>
> Is eventfd a better implementation of remote-port GPIOs as in the Xilinx work?

Functionally I think they provide the same behavior. We went for eventfd 
since
when designing the code of the IDM we based it on what available on 
upstream QEMU
to signal events between processes (e.g., eventfd).

> Re the terminology, I don't like the idea of thinking of inter-qemu
> "interrupts" as whatever system we decide on should be able to support
> arbitrary signals going from one QEMU to another. I think the Xilinx
> work already has reset signals going between the QEMU peers.

We used the inter-qemu interrupt term, since such signal was triggered 
from the IDM
and is an interrupt. But I see your point and agree that such interrupt 
could be a generic
inter-qemu signaling mechanism, that can be used as interrupt for this 
specific purpose.

>>>>>> The multi client-socket is used for the master to trigger
>>>>>>          the boot of a slave, and also for each master-slave couple to
>>>>>> exchancge the
>>>>>>          eventd file descriptors. The IDM device can be instantiated
>>>>>> either
>>>>>> as a
>>>>>>          PCI or sysbus device.
>>>>>>
>>>>> So if everything is is one QEMU, IPIs can be implemented with just a
>>>> of registers makes the master in
>>>> "control" each of the slaves. The IDM device is already seen as a regular
>>>> device by each of the QEMU instances
>>>> involved.
>>>>
>>> I'm starting to think this series is two things that should be
>>> decoupled. One is the abstract device(s) to facilitate your AMP, the
>>> other is the inter-qemu communication. For the abstract device, I
>>> guess this would be a new virtio-idm device. We should try and involve
>>> virtio people perhaps. I can see the value in it quite separate from
>>> modelling the real sysctrl hardware.
>> Interesting, which other value/usage do you see in it? For me the IDM was
>> meant to
> It has value in prototyping with your abstract toolkit even with
> homogeneous hardware. E.g. I should be able to just use single-QEMU
> ARM virt machine -smp 2 and create one of these virtio-AMP setups.
> Homogeneous hardware with heterogenous software using your new pieces
> of abstract hardware.
>
> It is also more practical for getting a merge of your work as you are
> targetting two different audiences with the work. People intersted in
> virtio can handle the new devices you create, while the core
> maintainers can handle your multi-QEMU work. It is two rather big new
> features.

This is true, too much meat on the fire for the same patch makes it
difficult to get merged. Thanks.
We could split in multi-client socket work, the inter-qemu
communication and virtio-idm.

>
>> work as an abstract system controller to centralize the management
>> of the slaves (boot_regs and interrupts).
>>
>>
>>> But I think the implementation
>>> should be free of any inter-QEMU awareness. E.g. from P4 of this
>>> series:
>>>
>>> +static void send_shmem_fd(IDMState *s, MSClient *c)
>>> +{
>>> +    int fd, len;
>>> +    uint32_t *message;
>>> +    HostMemoryBackend *backend = MEMORY_BACKEND(s->hostmem);
>>> +
>>> +    len = strlen(SEND_MEM_FD_CMD)/4 + 3;
>>> +    message = malloc(len * sizeof(uint32_t));
>>> +    strcpy((char *) message, SEND_MEM_FD_CMD);
>>> +    message[len - 2] = s->pboot_size;
>>> +    message[len - 1] = s->pboot_offset;
>>> +
>>> +    fd = memory_region_get_fd(&backend->mr);
>>> +
>>> +    multi_socket_send_fds_to(c, &fd, 1, (char *) message, len *
>>> sizeof(uint32_t));
>>>
>>> The device itself is aware of shared-memory and multi-sockets. Using
>>> the device for single-QEMU AMP would require neither - can the IDM
>>> device be used in a homogeneous AMP flow in one of our existing SMP
>>> machine models (eg on a dual core A9 with one core being master and
>>> the other slave)?
>>>
>>> Can this be architected in two phases for greater utility, with the
>>> AMP devices as just normal devices, and the inter-qemu communication
>>> as a separate feature?
>> I see your point, and it is an interesting proposal.
>>
>> What I can think here to remove the awareness of how the IDM communicates
>> with
>> the slaves, is to define a kind of AMP Slave interface. So there will be an
>> instance of the interface for each of the slaves, encapsulating the
>> communication part (being either local or based on sockets).
>> The AMP Slave interfaces would be what you called the AMP devices, with one
>> device per slave.
>>
> Do we need this hard definition of master and slave in the hardware?
> Can the virtio-device be more peer-peer and the master-slave
> relationship is purely implemented by the guest?

I think we can architect it in a way that the virtio-idm simply connects
two or more peers, and depending from the usage done by the
software, behaving as master from one side and slave on the other.
I used the term slave AMP interface, I should have used AMP client
interface, to indicate the cores/procesors the IDM has inter-connect
(being local or on another QEMU instance).
So there would be an implementation of the AMP client interface that
is based on the assumption that all the processors are on the same
instance, and one based on sockets for the remote instances.

to make an example, for a single qemu instance with -smp 2
you would add something like :

-smp 2
-device amp-local-client, core_id=0, id=client0
-device amp-local-client, core_id=1, id=client1
-device virtio-idm, clients=2, id=idm

while for remote qemu instances something like
(the opposite to be instantiated on the other remote instance):

-device amp-local-client, id=client0
-device amp-remote-client, chardev=chdev_id, id=client1
-device virtio-idm, clients=2, id=idm-dev

This way the idm only knows about clients (all clients are the
same for the IDM). The software running on the processors
will enable the interaction between the clients by writing
into the IDM device registers.

At a first glance, and according to my current proposal, I see
such AMP client interfaces exporting the following methods:

  * raise_interrupt() function: called by the IDM to trigger an
    interrupt towards the destination client

  * boot_trigger() function: called by the IDM to trigger the boot of
    the client

If the clients are remote, socket communication will be used and hidden 
in the AMP client interface implementation


Do you foresee a different type of interface for the use-case
you have in mind? I ask because if for example the clients are
cores of the same cluster (and same instance), interrupts could
simply be software generated from the linux-kernel/firmware
running on top of the processors and theoretically no need to
go through the IDM, same I guess for the boot.

Another thing that needs to be defined clearly is the interface between
the IDM and the software running on the cores.
At the moment I am using a set of registers, namely the boot and
the interrupt registers. By writing the ID of a client in such registers
it is possible to forward an interrupt or trigger its boot.


Thanks,

Christian


>
> Regards,
> Peter
>
>> At master side, besides the IDM, one would instantiate
>> as many interface devices as slaves. During the initialization the IDM will
>> link
>> with all those interfaces, and only call functions like: send_interrupt() or
>> boot_slave() to interact with the slaves. The interface will be the same for
>> both local or remote slaves, while the implementation of the methods will
>> differ and reside in the specific AMP Slave Interface device.
>> On the slave side, if the slave is remote, another instance of the
>> interface is instantiated so to connect to socket/eventfd.
>>
>> So as an example the send_shmem_fd function you pointed could be hidden in
>> the
>> slave interface, and invoked only when the IDM will invoke the slave_boot()
>> function of a remote slave interface.
>>
>> This would higher the level of abstraction and open the door to potentially
>> any
>> communication mechanism between master and slave, without the need to adapt
>> the
>> IDM device to the specific case. Or, eventually, to mix between local and
>> remote instances.
>>
>>
>> Thanks,
>>
>> Christian
>>
>>> Regards,
>>> Peter


[-- Attachment #2: Type: text/html, Size: 15686 bytes --]

next prev parent reply	other threads:[~2015-10-27 10:31 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-29 13:57 [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU Christian Pinto
2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 1/8] backend: multi-socket Christian Pinto
2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 2/8] backend: shared memory backend Christian Pinto
2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 3/8] migration: add shared migration type Christian Pinto
2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 4/8] hw/misc: IDM Device Christian Pinto
2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 5/8] hw/arm: sysbus-fdt Christian Pinto
2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 6/8] qemu: slave machine flag Christian Pinto
2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 7/8] hw/arm: boot Christian Pinto
2015-09-29 13:57 ` [Qemu-devel] [RFC PATCH 8/8] qemu: numa Christian Pinto
2015-10-01 16:26 ` [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU Peter Crosthwaite
2015-10-05 15:50   ` Christian Pinto
2015-10-07 15:48     ` Peter Crosthwaite
2015-10-22  9:21       ` Christian Pinto
2015-10-25 21:38         ` Peter Crosthwaite
2015-10-26 17:12           ` mar.krzeminski
2015-10-26 17:42             ` Peter Crosthwaite
2015-10-27 10:30           ` Christian Pinto [this message]
2015-11-13  7:02             ` Peter Crosthwaite
2015-12-12 10:19               ` Christian Pinto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=562F525E.7060604@virtualopensystems.com \
    --to=c.pinto@virtualopensystems.com \
    --cc=Claudio.Fontana@huawei.com \
    --cc=Jani.Kokkonen@huawei.com \
    --cc=crosthwaitepeter@gmail.com \
    --cc=edgar.iglesias@xilinx.com \
    --cc=mar.krzeminski@gmail.com \
    --cc=mst@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=tech@virtualopensystems.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).