From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:60743)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mar.krzeminski@gmail.com>) id 1ZqlKJ-0003LH-7B
	for qemu-devel@nongnu.org; Mon, 26 Oct 2015 13:12:57 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mar.krzeminski@gmail.com>) id 1ZqlKD-0003i1-S6
	for qemu-devel@nongnu.org; Mon, 26 Oct 2015 13:12:54 -0400
Received: from mail-lf0-x22d.google.com ([2a00:1450:4010:c07::22d]:36419)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mar.krzeminski@gmail.com>) id 1ZqlKD-0003ho-Gy
	for qemu-devel@nongnu.org; Mon, 26 Oct 2015 13:12:49 -0400
Received: by lffz202 with SMTP id z202so157304030lff.3
	for <qemu-devel@nongnu.org>; Mon, 26 Oct 2015 10:12:48 -0700 (PDT)
References: <1443535059-26010-1-git-send-email-c.pinto@virtualopensystems.com>
	<CAPokK=pmTsavj6xS6Pd7SXusEhKmwqBNE8BKT2jxh_7WFYzvKA@mail.gmail.com>
	<56129C63.1090401@virtualopensystems.com>
	<CAPokK=qdgtkLpkEbKytbvh7DQg+yhHvvasVxCjug=Y=LNoXkqQ@mail.gmail.com>
	<5628AA92.6040600@virtualopensystems.com>
	<CAPokK=rAkNFKFcJmH7nDMB2o5bqY9a9tbHtYzhtmZDMV7T9Oxw@mail.gmail.com>
From: "mar.krzeminski" <mar.krzeminski@gmail.com>
Message-ID: <562E5F0E.8010901@gmail.com>
Date: Mon, 26 Oct 2015 18:12:46 +0100
MIME-Version: 1.0
In-Reply-To: <CAPokK=rAkNFKFcJmH7nDMB2o5bqY9a9tbHtYzhtmZDMV7T9Oxw@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC PATCH 0/8] Towards an Heterogeneous QEMU
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Peter Crosthwaite <crosthwaitepeter@gmail.com>, Christian Pinto <c.pinto@virtualopensystems.com>
Cc: Edgar Iglesias <edgar.iglesias@xilinx.com>, Peter Maydell <peter.maydell@linaro.org>, mst@redhat.com, Claudio.Fontana@huawei.com, "qemu-devel@nongnu.org Developers" <qemu-devel@nongnu.org>, Jani.Kokkonen@huawei.com, tech@virtualopensystems.com


W dniu 25.10.2015 o 22:38, Peter Crosthwaite pisze:
> On Thu, Oct 22, 2015 at 2:21 AM, Christian Pinto
> <c.pinto@virtualopensystems.com> wrote:
>> Hello Peter,
>>
>>
>> On 07/10/2015 17:48, Peter Crosthwaite wrote:
>>> On Mon, Oct 5, 2015 at 8:50 AM, Christian Pinto
>>> <c.pinto@virtualopensystems.com> wrote:
>>>> Hello Peter,
>>>>
>>>> thanks for your comments
>>>>
>>>> On 01/10/2015 18:26, Peter Crosthwaite wrote:
>>>>> On Tue, Sep 29, 2015 at 6:57 AM, Christian Pinto
>>>>> <c.pinto@virtualopensystems.com>  wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> This RFC patch-series introduces the set of changes enabling the
>>>>>> architectural elements to model the architecture presented in a
>>>>>> previous
>>>>>> RFC
>>>>>> letter: "[Qemu-devel][RFC] Towards an Heterogeneous QEMU".
Sorry for late response, unfortunately my M3+A9 SoC can not be published 
(yet).
But I am working on it.
>>>>>> and the OS binary image needs
>>>>>> to be placed in memory at model startup.
>>>>>>
>>>>> I don't see what this limitation is exactly. Can you explain more? I
>>>>> do see a need to work on the ARM bootloader for AMP flows, it is a
>>>>> pure SMP bootloader than assumes total control.
>>>> the problem here was to me that when we launch QEMU a binary needs to be
>>>> provided and put in memory
>>>> in order to be executed. In this patch series the slave doesn't have a
>>>> proper memory allocated when first launched.
>>> But it could though couldn't it? Can't the slave guest just have full
>>> access to it's own address space (probably very similar to the masters
>>> address space) from machine init time? This seems more realistic than
>>> setting up the hardware based on guest level information.
>>
>> Actually the address space for a slave is built at init time, the thing that
>> is not
>> completely configured is the memory region modeling the RAM. Such region is
>> configured
>> in terms of size, but there is no pointer to the actual memory. The pointer
>> is mmap-ed later
>> before the slave boots.
>>
> based on what information? Is the master guest controlling this? If so
> what is the real-hardware analogue for this concept where the address
> map of the slave can change (i.e. be configured) at runtime?
I am not sure if it is the case since I haven't emulated this yet (and 
it has very low priority),
but I might have a real case in my M3+A9 - M3 has 256MiB window that can 
be moved over the 1GiB system memory at runtime.

>
>>>> The information about memory (fd + offset for mmap) is sent only later
>>>> when
>>>> the boot is triggered. This is also
>>>> safe since the slave will be waiting in the incoming state, and thus no
>>>> corruption or errors can happen before the
>>>> boot is triggered.
>>> I was thinking more about your comment about slave-to-slave
>>> interrupts. This would just trivially be a local software-generated
>>> interrupts of some form within the slave cluster.
>>
>> Sorry, I did not catch your comment at first time. You are right, if cores
>> are in the same cluster
>> a software generated interrupt is going to be enough. Of course the eventfd
>> based interrupts
>> make sense for a remote QEMU.
>>
> Is eventfd a better implementation of remote-port GPIOs as in the Xilinx work?
>
> Re the terminology, I don't like the idea of thinking of inter-qemu
> "interrupts" as whatever system we decide on should be able to support
> arbitrary signals going from one QEMU to another. I think the Xilinx
> work already has reset signals going between the QEMU peers.
>
>>>>>> The multi client-socket is used for the master to trigger
>>>>>>          the boot of a slave, and also for each master-slave couple to
>>>>>> exchancge the
>>>>>>          eventd file descriptors. The IDM device can be instantiated
>>>>>> either
>>>>>> as a
>>>>>>          PCI or sysbus device.
>>>>>>
>>>>> So if everything is is one QEMU, IPIs can be implemented with just a
>>>> of registers makes the master in
>>>> "control" each of the slaves. The IDM device is already seen as a regular
>>>> device by each of the QEMU instances
>>>> involved.
>>>>
>>> I'm starting to think this series is two things that should be
>>> decoupled. One is the abstract device(s) to facilitate your AMP, the
>>> other is the inter-qemu communication. For the abstract device, I
>>> guess this would be a new virtio-idm device. We should try and involve
>>> virtio people perhaps. I can see the value in it quite separate from
>>> modelling the real sysctrl hardware.
>>
>> Interesting, which other value/usage do you see in it? For me the IDM was
>> meant to
> It has value in prototyping with your abstract toolkit even with
> homogeneous hardware. E.g. I should be able to just use single-QEMU
> ARM virt machine -smp 2 and create one of these virtio-AMP setups.
> Homogeneous hardware with heterogenous software using your new pieces
> of abstract hardware.
>
> It is also more practical for getting a merge of your work as you are
> targetting two different audiences with the work. People intersted in
> virtio can handle the new devices you create, while the core
> maintainers can handle your multi-QEMU work. It is two rather big new
> features.
>
>> work as an abstract system controller to centralize the management
>> of the slaves (boot_regs and interrupts).
>>
>>
>>> But I think the implementation
>>> should be free of any inter-QEMU awareness. E.g. from P4 of this
>>> series:
>>>
>>> +static void send_shmem_fd(IDMState *s, MSClient *c)
>>> +{
>>> +    int fd, len;
>>> +    uint32_t *message;
>>> +    HostMemoryBackend *backend = MEMORY_BACKEND(s->hostmem);
>>> +
>>> +    len = strlen(SEND_MEM_FD_CMD)/4 + 3;
>>> +    message = malloc(len * sizeof(uint32_t));
>>> +    strcpy((char *) message, SEND_MEM_FD_CMD);
>>> +    message[len - 2] = s->pboot_size;
>>> +    message[len - 1] = s->pboot_offset;
>>> +
>>> +    fd = memory_region_get_fd(&backend->mr);
>>> +
>>> +    multi_socket_send_fds_to(c, &fd, 1, (char *) message, len *
>>> sizeof(uint32_t));
>>>
>>> The device itself is aware of shared-memory and multi-sockets. Using
>>> the device for single-QEMU AMP would require neither - can the IDM
>>> device be used in a homogeneous AMP flow in one of our existing SMP
>>> machine models (eg on a dual core A9 with one core being master and
>>> the other slave)?
>>>
>>> Can this be architected in two phases for greater utility, with the
>>> AMP devices as just normal devices, and the inter-qemu communication
>>> as a separate feature?
>>
>> I see your point, and it is an interesting proposal.
>>
>> What I can think here to remove the awareness of how the IDM communicates
>> with
>> the slaves, is to define a kind of AMP Slave interface. So there will be an
>> instance of the interface for each of the slaves, encapsulating the
>> communication part (being either local or based on sockets).
>> The AMP Slave interfaces would be what you called the AMP devices, with one
>> device per slave.
>>
> Do we need this hard definition of master and slave in the hardware?
> Can the virtio-device be more peer-peer and the master-slave
> relationship is purely implemented by the guest?
>
> Regards,
> Peter
>
>> At master side, besides the IDM, one would instantiate
>> as many interface devices as slaves. During the initialization the IDM will
>> link
>> with all those interfaces, and only call functions like: send_interrupt() or
>> boot_slave() to interact with the slaves. The interface will be the same for
>> both local or remote slaves, while the implementation of the methods will
>> differ and reside in the specific AMP Slave Interface device.
>> On the slave side, if the slave is remote, another instance of the
>> interface is instantiated so to connect to socket/eventfd.
>>
>> So as an example the send_shmem_fd function you pointed could be hidden in
>> the
>> slave interface, and invoked only when the IDM will invoke the slave_boot()
>> function of a remote slave interface.
>>
>> This would higher the level of abstraction and open the door to potentially
>> any
>> communication mechanism between master and slave, without the need to adapt
>> the
>> IDM device to the specific case. Or, eventually, to mix between local and
>> remote instances.
>>
>>
>> Thanks,
>>
>> Christian
>>
>>> Regards,
>>> Peter
>>
Regards,
Marcin