[virtio-dev] Enabling hypervisor agnosticism for VirtIO backends

Discussion of the implementations of VIRTIO specification
 help / color / mirror / Atom feed

* [virtio-dev] Enabling hypervisor agnosticism for VirtIO backends
@ 2021-08-04  9:04 Alex Bennée
  2021-08-05 15:48 ` [virtio-dev] " Stefan Hajnoczi
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Alex Bennée @ 2021-08-04  9:04 UTC (permalink / raw)
  To: Stratos Mailing List, virtio-dev
  Cc: Arnd Bergmann, Viresh Kumar, AKASHI Takahiro, Stefano Stabellini,
	stefanha, Jan Kiszka, Carl van Schaik, pratikp,
	Srivatsa Vaddagiri, Jean-Philippe Brucker, Mathieu Poirier

Hi,

One of the goals of Project Stratos is to enable hypervisor agnostic
backends so we can enable as much re-use of code as possible and avoid
repeating ourselves. This is the flip side of the front end where
multiple front-end implementations are required - one per OS, assuming
you don't just want Linux guests. The resultant guests are trivially
movable between hypervisors modulo any abstracted paravirt type
interfaces.

In my original thumb nail sketch of a solution I envisioned vhost-user
daemons running in a broadly POSIX like environment. The interface to
the daemon is fairly simple requiring only some mapped memory and some
sort of signalling for events (on Linux this is eventfd). The idea was a
stub binary would be responsible for any hypervisor specific setup and
then launch a common binary to deal with the actual virtqueue requests
themselves.

Since that original sketch we've seen an expansion in the sort of ways
backends could be created. There is interest in encapsulating backends
in RTOSes or unikernels for solutions like SCMI. There interest in Rust
has prompted ideas of using the trait interface to abstract differences
away as well as the idea of bare-metal Rust backends.

We have a card (STR-12) called "Hypercall Standardisation" which
calls for a description of the APIs needed from the hypervisor side to
support VirtIO guests and their backends. However we are some way off
from that at the moment as I think we need to at least demonstrate one
portable backend before we start codifying requirements. To that end I
want to think about what we need for a backend to function.

Configuration
=============

In the type-2 setup this is typically fairly simple because the host
system can orchestrate the various modules that make up the complete
system. In the type-1 case (or even type-2 with delegated service VMs)
we need some sort of mechanism to inform the backend VM about key
details about the system:

  - where virt queue memory is in it's address space
  - how it's going to receive (interrupt) and trigger (kick) events
  - what (if any) resources the backend needs to connect to

Obviously you can elide over configuration issues by having static
configurations and baking the assumptions into your guest images however
this isn't scalable in the long term. The obvious solution seems to be
extending a subset of Device Tree data to user space but perhaps there
are other approaches?

Before any virtio transactions can take place the appropriate memory
mappings need to be made between the FE guest and the BE guest.
Currently the whole of the FE guests address space needs to be visible
to whatever is serving the virtio requests. I can envision 3 approaches:

 * BE guest boots with memory already mapped

 This would entail the guest OS knowing where in it's Guest Physical
 Address space is already taken up and avoiding clashing. I would assume
 in this case you would want a standard interface to userspace to then
 make that address space visible to the backend daemon.

 * BE guests boots with a hypervisor handle to memory

 The BE guest is then free to map the FE's memory to where it wants in
 the BE's guest physical address space. To activate the mapping will
 require some sort of hypercall to the hypervisor. I can see two options
 at this point:

  - expose the handle to userspace for daemon/helper to trigger the
    mapping via existing hypercall interfaces. If using a helper you
    would have a hypervisor specific one to avoid the daemon having to
    care too much about the details or push that complexity into a
    compile time option for the daemon which would result in different
    binaries although a common source base.

  - expose a new kernel ABI to abstract the hypercall differences away
    in the guest kernel. In this case the userspace would essentially
    ask for an abstract "map guest N memory to userspace ptr" and let
    the kernel deal with the different hypercall interfaces. This of
    course assumes the majority of BE guests would be Linux kernels and
    leaves the bare-metal/unikernel approaches to their own devices.

Operation
=========

The core of the operation of VirtIO is fairly simple. Once the
vhost-user feature negotiation is done it's a case of receiving update
events and parsing the resultant virt queue for data. The vhost-user
specification handles a bunch of setup before that point, mostly to
detail where the virt queues are set up FD's for memory and event
communication. This is where the envisioned stub process would be
responsible for getting the daemon up and ready to run. This is
currently done inside a big VMM like QEMU but I suspect a modern
approach would be to use the rust-vmm vhost crate. It would then either
communicate with the kernel's abstracted ABI or be re-targeted as a
build option for the various hypervisors.

One question is how to best handle notification and kicks. The existing
vhost-user framework uses eventfd to signal the daemon (although QEMU
is quite capable of simulating them when you use TCG). Xen has it's own
IOREQ mechanism. However latency is an important factor and having
events go through the stub would add quite a lot.

Could we consider the kernel internally converting IOREQ messages from
the Xen hypervisor to eventfd events? Would this scale with other kernel
hypercall interfaces?

So any thoughts on what directions are worth experimenting with?

-- 
Alex Bennée

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [virtio-dev] Re: Enabling hypervisor agnosticism for VirtIO backends
  2021-08-04  9:04 [virtio-dev] Enabling hypervisor agnosticism for VirtIO backends Alex Bennée
@ 2021-08-05 15:48 ` Stefan Hajnoczi
       [not found] ` <alpine.DEB.2.21.2108041055390.9768@sstabellini-ThinkPad-T480s>
  2021-08-19  9:11 ` [virtio-dev] " Matias Ezequiel Vara Larsen
  2 siblings, 0 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2021-08-05 15:48 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Stratos Mailing List, virtio-dev, Arnd Bergmann, Viresh Kumar,
	AKASHI Takahiro, Stefano Stabellini, Jan Kiszka, Carl van Schaik,
	pratikp, Srivatsa Vaddagiri, Jean-Philippe Brucker,
	Mathieu Poirier

[-- Attachment #1: Type: text/plain, Size: 5334 bytes --]

On Wed, Aug 04, 2021 at 10:04:30AM +0100, Alex Bennée wrote:
> Hi,
> 
> One of the goals of Project Stratos is to enable hypervisor agnostic
> backends so we can enable as much re-use of code as possible and avoid
> repeating ourselves. This is the flip side of the front end where
> multiple front-end implementations are required - one per OS, assuming
> you don't just want Linux guests. The resultant guests are trivially
> movable between hypervisors modulo any abstracted paravirt type
> interfaces.
> 
> In my original thumb nail sketch of a solution I envisioned vhost-user
> daemons running in a broadly POSIX like environment. The interface to
> the daemon is fairly simple requiring only some mapped memory and some
> sort of signalling for events (on Linux this is eventfd). The idea was a
> stub binary would be responsible for any hypervisor specific setup and
> then launch a common binary to deal with the actual virtqueue requests
> themselves.
> 
> Since that original sketch we've seen an expansion in the sort of ways
> backends could be created. There is interest in encapsulating backends
> in RTOSes or unikernels for solutions like SCMI. There interest in Rust
> has prompted ideas of using the trait interface to abstract differences
> away as well as the idea of bare-metal Rust backends.
> 
> We have a card (STR-12) called "Hypercall Standardisation" which
> calls for a description of the APIs needed from the hypervisor side to
> support VirtIO guests and their backends. However we are some way off
> from that at the moment as I think we need to at least demonstrate one
> portable backend before we start codifying requirements. To that end I
> want to think about what we need for a backend to function.
> 
> Configuration
> =============
> 
> In the type-2 setup this is typically fairly simple because the host
> system can orchestrate the various modules that make up the complete
> system. In the type-1 case (or even type-2 with delegated service VMs)
> we need some sort of mechanism to inform the backend VM about key
> details about the system:
> 
>   - where virt queue memory is in it's address space
>   - how it's going to receive (interrupt) and trigger (kick) events
>   - what (if any) resources the backend needs to connect to
> 
> Obviously you can elide over configuration issues by having static
> configurations and baking the assumptions into your guest images however
> this isn't scalable in the long term. The obvious solution seems to be
> extending a subset of Device Tree data to user space but perhaps there
> are other approaches?
> 
> Before any virtio transactions can take place the appropriate memory
> mappings need to be made between the FE guest and the BE guest.
> Currently the whole of the FE guests address space needs to be visible
> to whatever is serving the virtio requests. I can envision 3 approaches:
> 
>  * BE guest boots with memory already mapped
> 
>  This would entail the guest OS knowing where in it's Guest Physical
>  Address space is already taken up and avoiding clashing. I would assume
>  in this case you would want a standard interface to userspace to then
>  make that address space visible to the backend daemon.
> 
>  * BE guests boots with a hypervisor handle to memory
> 
>  The BE guest is then free to map the FE's memory to where it wants in
>  the BE's guest physical address space. To activate the mapping will
>  require some sort of hypercall to the hypervisor. I can see two options
>  at this point:
> 
>   - expose the handle to userspace for daemon/helper to trigger the
>     mapping via existing hypercall interfaces. If using a helper you
>     would have a hypervisor specific one to avoid the daemon having to
>     care too much about the details or push that complexity into a
>     compile time option for the daemon which would result in different
>     binaries although a common source base.
> 
>   - expose a new kernel ABI to abstract the hypercall differences away
>     in the guest kernel. In this case the userspace would essentially
>     ask for an abstract "map guest N memory to userspace ptr" and let
>     the kernel deal with the different hypercall interfaces. This of
>     course assumes the majority of BE guests would be Linux kernels and
>     leaves the bare-metal/unikernel approaches to their own devices.

VIRTIO typically uses the vring memory layout but doesn't need to. The
VIRTIO device model deals with virtqueues. The shared memory vring
layout is part of the VIRTIO transport (PCI, MMIO, and CCW use vrings).
Alternative transports with other virtqueue representations are possible
(e.g. VIRTIO-over-TCP). They don't need to involve a BE mapping shared
memory and processing a vring owned by the FE.

For example, there could be BE hypercalls to pop a virtqueue elements,
push a virtqueue elements, and to access buffers (basically DMA
read/write). The FE could either be a traditional virtio-mmio/pci device
with a vring or use FE hypercalls to add available elements to a
virtqueue and get used elements.

I don't know the goals of project Stratos or whether this helps, but it
might allow other architectures that have different security,
complexity, etc properties.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [virtio-dev] Re: Enabling hypervisor agnosticism for VirtIO backends
       [not found] ` <alpine.DEB.2.21.2108041055390.9768@sstabellini-ThinkPad-T480s>
@ 2021-08-17 10:41   ` Stefan Hajnoczi
       [not found]     ` <20210823062500.GC40863@laputa>
  2021-09-01 12:53     ` Alex Bennée
       [not found]   ` <20210811062748.GB54169@laputa>
       [not found]   ` <0100017b33e585a5-06d4248e-b1a7-485e-800c-7ead89e5f916-000000@email.amazonses.com>
  2 siblings, 2 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2021-08-17 10:41 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Alex Bennée, Stratos Mailing List, virtio-dev, Arnd Bergmann,
	Viresh Kumar, AKASHI Takahiro, Stefano Stabellini, Jan Kiszka,
	Carl van Schaik, pratikp, Srivatsa Vaddagiri,
	Jean-Philippe Brucker, Mathieu Poirier, Wei.Chen, olekstysh,
	Oleksandr_Tyshchenko, Bertrand.Marquis, Artem_Mygaiev, julien,
	jgross, paul, xen-devel

[-- Attachment #1: Type: text/plain, Size: 2020 bytes --]

On Wed, Aug 04, 2021 at 12:20:01PM -0700, Stefano Stabellini wrote:
> > Could we consider the kernel internally converting IOREQ messages from
> > the Xen hypervisor to eventfd events? Would this scale with other kernel
> > hypercall interfaces?
> > 
> > So any thoughts on what directions are worth experimenting with?
>  
> One option we should consider is for each backend to connect to Xen via
> the IOREQ interface. We could generalize the IOREQ interface and make it
> hypervisor agnostic. The interface is really trivial and easy to add.
> The only Xen-specific part is the notification mechanism, which is an
> event channel. If we replaced the event channel with something else the
> interface would be generic. See:
> https://gitlab.com/xen-project/xen/-/blob/staging/xen/include/public/hvm/ioreq.h#L52

There have been experiments with something kind of similar in KVM
recently (see struct ioregionfd_cmd):
https://lore.kernel.org/kvm/dad3d025bcf15ece11d9df0ff685e8ab0a4f2edd.1613828727.git.eafanasova@gmail.com/

> There is also another problem. IOREQ is probably not be the only
> interface needed. Have a look at
> https://marc.info/?l=xen-devel&m=162373754705233&w=2. Don't we also need
> an interface for the backend to inject interrupts into the frontend? And
> if the backend requires dynamic memory mappings of frontend pages, then
> we would also need an interface to map/unmap domU pages.
> 
> These interfaces are a lot more problematic than IOREQ: IOREQ is tiny
> and self-contained. It is easy to add anywhere. A new interface to
> inject interrupts or map pages is more difficult to manage because it
> would require changes scattered across the various emulators.

Something like ioreq is indeed necessary to implement arbitrary devices,
but if you are willing to restrict yourself to VIRTIO then other
interfaces are possible too because the VIRTIO device model is different
from the general purpose x86 PIO/MMIO that Xen's ioreq seems to support.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [virtio-dev] Enabling hypervisor agnosticism for VirtIO backends
  2021-08-04  9:04 [virtio-dev] Enabling hypervisor agnosticism for VirtIO backends Alex Bennée
  2021-08-05 15:48 ` [virtio-dev] " Stefan Hajnoczi
       [not found] ` <alpine.DEB.2.21.2108041055390.9768@sstabellini-ThinkPad-T480s>
@ 2021-08-19  9:11 ` Matias Ezequiel Vara Larsen
       [not found]   ` <20210820060558.GB13452@laputa>
  2021-09-01  8:43   ` Alex Bennée
  2 siblings, 2 replies; 19+ messages in thread
From: Matias Ezequiel Vara Larsen @ 2021-08-19  9:11 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Stratos Mailing List, virtio-dev, Arnd Bergmann, Viresh Kumar,
	AKASHI Takahiro, Stefano Stabellini, stefanha, Jan Kiszka,
	Carl van Schaik, pratikp, Srivatsa Vaddagiri,
	Jean-Philippe Brucker, Mathieu Poirier

Hello Alex,

I can tell you my experience from working on a PoC (library) 
to allow the implementation of virtio-devices that are hypervisor/OS agnostic. 
I focused on two use cases:
1. type-I hypervisor in which the backend is running as a VM. This
is an in-house hypervisor that does not support VMExits.
2. Linux user-space. In this case, the library is just used to
communicate threads. The goal of this use case is merely testing.

I have chosen virtio-mmio as the way to exchange information
between the frontend and backend. I found it hard to synchronize the
access to the virtio-mmio layout without VMExits. I had to add some extra bits to allow 
the front-end and back-end to synchronize, which is required
during the device-status initialization. These extra bits would not be 
needed in case the hypervisor supports VMExits, e.g., KVM.

Each guest has a memory region that is shared with the backend. 
This memory region is used by the frontend to allocate the io-buffers. This region also 
maps the virtio-mmio layout that is initialized by the backend. For the moment, this region 
is defined when the guest is created. One limitation is that the memory for io-buffers is fixed. 
At some point, the guest shall be able to balloon this region. Notifications between 
the frontend and the backend are implemented by using an hypercall. The hypercall 
mechanism and the memory allocation are abstracted away by a platform layer that 
exposes an interface that is hypervisor/os agnostic.

I split the backend into a virtio-device driver and a
backend driver. The virtio-device driver is the virtqueues and the
backend driver gets packets from the virtqueue for
post-processing. For example, in the case of virtio-net, the backend
driver would decide if the packet goes to the hardware or to another
virtio-net device. The virtio-device drivers may be
implemented in different ways like by using a single thread, multiple threads, 
or one thread for all the virtio-devices.

In this PoC, I just tackled two very simple use-cases. These
use-cases allowed me to extract some requirements for an hypervisor to
support virtio.

Matias

On Wed, Aug 04, 2021 at 10:04:30AM +0100, Alex Bennée wrote:
> Hi,
> 
> One of the goals of Project Stratos is to enable hypervisor agnostic
> backends so we can enable as much re-use of code as possible and avoid
> repeating ourselves. This is the flip side of the front end where
> multiple front-end implementations are required - one per OS, assuming
> you don't just want Linux guests. The resultant guests are trivially
> movable between hypervisors modulo any abstracted paravirt type
> interfaces.
> 
> In my original thumb nail sketch of a solution I envisioned vhost-user
> daemons running in a broadly POSIX like environment. The interface to
> the daemon is fairly simple requiring only some mapped memory and some
> sort of signalling for events (on Linux this is eventfd). The idea was a
> stub binary would be responsible for any hypervisor specific setup and
> then launch a common binary to deal with the actual virtqueue requests
> themselves.
> 
> Since that original sketch we've seen an expansion in the sort of ways
> backends could be created. There is interest in encapsulating backends
> in RTOSes or unikernels for solutions like SCMI. There interest in Rust
> has prompted ideas of using the trait interface to abstract differences
> away as well as the idea of bare-metal Rust backends.
> 
> We have a card (STR-12) called "Hypercall Standardisation" which
> calls for a description of the APIs needed from the hypervisor side to
> support VirtIO guests and their backends. However we are some way off
> from that at the moment as I think we need to at least demonstrate one
> portable backend before we start codifying requirements. To that end I
> want to think about what we need for a backend to function.
> 
> Configuration
> =============
> 
> In the type-2 setup this is typically fairly simple because the host
> system can orchestrate the various modules that make up the complete
> system. In the type-1 case (or even type-2 with delegated service VMs)
> we need some sort of mechanism to inform the backend VM about key
> details about the system:
> 
>   - where virt queue memory is in it's address space
>   - how it's going to receive (interrupt) and trigger (kick) events
>   - what (if any) resources the backend needs to connect to
> 
> Obviously you can elide over configuration issues by having static
> configurations and baking the assumptions into your guest images however
> this isn't scalable in the long term. The obvious solution seems to be
> extending a subset of Device Tree data to user space but perhaps there
> are other approaches?
> 
> Before any virtio transactions can take place the appropriate memory
> mappings need to be made between the FE guest and the BE guest.
> Currently the whole of the FE guests address space needs to be visible
> to whatever is serving the virtio requests. I can envision 3 approaches:
> 
>  * BE guest boots with memory already mapped
> 
>  This would entail the guest OS knowing where in it's Guest Physical
>  Address space is already taken up and avoiding clashing. I would assume
>  in this case you would want a standard interface to userspace to then
>  make that address space visible to the backend daemon.
> 
>  * BE guests boots with a hypervisor handle to memory
> 
>  The BE guest is then free to map the FE's memory to where it wants in
>  the BE's guest physical address space. To activate the mapping will
>  require some sort of hypercall to the hypervisor. I can see two options
>  at this point:
> 
>   - expose the handle to userspace for daemon/helper to trigger the
>     mapping via existing hypercall interfaces. If using a helper you
>     would have a hypervisor specific one to avoid the daemon having to
>     care too much about the details or push that complexity into a
>     compile time option for the daemon which would result in different
>     binaries although a common source base.
> 
>   - expose a new kernel ABI to abstract the hypercall differences away
>     in the guest kernel. In this case the userspace would essentially
>     ask for an abstract "map guest N memory to userspace ptr" and let
>     the kernel deal with the different hypercall interfaces. This of
>     course assumes the majority of BE guests would be Linux kernels and
>     leaves the bare-metal/unikernel approaches to their own devices.
> 
> Operation
> =========
> 
> The core of the operation of VirtIO is fairly simple. Once the
> vhost-user feature negotiation is done it's a case of receiving update
> events and parsing the resultant virt queue for data. The vhost-user
> specification handles a bunch of setup before that point, mostly to
> detail where the virt queues are set up FD's for memory and event
> communication. This is where the envisioned stub process would be
> responsible for getting the daemon up and ready to run. This is
> currently done inside a big VMM like QEMU but I suspect a modern
> approach would be to use the rust-vmm vhost crate. It would then either
> communicate with the kernel's abstracted ABI or be re-targeted as a
> build option for the various hypervisors.
> 
> One question is how to best handle notification and kicks. The existing
> vhost-user framework uses eventfd to signal the daemon (although QEMU
> is quite capable of simulating them when you use TCG). Xen has it's own
> IOREQ mechanism. However latency is an important factor and having
> events go through the stub would add quite a lot.
> 
> Could we consider the kernel internally converting IOREQ messages from
> the Xen hypervisor to eventfd events? Would this scale with other kernel
> hypercall interfaces?
> 
> So any thoughts on what directions are worth experimenting with?
> 
> -- 
> Alex Bennée
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [virtio-dev] Enabling hypervisor agnosticism for VirtIO backends
       [not found]   ` <20210820060558.GB13452@laputa>
@ 2021-08-21 14:08     ` Matias Ezequiel Vara Larsen
       [not found]       ` <20210823012029.GB40863@laputa>
  0 siblings, 1 reply; 19+ messages in thread
From: Matias Ezequiel Vara Larsen @ 2021-08-21 14:08 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: Alex Benn??e, Stratos Mailing List, virtio-dev, Arnd Bergmann,
	Viresh Kumar, Stefano Stabellini, stefanha, Jan Kiszka,
	Carl van Schaik, pratikp, Srivatsa Vaddagiri,
	Jean-Philippe Brucker, Mathieu Poirier

Hello,

On Fri, Aug 20, 2021 at 03:05:58PM +0900, AKASHI Takahiro wrote:
> Hi Matias,
> 
> On Thu, Aug 19, 2021 at 11:11:55AM +0200, Matias Ezequiel Vara Larsen wrote:
> > Hello Alex,
> > 
> > I can tell you my experience from working on a PoC (library) 
> > to allow the implementation of virtio-devices that are hypervisor/OS agnostic. 
> 
> What hypervisor are you using for your PoC here?
> 

I am using an in-house hypervisor, which is similar to Jailhouse.

> > I focused on two use cases:
> > 1. type-I hypervisor in which the backend is running as a VM. This
> > is an in-house hypervisor that does not support VMExits.
> > 2. Linux user-space. In this case, the library is just used to
> > communicate threads. The goal of this use case is merely testing.
> > 
> > I have chosen virtio-mmio as the way to exchange information
> > between the frontend and backend. I found it hard to synchronize the
> > access to the virtio-mmio layout without VMExits. I had to add some extra bits to allow 
> 
> Can you explain how MMIOs to registers in virito-mmio layout
> (which I think means a configuration space?) will be propagated to BE?
> 

In this PoC, the BE guest is created with a fixed number of regions
of memory that represents each device. The BE initializes these regions, and then, waits
for the FEs to begin the initialization. 

> > the front-end and back-end to synchronize, which is required
> > during the device-status initialization. These extra bits would not be 
> > needed in case the hypervisor supports VMExits, e.g., KVM.
> > 
> > Each guest has a memory region that is shared with the backend. 
> > This memory region is used by the frontend to allocate the io-buffers. This region also 
> > maps the virtio-mmio layout that is initialized by the backend. For the moment, this region 
> > is defined when the guest is created. One limitation is that the memory for io-buffers is fixed. 
> 
> So in summary, you have a single memory region that is used
> for virtio-mmio layout and io-buffers (I think they are for payload)
> and you assume that the region will be (at lease for now) statically
> shared between FE and BE so that you can eliminate 'mmap' at every
> time to access the payload.
> Correct?
>

Yes, It is. 

> If so, it can be an alternative solution for memory access issue,
> and a similar technique is used in some implementations:
> - (Jailhouse's) ivshmem
> - Arnd's fat virtqueue
>
> In either case, however, you will have to allocate payload from the region
> and so you will see some impact on FE code (at least at some low level).
> (In ivshmem, dma_ops in the kernel is defined for this purpose.)
> Correct?

Yes, It is. The FE implements a sort of malloc() to organize the allocation of io-buffers from that
memory region.

Rethinking about the VMExits, I am not sure how this mechanism may be used when both the FE and 
the BE are VMs. The use of VMExits may require to involve the hypervisor.

Matias
> 
> -Takahiro Akashi
> 
> > At some point, the guest shall be able to balloon this region. Notifications between 
> > the frontend and the backend are implemented by using an hypercall. The hypercall 
> > mechanism and the memory allocation are abstracted away by a platform layer that 
> > exposes an interface that is hypervisor/os agnostic.
> > 
> > I split the backend into a virtio-device driver and a
> > backend driver. The virtio-device driver is the virtqueues and the
> > backend driver gets packets from the virtqueue for
> > post-processing. For example, in the case of virtio-net, the backend
> > driver would decide if the packet goes to the hardware or to another
> > virtio-net device. The virtio-device drivers may be
> > implemented in different ways like by using a single thread, multiple threads, 
> > or one thread for all the virtio-devices.
> > 
> > In this PoC, I just tackled two very simple use-cases. These
> > use-cases allowed me to extract some requirements for an hypervisor to
> > support virtio.
> > 
> > Matias
> > 
> > On Wed, Aug 04, 2021 at 10:04:30AM +0100, Alex Bennée wrote:
> > > Hi,
> > > 
> > > One of the goals of Project Stratos is to enable hypervisor agnostic
> > > backends so we can enable as much re-use of code as possible and avoid
> > > repeating ourselves. This is the flip side of the front end where
> > > multiple front-end implementations are required - one per OS, assuming
> > > you don't just want Linux guests. The resultant guests are trivially
> > > movable between hypervisors modulo any abstracted paravirt type
> > > interfaces.
> > > 
> > > In my original thumb nail sketch of a solution I envisioned vhost-user
> > > daemons running in a broadly POSIX like environment. The interface to
> > > the daemon is fairly simple requiring only some mapped memory and some
> > > sort of signalling for events (on Linux this is eventfd). The idea was a
> > > stub binary would be responsible for any hypervisor specific setup and
> > > then launch a common binary to deal with the actual virtqueue requests
> > > themselves.
> > > 
> > > Since that original sketch we've seen an expansion in the sort of ways
> > > backends could be created. There is interest in encapsulating backends
> > > in RTOSes or unikernels for solutions like SCMI. There interest in Rust
> > > has prompted ideas of using the trait interface to abstract differences
> > > away as well as the idea of bare-metal Rust backends.
> > > 
> > > We have a card (STR-12) called "Hypercall Standardisation" which
> > > calls for a description of the APIs needed from the hypervisor side to
> > > support VirtIO guests and their backends. However we are some way off
> > > from that at the moment as I think we need to at least demonstrate one
> > > portable backend before we start codifying requirements. To that end I
> > > want to think about what we need for a backend to function.
> > > 
> > > Configuration
> > > =============
> > > 
> > > In the type-2 setup this is typically fairly simple because the host
> > > system can orchestrate the various modules that make up the complete
> > > system. In the type-1 case (or even type-2 with delegated service VMs)
> > > we need some sort of mechanism to inform the backend VM about key
> > > details about the system:
> > > 
> > >   - where virt queue memory is in it's address space
> > >   - how it's going to receive (interrupt) and trigger (kick) events
> > >   - what (if any) resources the backend needs to connect to
> > > 
> > > Obviously you can elide over configuration issues by having static
> > > configurations and baking the assumptions into your guest images however
> > > this isn't scalable in the long term. The obvious solution seems to be
> > > extending a subset of Device Tree data to user space but perhaps there
> > > are other approaches?
> > > 
> > > Before any virtio transactions can take place the appropriate memory
> > > mappings need to be made between the FE guest and the BE guest.
> > > Currently the whole of the FE guests address space needs to be visible
> > > to whatever is serving the virtio requests. I can envision 3 approaches:
> > > 
> > >  * BE guest boots with memory already mapped
> > > 
> > >  This would entail the guest OS knowing where in it's Guest Physical
> > >  Address space is already taken up and avoiding clashing. I would assume
> > >  in this case you would want a standard interface to userspace to then
> > >  make that address space visible to the backend daemon.
> > > 
> > >  * BE guests boots with a hypervisor handle to memory
> > > 
> > >  The BE guest is then free to map the FE's memory to where it wants in
> > >  the BE's guest physical address space. To activate the mapping will
> > >  require some sort of hypercall to the hypervisor. I can see two options
> > >  at this point:
> > > 
> > >   - expose the handle to userspace for daemon/helper to trigger the
> > >     mapping via existing hypercall interfaces. If using a helper you
> > >     would have a hypervisor specific one to avoid the daemon having to
> > >     care too much about the details or push that complexity into a
> > >     compile time option for the daemon which would result in different
> > >     binaries although a common source base.
> > > 
> > >   - expose a new kernel ABI to abstract the hypercall differences away
> > >     in the guest kernel. In this case the userspace would essentially
> > >     ask for an abstract "map guest N memory to userspace ptr" and let
> > >     the kernel deal with the different hypercall interfaces. This of
> > >     course assumes the majority of BE guests would be Linux kernels and
> > >     leaves the bare-metal/unikernel approaches to their own devices.
> > > 
> > > Operation
> > > =========
> > > 
> > > The core of the operation of VirtIO is fairly simple. Once the
> > > vhost-user feature negotiation is done it's a case of receiving update
> > > events and parsing the resultant virt queue for data. The vhost-user
> > > specification handles a bunch of setup before that point, mostly to
> > > detail where the virt queues are set up FD's for memory and event
> > > communication. This is where the envisioned stub process would be
> > > responsible for getting the daemon up and ready to run. This is
> > > currently done inside a big VMM like QEMU but I suspect a modern
> > > approach would be to use the rust-vmm vhost crate. It would then either
> > > communicate with the kernel's abstracted ABI or be re-targeted as a
> > > build option for the various hypervisors.
> > > 
> > > One question is how to best handle notification and kicks. The existing
> > > vhost-user framework uses eventfd to signal the daemon (although QEMU
> > > is quite capable of simulating them when you use TCG). Xen has it's own
> > > IOREQ mechanism. However latency is an important factor and having
> > > events go through the stub would add quite a lot.
> > > 
> > > Could we consider the kernel internally converting IOREQ messages from
> > > the Xen hypervisor to eventfd events? Would this scale with other kernel
> > > hypercall interfaces?
> > > 
> > > So any thoughts on what directions are worth experimenting with?
> > > 
> > > -- 
> > > Alex Bennée
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [virtio-dev] Re: Enabling hypervisor agnosticism for VirtIO backends
       [not found]     ` <20210823062500.GC40863@laputa>
@ 2021-08-23  9:58       ` Stefan Hajnoczi
       [not found]         ` <20210825102945.GA89209@laputa>
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Hajnoczi @ 2021-08-23  9:58 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: Stefano Stabellini, Alex Benn??e, Stratos Mailing List,
	virtio-dev, Arnd Bergmann, Viresh Kumar, Stefano Stabellini,
	Jan Kiszka, Carl van Schaik, pratikp, Srivatsa Vaddagiri,
	Jean-Philippe Brucker, Mathieu Poirier, Wei.Chen, olekstysh,
	Oleksandr_Tyshchenko, Bertrand.Marquis, Artem_Mygaiev, julien,
	jgross, paul, xen-devel

[-- Attachment #1: Type: text/plain, Size: 3995 bytes --]

On Mon, Aug 23, 2021 at 03:25:00PM +0900, AKASHI Takahiro wrote:
> Hi Stefan,
> 
> On Tue, Aug 17, 2021 at 11:41:01AM +0100, Stefan Hajnoczi wrote:
> > On Wed, Aug 04, 2021 at 12:20:01PM -0700, Stefano Stabellini wrote:
> > > > Could we consider the kernel internally converting IOREQ messages from
> > > > the Xen hypervisor to eventfd events? Would this scale with other kernel
> > > > hypercall interfaces?
> > > > 
> > > > So any thoughts on what directions are worth experimenting with?
> > >  
> > > One option we should consider is for each backend to connect to Xen via
> > > the IOREQ interface. We could generalize the IOREQ interface and make it
> > > hypervisor agnostic. The interface is really trivial and easy to add.
> > > The only Xen-specific part is the notification mechanism, which is an
> > > event channel. If we replaced the event channel with something else the
> > > interface would be generic. See:
> > > https://gitlab.com/xen-project/xen/-/blob/staging/xen/include/public/hvm/ioreq.h#L52
> > 
> > There have been experiments with something kind of similar in KVM
> > recently (see struct ioregionfd_cmd):
> > https://lore.kernel.org/kvm/dad3d025bcf15ece11d9df0ff685e8ab0a4f2edd.1613828727.git.eafanasova@gmail.com/
> 
> Do you know the current status of Elena's work?
> It was last February that she posted her latest patch
> and it has not been merged upstream yet.

Elena worked on this during her Outreachy internship. At the moment no
one is actively working on the patches.

> > > There is also another problem. IOREQ is probably not be the only
> > > interface needed. Have a look at
> > > https://marc.info/?l=xen-devel&m=162373754705233&w=2. Don't we also need
> > > an interface for the backend to inject interrupts into the frontend? And
> > > if the backend requires dynamic memory mappings of frontend pages, then
> > > we would also need an interface to map/unmap domU pages.
> > > 
> > > These interfaces are a lot more problematic than IOREQ: IOREQ is tiny
> > > and self-contained. It is easy to add anywhere. A new interface to
> > > inject interrupts or map pages is more difficult to manage because it
> > > would require changes scattered across the various emulators.
> > 
> > Something like ioreq is indeed necessary to implement arbitrary devices,
> > but if you are willing to restrict yourself to VIRTIO then other
> > interfaces are possible too because the VIRTIO device model is different
> > from the general purpose x86 PIO/MMIO that Xen's ioreq seems to support.
> 
> Can you please elaborate your thoughts a bit more here?
> 
> It seems to me that trapping MMIOs to configuration space and
> forwarding those events to BE (or device emulation) is a quite
> straight-forward way to emulate device MMIOs.
> Or do you think of something of protocols used in vhost-user?
> 
> # On the contrary, virtio-ivshmem only requires a driver to explicitly
> # forward a "write" request of MMIO accesses to BE. But I don't think
> # it's your point. 

See my first reply to this email thread about alternative interfaces for
VIRTIO device emulation. The main thing to note was that although the
shared memory vring is used by VIRTIO transports today, the device model
actually allows transports to implement virtqueues differently (e.g.
making it possible to create a VIRTIO over TCP transport without shared
memory in the future).

It's possible to define a hypercall interface as a new VIRTIO transport
that provides higher-level virtqueue operations. Doing this is more work
than using vrings though since existing guest driver and device
emulation code already supports vrings.

I don't know the requirements of Stratos so I can't say if creating a
new hypervisor-independent interface (VIRTIO transport) that doesn't
rely on shared memory vrings makes sense. I just wanted to raise the
idea in case you find that VIRTIO's vrings don't meet your requirements.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [virtio-dev] Re: Enabling hypervisor agnosticism for VirtIO backends
       [not found]         ` <20210825102945.GA89209@laputa>
@ 2021-08-25 15:02           ` Stefan Hajnoczi
  0 siblings, 0 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2021-08-25 15:02 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: Stefano Stabellini, Alex Benn??e, Stratos Mailing List,
	virtio-dev, Arnd Bergmann, Viresh Kumar, Stefano Stabellini,
	Jan Kiszka, Carl van Schaik, pratikp, Srivatsa Vaddagiri,
	Jean-Philippe Brucker, Mathieu Poirier, Wei.Chen, olekstysh,
	Oleksandr_Tyshchenko, Bertrand.Marquis, Artem_Mygaiev, julien,
	jgross, paul, xen-devel

[-- Attachment #1: Type: text/plain, Size: 5474 bytes --]

On Wed, Aug 25, 2021 at 07:29:45PM +0900, AKASHI Takahiro wrote:
> On Mon, Aug 23, 2021 at 10:58:46AM +0100, Stefan Hajnoczi wrote:
> > On Mon, Aug 23, 2021 at 03:25:00PM +0900, AKASHI Takahiro wrote:
> > > Hi Stefan,
> > > 
> > > On Tue, Aug 17, 2021 at 11:41:01AM +0100, Stefan Hajnoczi wrote:
> > > > On Wed, Aug 04, 2021 at 12:20:01PM -0700, Stefano Stabellini wrote:
> > > > > > Could we consider the kernel internally converting IOREQ messages from
> > > > > > the Xen hypervisor to eventfd events? Would this scale with other kernel
> > > > > > hypercall interfaces?
> > > > > > 
> > > > > > So any thoughts on what directions are worth experimenting with?
> > > > >  
> > > > > One option we should consider is for each backend to connect to Xen via
> > > > > the IOREQ interface. We could generalize the IOREQ interface and make it
> > > > > hypervisor agnostic. The interface is really trivial and easy to add.
> > > > > The only Xen-specific part is the notification mechanism, which is an
> > > > > event channel. If we replaced the event channel with something else the
> > > > > interface would be generic. See:
> > > > > https://gitlab.com/xen-project/xen/-/blob/staging/xen/include/public/hvm/ioreq.h#L52
> > > > 
> > > > There have been experiments with something kind of similar in KVM
> > > > recently (see struct ioregionfd_cmd):
> > > > https://lore.kernel.org/kvm/dad3d025bcf15ece11d9df0ff685e8ab0a4f2edd.1613828727.git.eafanasova@gmail.com/
> > > 
> > > Do you know the current status of Elena's work?
> > > It was last February that she posted her latest patch
> > > and it has not been merged upstream yet.
> > 
> > Elena worked on this during her Outreachy internship. At the moment no
> > one is actively working on the patches.
> 
> Does RedHat plan to take over or follow up her work hereafter?
> # I'm simply asking from my curiosity.

At the moment I'm not aware of anyone from Red Hat working on it. If
someone decides they need this KVM API then that could change.

> > > > > There is also another problem. IOREQ is probably not be the only
> > > > > interface needed. Have a look at
> > > > > https://marc.info/?l=xen-devel&m=162373754705233&w=2. Don't we also need
> > > > > an interface for the backend to inject interrupts into the frontend? And
> > > > > if the backend requires dynamic memory mappings of frontend pages, then
> > > > > we would also need an interface to map/unmap domU pages.
> > > > > 
> > > > > These interfaces are a lot more problematic than IOREQ: IOREQ is tiny
> > > > > and self-contained. It is easy to add anywhere. A new interface to
> > > > > inject interrupts or map pages is more difficult to manage because it
> > > > > would require changes scattered across the various emulators.
> > > > 
> > > > Something like ioreq is indeed necessary to implement arbitrary devices,
> > > > but if you are willing to restrict yourself to VIRTIO then other
> > > > interfaces are possible too because the VIRTIO device model is different
> > > > from the general purpose x86 PIO/MMIO that Xen's ioreq seems to support.
> > > 
> > > Can you please elaborate your thoughts a bit more here?
> > > 
> > > It seems to me that trapping MMIOs to configuration space and
> > > forwarding those events to BE (or device emulation) is a quite
> > > straight-forward way to emulate device MMIOs.
> > > Or do you think of something of protocols used in vhost-user?
> > > 
> > > # On the contrary, virtio-ivshmem only requires a driver to explicitly
> > > # forward a "write" request of MMIO accesses to BE. But I don't think
> > > # it's your point. 
> > 
> > See my first reply to this email thread about alternative interfaces for
> > VIRTIO device emulation. The main thing to note was that although the
> > shared memory vring is used by VIRTIO transports today, the device model
> > actually allows transports to implement virtqueues differently (e.g.
> > making it possible to create a VIRTIO over TCP transport without shared
> > memory in the future).
> 
> Do you have any example of such use cases or systems?

This aspect of VIRTIO isn't being exploited today AFAIK. But the
layering to allow other virtqueue implementations is there. For example,
Linux's virtqueue API is independent of struct vring, so existing
drivers generally aren't tied to vrings.

> > It's possible to define a hypercall interface as a new VIRTIO transport
> > that provides higher-level virtqueue operations. Doing this is more work
> > than using vrings though since existing guest driver and device
> > emulation code already supports vrings.
> 
> Personally, I'm open to discuss about your point, but
> 
> > I don't know the requirements of Stratos so I can't say if creating a
> > new hypervisor-independent interface (VIRTIO transport) that doesn't
> > rely on shared memory vrings makes sense. I just wanted to raise the
> > idea in case you find that VIRTIO's vrings don't meet your requirements.
> 
> While I cannot represent the project's view, what the JIRA task
> that is assigned to me describes:
>   Deliverables
>     * Low level library allowing:
>     * management of virtio rings and buffers
>   [and so on]
> So supporting the shared memory-based vring is one of our assumptions.

If shared memory is allowed then vrings are the natural choice. That way
existing virtio code will work with minimal modifications.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [virtio-dev] Re: Enabling hypervisor agnosticism for VirtIO backends
       [not found]                       ` <CACMJ4GbmNgbB5ponYt3NGEk3j6YCksot+kDy2qs8HMdFXWnQbw@mail.gmail.com>
@ 2021-08-30 19:53                         ` Christopher Clark
       [not found]                           ` <20210902071902.GC71098@laputa>
  0 siblings, 1 reply; 19+ messages in thread
From: Christopher Clark @ 2021-08-30 19:53 UTC (permalink / raw)
  To: Wei Chen
  Cc: AKASHI Takahiro, Oleksandr Tyshchenko, Stefano Stabellini,
	Alex Benn??e, Kaly Xin, Stratos Mailing List,
	virtio-dev@lists.oasis-open.org, Arnd Bergmann, Viresh Kumar,
	Stefano Stabellini, stefanha@redhat.com, Jan Kiszka,
	Carl van Schaik, pratikp@quicinc.com, Srivatsa Vaddagiri,
	Jean-Philippe Brucker, Mathieu Poirier, Oleksandr Tyshchenko,
	Bertrand Marquis, Artem Mygaiev, Julien Grall, Juergen Gross,
	Paul Durrant, Xen Devel, Rich Persaud, Daniel Smith

[-- Attachment #1: Type: text/plain, Size: 47054 bytes --]

[ resending message to ensure delivery to the CCd mailing lists
post-subscription ]

Apologies for being late to this thread, but I hope to be able to
contribute to
this discussion in a meaningful way. I am grateful for the level of
interest in
this topic. I would like to draw your attention to Argo as a suitable
technology for development of VirtIO's hypervisor-agnostic interfaces.

* Argo is an interdomain communication mechanism in Xen (on x86 and Arm)
that
  can send and receive hypervisor-mediated notifications and messages
between
  domains (VMs). [1] The hypervisor can enforce Mandatory Access Control
over
  all communication between domains. It is derived from the earlier v4v,
which
  has been deployed on millions of machines with the HP/Bromium uXen
hypervisor
  and with OpenXT.

* Argo has a simple interface with a small number of operations that was
  designed for ease of integration into OS primitives on both Linux
(sockets)
  and Windows (ReadFile/WriteFile) [2].
    - A unikernel example of using it has also been developed for XTF. [3]

* There has been recent discussion and support in the Xen community for
making
  revisions to the Argo interface to make it hypervisor-agnostic, and
support
  implementations of Argo on other hypervisors. This will enable a single
  interface for an OS kernel binary to use for inter-VM communication that
will
  work on multiple hypervisors -- this applies equally to both backends and
  frontend implementations. [4]

* Here are the design documents for building VirtIO-over-Argo, to support a
  hypervisor-agnostic frontend VirtIO transport driver using Argo.

The Development Plan to build VirtIO virtual device support over Argo
transport:
https://openxt.atlassian.net/wiki/spaces/DC/pages/1696169985/VirtIO-Argo+Development+Phase+1

A design for using VirtIO over Argo, describing how VirtIO data structures
and communication is handled over the Argo transport:
https://openxt.atlassian.net/wiki/spaces/DC/pages/1348763698/VirtIO+Argo

Diagram (from the above document) showing how VirtIO rings are synchronized
between domains without using shared memory:
https://openxt.atlassian.net/46e1c93b-2b87-4cb2-951e-abd4377a1194#media-blob-url=true&id=01f7d0e1-7686-4f0b-88e1-457c1d30df40&collection=contentId-1348763698&contextId=1348763698&mimeType=image%2Fpng&name=device-buffer-access-virtio-argo.png&size=243175&width=1106&height=1241

Please note that the above design documents show that the existing VirtIO
device drivers, and both vring and virtqueue data structures can be
preserved
while interdomain communication can be performed with no shared memory
required
for most drivers; (the exceptions where further design is required are those
such as virtual framebuffer devices where shared memory regions are
intentionally
added to the communication structure beyond the vrings and virtqueues).

An analysis of VirtIO and Argo, informing the design:
https://openxt.atlassian.net/wiki/spaces/DC/pages/1333428225/Analysis+of+Argo+as+a+transport+medium+for+VirtIO

* Argo can be used for a communication path for configuration between the
backend
  and the toolstack, avoiding the need for a dependency on XenStore, which
is an
  advantage for any hypervisor-agnostic design. It is also amenable to a
notification
  mechanism that is not based on Xen event channels.

* Argo does not use or require shared memory between VMs and provides an
alternative
  to the use of foreign shared memory mappings. It avoids some of the
complexities
  involved with using grants (eg. XSA-300).

* Argo supports Mandatory Access Control by the hypervisor, satisfying a
common
  certification requirement.

* The Argo headers are BSD-licensed and the Xen hypervisor implementation
is GPLv2 but
  accessible via the hypercall interface. The licensing should not present
an obstacle
  to adoption of Argo in guest software or implementation by other
hypervisors.

* Since the interface that Argo presents to a guest VM is similar to DMA, a
VirtIO-Argo
  frontend transport driver should be able to operate with a physical
VirtIO-enabled
  smart-NIC if the toolstack and an Argo-aware backend provide support.

The next Xen Community Call is next week and I would be happy to answer
questions
about Argo and on this topic. I will also be following this thread.

Christopher
(Argo maintainer, Xen Community)

--------------------------------------------------------------------------------
[1]
An introduction to Argo:
https://static.sched.com/hosted_files/xensummit19/92/Argo%20and%20HMX%20-%20OpenXT%20-%20Christopher%20Clark%20-%20Xen%20Summit%202019.pdf
https://www.youtube.com/watch?v=cnC0Tg3jqJQ
Xen Wiki page for Argo:
https://wiki.xenproject.org/wiki/Argo:_Hypervisor-Mediated_Exchange_(HMX)_for_Xen

[2]
OpenXT Linux Argo driver and userspace library:
https://github.com/openxt/linux-xen-argo

Windows V4V at OpenXT wiki:
https://openxt.atlassian.net/wiki/spaces/DC/pages/14844007/V4V
Windows v4v driver source:
https://github.com/OpenXT/xc-windows/tree/master/xenv4v

HP/Bromium uXen V4V driver:
https://github.com/uxen-virt/uxen/tree/ascara/windows/uxenv4vlib

[3]
v2 of the Argo test unikernel for XTF:
https://lists.xenproject.org/archives/html/xen-devel/2021-01/msg02234.html

[4]
Argo HMX Transport for VirtIO meeting minutes:
https://lists.xenproject.org/archives/html/xen-devel/2021-02/msg01422.html

VirtIO-Argo Development wiki page:
https://openxt.atlassian.net/wiki/spaces/DC/pages/1696169985/VirtIO-Argo+Development+Phase+1


> On Thu, Aug 26, 2021 at 5:11 AM Wei Chen <Wei.Chen@arm.com> wrote:
>
>> Hi Akashi,
>>
>> > -----Original Message-----
>> > From: AKASHI Takahiro <takahiro.akashi@linaro.org>
>> > Sent: 2021年8月26日 17:41
>> > To: Wei Chen <Wei.Chen@arm.com>
>> > Cc: Oleksandr Tyshchenko <olekstysh@gmail.com>; Stefano Stabellini
>> > <sstabellini@kernel.org>; Alex Benn??e <alex.bennee@linaro.org>; Kaly
>> Xin
>> > <Kaly.Xin@arm.com>; Stratos Mailing List <
>> stratos-dev@op-lists.linaro.org>;
>> > virtio-dev@lists.oasis-open.org; Arnd Bergmann <
>> arnd.bergmann@linaro.org>;
>> > Viresh Kumar <viresh.kumar@linaro.org>; Stefano Stabellini
>> > <stefano.stabellini@xilinx.com>; stefanha@redhat.com; Jan Kiszka
>> > <jan.kiszka@siemens.com>; Carl van Schaik <cvanscha@qti.qualcomm.com>;
>> > pratikp@quicinc.com; Srivatsa Vaddagiri <vatsa@codeaurora.org>; Jean-
>> > Philippe Brucker <jean-philippe@linaro.org>; Mathieu Poirier
>> > <mathieu.poirier@linaro.org>; Oleksandr Tyshchenko
>> > <Oleksandr_Tyshchenko@epam.com>; Bertrand Marquis
>> > <Bertrand.Marquis@arm.com>; Artem Mygaiev <Artem_Mygaiev@epam.com>;
>> Julien
>> > Grall <julien@xen.org>; Juergen Gross <jgross@suse.com>; Paul Durrant
>> > <paul@xen.org>; Xen Devel <xen-devel@lists.xen.org>
>> > Subject: Re: Enabling hypervisor agnosticism for VirtIO backends
>> >
>> > Hi Wei,
>> >
>> > On Fri, Aug 20, 2021 at 03:41:50PM +0900, AKASHI Takahiro wrote:
>> > > On Wed, Aug 18, 2021 at 08:35:51AM +0000, Wei Chen wrote:
>> > > > Hi Akashi，
>> > > >
>> > > > > -----Original Message-----
>> > > > > From: AKASHI Takahiro <takahiro.akashi@linaro.org>
>> > > > > Sent: 2021年8月18日 13:39
>> > > > > To: Wei Chen <Wei.Chen@arm.com>
>> > > > > Cc: Oleksandr Tyshchenko <olekstysh@gmail.com>; Stefano
>> Stabellini
>> > > > > <sstabellini@kernel.org>; Alex Benn??e <alex.bennee@linaro.org>;
>> > Stratos
>> > > > > Mailing List <stratos-dev@op-lists.linaro.org>; virtio-
>> > dev@lists.oasis-
>> > > > > open.org; Arnd Bergmann <arnd.bergmann@linaro.org>; Viresh Kumar
>> > > > > <viresh.kumar@linaro.org>; Stefano Stabellini
>> > > > > <stefano.stabellini@xilinx.com>; stefanha@redhat.com; Jan Kiszka
>> > > > > <jan.kiszka@siemens.com>; Carl van Schaik
>> > <cvanscha@qti.qualcomm.com>;
>> > > > > pratikp@quicinc.com; Srivatsa Vaddagiri <vatsa@codeaurora.org>;
>> > Jean-
>> > > > > Philippe Brucker <jean-philippe@linaro.org>; Mathieu Poirier
>> > > > > <mathieu.poirier@linaro.org>; Oleksandr Tyshchenko
>> > > > > <Oleksandr_Tyshchenko@epam.com>; Bertrand Marquis
>> > > > > <Bertrand.Marquis@arm.com>; Artem Mygaiev <Artem_Mygaiev@epam.com
>> >;
>> > Julien
>> > > > > Grall <julien@xen.org>; Juergen Gross <jgross@suse.com>; Paul
>> > Durrant
>> > > > > <paul@xen.org>; Xen Devel <xen-devel@lists.xen.org>
>> > > > > Subject: Re: Enabling hypervisor agnosticism for VirtIO backends
>> > > > >
>> > > > > On Tue, Aug 17, 2021 at 08:39:09AM +0000, Wei Chen wrote:
>> > > > > > Hi Akashi,
>> > > > > >
>> > > > > > > -----Original Message-----
>> > > > > > > From: AKASHI Takahiro <takahiro.akashi@linaro.org>
>> > > > > > > Sent: 2021年8月17日 16:08
>> > > > > > > To: Wei Chen <Wei.Chen@arm.com>
>> > > > > > > Cc: Oleksandr Tyshchenko <olekstysh@gmail.com>; Stefano
>> > Stabellini
>> > > > > > > <sstabellini@kernel.org>; Alex Benn??e <
>> alex.bennee@linaro.org>;
>> > > > > Stratos
>> > > > > > > Mailing List <stratos-dev@op-lists.linaro.org>; virtio-
>> > > > > dev@lists.oasis-
>> > > > > > > open.org; Arnd Bergmann <arnd.bergmann@linaro.org>; Viresh
>> Kumar
>> > > > > > > <viresh.kumar@linaro.org>; Stefano Stabellini
>> > > > > > > <stefano.stabellini@xilinx.com>; stefanha@redhat.com; Jan
>> Kiszka
>> > > > > > > <jan.kiszka@siemens.com>; Carl van Schaik
>> > <cvanscha@qti.qualcomm.com>;
>> > > > > > > pratikp@quicinc.com; Srivatsa Vaddagiri <vatsa@codeaurora.org
>> >;
>> > Jean-
>> > > > > > > Philippe Brucker <jean-philippe@linaro.org>; Mathieu Poirier
>> > > > > > > <mathieu.poirier@linaro.org>; Oleksandr Tyshchenko
>> > > > > > > <Oleksandr_Tyshchenko@epam.com>; Bertrand Marquis
>> > > > > > > <Bertrand.Marquis@arm.com>; Artem Mygaiev
>> > <Artem_Mygaiev@epam.com>;
>> > > > > Julien
>> > > > > > > Grall <julien@xen.org>; Juergen Gross <jgross@suse.com>; Paul
>> > Durrant
>> > > > > > > <paul@xen.org>; Xen Devel <xen-devel@lists.xen.org>
>> > > > > > > Subject: Re: Enabling hypervisor agnosticism for VirtIO
>> backends
>> > > > > > >
>> > > > > > > Hi Wei, Oleksandr,
>> > > > > > >
>> > > > > > > On Mon, Aug 16, 2021 at 10:04:03AM +0000, Wei Chen wrote:
>> > > > > > > > Hi All,
>> > > > > > > >
>> > > > > > > > Thanks for Stefano to link my kvmtool for Xen proposal here.
>> > > > > > > > This proposal is still discussing in Xen and KVM
>> communities.
>> > > > > > > > The main work is to decouple the kvmtool from KVM and make
>> > > > > > > > other hypervisors can reuse the virtual device
>> implementations.
>> > > > > > > >
>> > > > > > > > In this case, we need to introduce an intermediate
>> hypervisor
>> > > > > > > > layer for VMM abstraction, Which is, I think it's very close
>> > > > > > > > to stratos' virtio hypervisor agnosticism work.
>> > > > > > >
>> > > > > > > # My proposal[1] comes from my own idea and doesn't always
>> > represent
>> > > > > > > # Linaro's view on this subject nor reflect Alex's concerns.
>> > > > > Nevertheless,
>> > > > > > >
>> > > > > > > Your idea and my proposal seem to share the same background.
>> > > > > > > Both have the similar goal and currently start with, at first,
>> > Xen
>> > > > > > > and are based on kvm-tool. (Actually, my work is derived from
>> > > > > > > EPAM's virtio-disk, which is also based on kvm-tool.)
>> > > > > > >
>> > > > > > > In particular, the abstraction of hypervisor interfaces has a
>> > same
>> > > > > > > set of interfaces (for your "struct vmm_impl" and my "RPC
>> > interfaces").
>> > > > > > > This is not co-incident as we both share the same origin as I
>> > said
>> > > > > above.
>> > > > > > > And so we will also share the same issues. One of them is a
>> way
>> > of
>> > > > > > > "sharing/mapping FE's memory". There is some trade-off between
>> > > > > > > the portability and the performance impact.
>> > > > > > > So we can discuss the topic here in this ML, too.
>> > > > > > > (See Alex's original email, too).
>> > > > > > >
>> > > > > > Yes, I agree.
>> > > > > >
>> > > > > > > On the other hand, my approach aims to create a
>> "single-binary"
>> > > > > solution
>> > > > > > > in which the same binary of BE vm could run on any
>> hypervisors.
>> > > > > > > Somehow similar to your "proposal-#2" in [2], but in my
>> solution,
>> > all
>> > > > > > > the hypervisor-specific code would be put into another entity
>> > (VM),
>> > > > > > > named "virtio-proxy" and the abstracted operations are served
>> > via RPC.
>> > > > > > > (In this sense, BE is hypervisor-agnostic but might have OS
>> > > > > dependency.)
>> > > > > > > But I know that we need discuss if this is a requirement even
>> > > > > > > in Stratos project or not. (Maybe not)
>> > > > > > >
>> > > > > >
>> > > > > > Sorry, I haven't had time to finish reading your virtio-proxy
>> > completely
>> > > > > > (I will do it ASAP). But from your description, it seems we
>> need a
>> > > > > > 3rd VM between FE and BE? My concern is that, if my assumption
>> is
>> > right,
>> > > > > > will it increase the latency in data transport path? Even if
>> we're
>> > > > > > using some lightweight guest like RTOS or Unikernel,
>> > > > >
>> > > > > Yes, you're right. But I'm afraid that it is a matter of degree.
>> > > > > As far as we execute 'mapping' operations at every fetch of
>> payload,
>> > > > > we will see latency issue (even in your case) and if we have some
>> > solution
>> > > > > for it, we won't see it neither in my proposal :)
>> > > > >
>> > > >
>> > > > Oleksandr has sent a proposal to Xen mailing list to reduce this
>> kind
>> > > > of "mapping/unmapping" operations. So the latency caused by this
>> > behavior
>> > > > on Xen may eventually be eliminated, and Linux-KVM doesn't have that
>> > problem.
>> > >
>> > > Obviously, I have not yet caught up there in the discussion.
>> > > Which patch specifically?
>> >
>> > Can you give me the link to the discussion or patch, please?
>> >
>>
>> It's a RFC discussion. We have tested this RFC patch internally.
>> https://lists.xenproject.org/archives/html/xen-devel/2021-07/msg01532.html
>>
>> > Thanks,
>> > -Takahiro Akashi
>> >
>> > > -Takahiro Akashi
>> > >
>> > > > > > > Specifically speaking about kvm-tool, I have a concern about
>> its
>> > > > > > > license term; Targeting different hypervisors and different
>> OSs
>> > > > > > > (which I assume includes RTOS's), the resultant library should
>> > be
>> > > > > > > license permissive and GPL for kvm-tool might be an issue.
>> > > > > > > Any thoughts?
>> > > > > > >
>> > > > > >
>> > > > > > Yes. If user want to implement a FreeBSD device model, but the
>> > virtio
>> > > > > > library is GPL. Then GPL would be a problem. If we have another
>> > good
>> > > > > > candidate, I am open to it.
>> > > > >
>> > > > > I have some candidates, particularly for vq/vring, in my mind:
>> > > > > * Open-AMP, or
>> > > > > * corresponding Free-BSD code
>> > > > >
>> > > >
>> > > > Interesting, I will look into them : )
>> > > >
>> > > > Cheers,
>> > > > Wei Chen
>> > > >
>> > > > > -Takahiro Akashi
>> > > > >
>> > > > >
>> > > > > > > -Takahiro Akashi
>> > > > > > >
>> > > > > > >
>> > > > > > > [1] https://op-lists.linaro.org/pipermail/stratos-dev/2021-
>> > > > > > > August/000548.html
>> > > > > > > [2] https://marc.info/?l=xen-devel&m=162373754705233&w=2
>> > > > > > >
>> > > > > > > >
>> > > > > > > > > From: Oleksandr Tyshchenko <olekstysh@gmail.com>
>> > > > > > > > > Sent: 2021年8月14日 23:38
>> > > > > > > > > To: AKASHI Takahiro <takahiro.akashi@linaro.org>; Stefano
>> > > > > Stabellini
>> > > > > > > <sstabellini@kernel.org>
>> > > > > > > > > Cc: Alex Benn??e <alex.bennee@linaro.org>; Stratos
>> Mailing
>> > List
>> > > > > > > <stratos-dev@op-lists.linaro.org>; virtio-dev@lists.oasis-
>> > open.org;
>> > > > > Arnd
>> > > > > > > Bergmann <arnd.bergmann@linaro.org>; Viresh Kumar
>> > > > > > > <viresh.kumar@linaro.org>; Stefano Stabellini
>> > > > > > > <stefano.stabellini@xilinx.com>; stefanha@redhat.com; Jan
>> Kiszka
>> > > > > > > <jan.kiszka@siemens.com>; Carl van Schaik
>> > <cvanscha@qti.qualcomm.com>;
>> > > > > > > pratikp@quicinc.com; Srivatsa Vaddagiri <vatsa@codeaurora.org
>> >;
>> > Jean-
>> > > > > > > Philippe Brucker <jean-philippe@linaro.org>; Mathieu Poirier
>> > > > > > > <mathieu.poirier@linaro.org>; Wei Chen <Wei.Chen@arm.com>;
>> > Oleksandr
>> > > > > > > Tyshchenko <Oleksandr_Tyshchenko@epam.com>; Bertrand Marquis
>> > > > > > > <Bertrand.Marquis@arm.com>; Artem Mygaiev
>> > <Artem_Mygaiev@epam.com>;
>> > > > > Julien
>> > > > > > > Grall <julien@xen.org>; Juergen Gross <jgross@suse.com>; Paul
>> > Durrant
>> > > > > > > <paul@xen.org>; Xen Devel <xen-devel@lists.xen.org>
>> > > > > > > > > Subject: Re: Enabling hypervisor agnosticism for VirtIO
>> > backends
>> > > > > > > > >
>> > > > > > > > > Hello, all.
>> > > > > > > > >
>> > > > > > > > > Please see some comments below. And sorry for the possible
>> > format
>> > > > > > > issues.
>> > > > > > > > >
>> > > > > > > > > > On Wed, Aug 11, 2021 at 9:27 AM AKASHI Takahiro
>> > > > > > > <mailto:takahiro.akashi@linaro.org> wrote:
>> > > > > > > > > > On Wed, Aug 04, 2021 at 12:20:01PM -0700, Stefano
>> > Stabellini
>> > > > > wrote:
>> > > > > > > > > > > CCing people working on Xen+VirtIO and IOREQs. Not
>> > trimming
>> > > > > the
>> > > > > > > original
>> > > > > > > > > > > email to let them read the full context.
>> > > > > > > > > > >
>> > > > > > > > > > > My comments below are related to a potential Xen
>> > > > > implementation,
>> > > > > > > not
>> > > > > > > > > > > because it is the only implementation that matters,
>> but
>> > > > > because it
>> > > > > > > is
>> > > > > > > > > > > the one I know best.
>> > > > > > > > > >
>> > > > > > > > > > Please note that my proposal (and hence the working
>> > prototype)[1]
>> > > > > > > > > > is based on Xen's virtio implementation (i.e. IOREQ) and
>> > > > > > > particularly
>> > > > > > > > > > EPAM's virtio-disk application (backend server).
>> > > > > > > > > > It has been, I believe, well generalized but is still a
>> > bit
>> > > > > biased
>> > > > > > > > > > toward this original design.
>> > > > > > > > > >
>> > > > > > > > > > So I hope you like my approach :)
>> > > > > > > > > >
>> > > > > > > > > > [1] https://op-lists.linaro.org/pipermail/stratos-
>> > dev/2021-
>> > > > > > > August/000546.html
>> > > > > > > > > >
>> > > > > > > > > > Let me take this opportunity to explain a bit more about
>> > my
>> > > > > approach
>> > > > > > > below.
>> > > > > > > > > >
>> > > > > > > > > > > Also, please see this relevant email thread:
>> > > > > > > > > > > https://marc.info/?l=xen-devel&m=162373754705233&w=2
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > On Wed, 4 Aug 2021, Alex Bennée wrote:
>> > > > > > > > > > > > Hi,
>> > > > > > > > > > > >
>> > > > > > > > > > > > One of the goals of Project Stratos is to enable
>> > hypervisor
>> > > > > > > agnostic
>> > > > > > > > > > > > backends so we can enable as much re-use of code as
>> > possible
>> > > > > and
>> > > > > > > avoid
>> > > > > > > > > > > > repeating ourselves. This is the flip side of the
>> > front end
>> > > > > > > where
>> > > > > > > > > > > > multiple front-end implementations are required -
>> one
>> > per OS,
>> > > > > > > assuming
>> > > > > > > > > > > > you don't just want Linux guests. The resultant
>> guests
>> > are
>> > > > > > > trivially
>> > > > > > > > > > > > movable between hypervisors modulo any abstracted
>> > paravirt
>> > > > > type
>> > > > > > > > > > > > interfaces.
>> > > > > > > > > > > >
>> > > > > > > > > > > > In my original thumb nail sketch of a solution I
>> > envisioned
>> > > > > > > vhost-user
>> > > > > > > > > > > > daemons running in a broadly POSIX like environment.
>> > The
>> > > > > > > interface to
>> > > > > > > > > > > > the daemon is fairly simple requiring only some
>> mapped
>> > > > > memory
>> > > > > > > and some
>> > > > > > > > > > > > sort of signalling for events (on Linux this is
>> > eventfd).
>> > > > > The
>> > > > > > > idea was a
>> > > > > > > > > > > > stub binary would be responsible for any hypervisor
>> > specific
>> > > > > > > setup and
>> > > > > > > > > > > > then launch a common binary to deal with the actual
>> > > > > virtqueue
>> > > > > > > requests
>> > > > > > > > > > > > themselves.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Since that original sketch we've seen an expansion
>> in
>> > the
>> > > > > sort
>> > > > > > > of ways
>> > > > > > > > > > > > backends could be created. There is interest in
>> > > > > encapsulating
>> > > > > > > backends
>> > > > > > > > > > > > in RTOSes or unikernels for solutions like SCMI.
>> There
>> > > > > interest
>> > > > > > > in Rust
>> > > > > > > > > > > > has prompted ideas of using the trait interface to
>> > abstract
>> > > > > > > differences
>> > > > > > > > > > > > away as well as the idea of bare-metal Rust
>> backends.
>> > > > > > > > > > > >
>> > > > > > > > > > > > We have a card (STR-12) called "Hypercall
>> > Standardisation"
>> > > > > which
>> > > > > > > > > > > > calls for a description of the APIs needed from the
>> > > > > hypervisor
>> > > > > > > side to
>> > > > > > > > > > > > support VirtIO guests and their backends. However we
>> > are
>> > > > > some
>> > > > > > > way off
>> > > > > > > > > > > > from that at the moment as I think we need to at
>> least
>> > > > > > > demonstrate one
>> > > > > > > > > > > > portable backend before we start codifying
>> > requirements. To
>> > > > > that
>> > > > > > > end I
>> > > > > > > > > > > > want to think about what we need for a backend to
>> > function.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Configuration
>> > > > > > > > > > > > =============
>> > > > > > > > > > > >
>> > > > > > > > > > > > In the type-2 setup this is typically fairly simple
>> > because
>> > > > > the
>> > > > > > > host
>> > > > > > > > > > > > system can orchestrate the various modules that make
>> > up the
>> > > > > > > complete
>> > > > > > > > > > > > system. In the type-1 case (or even type-2 with
>> > delegated
>> > > > > > > service VMs)
>> > > > > > > > > > > > we need some sort of mechanism to inform the backend
>> > VM
>> > > > > about
>> > > > > > > key
>> > > > > > > > > > > > details about the system:
>> > > > > > > > > > > >
>> > > > > > > > > > > >   - where virt queue memory is in it's address space
>> > > > > > > > > > > >   - how it's going to receive (interrupt) and
>> trigger
>> > (kick)
>> > > > > > > events
>> > > > > > > > > > > >   - what (if any) resources the backend needs to
>> > connect to
>> > > > > > > > > > > >
>> > > > > > > > > > > > Obviously you can elide over configuration issues by
>> > having
>> > > > > > > static
>> > > > > > > > > > > > configurations and baking the assumptions into your
>> > guest
>> > > > > images
>> > > > > > > however
>> > > > > > > > > > > > this isn't scalable in the long term. The obvious
>> > solution
>> > > > > seems
>> > > > > > > to be
>> > > > > > > > > > > > extending a subset of Device Tree data to user space
>> > but
>> > > > > perhaps
>> > > > > > > there
>> > > > > > > > > > > > are other approaches?
>> > > > > > > > > > > >
>> > > > > > > > > > > > Before any virtio transactions can take place the
>> > > > > appropriate
>> > > > > > > memory
>> > > > > > > > > > > > mappings need to be made between the FE guest and
>> the
>> > BE
>> > > > > guest.
>> > > > > > > > > > >
>> > > > > > > > > > > > Currently the whole of the FE guests address space
>> > needs to
>> > > > > be
>> > > > > > > visible
>> > > > > > > > > > > > to whatever is serving the virtio requests. I can
>> > envision 3
>> > > > > > > approaches:
>> > > > > > > > > > > >
>> > > > > > > > > > > >  * BE guest boots with memory already mapped
>> > > > > > > > > > > >
>> > > > > > > > > > > >  This would entail the guest OS knowing where in
>> it's
>> > Guest
>> > > > > > > Physical
>> > > > > > > > > > > >  Address space is already taken up and avoiding
>> > clashing. I
>> > > > > > > would assume
>> > > > > > > > > > > >  in this case you would want a standard interface to
>> > > > > userspace
>> > > > > > > to then
>> > > > > > > > > > > >  make that address space visible to the backend
>> daemon.
>> > > > > > > > > >
>> > > > > > > > > > Yet another way here is that we would have well known
>> > "shared
>> > > > > > > memory" between
>> > > > > > > > > > VMs. I think that Jailhouse's ivshmem gives us good
>> > insights on
>> > > > > this
>> > > > > > > matter
>> > > > > > > > > > and that it can even be an alternative for hypervisor-
>> > agnostic
>> > > > > > > solution.
>> > > > > > > > > >
>> > > > > > > > > > (Please note memory regions in ivshmem appear as a PCI
>> > device
>> > > > > and
>> > > > > > > can be
>> > > > > > > > > > mapped locally.)
>> > > > > > > > > >
>> > > > > > > > > > I want to add this shared memory aspect to my
>> virtio-proxy,
>> > but
>> > > > > > > > > > the resultant solution would eventually look similar to
>> > ivshmem.
>> > > > > > > > > >
>> > > > > > > > > > > >  * BE guests boots with a hypervisor handle to
>> memory
>> > > > > > > > > > > >
>> > > > > > > > > > > >  The BE guest is then free to map the FE's memory to
>> > where
>> > > > > it
>> > > > > > > wants in
>> > > > > > > > > > > >  the BE's guest physical address space.
>> > > > > > > > > > >
>> > > > > > > > > > > I cannot see how this could work for Xen. There is no
>> > "handle"
>> > > > > to
>> > > > > > > give
>> > > > > > > > > > > to the backend if the backend is not running in dom0.
>> So
>> > for
>> > > > > Xen I
>> > > > > > > think
>> > > > > > > > > > > the memory has to be already mapped
>> > > > > > > > > >
>> > > > > > > > > > In Xen's IOREQ solution (virtio-blk), the following
>> > information
>> > > > > is
>> > > > > > > expected
>> > > > > > > > > > to be exposed to BE via Xenstore:
>> > > > > > > > > > (I know that this is a tentative approach though.)
>> > > > > > > > > >    - the start address of configuration space
>> > > > > > > > > >    - interrupt number
>> > > > > > > > > >    - file path for backing storage
>> > > > > > > > > >    - read-only flag
>> > > > > > > > > > And the BE server have to call a particular hypervisor
>> > interface
>> > > > > to
>> > > > > > > > > > map the configuration space.
>> > > > > > > > >
>> > > > > > > > > Yes, Xenstore was chosen as a simple way to pass
>> > configuration
>> > > > > info to
>> > > > > > > the backend running in a non-toolstack domain.
>> > > > > > > > > I remember, there was a wish to avoid using Xenstore in
>> > Virtio
>> > > > > backend
>> > > > > > > itself if possible, so for non-toolstack domain, this could
>> done
>> > with
>> > > > > > > adjusting devd (daemon that listens for devices and launches
>> > backends)
>> > > > > > > > > to read backend configuration from the Xenstore anyway and
>> > pass it
>> > > > > to
>> > > > > > > the backend via command line arguments.
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > Yes, in current PoC code we're using xenstore to pass device
>> > > > > > > configuration.
>> > > > > > > > We also designed a static device configuration parse method
>> > for
>> > > > > Dom0less
>> > > > > > > or
>> > > > > > > > other scenarios don't have xentool. yes, it's from device
>> > model
>> > > > > command
>> > > > > > > line
>> > > > > > > > or a config file.
>> > > > > > > >
>> > > > > > > > > But, if ...
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > In my approach (virtio-proxy), all those Xen (or
>> > hypervisor)-
>> > > > > > > specific
>> > > > > > > > > > stuffs are contained in virtio-proxy, yet another VM, to
>> > hide
>> > > > > all
>> > > > > > > details.
>> > > > > > > > >
>> > > > > > > > > ... the solution how to overcome that is already found and
>> > proven
>> > > > > to
>> > > > > > > work then even better.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > > # My point is that a "handle" is not mandatory for
>> > executing
>> > > > > mapping.
>> > > > > > > > > >
>> > > > > > > > > > > and the mapping probably done by the
>> > > > > > > > > > > toolstack (also see below.) Or we would have to
>> invent a
>> > new
>> > > > > Xen
>> > > > > > > > > > > hypervisor interface and Xen virtual machine
>> privileges
>> > to
>> > > > > allow
>> > > > > > > this
>> > > > > > > > > > > kind of mapping.
>> > > > > > > > > >
>> > > > > > > > > > > If we run the backend in Dom0 that we have no problems
>> > of
>> > > > > course.
>> > > > > > > > > >
>> > > > > > > > > > One of difficulties on Xen that I found in my approach
>> is
>> > that
>> > > > > > > calling
>> > > > > > > > > > such hypervisor intefaces (registering IOREQ, mapping
>> > memory) is
>> > > > > > > only
>> > > > > > > > > > allowed on BE servers themselvies and so we will have to
>> > extend
>> > > > > > > those
>> > > > > > > > > > interfaces.
>> > > > > > > > > > This, however, will raise some concern on security and
>> > privilege
>> > > > > > > distribution
>> > > > > > > > > > as Stefan suggested.
>> > > > > > > > >
>> > > > > > > > > We also faced policy related issues with Virtio backend
>> > running in
>> > > > > > > other than Dom0 domain in a "dummy" xsm mode. In our target
>> > system we
>> > > > > run
>> > > > > > > the backend in a driver
>> > > > > > > > > domain (we call it DomD) where the underlying H/W resides.
>> > We
>> > > > > trust it,
>> > > > > > > so we wrote policy rules (to be used in "flask" xsm mode) to
>> > provide
>> > > > > it
>> > > > > > > with a little bit more privileges than a simple DomU had.
>> > > > > > > > > Now it is permitted to issue device-model, resource and
>> > memory
>> > > > > > > mappings, etc calls.
>> > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > > To activate the mapping will
>> > > > > > > > > > > >  require some sort of hypercall to the hypervisor. I
>> > can see
>> > > > > two
>> > > > > > > options
>> > > > > > > > > > > >  at this point:
>> > > > > > > > > > > >
>> > > > > > > > > > > >   - expose the handle to userspace for daemon/helper
>> > to
>> > > > > trigger
>> > > > > > > the
>> > > > > > > > > > > >     mapping via existing hypercall interfaces. If
>> > using a
>> > > > > helper
>> > > > > > > you
>> > > > > > > > > > > >     would have a hypervisor specific one to avoid
>> the
>> > daemon
>> > > > > > > having to
>> > > > > > > > > > > >     care too much about the details or push that
>> > complexity
>> > > > > into
>> > > > > > > a
>> > > > > > > > > > > >     compile time option for the daemon which would
>> > result in
>> > > > > > > different
>> > > > > > > > > > > >     binaries although a common source base.
>> > > > > > > > > > > >
>> > > > > > > > > > > >   - expose a new kernel ABI to abstract the
>> hypercall
>> > > > > > > differences away
>> > > > > > > > > > > >     in the guest kernel. In this case the userspace
>> > would
>> > > > > > > essentially
>> > > > > > > > > > > >     ask for an abstract "map guest N memory to
>> > userspace
>> > > > > ptr"
>> > > > > > > and let
>> > > > > > > > > > > >     the kernel deal with the different hypercall
>> > interfaces.
>> > > > > > > This of
>> > > > > > > > > > > >     course assumes the majority of BE guests would
>> be
>> > Linux
>> > > > > > > kernels and
>> > > > > > > > > > > >     leaves the bare-metal/unikernel approaches to
>> > their own
>> > > > > > > devices.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Operation
>> > > > > > > > > > > > =========
>> > > > > > > > > > > >
>> > > > > > > > > > > > The core of the operation of VirtIO is fairly
>> simple.
>> > Once
>> > > > > the
>> > > > > > > > > > > > vhost-user feature negotiation is done it's a case
>> of
>> > > > > receiving
>> > > > > > > update
>> > > > > > > > > > > > events and parsing the resultant virt queue for
>> data.
>> > The
>> > > > > vhost-
>> > > > > > > user
>> > > > > > > > > > > > specification handles a bunch of setup before that
>> > point,
>> > > > > mostly
>> > > > > > > to
>> > > > > > > > > > > > detail where the virt queues are set up FD's for
>> > memory and
>> > > > > > > event
>> > > > > > > > > > > > communication. This is where the envisioned stub
>> > process
>> > > > > would
>> > > > > > > be
>> > > > > > > > > > > > responsible for getting the daemon up and ready to
>> run.
>> > This
>> > > > > is
>> > > > > > > > > > > > currently done inside a big VMM like QEMU but I
>> > suspect a
>> > > > > modern
>> > > > > > > > > > > > approach would be to use the rust-vmm vhost crate.
>> It
>> > would
>> > > > > then
>> > > > > > > either
>> > > > > > > > > > > > communicate with the kernel's abstracted ABI or be
>> re-
>> > > > > targeted
>> > > > > > > as a
>> > > > > > > > > > > > build option for the various hypervisors.
>> > > > > > > > > > >
>> > > > > > > > > > > One thing I mentioned before to Alex is that Xen
>> doesn't
>> > have
>> > > > > VMMs
>> > > > > > > the
>> > > > > > > > > > > way they are typically envisioned and described in
>> other
>> > > > > > > environments.
>> > > > > > > > > > > Instead, Xen has IOREQ servers. Each of them connects
>> > > > > > > independently to
>> > > > > > > > > > > Xen via the IOREQ interface. E.g. today multiple QEMUs
>> > could
>> > > > > be
>> > > > > > > used as
>> > > > > > > > > > > emulators for a single Xen VM, each of them connecting
>> > to Xen
>> > > > > > > > > > > independently via the IOREQ interface.
>> > > > > > > > > > >
>> > > > > > > > > > > The component responsible for starting a daemon and/or
>> > setting
>> > > > > up
>> > > > > > > shared
>> > > > > > > > > > > interfaces is the toolstack: the xl command and the
>> > > > > libxl/libxc
>> > > > > > > > > > > libraries.
>> > > > > > > > > >
>> > > > > > > > > > I think that VM configuration management (or
>> orchestration
>> > in
>> > > > > > > Startos
>> > > > > > > > > > jargon?) is a subject to debate in parallel.
>> > > > > > > > > > Otherwise, is there any good assumption to avoid it
>> right
>> > now?
>> > > > > > > > > >
>> > > > > > > > > > > Oleksandr and others I CCed have been working on ways
>> > for the
>> > > > > > > toolstack
>> > > > > > > > > > > to create virtio backends and setup memory mappings.
>> > They
>> > > > > might be
>> > > > > > > able
>> > > > > > > > > > > to provide more info on the subject. I do think we
>> miss
>> > a way
>> > > > > to
>> > > > > > > provide
>> > > > > > > > > > > the configuration to the backend and anything else
>> that
>> > the
>> > > > > > > backend
>> > > > > > > > > > > might require to start doing its job.
>> > > > > > > > >
>> > > > > > > > > Yes, some work has been done for the toolstack to handle
>> > Virtio
>> > > > > MMIO
>> > > > > > > devices in
>> > > > > > > > > general and Virtio block devices in particular. However,
>> it
>> > has
>> > > > > not
>> > > > > > > been upstreaned yet.
>> > > > > > > > > Updated patches on review now:
>> > > > > > > > > https://lore.kernel.org/xen-devel/1621626361-29076-1-git-
>> > send-
>> > > > > email-
>> > > > > > > olekstysh@gmail.com/
>> > > > > > > > >
>> > > > > > > > > There is an additional (also important) activity to
>> > improve/fix
>> > > > > > > foreign memory mapping on Arm which I am also involved in.
>> > > > > > > > > The foreign memory mapping is proposed to be used for
>> Virtio
>> > > > > backends
>> > > > > > > (device emulators) if there is a need to run guest OS
>> completely
>> > > > > > > unmodified.
>> > > > > > > > > Of course, the more secure way would be to use grant
>> memory
>> > > > > mapping.
>> > > > > > > Brietly, the main difference between them is that with foreign
>> > mapping
>> > > > > the
>> > > > > > > backend
>> > > > > > > > > can map any guest memory it wants to map, but with grant
>> > mapping
>> > > > > it is
>> > > > > > > allowed to map only what was previously granted by the
>> frontend.
>> > > > > > > > >
>> > > > > > > > > So, there might be a problem if we want to pre-map some
>> > guest
>> > > > > memory
>> > > > > > > in advance or to cache mappings in the backend in order to
>> > improve
>> > > > > > > performance (because the mapping/unmapping guest pages every
>> > request
>> > > > > > > requires a lot of back and forth to Xen + P2M updates). In a
>> > nutshell,
>> > > > > > > currently, in order to map a guest page into the backend
>> address
>> > space
>> > > > > we
>> > > > > > > need to steal a real physical page from the backend domain.
>> So,
>> > with
>> > > > > the
>> > > > > > > said optimizations we might end up with no free memory in the
>> > backend
>> > > > > > > domain (see XSA-300). And what we try to achieve is to not
>> waste
>> > a
>> > > > > real
>> > > > > > > domain memory at all by providing safe non-allocated-yet (so
>> > unused)
>> > > > > > > address space for the foreign (and grant) pages to be mapped
>> > into,
>> > > > > this
>> > > > > > > enabling work implies Xen and Linux (and likely DTB bindings)
>> > changes.
>> > > > > > > However, as it turned out, for this to work in a proper and
>> safe
>> > way
>> > > > > some
>> > > > > > > prereq work needs to be done.
>> > > > > > > > > You can find the related Xen discussion at:
>> > > > > > > > > https://lore.kernel.org/xen-devel/1627489110-25633-1-git-
>> > send-
>> > > > > email-
>> > > > > > > olekstysh@gmail.com/
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > > One question is how to best handle notification and
>> > kicks.
>> > > > > The
>> > > > > > > existing
>> > > > > > > > > > > > vhost-user framework uses eventfd to signal the
>> daemon
>> > > > > (although
>> > > > > > > QEMU
>> > > > > > > > > > > > is quite capable of simulating them when you use
>> TCG).
>> > Xen
>> > > > > has
>> > > > > > > it's own
>> > > > > > > > > > > > IOREQ mechanism. However latency is an important
>> > factor and
>> > > > > > > having
>> > > > > > > > > > > > events go through the stub would add quite a lot.
>> > > > > > > > > > >
>> > > > > > > > > > > Yeah I think, regardless of anything else, we want the
>> > > > > backends to
>> > > > > > > > > > > connect directly to the Xen hypervisor.
>> > > > > > > > > >
>> > > > > > > > > > In my approach,
>> > > > > > > > > >  a) BE -> FE: interrupts triggered by BE calling a
>> > hypervisor
>> > > > > > > interface
>> > > > > > > > > >               via virtio-proxy
>> > > > > > > > > >  b) FE -> BE: MMIO to config raises events (in event
>> > channels),
>> > > > > > > which is
>> > > > > > > > > >               converted to a callback to BE via virtio-
>> > proxy
>> > > > > > > > > >               (Xen's event channel is internnally
>> > implemented by
>> > > > > > > interrupts.)
>> > > > > > > > > >
>> > > > > > > > > > I don't know what "connect directly" means here, but
>> > sending
>> > > > > > > interrupts
>> > > > > > > > > > to the opposite side would be best efficient.
>> > > > > > > > > > Ivshmem, I suppose, takes this approach by utilizing
>> PCI's
>> > msi-x
>> > > > > > > mechanism.
>> > > > > > > > >
>> > > > > > > > > Agree that MSI would be more efficient than SPI...
>> > > > > > > > > At the moment, in order to notify the frontend, the
>> backend
>> > issues
>> > > > > a
>> > > > > > > specific device-model call to query Xen to inject a
>> > corresponding SPI
>> > > > > to
>> > > > > > > the guest.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > > Could we consider the kernel internally converting
>> > IOREQ
>> > > > > > > messages from
>> > > > > > > > > > > > the Xen hypervisor to eventfd events? Would this
>> scale
>> > with
>> > > > > > > other kernel
>> > > > > > > > > > > > hypercall interfaces?
>> > > > > > > > > > > >
>> > > > > > > > > > > > So any thoughts on what directions are worth
>> > experimenting
>> > > > > with?
>> > > > > > > > > > >
>> > > > > > > > > > > One option we should consider is for each backend to
>> > connect
>> > > > > to
>> > > > > > > Xen via
>> > > > > > > > > > > the IOREQ interface. We could generalize the IOREQ
>> > interface
>> > > > > and
>> > > > > > > make it
>> > > > > > > > > > > hypervisor agnostic. The interface is really trivial
>> and
>> > easy
>> > > > > to
>> > > > > > > add.
>> > > > > > > > > >
>> > > > > > > > > > As I said above, my proposal does the same thing that
>> you
>> > > > > mentioned
>> > > > > > > here :)
>> > > > > > > > > > The difference is that I do call hypervisor interfaces
>> via
>> > > > > virtio-
>> > > > > > > proxy.
>> > > > > > > > > >
>> > > > > > > > > > > The only Xen-specific part is the notification
>> mechanism,
>> > > > > which is
>> > > > > > > an
>> > > > > > > > > > > event channel. If we replaced the event channel with
>> > something
>> > > > > > > else the
>> > > > > > > > > > > interface would be generic. See:
>> > > > > > > > > > > https://gitlab.com/xen-project/xen/-
>> > > > > > > /blob/staging/xen/include/public/hvm/ioreq.h#L52
>> > > > > > > > > > >
>> > > > > > > > > > > I don't think that translating IOREQs to eventfd in
>> the
>> > kernel
>> > > > > is
>> > > > > > > a
>> > > > > > > > > > > good idea: if feels like it would be extra complexity
>> > and that
>> > > > > the
>> > > > > > > > > > > kernel shouldn't be involved as this is a backend-
>> > hypervisor
>> > > > > > > interface.
>> > > > > > > > > >
>> > > > > > > > > > Given that we may want to implement BE as a bare-metal
>> > > > > application
>> > > > > > > > > > as I did on Zephyr, I don't think that the translation
>> > would not
>> > > > > be
>> > > > > > > > > > a big issue, especially on RTOS's.
>> > > > > > > > > > It will be some kind of abstraction layer of interrupt
>> > handling
>> > > > > > > > > > (or nothing but a callback mechanism).
>> > > > > > > > > >
>> > > > > > > > > > > Also, eventfd is very Linux-centric and we are trying
>> to
>> > > > > design an
>> > > > > > > > > > > interface that could work well for RTOSes too. If we
>> > want to
>> > > > > do
>> > > > > > > > > > > something different, both OS-agnostic and hypervisor-
>> > agnostic,
>> > > > > > > perhaps
>> > > > > > > > > > > we could design a new interface. One that could be
>> > > > > implementable
>> > > > > > > in the
>> > > > > > > > > > > Xen hypervisor itself (like IOREQ) and of course any
>> > other
>> > > > > > > hypervisor
>> > > > > > > > > > > too.
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > There is also another problem. IOREQ is probably not
>> be
>> > the
>> > > > > only
>> > > > > > > > > > > interface needed. Have a look at
>> > > > > > > > > > > https://marc.info/?l=xen-devel&m=162373754705233&w=2.
>> > Don't we
>> > > > > > > also need
>> > > > > > > > > > > an interface for the backend to inject interrupts into
>> > the
>> > > > > > > frontend? And
>> > > > > > > > > > > if the backend requires dynamic memory mappings of
>> > frontend
>> > > > > pages,
>> > > > > > > then
>> > > > > > > > > > > we would also need an interface to map/unmap domU
>> pages.
>> > > > > > > > > >
>> > > > > > > > > > My proposal document might help here; All the interfaces
>> > > > > required
>> > > > > > > for
>> > > > > > > > > > virtio-proxy (or hypervisor-related interfaces) are
>> listed
>> > as
>> > > > > > > > > > RPC protocols :)
>> > > > > > > > > >
>> > > > > > > > > > > These interfaces are a lot more problematic than
>> IOREQ:
>> > IOREQ
>> > > > > is
>> > > > > > > tiny
>> > > > > > > > > > > and self-contained. It is easy to add anywhere. A new
>> > > > > interface to
>> > > > > > > > > > > inject interrupts or map pages is more difficult to
>> > manage
>> > > > > because
>> > > > > > > it
>> > > > > > > > > > > would require changes scattered across the various
>> > emulators.
>> > > > > > > > > >
>> > > > > > > > > > Exactly. I have no confident yet that my approach will
>> > also
>> > > > > apply
>> > > > > > > > > > to other hypervisors than Xen.
>> > > > > > > > > > Technically, yes, but whether people can accept it or
>> not
>> > is a
>> > > > > > > different
>> > > > > > > > > > matter.
>> > > > > > > > > >
>> > > > > > > > > > Thanks,
>> > > > > > > > > > -Takahiro Akashi
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > > Regards,
>> > > > > > > > >
>> > > > > > > > > Oleksandr Tyshchenko
>> > > > > > > > IMPORTANT NOTICE: The contents of this email and any
>> > attachments are
>> > > > > > > confidential and may also be privileged. If you are not the
>> > intended
>> > > > > > > recipient, please notify the sender immediately and do not
>> > disclose
>> > > > > the
>> > > > > > > contents to any other person, use it for any purpose, or store
>> > or copy
>> > > > > the
>> > > > > > > information in any medium. Thank you.
>> > > > > > IMPORTANT NOTICE: The contents of this email and any attachments
>> > are
>> > > > > confidential and may also be privileged. If you are not the
>> intended
>> > > > > recipient, please notify the sender immediately and do not
>> disclose
>> > the
>> > > > > contents to any other person, use it for any purpose, or store or
>> > copy the
>> > > > > information in any medium. Thank you.
>> > > > IMPORTANT NOTICE: The contents of this email and any attachments are
>> > confidential and may also be privileged. If you are not the intended
>> > recipient, please notify the sender immediately and do not disclose the
>> > contents to any other person, use it for any purpose, or store or copy
>> the
>> > information in any medium. Thank you.
>> IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose the
>> contents to any other person, use it for any purpose, or store or copy the
>> information in any medium. Thank you.
>>
>

[-- Attachment #2: Type: text/html, Size: 75017 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [virtio-dev] Enabling hypervisor agnosticism for VirtIO backends
  2021-08-19  9:11 ` [virtio-dev] " Matias Ezequiel Vara Larsen
       [not found]   ` <20210820060558.GB13452@laputa>
@ 2021-09-01  8:43   ` Alex Bennée
  1 sibling, 0 replies; 19+ messages in thread
From: Alex Bennée @ 2021-09-01  8:43 UTC (permalink / raw)
  To: Matias Ezequiel Vara Larsen
  Cc: Stratos Mailing List, virtio-dev, Arnd Bergmann, Viresh Kumar,
	AKASHI Takahiro, Stefano Stabellini, stefanha, Jan Kiszka,
	Carl van Schaik, pratikp, Srivatsa Vaddagiri,
	Jean-Philippe Brucker, Mathieu Poirier


Matias Ezequiel Vara Larsen <matiasevara@gmail.com> writes:

> Hello Alex,
>
> I can tell you my experience from working on a PoC (library) 
> to allow the implementation of virtio-devices that are hypervisor/OS agnostic. 
> I focused on two use cases:
> 1. type-I hypervisor in which the backend is running as a VM. This
> is an in-house hypervisor that does not support VMExits.
> 2. Linux user-space. In this case, the library is just used to
> communicate threads. The goal of this use case is merely testing.
>
> I have chosen virtio-mmio as the way to exchange information
> between the frontend and backend. I found it hard to synchronize the
> access to the virtio-mmio layout without VMExits. I had to add some extra bits to allow 
> the front-end and back-end to synchronize, which is required
> during the device-status initialization. These extra bits would not be 
> needed in case the hypervisor supports VMExits, e.g., KVM.

The support for a vmexit seems rather fundamental to type-2 hypervisors
(like KVM) as the VMM is intrinsically linked to a vCPUs run loop. This
makes handling a condition like a bit of MMIO fairly natural to
implement. For type-1 cases the line of execution between "guest
accesses MMIO" and "something services that request" is a little
trickier to pin down. Ultimately at that point you are relying on the
hypervisor itself to make the scheduling decision to stop executing the
guest and allow the backend to do it's thing. We don't really want to
expose the exact details about that as it probably varies a lot between
hypervisors. However would a backend API semantic that expresses:

  - guest has done some MMIO
  - hypervisor has stopped execution of guest
  - guest will be restarted when response conditions are set by backend

cover the needs of a virtio backend and could the userspace facing
portion of that be agnostic?

>
> Each guest has a memory region that is shared with the backend. 
> This memory region is used by the frontend to allocate the io-buffers. This region also 
> maps the virtio-mmio layout that is initialized by the backend. For the moment, this region 
> is defined when the guest is created. One limitation is that the memory for io-buffers is fixed. 
> At some point, the guest shall be able to balloon this region. Notifications between 
> the frontend and the backend are implemented by using an hypercall. The hypercall 
> mechanism and the memory allocation are abstracted away by a platform layer that 
> exposes an interface that is hypervisor/os agnostic.
>
> I split the backend into a virtio-device driver and a
> backend driver. The virtio-device driver is the virtqueues and the
> backend driver gets packets from the virtqueue for
> post-processing. For example, in the case of virtio-net, the backend
> driver would decide if the packet goes to the hardware or to another
> virtio-net device. The virtio-device drivers may be
> implemented in different ways like by using a single thread, multiple threads, 
> or one thread for all the virtio-devices.
>
> In this PoC, I just tackled two very simple use-cases. These
> use-cases allowed me to extract some requirements for an hypervisor to
> support virtio.
>
> Matias
<snip>

-- 
Alex Bennée

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [virtio-dev] Re: [Stratos-dev] Enabling hypervisor agnosticism for VirtIO backends
       [not found]       ` <20210813051038.GA77540@laputa>
@ 2021-09-01  8:57         ` Alex Bennée
  0 siblings, 0 replies; 19+ messages in thread
From: Alex Bennée @ 2021-09-01  8:57 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: Fran??ois Ozog, Stefano Stabellini, paul, Stratos Mailing List,
	virtio-dev, Jan Kiszka, Arnd Bergmann, jgross, julien,
	Carl van Schaik, Bertrand.Marquis, stefanha, Artem_Mygaiev,
	xen-devel, olekstysh, Oleksandr_Tyshchenko, xen-devel


AKASHI Takahiro <takahiro.akashi@linaro.org> writes:

> Hi François,
>
> On Thu, Aug 12, 2021 at 09:55:52AM +0200, Fran??ois Ozog wrote:
>> I top post as I find it difficult to identify where to make the comments.
>
> Thank you for the posting. 
> I think that we should first discuss more about the goal/requirements/
> practical use cases for the framework.
>
>> 1) BE acceleration
>> Network and storage backends may actually be executed in SmartNICs. As
>> virtio 1.1 is hardware friendly, there may be SmartNICs with virtio 1.1 PCI
>> VFs. Is it a valid use case for the generic BE framework to be used in this
>> context?
>> DPDK is used in some BE to significantly accelerate switching. DPDK is also
>> used sometimes in guests. In that case, there are no event injection but
>> just high performance memory scheme. Is this considered as a use case?
>
> I'm not quite familiar with DPDK but it seems to be heavily reliant
> on not only virtqueues but also kvm/linux features/functionality, say,
> according to [1].
> I'm afraid that DPDK is not suitable for primary (at least, initial)
> target use.
> # In my proposal, virtio-proxy, I have in mind the assumption that we would
> # create BE VM as a baremetal application on RTOS (and/or unikernel.)
>
> But as far as virtqueue is concerned, I think we can discuss in general
> technical details as Alex suggested, including:
> - sharing or mapping memory regions for data payload
> - efficient notification mechanism
>
> [1] https://www.redhat.com/en/blog/journey-vhost-users-realm
>
>> 2) Virtio as OS HAL
>> Panasonic CTO has been calling for a virtio based HAL and based on the
>> teachings of Google GKI, an internal HAL seem inevitable in the long term.
>> Virtio is then a contender to Google promoted Android HAL. Could the
>> framework be used in that context?
>
> In this case, where will the implementation of "HAL" reside?
> I don't think the portability of "HAL" code (as a set of virtio BEs)
> is a requirement here.

When I hear people referring to VirtIO HALs I'm thinking mainly of
VirtIO FE's living in a Linux kernel. There are certainly more devices
that can get added but the commonality on the guest side I think is
pretty much a solved problem (modulo Linux-ism's creeping into the
virtio spec).

-- 
Alex Bennée

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [virtio-dev] Re: Enabling hypervisor agnosticism for VirtIO backends
  2021-08-17 10:41   ` Stefan Hajnoczi
       [not found]     ` <20210823062500.GC40863@laputa>
@ 2021-09-01 12:53     ` Alex Bennée
  2021-09-02  9:12       ` Stefan Hajnoczi
       [not found]       ` <20210903080609.GD47953@laputa>
  1 sibling, 2 replies; 19+ messages in thread
From: Alex Bennée @ 2021-09-01 12:53 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Stefano Stabellini, Stratos Mailing List, virtio-dev,
	Arnd Bergmann, Viresh Kumar, AKASHI Takahiro, Stefano Stabellini,
	Jan Kiszka, Carl van Schaik, pratikp, Srivatsa Vaddagiri,
	Jean-Philippe Brucker, Mathieu Poirier, Wei.Chen, olekstysh,
	Oleksandr_Tyshchenko, Bertrand.Marquis, Artem_Mygaiev, julien,
	jgross, paul, xen-devel, Elena Afanasova


Stefan Hajnoczi <stefanha@redhat.com> writes:

> [[PGP Signed Part:Undecided]]
> On Wed, Aug 04, 2021 at 12:20:01PM -0700, Stefano Stabellini wrote:
>> > Could we consider the kernel internally converting IOREQ messages from
>> > the Xen hypervisor to eventfd events? Would this scale with other kernel
>> > hypercall interfaces?
>> > 
>> > So any thoughts on what directions are worth experimenting with?
>>  
>> One option we should consider is for each backend to connect to Xen via
>> the IOREQ interface. We could generalize the IOREQ interface and make it
>> hypervisor agnostic. The interface is really trivial and easy to add.
>> The only Xen-specific part is the notification mechanism, which is an
>> event channel. If we replaced the event channel with something else the
>> interface would be generic. See:
>> https://gitlab.com/xen-project/xen/-/blob/staging/xen/include/public/hvm/ioreq.h#L52
>
> There have been experiments with something kind of similar in KVM
> recently (see struct ioregionfd_cmd):
> https://lore.kernel.org/kvm/dad3d025bcf15ece11d9df0ff685e8ab0a4f2edd.1613828727.git.eafanasova@gmail.com/

Reading the cover letter was very useful in showing how this provides a
separate channel for signalling IO events to userspace instead of using
the normal type-2 vmexit type event. I wonder how deeply tied the
userspace facing side of this is to KVM? Could it provide a common FD
type interface to IOREQ?

As I understand IOREQ this is currently a direct communication between
userspace and the hypervisor using the existing Xen message bus. My
worry would be that by adding knowledge of what the underlying
hypervisor is we'd end up with excess complexity in the kernel. For one
thing we certainly wouldn't want an API version dependency on the kernel
to understand which version of the Xen hypervisor it was running on.

>> There is also another problem. IOREQ is probably not be the only
>> interface needed. Have a look at
>> https://marc.info/?l=xen-devel&m=162373754705233&w=2. Don't we also need
>> an interface for the backend to inject interrupts into the frontend? And
>> if the backend requires dynamic memory mappings of frontend pages, then
>> we would also need an interface to map/unmap domU pages.
>> 
>> These interfaces are a lot more problematic than IOREQ: IOREQ is tiny
>> and self-contained. It is easy to add anywhere. A new interface to
>> inject interrupts or map pages is more difficult to manage because it
>> would require changes scattered across the various emulators.
>
> Something like ioreq is indeed necessary to implement arbitrary devices,
> but if you are willing to restrict yourself to VIRTIO then other
> interfaces are possible too because the VIRTIO device model is different
> from the general purpose x86 PIO/MMIO that Xen's ioreq seems to
> support.

It's true our focus is just VirtIO which does support alternative
transport options however most implementations seem to be targeting
virtio-mmio for it's relative simplicity and understood semantics
(modulo a desire for MSI to reduce round trip latency handling
signalling).

>
> Stefan
>
> [[End of PGP Signed Part]]


-- 
Alex Bennée

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [virtio-dev] Re: Enabling hypervisor agnosticism for VirtIO backends
  2021-09-01 12:53     ` Alex Bennée
@ 2021-09-02  9:12       ` Stefan Hajnoczi
       [not found]       ` <20210903080609.GD47953@laputa>
  1 sibling, 0 replies; 19+ messages in thread
From: Stefan Hajnoczi @ 2021-09-02  9:12 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Stefano Stabellini, Stratos Mailing List, virtio-dev,
	Arnd Bergmann, Viresh Kumar, AKASHI Takahiro, Stefano Stabellini,
	Jan Kiszka, Carl van Schaik, pratikp, Srivatsa Vaddagiri,
	Jean-Philippe Brucker, Mathieu Poirier, Wei.Chen, olekstysh,
	Oleksandr_Tyshchenko, Bertrand.Marquis, Artem_Mygaiev, julien,
	jgross, paul, xen-devel, Elena Afanasova

[-- Attachment #1: Type: text/plain, Size: 3717 bytes --]

On Wed, Sep 01, 2021 at 01:53:34PM +0100, Alex Bennée wrote:
> 
> Stefan Hajnoczi <stefanha@redhat.com> writes:
> 
> > [[PGP Signed Part:Undecided]]
> > On Wed, Aug 04, 2021 at 12:20:01PM -0700, Stefano Stabellini wrote:
> >> > Could we consider the kernel internally converting IOREQ messages from
> >> > the Xen hypervisor to eventfd events? Would this scale with other kernel
> >> > hypercall interfaces?
> >> > 
> >> > So any thoughts on what directions are worth experimenting with?
> >>  
> >> One option we should consider is for each backend to connect to Xen via
> >> the IOREQ interface. We could generalize the IOREQ interface and make it
> >> hypervisor agnostic. The interface is really trivial and easy to add.
> >> The only Xen-specific part is the notification mechanism, which is an
> >> event channel. If we replaced the event channel with something else the
> >> interface would be generic. See:
> >> https://gitlab.com/xen-project/xen/-/blob/staging/xen/include/public/hvm/ioreq.h#L52
> >
> > There have been experiments with something kind of similar in KVM
> > recently (see struct ioregionfd_cmd):
> > https://lore.kernel.org/kvm/dad3d025bcf15ece11d9df0ff685e8ab0a4f2edd.1613828727.git.eafanasova@gmail.com/
> 
> Reading the cover letter was very useful in showing how this provides a
> separate channel for signalling IO events to userspace instead of using
> the normal type-2 vmexit type event. I wonder how deeply tied the
> userspace facing side of this is to KVM? Could it provide a common FD
> type interface to IOREQ?

I wondered this too after reading Stefano's link to Xen's ioreq. They
seem to be quite similar. ioregionfd is closer to have PIO/MMIO vmexits
are handled in KVM while I guess ioreq is closer to how Xen handles
them, but those are small details.

It may be possible to use the ioreq struct instead of ioregionfd in KVM,
but I haven't checked each field.

> As I understand IOREQ this is currently a direct communication between
> userspace and the hypervisor using the existing Xen message bus. My
> worry would be that by adding knowledge of what the underlying
> hypervisor is we'd end up with excess complexity in the kernel. For one
> thing we certainly wouldn't want an API version dependency on the kernel
> to understand which version of the Xen hypervisor it was running on.
> 
> >> There is also another problem. IOREQ is probably not be the only
> >> interface needed. Have a look at
> >> https://marc.info/?l=xen-devel&m=162373754705233&w=2. Don't we also need
> >> an interface for the backend to inject interrupts into the frontend? And
> >> if the backend requires dynamic memory mappings of frontend pages, then
> >> we would also need an interface to map/unmap domU pages.
> >> 
> >> These interfaces are a lot more problematic than IOREQ: IOREQ is tiny
> >> and self-contained. It is easy to add anywhere. A new interface to
> >> inject interrupts or map pages is more difficult to manage because it
> >> would require changes scattered across the various emulators.
> >
> > Something like ioreq is indeed necessary to implement arbitrary devices,
> > but if you are willing to restrict yourself to VIRTIO then other
> > interfaces are possible too because the VIRTIO device model is different
> > from the general purpose x86 PIO/MMIO that Xen's ioreq seems to
> > support.
> 
> It's true our focus is just VirtIO which does support alternative
> transport options however most implementations seem to be targeting
> virtio-mmio for it's relative simplicity and understood semantics
> (modulo a desire for MSI to reduce round trip latency handling
> signalling).

Okay.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [virtio-dev] Re: Enabling hypervisor agnosticism for VirtIO backends
       [not found]       ` <20210903080609.GD47953@laputa>
@ 2021-09-03  9:28         ` Alex Bennée
       [not found]           ` <20210906022356.GD40187@laputa>
  0 siblings, 1 reply; 19+ messages in thread
From: Alex Bennée @ 2021-09-03  9:28 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: Stefan Hajnoczi, Stefano Stabellini, Stratos Mailing List,
	virtio-dev, Arnd Bergmann, Viresh Kumar, Stefano Stabellini,
	Jan Kiszka, Carl van Schaik, pratikp, Srivatsa Vaddagiri,
	Jean-Philippe Brucker, Mathieu Poirier, Wei.Chen, olekstysh,
	Oleksandr_Tyshchenko, Bertrand.Marquis, Artem_Mygaiev, julien,
	jgross, paul, xen-devel, Elena Afanasova


AKASHI Takahiro <takahiro.akashi@linaro.org> writes:

> Alex,
>
> On Wed, Sep 01, 2021 at 01:53:34PM +0100, Alex Benn??e wrote:
>> 
>> Stefan Hajnoczi <stefanha@redhat.com> writes:
>> 
>> > [[PGP Signed Part:Undecided]]
>> > On Wed, Aug 04, 2021 at 12:20:01PM -0700, Stefano Stabellini wrote:
>> >> > Could we consider the kernel internally converting IOREQ messages from
>> >> > the Xen hypervisor to eventfd events? Would this scale with other kernel
>> >> > hypercall interfaces?
>> >> > 
>> >> > So any thoughts on what directions are worth experimenting with?
>> >>  
>> >> One option we should consider is for each backend to connect to Xen via
>> >> the IOREQ interface. We could generalize the IOREQ interface and make it
>> >> hypervisor agnostic. The interface is really trivial and easy to add.
>> >> The only Xen-specific part is the notification mechanism, which is an
>> >> event channel. If we replaced the event channel with something else the
>> >> interface would be generic. See:
>> >> https://gitlab.com/xen-project/xen/-/blob/staging/xen/include/public/hvm/ioreq.h#L52
>> >
>> > There have been experiments with something kind of similar in KVM
>> > recently (see struct ioregionfd_cmd):
>> > https://lore.kernel.org/kvm/dad3d025bcf15ece11d9df0ff685e8ab0a4f2edd.1613828727.git.eafanasova@gmail.com/
>> 
>> Reading the cover letter was very useful in showing how this provides a
>> separate channel for signalling IO events to userspace instead of using
>> the normal type-2 vmexit type event. I wonder how deeply tied the
>> userspace facing side of this is to KVM? Could it provide a common FD
>> type interface to IOREQ?
>
> Why do you stick to a "FD" type interface?

I mean most user space interfaces on POSIX start with a file descriptor
and the usual read/write semantics or a series of ioctls.

>> As I understand IOREQ this is currently a direct communication between
>> userspace and the hypervisor using the existing Xen message bus. My
>
> With IOREQ server, IO event occurrences are notified to BE via Xen's event
> channel, while the actual contexts of IO events (see struct ioreq in ioreq.h)
> are put in a queue on a single shared memory page which is to be assigned
> beforehand with xenforeignmemory_map_resource hypervisor call.

If we abstracted the IOREQ via the kernel interface you would probably
just want to put the ioreq structure on a queue rather than expose the
shared page to userspace. 

>> worry would be that by adding knowledge of what the underlying
>> hypervisor is we'd end up with excess complexity in the kernel. For one
>> thing we certainly wouldn't want an API version dependency on the kernel
>> to understand which version of the Xen hypervisor it was running on.
>
> That's exactly what virtio-proxy in my proposal[1] does; All the hypervisor-
> specific details of IO event handlings are contained in virtio-proxy
> and virtio BE will communicate with virtio-proxy through a virtqueue
> (yes, virtio-proxy is seen as yet another virtio device on BE) and will
> get IO event-related *RPC* callbacks, either MMIO read or write, from
> virtio-proxy.
>
> See page 8 (protocol flow) and 10 (interfaces) in [1].

There are two areas of concern with the proxy approach at the moment.
The first is how the bootstrap of the virtio-proxy channel happens and
the second is how many context switches are involved in a transaction.
Of course with all things there is a trade off. Things involving the
very tightest latency would probably opt for a bare metal backend which
I think would imply hypervisor knowledge in the backend binary.

>
> If kvm's ioregionfd can fit into this protocol, virtio-proxy for kvm
> will hopefully be implemented using ioregionfd.
>
> -Takahiro Akashi
>
> [1] https://op-lists.linaro.org/pipermail/stratos-dev/2021-August/000548.html

-- 
Alex Bennée

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [virtio-dev] Re: Enabling hypervisor agnosticism for VirtIO backends
       [not found]                           ` <20210902071902.GC71098@laputa>
@ 2021-09-07  0:57                             ` Christopher Clark
       [not found]                               ` <20210907115501.GC49004@laputa>
  0 siblings, 1 reply; 19+ messages in thread
From: Christopher Clark @ 2021-09-07  0:57 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: Wei Chen, Oleksandr Tyshchenko, Stefano Stabellini, Alex Benn??e,
	Kaly Xin, Stratos Mailing List, virtio-dev@lists.oasis-open.org,
	Arnd Bergmann, Viresh Kumar, Stefano Stabellini,
	stefanha@redhat.com, Jan Kiszka, Carl van Schaik,
	pratikp@quicinc.com, Srivatsa Vaddagiri, Jean-Philippe Brucker,
	Mathieu Poirier, Oleksandr Tyshchenko, Bertrand Marquis,
	Artem Mygaiev, Julien Grall, Juergen Gross, Paul Durrant,
	Xen Devel, Rich Persaud, Daniel Smith, James McKenzie,
	Andrew Cooper

[-- Attachment #1: Type: text/plain, Size: 12671 bytes --]

On Thu, Sep 2, 2021 at 12:19 AM AKASHI Takahiro <takahiro.akashi@linaro.org>
wrote:

> Hi Christopher,
>
> Thank you for your feedback.
>
> On Mon, Aug 30, 2021 at 12:53:00PM -0700, Christopher Clark wrote:
> > [ resending message to ensure delivery to the CCd mailing lists
> > post-subscription ]
> >
> > Apologies for being late to this thread, but I hope to be able to
> > contribute to
> > this discussion in a meaningful way. I am grateful for the level of
> > interest in
> > this topic. I would like to draw your attention to Argo as a suitable
> > technology for development of VirtIO's hypervisor-agnostic interfaces.
> >
> > * Argo is an interdomain communication mechanism in Xen (on x86 and Arm)
> > that
> >   can send and receive hypervisor-mediated notifications and messages
> > between
> >   domains (VMs). [1] The hypervisor can enforce Mandatory Access Control
> > over
> >   all communication between domains. It is derived from the earlier v4v,
> > which
> >   has been deployed on millions of machines with the HP/Bromium uXen
> > hypervisor
> >   and with OpenXT.
> >
> > * Argo has a simple interface with a small number of operations that was
> >   designed for ease of integration into OS primitives on both Linux
> > (sockets)
> >   and Windows (ReadFile/WriteFile) [2].
> >     - A unikernel example of using it has also been developed for XTF.
> [3]
> >
> > * There has been recent discussion and support in the Xen community for
> > making
> >   revisions to the Argo interface to make it hypervisor-agnostic, and
> > support
> >   implementations of Argo on other hypervisors. This will enable a single
> >   interface for an OS kernel binary to use for inter-VM communication
> that
> > will
> >   work on multiple hypervisors -- this applies equally to both backends
> and
> >   frontend implementations. [4]
>
> Regarding virtio-over-Argo, let me ask a few questions:
> (In figure "Virtual device buffer access:Virtio+Argo" in [4])
>

(for ref, this diagram is from this document:
 https://openxt.atlassian.net/wiki/spaces/DC/pages/1348763698 )

Takahiro, thanks for reading the Virtio-Argo materials.

Some relevant context before answering your questions below: the Argo
request
interface from the hypervisor to a guest, which is currently exposed only
via a
dedicated hypercall op, has been discussed within the Xen community and is
open
to being changed in order to better enable support for guest VM access to
Argo
functions in a hypervisor-agnostic way.

The proposal is to allow hypervisors the option to implement and expose any
of
multiple access mechanisms for Argo, and then enable a guest device driver
to
probe the hypervisor for methods that it is aware of and able to use. The
hypercall op is likely to be retained (in some form), and complemented at
least
on x86 with another interface via MSRs presented to the guests.



> 1) How the configuration is managed?
>    On either virtio-mmio or virtio-pci, there always takes place
>    some negotiation between the FE and BE through the "configuration"
>    space. How can this be done in virtio-over-Argo?
>

Just to be clear about my understanding: your question, in the context of a
Linux kernel virtio device driver implementation, is about how a virtio-argo
transport driver would implement the get_features function of the
virtio_config_ops, as a parallel to the work that vp_get_features does for
virtio-pci, and vm_get_features does for virtio-mmio.

The design is still open on this and options have been discussed, including:

* an extension to Argo to allow the system toolstack (which is responsible
for
  managing guest VMs and enabling connections from front-to-backends)
  to manage a table of "implicit destinations", so a guest can transmit Argo
  messages to eg. "my storage service" port and the hypervisor will deliver
it
  based on a destination table pre-programmed by the toolstack for the VM.
  [1]
     - ref: Notes from the December 2019 Xen F2F meeting in Cambridge, UK:
       [1] https://lists.archive.carbon60.com/xen/devel/577800#577800

  So within that feature negotiation function, communication with the
backend
  via that Argo channel will occur.

* IOREQ
The Xen IOREQ implementation is not currently appropriate for virtio-argo
since
it requires the use of foreign memory mappings of frontend memory in the
backend
guest. However, a new HMX interface from the hypervisor could support a new
DMA
Device Model Op to allow the backend to request the hypervisor to retrieve
specified
bytes from the frontend guest, which would enable plumbing for device
configuration
between an IOREQ server (device model backend implementation) and the guest
driver.
[2]

Feature negotiation in the front end in this case would look very similar to
the virtio-mmio implementation.

ref: Argo HMX Transport for VirtIO meeting minutes, from January 2021:
[2]
https://lists.xenproject.org/archives/html/xen-devel/2021-02/msg01422.html

* guest ACPI tables that surface the address of a remote Argo endpoint
  on behalf of the toolstack, and Argo communication can then negotiate
features

* emulation of a basic PCI device by the hypervisor (though details not
determined)



> 2) Do there physically exist virtio's available/used vrings as well as
>    descriptors, or are they virtually emulated over Argo (rings)?
>

In short: the latter.

In the analysis that I did when looking at this, my observation was that
each
side (front and backend) should be able to accurately maintain their own
local
copy of the available/used vrings as well as descriptors, and both be kept
synchronized by ensuring that updates are transmitted to the other side when
they are written to. eg. As part of this, in the Linux front end
implementation
the virtqueue_notify function uses a function pointer in the virtqueue that
is
populated by the transport driver, ie. the virtio-argo driver in this case,
which can implement the necessary logic to coordinate with the backend.


> 3) The payload in a request will be copied into the receiver's Argo ring.
>    What does the address in a descriptor mean?
>    Address/offset in a ring buffer?
>

Effectively yes. I would treat it as a handle that is used to identify and
retrieve data from messages exchanged between frontend transport driver and
the backend via Argo rings established for moving data for the data path.
In the diagram, those are "Argo ring for reads" and "Argo ring for writes".


> 4) Estimate of performance or latency?
>

Different access methods to Argo (ie. related to my answer to your question
'1)'
above --) will have different performance characteristics.

Data copying will necessarily involved for any Hypervisor-Mediated data
eXchange
(HMX) mechanism[1], such as Argo, where there is no shared memory between
guest
VMs, but the performance profile on modern CPUs with sizable caches has been
demonstrated to be acceptable for the guest virtual device drivers use case
in
the HP/Bromium vSentry uXen product. The VirtIO structure is somewhat
different
though.

Further performance profiling and measurement will be valuable for enabling
tuning of the implementation and development of additional interfaces (eg.
such
as an asynchronous send primitive) - some of this has been discussed and
described on the VirtIO-Argo-Development-Phase-1 wiki page[2].

[1]
https://wiki.xenproject.org/wiki/Argo:_Hypervisor-Mediated_Exchange_(HMX)_for_Xen

[2]
https://openxt.atlassian.net/wiki/spaces/DC/pages/1696169985/VirtIO-Argo+Development%3A+Phase+1


>    It appears that, on FE side, at least three hypervisor calls (and data
>    copying) need to be invoked at every request, right?
>

For a write, counting FE sendv ops:
1: the write data payload is sent via the "Argo ring for writes"
2: the descriptor is sent via a sync of the available/descriptor ring
  -- is there a third one that I am missing?

Christopher


>
> Thanks,
> -Takahiro Akashi
>
>
> > * Here are the design documents for building VirtIO-over-Argo, to
> support a
> >   hypervisor-agnostic frontend VirtIO transport driver using Argo.
> >
> > The Development Plan to build VirtIO virtual device support over Argo
> > transport:
> >
> https://openxt.atlassian.net/wiki/spaces/DC/pages/1696169985/VirtIO-Argo+Development+Phase+1
> >
> > A design for using VirtIO over Argo, describing how VirtIO data
> structures
> > and communication is handled over the Argo transport:
> > https://openxt.atlassian.net/wiki/spaces/DC/pages/1348763698/VirtIO+Argo
> >
> > Diagram (from the above document) showing how VirtIO rings are
> synchronized
> > between domains without using shared memory:
> >
> https://openxt.atlassian.net/46e1c93b-2b87-4cb2-951e-abd4377a1194#media-blob-url=true&id=01f7d0e1-7686-4f0b-88e1-457c1d30df40&collection=contentId-1348763698&contextId=1348763698&mimeType=image%2Fpng&name=device-buffer-access-virtio-argo.png&size=243175&width=1106&height=1241
> >
> > Please note that the above design documents show that the existing VirtIO
> > device drivers, and both vring and virtqueue data structures can be
> > preserved
> > while interdomain communication can be performed with no shared memory
> > required
> > for most drivers; (the exceptions where further design is required are
> those
> > such as virtual framebuffer devices where shared memory regions are
> > intentionally
> > added to the communication structure beyond the vrings and virtqueues).
> >
> > An analysis of VirtIO and Argo, informing the design:
> >
> https://openxt.atlassian.net/wiki/spaces/DC/pages/1333428225/Analysis+of+Argo+as+a+transport+medium+for+VirtIO
> >
> > * Argo can be used for a communication path for configuration between the
> > backend
> >   and the toolstack, avoiding the need for a dependency on XenStore,
> which
> > is an
> >   advantage for any hypervisor-agnostic design. It is also amenable to a
> > notification
> >   mechanism that is not based on Xen event channels.
> >
> > * Argo does not use or require shared memory between VMs and provides an
> > alternative
> >   to the use of foreign shared memory mappings. It avoids some of the
> > complexities
> >   involved with using grants (eg. XSA-300).
> >
> > * Argo supports Mandatory Access Control by the hypervisor, satisfying a
> > common
> >   certification requirement.
> >
> > * The Argo headers are BSD-licensed and the Xen hypervisor implementation
> > is GPLv2 but
> >   accessible via the hypercall interface. The licensing should not
> present
> > an obstacle
> >   to adoption of Argo in guest software or implementation by other
> > hypervisors.
> >
> > * Since the interface that Argo presents to a guest VM is similar to
> DMA, a
> > VirtIO-Argo
> >   frontend transport driver should be able to operate with a physical
> > VirtIO-enabled
> >   smart-NIC if the toolstack and an Argo-aware backend provide support.
> >
> > The next Xen Community Call is next week and I would be happy to answer
> > questions
> > about Argo and on this topic. I will also be following this thread.
> >
> > Christopher
> > (Argo maintainer, Xen Community)
> >
> >
> --------------------------------------------------------------------------------
> > [1]
> > An introduction to Argo:
> >
> https://static.sched.com/hosted_files/xensummit19/92/Argo%20and%20HMX%20-%20OpenXT%20-%20Christopher%20Clark%20-%20Xen%20Summit%202019.pdf
> > https://www.youtube.com/watch?v=cnC0Tg3jqJQ
> > Xen Wiki page for Argo:
> >
> https://wiki.xenproject.org/wiki/Argo:_Hypervisor-Mediated_Exchange_(HMX)_for_Xen
> >
> > [2]
> > OpenXT Linux Argo driver and userspace library:
> > https://github.com/openxt/linux-xen-argo
> >
> > Windows V4V at OpenXT wiki:
> > https://openxt.atlassian.net/wiki/spaces/DC/pages/14844007/V4V
> > Windows v4v driver source:
> > https://github.com/OpenXT/xc-windows/tree/master/xenv4v
> >
> > HP/Bromium uXen V4V driver:
> > https://github.com/uxen-virt/uxen/tree/ascara/windows/uxenv4vlib
> >
> > [3]
> > v2 of the Argo test unikernel for XTF:
> >
> https://lists.xenproject.org/archives/html/xen-devel/2021-01/msg02234.html
> >
> > [4]
> > Argo HMX Transport for VirtIO meeting minutes:
> >
> https://lists.xenproject.org/archives/html/xen-devel/2021-02/msg01422.html
> >
> > VirtIO-Argo Development wiki page:
> >
> https://openxt.atlassian.net/wiki/spaces/DC/pages/1696169985/VirtIO-Argo+Development+Phase+1
> >
>
>

[-- Attachment #2: Type: text/html, Size: 17467 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [virtio-dev] Re: [Stratos-dev] Enabling hypervisor agnosticism for VirtIO backends
       [not found]           ` <20210906022356.GD40187@laputa>
@ 2021-09-07  2:41             ` Christopher Clark
  2021-09-10  9:35               ` Alex Bennée
       [not found]             ` <alpine.DEB.2.21.2109131615570.10523@sstabellini-ThinkPad-T480s>
  1 sibling, 1 reply; 19+ messages in thread
From: Christopher Clark @ 2021-09-07  2:41 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: Alex Benn??e, Wei Chen, Paul Durrant, Stratos Mailing List,
	virtio-dev, Stefano Stabellini, Jan Kiszka, Arnd Bergmann,
	Juergen Gross, Julien Grall, Carl van Schaik, Bertrand Marquis,
	Stefan Hajnoczi, Artem Mygaiev, Xen-devel, Oleksandr Tyshchenko,
	Oleksandr Tyshchenko, Elena Afanasova, James McKenzie,
	Andrew Cooper, Rich Persaud, Daniel Smith, Jason Andryuk,
	eric chanudet, Roger Pau Monné

[-- Attachment #1: Type: text/plain, Size: 6131 bytes --]

On Sun, Sep 5, 2021 at 7:24 PM AKASHI Takahiro via Stratos-dev <
stratos-dev@op-lists.linaro.org> wrote:

> Alex,
>
> On Fri, Sep 03, 2021 at 10:28:06AM +0100, Alex Benn??e wrote:
> >
> > AKASHI Takahiro <takahiro.akashi@linaro.org> writes:
> >
> > > Alex,
> > >
> > > On Wed, Sep 01, 2021 at 01:53:34PM +0100, Alex Benn??e wrote:
> > >>
> > >> Stefan Hajnoczi <stefanha@redhat.com> writes:
> > >>
> > >> > [[PGP Signed Part:Undecided]]
> > >> > On Wed, Aug 04, 2021 at 12:20:01PM -0700, Stefano Stabellini wrote:
> > >> >> > Could we consider the kernel internally converting IOREQ
> messages from
> > >> >> > the Xen hypervisor to eventfd events? Would this scale with
> other kernel
> > >> >> > hypercall interfaces?
> > >> >> >
> > >> >> > So any thoughts on what directions are worth experimenting with?
> > >> >>
> > >> >> One option we should consider is for each backend to connect to
> Xen via
> > >> >> the IOREQ interface. We could generalize the IOREQ interface and
> make it
> > >> >> hypervisor agnostic. The interface is really trivial and easy to
> add.
> > >> >> The only Xen-specific part is the notification mechanism, which is
> an
> > >> >> event channel. If we replaced the event channel with something
> else the
> > >> >> interface would be generic. See:
> > >> >>
> https://gitlab.com/xen-project/xen/-/blob/staging/xen/include/public/hvm/ioreq.h#L52
> > >> >
> > >> > There have been experiments with something kind of similar in KVM
> > >> > recently (see struct ioregionfd_cmd):
> > >> >
> https://lore.kernel.org/kvm/dad3d025bcf15ece11d9df0ff685e8ab0a4f2edd.1613828727.git.eafanasova@gmail.com/
> > >>
> > >> Reading the cover letter was very useful in showing how this provides
> a
> > >> separate channel for signalling IO events to userspace instead of
> using
> > >> the normal type-2 vmexit type event. I wonder how deeply tied the
> > >> userspace facing side of this is to KVM? Could it provide a common FD
> > >> type interface to IOREQ?
> > >
> > > Why do you stick to a "FD" type interface?
> >
> > I mean most user space interfaces on POSIX start with a file descriptor
> > and the usual read/write semantics or a series of ioctls.
>
> Who do you assume is responsible for implementing this kind of
> fd semantics, OSs on BE or hypervisor itself?
>
> I think such interfaces can only be easily implemented on type-2
> hypervisors.
>
> # In this sense, I don't think rust-vmm, as it is, cannot be
> # a general solution.
>
> > >> As I understand IOREQ this is currently a direct communication between
> > >> userspace and the hypervisor using the existing Xen message bus. My
> > >
> > > With IOREQ server, IO event occurrences are notified to BE via Xen's
> event
> > > channel, while the actual contexts of IO events (see struct ioreq in
> ioreq.h)
> > > are put in a queue on a single shared memory page which is to be
> assigned
> > > beforehand with xenforeignmemory_map_resource hypervisor call.
> >
> > If we abstracted the IOREQ via the kernel interface you would probably
> > just want to put the ioreq structure on a queue rather than expose the
> > shared page to userspace.
>
> Where is that queue?
>
> > >> worry would be that by adding knowledge of what the underlying
> > >> hypervisor is we'd end up with excess complexity in the kernel. For
> one
> > >> thing we certainly wouldn't want an API version dependency on the
> kernel
> > >> to understand which version of the Xen hypervisor it was running on.
> > >
> > > That's exactly what virtio-proxy in my proposal[1] does; All the
> hypervisor-
> > > specific details of IO event handlings are contained in virtio-proxy
> > > and virtio BE will communicate with virtio-proxy through a virtqueue
> > > (yes, virtio-proxy is seen as yet another virtio device on BE) and will
> > > get IO event-related *RPC* callbacks, either MMIO read or write, from
> > > virtio-proxy.
> > >
> > > See page 8 (protocol flow) and 10 (interfaces) in [1].
> >
> > There are two areas of concern with the proxy approach at the moment.
> > The first is how the bootstrap of the virtio-proxy channel happens and
>
> As I said, from BE point of view, virtio-proxy would be seen
> as yet another virtio device by which BE could talk to "virtio
> proxy" vm or whatever else.
>
> This way we guarantee BE's hypervisor-agnosticism instead of having
> "common" hypervisor interfaces. That is the base of my idea.
>
> > the second is how many context switches are involved in a transaction.
> > Of course with all things there is a trade off. Things involving the
> > very tightest latency would probably opt for a bare metal backend which
> > I think would imply hypervisor knowledge in the backend binary.
>
> In configuration phase of virtio device, the latency won't be a big matter.
> In device operations (i.e. read/write to block devices), if we can
> resolve 'mmap' issue, as Oleksandr is proposing right now, the only issue
> is
> how efficiently we can deliver notification to the opposite side. Right?
> And this is a very common problem whatever approach we would take.
>
> Anyhow, if we do care the latency in my approach, most of virtio-proxy-
> related code can be re-implemented just as a stub (or shim?) library
> since the protocols are defined as RPCs.
> In this case, however, we would lose the benefit of providing "single
> binary"
> BE.
> (I know this is is an arguable requirement, though.)
>
> # Would we better discuss what "hypervisor-agnosticism" means?
>
> Is there a call that you could recommend that we join to discuss this and
the topics of this thread?
There is definitely interest in pursuing a new interface for Argo that can
be implemented in other hypervisors and enable guest binary portability
between them, at least on the same hardware architecture, with VirtIO
transport as a primary use case.

The notes from the Xen Summit Design Session on VirtIO Cross-Project BoF
for Xen and Guest OS, which include context about the several separate
approaches to VirtIO on Xen, have now been posted here:
https://lists.xenproject.org/archives/html/xen-devel/2021-09/msg00472.html

Christopher



> -Takahiro Akashi
>
>
>

[-- Attachment #2: Type: text/html, Size: 8183 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [virtio-dev] Re: Enabling hypervisor agnosticism for VirtIO backends
       [not found]                               ` <20210907115501.GC49004@laputa>
@ 2021-09-07 18:09                                 ` Christopher Clark
  0 siblings, 0 replies; 19+ messages in thread
From: Christopher Clark @ 2021-09-07 18:09 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: Wei Chen, Oleksandr Tyshchenko, Stefano Stabellini, Alex Benn??e,
	Kaly Xin, Stratos Mailing List, virtio-dev@lists.oasis-open.org,
	Arnd Bergmann, Viresh Kumar, Stefano Stabellini,
	stefanha@redhat.com, Jan Kiszka, Carl van Schaik,
	pratikp@quicinc.com, Srivatsa Vaddagiri, Jean-Philippe Brucker,
	Mathieu Poirier, Oleksandr Tyshchenko, Bertrand Marquis,
	Artem Mygaiev, Julien Grall, Juergen Gross, Paul Durrant,
	Xen Devel, Rich Persaud, Daniel Smith, James McKenzie,
	Andrew Cooper

[-- Attachment #1: Type: text/plain, Size: 7922 bytes --]

On Tue, Sep 7, 2021 at 4:55 AM AKASHI Takahiro <takahiro.akashi@linaro.org>
wrote:

> Hi,
>
> I have not covered all your comments below yet.
> So just one comment:
>
> On Mon, Sep 06, 2021 at 05:57:43PM -0700, Christopher Clark wrote:
> > On Thu, Sep 2, 2021 at 12:19 AM AKASHI Takahiro <
> takahiro.akashi@linaro.org>
> > wrote:
>
> (snip)
>
> > >    It appears that, on FE side, at least three hypervisor calls (and
> data
> > >    copying) need to be invoked at every request, right?
> > >
> >
> > For a write, counting FE sendv ops:
> > 1: the write data payload is sent via the "Argo ring for writes"
> > 2: the descriptor is sent via a sync of the available/descriptor ring
> >   -- is there a third one that I am missing?
>
> In the picture, I can see
> a) Data transmitted by Argo sendv
> b) Descriptor written after data sendv
> c) VirtIO ring sync'd to back-end via separate sendv
>
> Oops, (b) is not a hypervisor call, is it?
>

That's correct, it is not - the blue arrows in the diagram are not
hypercalls, they are intended to show data movement or action in the flow
of performing the operation, and (b) is a data write within the guest's
address space into the descriptor ring.



> (But I guess that you will have to have yet another call for notification
> since there is no config register of QueueNotify?)
>

Reasoning about hypercalls necessary for data movement:

VirtIO transport drivers are responsible for instantiating virtqueues
(setup_vq) and are able to populate the notify function pointer in the
virtqueue that they supply. The virtio-argo transport driver can provide a
suitable notify function implementation that will issue the Argo hypercall
sendv hypercall(s) for sending data from the guest frontend to the backend.
By issuing the sendv at the time of the queuenotify, rather than as each
buffer is added to the virtqueue, the cost of the sendv hypercall can be
amortized over multiple buffer additions to the virtqueue.

I also understand that there has been some recent work in the Linaro
Project Stratos on "Fat Virtqueues", where the data to be transmitted is
included within an expanded virtqueue, which could further reduce the
number of hypercalls required, since the data can be transmitted inline
with the descriptors.
Reference here:
https://linaro.atlassian.net/wiki/spaces/STR/pages/25626313982/2021-01-21+Project+Stratos+Sync+Meeting+notes
https://linaro.atlassian.net/browse/STR-25

As a result of the above, I think that a single hypercall could be
sufficient for communicating data for multiple requests, and that a
two-hypercall-per-request (worst case) upper bound could also be
established.

Christopher



>
> Thanks,
> -Takahiro Akashi
>
>
> > Christopher
> >
> >
> > >
> > > Thanks,
> > > -Takahiro Akashi
> > >
> > >
> > > > * Here are the design documents for building VirtIO-over-Argo, to
> > > support a
> > > >   hypervisor-agnostic frontend VirtIO transport driver using Argo.
> > > >
> > > > The Development Plan to build VirtIO virtual device support over Argo
> > > > transport:
> > > >
> > >
> https://openxt.atlassian.net/wiki/spaces/DC/pages/1696169985/VirtIO-Argo+Development+Phase+1
> > > >
> > > > A design for using VirtIO over Argo, describing how VirtIO data
> > > structures
> > > > and communication is handled over the Argo transport:
> > > >
> https://openxt.atlassian.net/wiki/spaces/DC/pages/1348763698/VirtIO+Argo
> > > >
> > > > Diagram (from the above document) showing how VirtIO rings are
> > > synchronized
> > > > between domains without using shared memory:
> > > >
> > >
> https://openxt.atlassian.net/46e1c93b-2b87-4cb2-951e-abd4377a1194#media-blob-url=true&id=01f7d0e1-7686-4f0b-88e1-457c1d30df40&collection=contentId-1348763698&contextId=1348763698&mimeType=image%2Fpng&name=device-buffer-access-virtio-argo.png&size=243175&width=1106&height=1241
> > > >
> > > > Please note that the above design documents show that the existing
> VirtIO
> > > > device drivers, and both vring and virtqueue data structures can be
> > > > preserved
> > > > while interdomain communication can be performed with no shared
> memory
> > > > required
> > > > for most drivers; (the exceptions where further design is required
> are
> > > those
> > > > such as virtual framebuffer devices where shared memory regions are
> > > > intentionally
> > > > added to the communication structure beyond the vrings and
> virtqueues).
> > > >
> > > > An analysis of VirtIO and Argo, informing the design:
> > > >
> > >
> https://openxt.atlassian.net/wiki/spaces/DC/pages/1333428225/Analysis+of+Argo+as+a+transport+medium+for+VirtIO
> > > >
> > > > * Argo can be used for a communication path for configuration
> between the
> > > > backend
> > > >   and the toolstack, avoiding the need for a dependency on XenStore,
> > > which
> > > > is an
> > > >   advantage for any hypervisor-agnostic design. It is also amenable
> to a
> > > > notification
> > > >   mechanism that is not based on Xen event channels.
> > > >
> > > > * Argo does not use or require shared memory between VMs and
> provides an
> > > > alternative
> > > >   to the use of foreign shared memory mappings. It avoids some of the
> > > > complexities
> > > >   involved with using grants (eg. XSA-300).
> > > >
> > > > * Argo supports Mandatory Access Control by the hypervisor,
> satisfying a
> > > > common
> > > >   certification requirement.
> > > >
> > > > * The Argo headers are BSD-licensed and the Xen hypervisor
> implementation
> > > > is GPLv2 but
> > > >   accessible via the hypercall interface. The licensing should not
> > > present
> > > > an obstacle
> > > >   to adoption of Argo in guest software or implementation by other
> > > > hypervisors.
> > > >
> > > > * Since the interface that Argo presents to a guest VM is similar to
> > > DMA, a
> > > > VirtIO-Argo
> > > >   frontend transport driver should be able to operate with a physical
> > > > VirtIO-enabled
> > > >   smart-NIC if the toolstack and an Argo-aware backend provide
> support.
> > > >
> > > > The next Xen Community Call is next week and I would be happy to
> answer
> > > > questions
> > > > about Argo and on this topic. I will also be following this thread.
> > > >
> > > > Christopher
> > > > (Argo maintainer, Xen Community)
> > > >
> > > >
> > >
> --------------------------------------------------------------------------------
> > > > [1]
> > > > An introduction to Argo:
> > > >
> > >
> https://static.sched.com/hosted_files/xensummit19/92/Argo%20and%20HMX%20-%20OpenXT%20-%20Christopher%20Clark%20-%20Xen%20Summit%202019.pdf
> > > > https://www.youtube.com/watch?v=cnC0Tg3jqJQ
> > > > Xen Wiki page for Argo:
> > > >
> > >
> https://wiki.xenproject.org/wiki/Argo:_Hypervisor-Mediated_Exchange_(HMX)_for_Xen
> > > >
> > > > [2]
> > > > OpenXT Linux Argo driver and userspace library:
> > > > https://github.com/openxt/linux-xen-argo
> > > >
> > > > Windows V4V at OpenXT wiki:
> > > > https://openxt.atlassian.net/wiki/spaces/DC/pages/14844007/V4V
> > > > Windows v4v driver source:
> > > > https://github.com/OpenXT/xc-windows/tree/master/xenv4v
> > > >
> > > > HP/Bromium uXen V4V driver:
> > > > https://github.com/uxen-virt/uxen/tree/ascara/windows/uxenv4vlib
> > > >
> > > > [3]
> > > > v2 of the Argo test unikernel for XTF:
> > > >
> > >
> https://lists.xenproject.org/archives/html/xen-devel/2021-01/msg02234.html
> > > >
> > > > [4]
> > > > Argo HMX Transport for VirtIO meeting minutes:
> > > >
> > >
> https://lists.xenproject.org/archives/html/xen-devel/2021-02/msg01422.html
> > > >
> > > > VirtIO-Argo Development wiki page:
> > > >
> > >
> https://openxt.atlassian.net/wiki/spaces/DC/pages/1696169985/VirtIO-Argo+Development+Phase+1
> > > >
> > >
> > >
>

[-- Attachment #2: Type: text/html, Size: 12365 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [virtio-dev] Re: [Stratos-dev] Enabling hypervisor agnosticism for VirtIO backends
  2021-09-07  2:41             ` [virtio-dev] Re: [Stratos-dev] " Christopher Clark
@ 2021-09-10  9:35               ` Alex Bennée
  0 siblings, 0 replies; 19+ messages in thread
From: Alex Bennée @ 2021-09-10  9:35 UTC (permalink / raw)
  To: Christopher Clark
  Cc: AKASHI Takahiro, Wei Chen, Paul Durrant, Stratos Mailing List,
	virtio-dev, Stefano Stabellini, Jan Kiszka, Arnd Bergmann,
	Juergen Gross, Julien Grall, Carl van Schaik, Bertrand Marquis,
	Stefan Hajnoczi, Artem Mygaiev, Xen-devel, Oleksandr Tyshchenko,
	Oleksandr Tyshchenko, Elena Afanasova, James McKenzie,
	Andrew Cooper, Rich Persaud, Daniel Smith, Jason Andryuk,
	eric chanudet, Roger Pau Monné


Christopher Clark <christopher.w.clark@gmail.com> writes:

> On Sun, Sep 5, 2021 at 7:24 PM AKASHI Takahiro via Stratos-dev <stratos-dev@op-lists.linaro.org> wrote:
>
>  Alex,
>
>  On Fri, Sep 03, 2021 at 10:28:06AM +0100, Alex Benn??e wrote:
<snip>
>
>  In configuration phase of virtio device, the latency won't be a big matter.
>  In device operations (i.e. read/write to block devices), if we can
>  resolve 'mmap' issue, as Oleksandr is proposing right now, the only issue is
>  how efficiently we can deliver notification to the opposite side. Right?
>  And this is a very common problem whatever approach we would take.
>
>  Anyhow, if we do care the latency in my approach, most of virtio-proxy-
>  related code can be re-implemented just as a stub (or shim?) library
>  since the protocols are defined as RPCs.
>  In this case, however, we would lose the benefit of providing "single binary"
>  BE.
>  (I know this is is an arguable requirement, though.)

The proposal for a single binary would always require something to shim
between hypervisors. This is still an area of discussion though. Having
a compile time selectable approach is practically unavoidable for "bare
metal" backends though because there are no other processes/layers that
communication with the hypervisor can be delegated to.

>
>  # Would we better discuss what "hypervisor-agnosticism" means?
>
> Is there a call that you could recommend that we join to discuss this and the topics of this thread?
> There is definitely interest in pursuing a new interface for Argo that can be implemented in other hypervisors and enable guest binary
> portability between them, at least on the same hardware architecture,
> with VirtIO transport as a primary use case.

There is indeed ;-)

We have a regular open call every two week for the Stratos project which
you are welcome to attend. You can find the details on the project
overview page:

  https://linaro.atlassian.net/wiki/spaces/STR/overview

we regularly have teams from outside the project present their work as well.

> The notes from the Xen Summit Design Session on VirtIO Cross-Project BoF for Xen and Guest OS, which include context about the
> several separate approaches to VirtIO on Xen, have now been posted here:
> https://lists.xenproject.org/archives/html/xen-devel/2021-09/msg00472.html

Thanks for the link - looks like a very detailed summary.

>
> Christopher
>
>  
>  -Takahiro Akashi


-- 
Alex Bennée

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [virtio-dev] Re: Enabling hypervisor agnosticism for VirtIO backends
       [not found]             ` <alpine.DEB.2.21.2109131615570.10523@sstabellini-ThinkPad-T480s>
@ 2021-09-14 14:25               ` Alex Bennée
  0 siblings, 0 replies; 19+ messages in thread
From: Alex Bennée @ 2021-09-14 14:25 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: AKASHI Takahiro, Stefan Hajnoczi, Stefano Stabellini,
	Stratos Mailing List, virtio-dev, Arnd Bergmann, Viresh Kumar,
	Jan Kiszka, Carl van Schaik, pratikp, Srivatsa Vaddagiri,
	Jean-Philippe Brucker, Mathieu Poirier, Wei.Chen, olekstysh,
	Oleksandr_Tyshchenko, Bertrand.Marquis, Artem_Mygaiev, julien,
	jgross, paul, xen-devel, Elena Afanasova


Stefano Stabellini <stefano.stabellini@xilinx.com> writes:

> On Mon, 6 Sep 2021, AKASHI Takahiro wrote:
>> > the second is how many context switches are involved in a transaction.
>> > Of course with all things there is a trade off. Things involving the
>> > very tightest latency would probably opt for a bare metal backend which
>> > I think would imply hypervisor knowledge in the backend binary.
>> 
>> In configuration phase of virtio device, the latency won't be a big matter.
>> In device operations (i.e. read/write to block devices), if we can
>> resolve 'mmap' issue, as Oleksandr is proposing right now, the only issue is
>> how efficiently we can deliver notification to the opposite side. Right?
>> And this is a very common problem whatever approach we would take.
>> 
>> Anyhow, if we do care the latency in my approach, most of virtio-proxy-
>> related code can be re-implemented just as a stub (or shim?) library
>> since the protocols are defined as RPCs.
>> In this case, however, we would lose the benefit of providing "single binary"
>> BE.
>> (I know this is is an arguable requirement, though.)
>
> In my experience, latency, performance, and security are far more
> important than providing a single binary.
>
> In my opinion, we should optimize for the best performance and security,
> then be practical on the topic of hypervisor agnosticism. For instance,
> a shared source with a small hypervisor-specific component, with one
> implementation of the small component for each hypervisor, would provide
> a good enough hypervisor abstraction. It is good to be hypervisor
> agnostic, but I wouldn't go extra lengths to have a single binary.

I agree it shouldn't be a primary goal although a single binary working
with helpers to bridge the gap would make a cool demo. The real aim of
agnosticism is avoid having multiple implementations of the backend
itself for no other reason than a change in hypervisor.

> I cannot picture a case where a BE binary needs to be moved between
> different hypervisors and a recompilation is impossible (BE, not FE).
> Instead, I can definitely imagine detailed requirements on IRQ latency
> having to be lower than 10us or bandwidth higher than 500 MB/sec.
>
> Instead of virtio-proxy, my suggestion is to work together on a common
> project and common source with others interested in the same problem.
>
> I would pick something like kvmtool as a basis. It doesn't have to be
> kvmtools, and kvmtools specifically is GPL-licensed, which is
> unfortunate because it would help if the license was BSD-style for ease
> of integration with Zephyr and other RTOSes.

This does imply making some choices, especially the implementation
language. However I feel that C is really the lowest common denominator
here and I get the sense that people would rather avoid it if they could
given the potential security implications of a bug prone back end. This
is what is prompting interest in Rust.

> As long as the project is open to working together on multiple
> hypervisors and deployment models then it is fine. For instance, the
> shared source could be based on OpenAMP kvmtool [1] (the original
> kvmtool likely prefers to stay small and narrow-focused on KVM). OpenAMP
> kvmtool was created to add support for hypervisor-less virtio but they
> are very open to hypervisors too. It could be a good place to add a Xen
> implementation, a KVM fatqueue implementation, a Jailhouse
> implementation, etc. -- work together toward the common goal of a single
> BE source (not binary) supporting multiple different deployment models.
>
>
> [1] https://github.com/OpenAMP/kvmtool


-- 
Alex Bennée

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [virtio-dev] Enabling hypervisor agnosticism for VirtIO backends
       [not found]       ` <20210823012029.GB40863@laputa>
@ 2021-10-04 11:33         ` Matias Ezequiel Vara Larsen
  0 siblings, 0 replies; 19+ messages in thread
From: Matias Ezequiel Vara Larsen @ 2021-10-04 11:33 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: Alex Benn??e, Stratos Mailing List, virtio-dev, Arnd Bergmann,
	Viresh Kumar, Stefano Stabellini, stefanha, Jan Kiszka,
	Carl van Schaik, pratikp, Srivatsa Vaddagiri,
	Jean-Philippe Brucker, Mathieu Poirier

On Mon, Aug 23, 2021 at 10:20:29AM +0900, AKASHI Takahiro wrote:
> Hi Matias,
> 
> On Sat, Aug 21, 2021 at 04:08:20PM +0200, Matias Ezequiel Vara Larsen wrote:
> > Hello,
> > 
> > On Fri, Aug 20, 2021 at 03:05:58PM +0900, AKASHI Takahiro wrote:
> > > Hi Matias,
> > > 
> > > On Thu, Aug 19, 2021 at 11:11:55AM +0200, Matias Ezequiel Vara Larsen wrote:
> > > > Hello Alex,
> > > > 
> > > > I can tell you my experience from working on a PoC (library) 
> > > > to allow the implementation of virtio-devices that are hypervisor/OS agnostic. 
> > > 
> > > What hypervisor are you using for your PoC here?
> > > 
> > 
> > I am using an in-house hypervisor, which is similar to Jailhouse.
> > 
> > > > I focused on two use cases:
> > > > 1. type-I hypervisor in which the backend is running as a VM. This
> > > > is an in-house hypervisor that does not support VMExits.
> > > > 2. Linux user-space. In this case, the library is just used to
> > > > communicate threads. The goal of this use case is merely testing.
> > > > 
> > > > I have chosen virtio-mmio as the way to exchange information
> > > > between the frontend and backend. I found it hard to synchronize the
> > > > access to the virtio-mmio layout without VMExits. I had to add some extra bits to allow 
> > > 
> > > Can you explain how MMIOs to registers in virito-mmio layout
> > > (which I think means a configuration space?) will be propagated to BE?
> > > 
> > 
> > In this PoC, the BE guest is created with a fixed number of regions
> > of memory that represents each device. The BE initializes these regions, and then, waits
> > for the FEs to begin the initialization. 
> 
> Let me ask you in another way; When FE tries to write a register
> in configuration space, say QueueSel, how is BE notified of this event?
> 
In my PoC, it is never notified when FE writes to a register. For example, the QueueSel is only used in one of the
steps of the device status configuration. The BE is only notified when the
FE is in that step. When the FE is setting up the vrings, it sets the address, set the QueueSel, and 
then blocks until the BE can get the values. The BE gets the values and resumes the FE, which moves to the next step. 

> > > > the front-end and back-end to synchronize, which is required
> > > > during the device-status initialization. These extra bits would not be 
> > > > needed in case the hypervisor supports VMExits, e.g., KVM.
> > > > 
> > > > Each guest has a memory region that is shared with the backend. 
> > > > This memory region is used by the frontend to allocate the io-buffers. This region also 
> > > > maps the virtio-mmio layout that is initialized by the backend. For the moment, this region 
> > > > is defined when the guest is created. One limitation is that the memory for io-buffers is fixed. 
> > > 
> > > So in summary, you have a single memory region that is used
> > > for virtio-mmio layout and io-buffers (I think they are for payload)
> > > and you assume that the region will be (at lease for now) statically
> > > shared between FE and BE so that you can eliminate 'mmap' at every
> > > time to access the payload.
> > > Correct?
> > >
> > 
> > Yes, It is. 
> > 
> > > If so, it can be an alternative solution for memory access issue,
> > > and a similar technique is used in some implementations:
> > > - (Jailhouse's) ivshmem
> > > - Arnd's fat virtqueue
> > >
> > > In either case, however, you will have to allocate payload from the region
> > > and so you will see some impact on FE code (at least at some low level).
> > > (In ivshmem, dma_ops in the kernel is defined for this purpose.)
> > > Correct?
> > 
> > Yes, It is. The FE implements a sort of malloc() to organize the allocation of io-buffers from that
> > memory region.
> > 
> > Rethinking about the VMExits, I am not sure how this mechanism may be used when both the FE and 
> > the BE are VMs. The use of VMExits may require to involve the hypervisor.
> 
> Maybe I misunderstand something. Are FE/BE not VMs in your PoC?
> 

Yes, both are VMs. I meant, in case that both are VMs AND a VMExit
mechanism is used, such a mechanism would require the hypervisor to
forward the traps. In my PoC, both are VMs BUT there is not a VMExit
mechanism.

Matias
> -Takahiro Akashi
> 
> > Matias
> > > 
> > > -Takahiro Akashi
> > > 
> > > > At some point, the guest shall be able to balloon this region. Notifications between 
> > > > the frontend and the backend are implemented by using an hypercall. The hypercall 
> > > > mechanism and the memory allocation are abstracted away by a platform layer that 
> > > > exposes an interface that is hypervisor/os agnostic.
> > > > 
> > > > I split the backend into a virtio-device driver and a
> > > > backend driver. The virtio-device driver is the virtqueues and the
> > > > backend driver gets packets from the virtqueue for
> > > > post-processing. For example, in the case of virtio-net, the backend
> > > > driver would decide if the packet goes to the hardware or to another
> > > > virtio-net device. The virtio-device drivers may be
> > > > implemented in different ways like by using a single thread, multiple threads, 
> > > > or one thread for all the virtio-devices.
> > > > 
> > > > In this PoC, I just tackled two very simple use-cases. These
> > > > use-cases allowed me to extract some requirements for an hypervisor to
> > > > support virtio.
> > > > 
> > > > Matias
> > > > 
> > > > On Wed, Aug 04, 2021 at 10:04:30AM +0100, Alex Bennée wrote:
> > > > > Hi,
> > > > > 
> > > > > One of the goals of Project Stratos is to enable hypervisor agnostic
> > > > > backends so we can enable as much re-use of code as possible and avoid
> > > > > repeating ourselves. This is the flip side of the front end where
> > > > > multiple front-end implementations are required - one per OS, assuming
> > > > > you don't just want Linux guests. The resultant guests are trivially
> > > > > movable between hypervisors modulo any abstracted paravirt type
> > > > > interfaces.
> > > > > 
> > > > > In my original thumb nail sketch of a solution I envisioned vhost-user
> > > > > daemons running in a broadly POSIX like environment. The interface to
> > > > > the daemon is fairly simple requiring only some mapped memory and some
> > > > > sort of signalling for events (on Linux this is eventfd). The idea was a
> > > > > stub binary would be responsible for any hypervisor specific setup and
> > > > > then launch a common binary to deal with the actual virtqueue requests
> > > > > themselves.
> > > > > 
> > > > > Since that original sketch we've seen an expansion in the sort of ways
> > > > > backends could be created. There is interest in encapsulating backends
> > > > > in RTOSes or unikernels for solutions like SCMI. There interest in Rust
> > > > > has prompted ideas of using the trait interface to abstract differences
> > > > > away as well as the idea of bare-metal Rust backends.
> > > > > 
> > > > > We have a card (STR-12) called "Hypercall Standardisation" which
> > > > > calls for a description of the APIs needed from the hypervisor side to
> > > > > support VirtIO guests and their backends. However we are some way off
> > > > > from that at the moment as I think we need to at least demonstrate one
> > > > > portable backend before we start codifying requirements. To that end I
> > > > > want to think about what we need for a backend to function.
> > > > > 
> > > > > Configuration
> > > > > =============
> > > > > 
> > > > > In the type-2 setup this is typically fairly simple because the host
> > > > > system can orchestrate the various modules that make up the complete
> > > > > system. In the type-1 case (or even type-2 with delegated service VMs)
> > > > > we need some sort of mechanism to inform the backend VM about key
> > > > > details about the system:
> > > > > 
> > > > >   - where virt queue memory is in it's address space
> > > > >   - how it's going to receive (interrupt) and trigger (kick) events
> > > > >   - what (if any) resources the backend needs to connect to
> > > > > 
> > > > > Obviously you can elide over configuration issues by having static
> > > > > configurations and baking the assumptions into your guest images however
> > > > > this isn't scalable in the long term. The obvious solution seems to be
> > > > > extending a subset of Device Tree data to user space but perhaps there
> > > > > are other approaches?
> > > > > 
> > > > > Before any virtio transactions can take place the appropriate memory
> > > > > mappings need to be made between the FE guest and the BE guest.
> > > > > Currently the whole of the FE guests address space needs to be visible
> > > > > to whatever is serving the virtio requests. I can envision 3 approaches:
> > > > > 
> > > > >  * BE guest boots with memory already mapped
> > > > > 
> > > > >  This would entail the guest OS knowing where in it's Guest Physical
> > > > >  Address space is already taken up and avoiding clashing. I would assume
> > > > >  in this case you would want a standard interface to userspace to then
> > > > >  make that address space visible to the backend daemon.
> > > > > 
> > > > >  * BE guests boots with a hypervisor handle to memory
> > > > > 
> > > > >  The BE guest is then free to map the FE's memory to where it wants in
> > > > >  the BE's guest physical address space. To activate the mapping will
> > > > >  require some sort of hypercall to the hypervisor. I can see two options
> > > > >  at this point:
> > > > > 
> > > > >   - expose the handle to userspace for daemon/helper to trigger the
> > > > >     mapping via existing hypercall interfaces. If using a helper you
> > > > >     would have a hypervisor specific one to avoid the daemon having to
> > > > >     care too much about the details or push that complexity into a
> > > > >     compile time option for the daemon which would result in different
> > > > >     binaries although a common source base.
> > > > > 
> > > > >   - expose a new kernel ABI to abstract the hypercall differences away
> > > > >     in the guest kernel. In this case the userspace would essentially
> > > > >     ask for an abstract "map guest N memory to userspace ptr" and let
> > > > >     the kernel deal with the different hypercall interfaces. This of
> > > > >     course assumes the majority of BE guests would be Linux kernels and
> > > > >     leaves the bare-metal/unikernel approaches to their own devices.
> > > > > 
> > > > > Operation
> > > > > =========
> > > > > 
> > > > > The core of the operation of VirtIO is fairly simple. Once the
> > > > > vhost-user feature negotiation is done it's a case of receiving update
> > > > > events and parsing the resultant virt queue for data. The vhost-user
> > > > > specification handles a bunch of setup before that point, mostly to
> > > > > detail where the virt queues are set up FD's for memory and event
> > > > > communication. This is where the envisioned stub process would be
> > > > > responsible for getting the daemon up and ready to run. This is
> > > > > currently done inside a big VMM like QEMU but I suspect a modern
> > > > > approach would be to use the rust-vmm vhost crate. It would then either
> > > > > communicate with the kernel's abstracted ABI or be re-targeted as a
> > > > > build option for the various hypervisors.
> > > > > 
> > > > > One question is how to best handle notification and kicks. The existing
> > > > > vhost-user framework uses eventfd to signal the daemon (although QEMU
> > > > > is quite capable of simulating them when you use TCG). Xen has it's own
> > > > > IOREQ mechanism. However latency is an important factor and having
> > > > > events go through the stub would add quite a lot.
> > > > > 
> > > > > Could we consider the kernel internally converting IOREQ messages from
> > > > > the Xen hypervisor to eventfd events? Would this scale with other kernel
> > > > > hypercall interfaces?
> > > > > 
> > > > > So any thoughts on what directions are worth experimenting with?
> > > > > 
> > > > > -- 
> > > > > Alex Bennée
> > > > > 
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-10-04 11:35 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-08-04  9:04 [virtio-dev] Enabling hypervisor agnosticism for VirtIO backends Alex Bennée
2021-08-05 15:48 ` [virtio-dev] " Stefan Hajnoczi
     [not found] ` <alpine.DEB.2.21.2108041055390.9768@sstabellini-ThinkPad-T480s>
2021-08-17 10:41   ` Stefan Hajnoczi
     [not found]     ` <20210823062500.GC40863@laputa>
2021-08-23  9:58       ` Stefan Hajnoczi
     [not found]         ` <20210825102945.GA89209@laputa>
2021-08-25 15:02           ` Stefan Hajnoczi
2021-09-01 12:53     ` Alex Bennée
2021-09-02  9:12       ` Stefan Hajnoczi
     [not found]       ` <20210903080609.GD47953@laputa>
2021-09-03  9:28         ` Alex Bennée
     [not found]           ` <20210906022356.GD40187@laputa>
2021-09-07  2:41             ` [virtio-dev] Re: [Stratos-dev] " Christopher Clark
2021-09-10  9:35               ` Alex Bennée
     [not found]             ` <alpine.DEB.2.21.2109131615570.10523@sstabellini-ThinkPad-T480s>
2021-09-14 14:25               ` [virtio-dev] " Alex Bennée
     [not found]   ` <20210811062748.GB54169@laputa>
     [not found]     ` <CAPD2p-mMeY=MDbAdLGrmmioSkJo147aMDrK=Qzr=PCa4jztGGg@mail.gmail.com>
     [not found]       ` <DB9PR08MB685767CFAA4A8BCE7D2225A89EFD9@DB9PR08MB6857.eurprd08.prod.outlook.com>
     [not found]         ` <20210817080757.GC43203@laputa>
     [not found]           ` <DB9PR08MB6857C656472153A42FB438C49EFE9@DB9PR08MB6857.eurprd08.prod.outlook.com>
     [not found]             ` <20210818053840.GE39588@laputa>
     [not found]               ` <DB9PR08MB6857D1BE810B1D1DAF7B12AE9EFF9@DB9PR08MB6857.eurprd08.prod.outlook.com>
     [not found]                 ` <20210820064150.GC13452@laputa>
     [not found]                   ` <20210826094047.GA55218@laputa>
     [not found]                     ` <DB9PR08MB68578198FF352EDC473D619E9EC79@DB9PR08MB6857.eurprd08.prod.outlook.com>
     [not found]                       ` <CACMJ4GbmNgbB5ponYt3NGEk3j6YCksot+kDy2qs8HMdFXWnQbw@mail.gmail.com>
2021-08-30 19:53                         ` Christopher Clark
     [not found]                           ` <20210902071902.GC71098@laputa>
2021-09-07  0:57                             ` Christopher Clark
     [not found]                               ` <20210907115501.GC49004@laputa>
2021-09-07 18:09                                 ` Christopher Clark
     [not found]   ` <0100017b33e585a5-06d4248e-b1a7-485e-800c-7ead89e5f916-000000@email.amazonses.com>
     [not found]     ` <CAHFG_=WKjJ1riKtaWC8jm13shc3RtVsNNqd3j9WD9Fq0NeRS2Q@mail.gmail.com>
     [not found]       ` <20210813051038.GA77540@laputa>
2021-09-01  8:57         ` [virtio-dev] Re: [Stratos-dev] " Alex Bennée
2021-08-19  9:11 ` [virtio-dev] " Matias Ezequiel Vara Larsen
     [not found]   ` <20210820060558.GB13452@laputa>
2021-08-21 14:08     ` Matias Ezequiel Vara Larsen
     [not found]       ` <20210823012029.GB40863@laputa>
2021-10-04 11:33         ` Matias Ezequiel Vara Larsen
2021-09-01  8:43   ` Alex Bennée

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox