From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Alex Bennée" <alex.bennee@linaro.org>
Cc: virtio-dev@lists.oasis-open.org
Subject: Re: [virtio-dev] Backend libraries for VirtIO device emulation
Date: Mon, 9 Mar 2020 15:08:31 +0000 [thread overview]
Message-ID: <20200309150831.GF3088@work-vm> (raw)
In-Reply-To: <87a74pk9rc.fsf@linaro.org>
* Alex Bennée (alex.bennee@linaro.org) wrote:
>
> Dr. David Alan Gilbert <dgilbert@redhat.com> writes:
>
> > * Alex Bennée (alex.bennee@linaro.org) wrote:
> >>
> >> Dr. David Alan Gilbert <dgilbert@redhat.com> writes:
> >>
> >> > * Alex Bennée (alex.bennee@linaro.org) wrote:
> >> >> Hi,
> >> >>
> >> >> So the context of my question is what sort of common software layer is
> >> >> required to implement a virtio backend entirely in userspace?
> >> >>
> >> >> Currently most virtio backends are embedded directly in various VMMs
> >> >> which emulate a number of devices as well as deal with handling devices
> >> >> that are vhost aware and link with the host kernel. However there seems
> >> >> to be a growing interest in having backends implemented in separate
> >> >> processes, potentially even hosted in other guest VMs.
> >> >>
> >> >> As far as I can tell there is a lot of duplicated effort in handling the
> >> >> low level navigation of virt queues and buffers. QEMU has code in
> >> >> hw/virtio as well as contrib/libvhost-user which is used by the recent
> >> >> virtiofsd daemon. kvm-tool has a virtio subdirectory that implements a
> >> >> similar set of functionality for it's emulation. The Rust-vmm project
> >> >> has libraries for implementing the device traits.
> >> >>
> >> >> Another aspect to this is the growing interest in carrying virtio over
> >> >> other hypervisors. I'm wondering if there is enough abstraction possible
> >> >> to have a common library that is hypervisor agnostic? Can a device
> >> >> backend be emulated purely with some shared memory and some sockets for
> >> >> passing messages/kicks from/to the VMM which then deals with the hypervisor
> >> >> specifics of the virtio-transport?
> >> >
> >> > It's a little tricky because it has to interface tightly with the way
> >> > that the memory-mapping works for the hypervisor, so that the external
> >> > process can access the memory of the queues.
> >>
> >> I suspect the problem space can at least be reduced to at least a
> >> POSIX-like environment - if that makes things simpler. The setting up of
> >> memory-mappings should be the problem of the VMM, which would possibly
> >> be hypervisor specific. After that it is simply(?) a question of sharing
> >> the appropriate bit of memory between the VMM and the device process.
> >
> > The 'simply(?)' is actually pretty tricky.
>
> Well I am at the start of this journey so may be hand waving a bit ;-)
>
> Lets drill down:
>
> > You have to share the mapping of all the RAM blocks in whcih the virtio
> > queues or the data to which they point might reside
>
> Aren't all the queues in one section of memory?
I don't think so; it's just allocated in guest RAM which can be split
into multiple blocks due to NUMA etc.
> As for where the data is doesn't this depend on the structure of the
> device. As I understand the behaviour of virtfs there is a direct
> relationship between the guest page cache and the host page cache to
> take advantage of DAX. This by definition means the backend needs access
> to the entire address space of the guest.
>
> Is this also the case for other devices?
virtiofs's DAX shared memory is a bit different from normal virtio
queues and data. Normal queues and data just live in normal guest RAM
and their location is chosen by the guest.
> > and you also have
> > to let the other process know where in Guest physical address space they
> > live. That mapping is also not constant, either with hotplug, or with
> > architecture specific things that cause physical address mapping to
> > change.
>
> It sounds like the solution here would be to have bounce buffers as part
> of the virtio spec which could be part of the virtio memory block? I
> guess another option is for guests to keep their internal data (as
> referenced by virtio drivers) in a fixed guest physical address but that
> gets real complicated quick and I suspect is harder to audit from a
> security point of view.
Bounce buffers are expensive - they're used in some things (like SEV
encrypted memory).
> >> The other model would be the device process runs inside another guest -
> >> most likely a Linux VM. Here the guest kernel can be told an area of
> >> memory is special in some way and provide a device node that can be
> >> mmaped in more or less the same way. In this configuration it can't even
> >> be aware of what the underlying hypervisor is - just a block of memory
> >> and a way to receive message queue events.
> >
> > Doing it between VMs works in my mind; but again you still need to
> > handle that mapping.
> >
> >> > QEMU's vhost-user has a fair amount of code for handling the mappings,
> >> > dirty logging for migration, iommu's and things like reset (which is
> >> > pretty hairy, and probably needs more work).
> >>
> >> I suspect all of these multi-process models just hand wave away details
> >> like migration because that really does benefit from a single process
> >> with total awareness of the state of the system.
> >
> > Vhost-user has it pretty well defined; it works - as long as the user
> > process does dirty map update. Postcopy can also be made to work.
> >
> >> That said I wonder how
> >> robust a guest can be if the device emulation may go away at any time?
> >
> > That one I've not thought too much about, but the opposite case; making
> > the separate process survive even when the guest behaves
> > badly/resets/etc is quite nasty.
>
> I guess whatever orchestrates the start-up of the VMs has to worry about
> that. Some of the models I've been looking at have very simplistic
> setups where the guest VMs described in the platform data to be spawned
> directly by the hypervisor. I guess in those cases you need to restart
> everything.
That depends; the orchestrator doesn't normally see a guest reset - even
a nasty one.
Dave
> >
> > Dave
> >
> >> I guess in virtio if you never signal the consumption of a virt-queue it
> >> will still be there waiting until you restart the emulation process and
> >> pick up from where you left off?
> >>
> >> >
> >> > Dave
> >> >
> >> >> Thoughts?
> >> >>
> >> >> --
> >> >> Alex Bennée
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> >> >> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >> >>
> >>
> >>
> >> --
> >> Alex Bennée
> >>
>
>
> --
> Alex Bennée
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
next prev parent reply other threads:[~2020-03-09 15:08 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-06 18:33 [virtio-dev] Backend libraries for VirtIO device emulation Alex Bennée
2020-03-06 19:14 ` Matias Ezequiel Vara Larsen
2020-03-06 20:34 ` Alex Bennée
2020-03-06 19:40 ` Dr. David Alan Gilbert
2020-03-06 20:24 ` Alex Bennée
2020-03-09 8:11 ` Jan Kiszka
2020-03-09 10:44 ` Dr. David Alan Gilbert
2020-03-09 12:12 ` Alex Bennée
2020-03-09 15:08 ` Dr. David Alan Gilbert [this message]
2020-03-09 15:46 ` Stefan Hajnoczi
2020-03-09 16:43 ` Alex Bennée
2020-03-11 17:24 ` Stefan Hajnoczi
2020-03-11 18:18 ` Halil Pasic
2020-03-09 17:27 ` Srivatsa Vaddagiri
2020-03-09 17:42 ` Alex Bennée
2020-03-11 17:28 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200309150831.GF3088@work-vm \
--to=dgilbert@redhat.com \
--cc=alex.bennee@linaro.org \
--cc=virtio-dev@lists.oasis-open.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox