Discussion of the implementations of VIRTIO specification
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>,
	virtio-dev@lists.oasis-open.org,
	Miklos Szeredi <mszeredi@redhat.com>,
	Sage Weil <sweil@redhat.com>, Vivek Goyal <vgoyal@redhat.com>,
	Steven Whitehouse <swhiteho@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [virtio-dev] [PATCH v3 2/2] virtio-fs: add DAX window
Date: Tue, 25 Jun 2019 10:55:15 +0100	[thread overview]
Message-ID: <20190625095515.GG3226@work-vm> (raw)
In-Reply-To: <20190624100448-mutt-send-email-mst@kernel.org>

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Mon, Jun 24, 2019 at 02:58:08PM +0100, Stefan Hajnoczi wrote:
> > On Tue, Jun 18, 2019 at 09:41:25PM -0400, Michael S. Tsirkin wrote:
> > > On Wed, Feb 20, 2019 at 12:46:13PM +0000, Stefan Hajnoczi wrote:
> > > > Describe how shared memory region ID 0 is the DAX window and how
> > > > FUSE_SETUPMAPPING maps file ranges into the window.
> > > > 
> > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > ---
> > > > Note that this depends on the shared memory resource specification
> > > > extension that David Gilbert is working on.
> > > > https://lists.oasis-open.org/archives/virtio-comment/201901/msg00000.html
> > > > 
> > > > The FUSE_SETUPMAPPING message is part of the virtio-fs Linux patches:
> > > > https://gitlab.com/virtio-fs/linux/blob/virtio-fs/include/uapi/linux/fuse.h
> > > > ---
> > > >  virtio-fs.tex | 25 +++++++++++++++++++++++++
> > > >  1 file changed, 25 insertions(+)
> > > > 
> > > > diff --git a/virtio-fs.tex b/virtio-fs.tex
> > > > index 5df5b9c..abb1e48 100644
> > > > --- a/virtio-fs.tex
> > > > +++ b/virtio-fs.tex
> > > > @@ -157,6 +157,31 @@ The driver MUST submit FUSE_INTERRUPT, FUSE_FORGET, and FUSE_BATCH_FORGET reques
> > > >  
> > > >  The driver MUST anticipate that request queues are processed concurrently with the hiprio queue.
> > > >  
> > > > +\subsubsection{Device Operation: DAX Window}\label{sec:Device Types / File System Device / Device Operation / Device Operation: DAX Window}
> > > > +
> > > > +FUSE\_READ and FUSE\_WRITE requests transfer file contents between the
> > > > +driver-provided buffer and the device.  In cases where data transfer is
> > > > +undesirable, the device can map file contents into the DAX window shared memory
> > > > +region.  The driver then accesses file contents directly in device-owned memory
> > > > +without a data transfer.
> > > > +
> > > > +Shared memory region ID 0 is called the DAX window.  The driver maps a file
> > > > +range into the DAX window using the FUSE\_SETUPMAPPING request.  The mapping is
> > > > +removed using the FUSE\_REMOVEMAPPING request.
> > > 
> > > I don't see FUSE\_SETUPMAPPING or FUSE\_REMOVEMAPPING  under
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/fuse.h
> > > Is it just me?
> > 
> > They are not upstream yet and can be found here:
> > 
> > https://gitlab.com/virtio-fs/linux/blob/virtio-fs/include/uapi/linux/fuse.h#L384
> > 
> > There is a chicken-and-egg problem.  Linux should merge this once the
> > spec has been accepted.  The spec makes reference to a new FUSE command
> > that is being added to Linux.  :D
> > 
> > I suggest we break it by merging the VIRTIO spec change first.  There
> > won't be a spec release so soon anyway and we can revert it in case
> > there are issues Linux.  Miklos, the FUSE maintainer, is well aware of
> > virtio-fs and contributes to it, so it's unlikely that Linux will reject
> > these commands.
> > 
> > > > +
> > > > +After FUSE\_SETUPMAPPING has completed successfully the file range is accessible
> > > > +from the DAX window at the offset provided by the driver in the request.
> > > 
> > > Dgilbert's patches describing shared memory say that
> > > the legal ways to set up mappings are all implementation-dependent.
> > > How does driver know which attributes to use for the
> > > mapping?
> > 
> > Two different types of mappings:
> > 1. The DAX window shared memory region described by DaveG's spec.
> > 2. The file mappings established using FUSE_SETUPMAPPING.
> > 
> > The virtio_fs.ko driver maps the DAX window, e.g. from a PCI BAR in an
> > implementation-defined way.  virtio_pci_*.c in Linux will have to help
> > out with the implementation-specific details here.
> > 
> > The only flags currently supported by FUSE_SETUPMAPPING are READ and
> > WRITE.  This depends on the file's access mode.  There is nothing
> > implementation-specific in FUSE_SETUPMAPPING.
> 
> Sorry - I'm being unclear.
> The guest driver maps parts of the PCI BAR.
> What are the attributes of this mapping?
> This is unrelated to FUSE_SETUPMAPPING things -
> mapping is created by creatig PTEs and such
> within guest, not by virtio things.

By attributes you mean... memory ordering, cachability etc?

> 
> > > Also, we recently had a discussion about DAX support on hosts
> > > and safety wrt crashes. Do we need to expose this
> > > information to guests maybe?
> > 
> > No.  Although virtio-fs uses the DAX subsystem, it does not use NVDIMM's
> > persistence model (e.g. CPU cache flush for persistence).  FUSE_FSYNC is
> > sent when persistence is required.  Therefore virtio-fs is still using
> > the traditional file/block persistence model.  No changes necessary for
> > power failure, etc.
> > 
> > > Finally, do we want to have a way to express that the filesystem
> > > only allows RO mappings?
> > 
> > Thanks for this idea.  I'm discussing it with the FUSE community because
> > mount -o ro with FUSE currently doesn't involve the file system daemon.
> > 
> > > > +
> > > > +\devicenormative{\paragraph}{Device Operation: DAX Window}{Device Types / File System Device / Device Operation / Device Operation: DAX Window}
> > > > +
> > > > +The device MUST allow mappings that completely or partially overlap existing mappings within the DAX window.
> > > 
> > > 
> > > Any alignment requirements?
> > 
> > Good point.  There are alignment requirements and the driver has no way
> > of knowing what they are.  I'll find a way to communicate them into the
> > guest, either via virtio or via FUSE.
> > 
> > > Also, with no limit on mappings, it looks like guest can use up lots of
> > > host VMAs quickly. Shouldn't there be a limit on # of mappings?
> > 
> > The VM can only deteriorate its own performance, right?
> 
> Only if QEMU is put in a container where virtual memory is
> limited.
> It's generally not a good idea where the only way for
> host to make progress is to allocate more memory
> without any limit.
> 
> If we are in a situation where we need to either kill
> the guest or hit swap, none of the choices is good.

There is a bound; it's cache region size / page size - so
that's ~1M mappings worst case (e.g. 4GB cache, 4kB page size)
That limit can be bought down if we impose a larger granularity
somewhere (and the reality is our kernel uses 2MB mapping chunks I
think).

> > We haven't seen catastrophic problems that bring the system to it's
> > knees.
> 
> Because you are not running malicious guests?

Hmm, I didn't realise a process having an excessive number of mappings
could harm any other process.

Dave

> >  But we're aware that increasing the number VMAs slows down the
> > lookup.  There is currently no imposed limit.
> > 
> > Ideas have been discussed to avoid using (so many) VMAs but it seems
> > like that will take some time to develop and get upstream.  This will
> > not affect the virtio specification because the device interface doesn't
> > need to know about this.
> > 
> > Stefan
> 
> 
> One way to address this is to expose the # of mappings
> in the config space.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


  reply	other threads:[~2019-06-25  9:55 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-20 12:46 [virtio-dev] [PATCH v3 0/2] virtio-fs: add virtio file system device Stefan Hajnoczi
2019-02-20 12:46 ` [virtio-dev] [PATCH v3 1/2] content: " Stefan Hajnoczi
2019-02-22 14:31   ` Dr. David Alan Gilbert
2019-02-25 15:54     ` Stefan Hajnoczi
2019-02-25 16:11   ` [virtio-dev] " Dr. David Alan Gilbert
2019-02-27 16:19     ` Stefan Hajnoczi
2019-06-19  1:29   ` [virtio-dev] " Michael S. Tsirkin
2019-07-23 15:58     ` Stefan Hajnoczi
2019-02-20 12:46 ` [virtio-dev] [PATCH v3 2/2] virtio-fs: add DAX window Stefan Hajnoczi
2019-06-19  1:41   ` Michael S. Tsirkin
2019-06-24 13:58     ` Stefan Hajnoczi
2019-06-24 14:10       ` Michael S. Tsirkin
2019-06-25  9:55         ` Dr. David Alan Gilbert [this message]
2019-06-27 14:09           ` Michael S. Tsirkin
2019-07-17 10:48             ` Stefan Hajnoczi
     [not found]             ` <20190717124258.GA13761@redhat.com>
2019-07-23 13:32               ` Stefan Hajnoczi
     [not found]                 ` <20190723140855.GA11628@redhat.com>
2019-07-23 14:52                   ` Stefan Hajnoczi
     [not found]                     ` <20190723155623.GA19189@redhat.com>
2019-07-24  8:33                       ` Stefan Hajnoczi
2019-06-19  1:30 ` [virtio-dev] [PATCH v3 0/2] virtio-fs: add virtio file system device Michael S. Tsirkin
2019-06-24 12:23   ` Stefan Hajnoczi
2019-06-24 13:57     ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190625095515.GG3226@work-vm \
    --to=dgilbert@redhat.com \
    --cc=mst@redhat.com \
    --cc=mszeredi@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=sweil@redhat.com \
    --cc=swhiteho@redhat.com \
    --cc=vgoyal@redhat.com \
    --cc=virtio-dev@lists.oasis-open.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox