Re: [PATCH 3/3] util/userfaultfd: Support /dev/userfaultfd

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Peter Xu <peterx@redhat.com>,
	qemu-devel@nongnu.org,
	Leonardo Bras Soares Passos <lsoaresp@redhat.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	Juan Quintela <quintela@redhat.com>
Subject: Re: [PATCH 3/3] util/userfaultfd: Support /dev/userfaultfd
Date: Thu, 26 Jan 2023 09:05:01 +0000	[thread overview]
Message-ID: <Y9JCPTHLuLKwz2Ge@redhat.com> (raw)
In-Reply-To: <Y9JBkR5xDHZEAN6p@redhat.com>

On Thu, Jan 26, 2023 at 09:02:09AM +0000, Daniel P. Berrangé wrote:
> On Wed, Jan 25, 2023 at 05:40:16PM -0500, Peter Xu wrote:
> > Teach QEMU to use /dev/userfaultfd when it existed and fallback to the
> > system call if either it's not there or doesn't have enough permission.
> > 
> > Firstly, as long as the app has permission to access /dev/userfaultfd, it
> > always have the ability to trap kernel faults which QEMU mostly wants.
> > Meanwhile, in some context (e.g. containers) the userfaultfd syscall can be
> > forbidden, so it can be the major way to use postcopy in a restricted
> > environment with strict seccomp setup.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  util/trace-events  |  1 +
> >  util/userfaultfd.c | 36 ++++++++++++++++++++++++++++++++++++
> >  2 files changed, 37 insertions(+)
> > 
> > diff --git a/util/trace-events b/util/trace-events
> > index c8f53d7d9f..16f78d8fe5 100644
> > --- a/util/trace-events
> > +++ b/util/trace-events
> > @@ -93,6 +93,7 @@ qemu_vfio_region_info(const char *desc, uint64_t region_ofs, uint64_t region_siz
> >  qemu_vfio_pci_map_bar(int index, uint64_t region_ofs, uint64_t region_size, int ofs, void *host) "map region bar#%d addr 0x%"PRIx64" size 0x%"PRIx64" ofs 0x%x host %p"
> >  
> >  #userfaultfd.c
> > +uffd_detect_open_mode(int mode) "%d"
> >  uffd_query_features_nosys(int err) "errno: %i"
> >  uffd_query_features_api_failed(int err) "errno: %i"
> >  uffd_create_fd_nosys(int err) "errno: %i"
> > diff --git a/util/userfaultfd.c b/util/userfaultfd.c
> > index 9845a2ec81..360ecf8084 100644
> > --- a/util/userfaultfd.c
> > +++ b/util/userfaultfd.c
> > @@ -18,10 +18,46 @@
> >  #include <poll.h>
> >  #include <sys/syscall.h>
> >  #include <sys/ioctl.h>
> > +#include <fcntl.h>
> > +
> > +typedef enum {
> > +    UFFD_UNINITIALIZED = 0,
> > +    UFFD_USE_DEV_PATH,
> > +    UFFD_USE_SYSCALL,
> > +} uffd_open_mode;
> > +
> > +static uffd_open_mode open_mode;
> > +static int uffd_dev;
> > +
> > +static uffd_open_mode uffd_detect_open_mode(void)
> > +{
> > +    if (open_mode == UFFD_UNINITIALIZED) {
> > +        /*
> > +         * Make /dev/userfaultfd the default approach because it has better
> > +         * permission controls, meanwhile allows kernel faults without any
> > +         * privilege requirement (e.g. SYS_CAP_PTRACE).
> > +         */
> > +        uffd_dev = open("/dev/userfaultfd", O_RDWR | O_CLOEXEC);
> 
> qemu_open(), otherwise FD passing from the mgmt app won't work.
> 
> > +        if (uffd_dev >= 0) {
> > +            open_mode = UFFD_USE_DEV_PATH;
> > +        } else {
> > +            /* Fallback to the system call */
> > +            open_mode = UFFD_USE_SYSCALL;
> > +        }
> > +        trace_uffd_detect_open_mode(open_mode);
> > +    }
> > +
> > +    return open_mode;
> > +}
> 
> This leaves the /dev/userfaultfd FD open forever once it has been used
> once. Is this really needed ? IIUC, the place where we call this is
> not going to be impacted if we open + close it every time we need to
> create a new FD, and it'll simplify this code right down.

Having said that, if we want to support passing the FD in from the
mgmt app, we need to keep it open persistently.

> 
> >  
> >  int uffd_open(int flags)
> >  {
> >  #if defined(__linux__) && defined(__NR_userfaultfd)
> > +    if (uffd_detect_open_mode() == UFFD_USE_DEV_PATH) {
> > +        assert(uffd_dev >= 0);
> > +        return ioctl(uffd_dev, USERFAULTFD_IOC_NEW, flags);
> > +    }
> > +
> >      return syscall(__NR_userfaultfd, flags);
> >  #else
> >      return -EINVAL;
> > -- 
> > 2.37.3
> > 
> > 
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

next prev parent reply	other threads:[~2023-01-26  9:05 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-25 22:40 [PATCH 0/3] util/userfaultfd: Support /dev/userfaultfd Peter Xu
2023-01-25 22:40 ` [PATCH 1/3] linux-headers: Update to v6.1 Peter Xu
2023-01-25 22:40 ` [PATCH 2/3] util/userfaultfd: Add uffd_open() Peter Xu
2023-01-25 23:04   ` Philippe Mathieu-Daudé
2023-01-26 15:58     ` Peter Xu
2023-01-25 22:40 ` [PATCH 3/3] util/userfaultfd: Support /dev/userfaultfd Peter Xu
2023-01-25 23:08   ` Philippe Mathieu-Daudé
2023-01-26 17:33     ` Peter Xu
2023-01-26  9:02   ` Daniel P. Berrangé
2023-01-26  9:05     ` Daniel P. Berrangé [this message]
2023-01-26 20:03       ` Peter Xu
2023-01-26 14:13 ` [PATCH 0/3] " Michal Prívozník
2023-01-26 14:15   ` Dr. David Alan Gilbert
2023-01-26 15:25     ` Peter Xu
2023-01-26 15:29       ` Michal Prívozník
2023-01-26 15:49         ` Peter Xu
2023-01-26 15:59       ` Daniel P. Berrangé
2023-01-26 17:26         ` Peter Xu
2023-01-31 19:48           ` Peter Xu
2023-01-31 20:06             ` Daniel P. Berrangé
2023-01-31 21:01               ` Peter Xu
2023-02-01  7:55                 ` Michal Prívozník
2023-02-01 14:58                   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y9JCPTHLuLKwz2Ge@redhat.com \
    --to=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=lsoaresp@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).