All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Juan Quintela <quintela@redhat.com>
Cc: qemu-devel@nongnu.org,
	"Leonardo Bras Soares Passos" <lsoaresp@redhat.com>,
	"Michal Prívozník" <mprivozn@redhat.com>,
	"Daniel P . Berrangé" <berrange@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [PATCH v2 3/3] util/userfaultfd: Support /dev/userfaultfd
Date: Thu, 2 Feb 2023 15:41:08 -0500	[thread overview]
Message-ID: <Y9wf5AI4xmHhNCTM@x1n> (raw)
In-Reply-To: <87cz6stk4a.fsf@secure.mitica>

On Thu, Feb 02, 2023 at 11:52:21AM +0100, Juan Quintela wrote:
> Peter Xu <peterx@redhat.com> wrote:
> > Teach QEMU to use /dev/userfaultfd when it existed and fallback to the
> > system call if either it's not there or doesn't have enough permission.
> >
> > Firstly, as long as the app has permission to access /dev/userfaultfd, it
> > always have the ability to trap kernel faults which QEMU mostly wants.
> > Meanwhile, in some context (e.g. containers) the userfaultfd syscall can be
> > forbidden, so it can be the major way to use postcopy in a restricted
> > environment with strict seccomp setup.
> >
> > Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> 
> Hi

Hi, Juan,

> 
> Can we change this code to not use the global variable.
> 
> > ---
> >  util/trace-events  |  1 +
> >  util/userfaultfd.c | 37 +++++++++++++++++++++++++++++++++++++
> >  2 files changed, 38 insertions(+)
> >
> > diff --git a/util/trace-events b/util/trace-events
> > index c8f53d7d9f..16f78d8fe5 100644
> > --- a/util/trace-events
> > +++ b/util/trace-events
> > @@ -93,6 +93,7 @@ qemu_vfio_region_info(const char *desc, uint64_t region_ofs, uint64_t region_siz
> >  qemu_vfio_pci_map_bar(int index, uint64_t region_ofs, uint64_t region_size, int ofs, void *host) "map region bar#%d addr 0x%"PRIx64" size 0x%"PRIx64" ofs 0x%x host %p"
> >  
> >  #userfaultfd.c
> > +uffd_detect_open_mode(int mode) "%d"
> >  uffd_query_features_nosys(int err) "errno: %i"
> >  uffd_query_features_api_failed(int err) "errno: %i"
> >  uffd_create_fd_nosys(int err) "errno: %i"
> > diff --git a/util/userfaultfd.c b/util/userfaultfd.c
> > index 9845a2ec81..7dceab51d6 100644
> > --- a/util/userfaultfd.c
> > +++ b/util/userfaultfd.c
> > @@ -18,10 +18,47 @@
> >  #include <poll.h>
> >  #include <sys/syscall.h>
> >  #include <sys/ioctl.h>
> > +#include <fcntl.h>
> > +
> > +typedef enum {
> > +    UFFD_UNINITIALIZED = 0,
> > +    UFFD_USE_DEV_PATH,
> > +    UFFD_USE_SYSCALL,
> > +} uffd_open_mode;
> > +
> > +static int uffd_dev;
> > +
> > +static uffd_open_mode uffd_detect_open_mode(void)
> > +{
> > +    static uffd_open_mode open_mode;
> > +
> > +    if (open_mode == UFFD_UNINITIALIZED) {
> > +        /*
> > +         * Make /dev/userfaultfd the default approach because it has better
> > +         * permission controls, meanwhile allows kernel faults without any
> > +         * privilege requirement (e.g. SYS_CAP_PTRACE).
> > +         */
> > +        uffd_dev = open("/dev/userfaultfd", O_RDWR | O_CLOEXEC);
> > +        if (uffd_dev >= 0) {
> > +            open_mode = UFFD_USE_DEV_PATH;
> > +        } else {
> > +            /* Fallback to the system call */
> > +            open_mode = UFFD_USE_SYSCALL;
> > +        }
> > +        trace_uffd_detect_open_mode(open_mode);
> > +    }
> > +
> > +    return open_mode;
> > +}
> >  
> >  int uffd_open(int flags)
> >  {
> >  #if defined(__linux__) && defined(__NR_userfaultfd)
> > +    if (uffd_detect_open_mode() == UFFD_USE_DEV_PATH) {
> > +        assert(uffd_dev >= 0);
> > +        return ioctl(uffd_dev, USERFAULTFD_IOC_NEW, flags);
> > +    }
> > +
> >      return syscall(__NR_userfaultfd, flags);
> >  #else
> >      return -EINVAL;
> 
> static int open_userfaultd(void)
> {
>     /*
>      * Make /dev/userfaultfd the default approach because it has better
>      * permission controls, meanwhile allows kernel faults without any
>      * privilege requirement (e.g. SYS_CAP_PTRACE).
>      */
>      int uffd = open("/dev/userfaultfd", O_RDWR | O_CLOEXEC);
>      if (uffd >= 0) {
>             return uffd;
>      }
>      return -1;
> }
> 
> int uffd_open(int flags)
> {
> #if defined(__linux__) && defined(__NR_userfaultfd)
>     static int uffd = -2;
>     if (uffd == -2) {
>         uffd = open_userfaultd();
>     }
>     if (uffd >= 0) {
>         return ioctl(uffd, USERFAULTFD_IOC_NEW, flags);
>     }
>     return syscall(__NR_userfaultfd, flags);
> #else
>      return -EINVAL;
> 
> 27 lines vs 42
> 
> No need for enum type
> No need for global variable
> 
> What do you think?

Yes, as I used to reply to Phil I think it can be simplified.  I did this
major for (1) better readability, and (2) being crystal clear on which way
we used to open /dev/userfaultfd, then guarantee we're keeping using it. so
at least I prefer keeping things like trace_uffd_detect_open_mode().

I also plan to add another mode when fd-mode is there even if it'll reuse
the same USERFAULTFD_IOC_NEW; they can be useful information when a failure
happens.

Though if you insist, I can switch to the simple version too.

-- 
Peter Xu



  reply	other threads:[~2023-02-02 20:42 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-01 21:10 [PATCH v2 0/3] util/userfaultfd: Support /dev/userfaultfd Peter Xu
2023-02-01 21:10 ` [PATCH v2 1/3] linux-headers: Update to v6.1 Peter Xu
2023-02-02 10:53   ` Juan Quintela
2023-02-02 19:49     ` Peter Xu
2023-02-01 21:10 ` [PATCH v2 2/3] util/userfaultfd: Add uffd_open() Peter Xu
2023-02-02 10:27   ` Juan Quintela
2023-02-01 21:10 ` [PATCH v2 3/3] util/userfaultfd: Support /dev/userfaultfd Peter Xu
2023-02-02 10:52   ` Juan Quintela
2023-02-02 20:41     ` Peter Xu [this message]
2023-02-03 21:01       ` Juan Quintela
2023-02-06 21:31         ` Peter Xu
2023-02-07  0:11           ` Juan Quintela

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y9wf5AI4xmHhNCTM@x1n \
    --to=peterx@redhat.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=lsoaresp@redhat.com \
    --cc=mprivozn@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.