From: Juan Quintela <quintela@redhat.com>
To: qemu-devel@nongnu.org
Cc: "Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Cornelia Huck" <cohuck@redhat.com>,
"Thomas Huth" <thuth@redhat.com>,
"Marc-André Lureau" <marcandre.lureau@redhat.com>,
"Juan Quintela" <quintela@redhat.com>,
"Daniel P. Berrangé" <berrange@redhat.com>,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
kvm@vger.kernel.org, "Michael S. Tsirkin" <mst@redhat.com>,
"Peter Xu" <peterx@redhat.com>
Subject: [PULL 05/22] util/userfaultfd: Support /dev/userfaultfd
Date: Mon, 13 Feb 2023 03:51:33 +0100 [thread overview]
Message-ID: <20230213025150.71537-6-quintela@redhat.com> (raw)
In-Reply-To: <20230213025150.71537-1-quintela@redhat.com>
From: Peter Xu <peterx@redhat.com>
Teach QEMU to use /dev/userfaultfd when it existed and fallback to the
system call if either it's not there or doesn't have enough permission.
Firstly, as long as the app has permission to access /dev/userfaultfd, it
always have the ability to trap kernel faults which QEMU mostly wants.
Meanwhile, in some context (e.g. containers) the userfaultfd syscall can be
forbidden, so it can be the major way to use postcopy in a restricted
environment with strict seccomp setup.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
util/userfaultfd.c | 32 ++++++++++++++++++++++++++++++++
util/trace-events | 1 +
2 files changed, 33 insertions(+)
diff --git a/util/userfaultfd.c b/util/userfaultfd.c
index 4953b3137d..fdff4867e8 100644
--- a/util/userfaultfd.c
+++ b/util/userfaultfd.c
@@ -18,10 +18,42 @@
#include <poll.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>
+#include <fcntl.h>
+
+typedef enum {
+ UFFD_UNINITIALIZED = 0,
+ UFFD_USE_DEV_PATH,
+ UFFD_USE_SYSCALL,
+} uffd_open_mode;
int uffd_open(int flags)
{
#if defined(__NR_userfaultfd)
+ static uffd_open_mode open_mode;
+ static int uffd_dev;
+
+ /* Detect how to generate uffd desc when run the 1st time */
+ if (open_mode == UFFD_UNINITIALIZED) {
+ /*
+ * Make /dev/userfaultfd the default approach because it has better
+ * permission controls, meanwhile allows kernel faults without any
+ * privilege requirement (e.g. SYS_CAP_PTRACE).
+ */
+ uffd_dev = open("/dev/userfaultfd", O_RDWR | O_CLOEXEC);
+ if (uffd_dev >= 0) {
+ open_mode = UFFD_USE_DEV_PATH;
+ } else {
+ /* Fallback to the system call */
+ open_mode = UFFD_USE_SYSCALL;
+ }
+ trace_uffd_detect_open_mode(open_mode);
+ }
+
+ if (open_mode == UFFD_USE_DEV_PATH) {
+ assert(uffd_dev >= 0);
+ return ioctl(uffd_dev, USERFAULTFD_IOC_NEW, flags);
+ }
+
return syscall(__NR_userfaultfd, flags);
#else
return -EINVAL;
diff --git a/util/trace-events b/util/trace-events
index c8f53d7d9f..16f78d8fe5 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -93,6 +93,7 @@ qemu_vfio_region_info(const char *desc, uint64_t region_ofs, uint64_t region_siz
qemu_vfio_pci_map_bar(int index, uint64_t region_ofs, uint64_t region_size, int ofs, void *host) "map region bar#%d addr 0x%"PRIx64" size 0x%"PRIx64" ofs 0x%x host %p"
#userfaultfd.c
+uffd_detect_open_mode(int mode) "%d"
uffd_query_features_nosys(int err) "errno: %i"
uffd_query_features_api_failed(int err) "errno: %i"
uffd_create_fd_nosys(int err) "errno: %i"
--
2.39.1
next prev parent reply other threads:[~2023-02-13 2:54 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-13 2:51 [PULL 00/22] Migration 20230213 patches Juan Quintela
2023-02-13 2:51 ` [PULL 01/22] migration: Remove spurious files Juan Quintela
2023-02-13 2:51 ` [PULL 02/22] multifd: cleanup the function multifd_channel_connect Juan Quintela
2023-02-13 2:51 ` [PULL 03/22] multifd: Remove some redundant code Juan Quintela
2023-02-13 2:51 ` [PULL 04/22] linux-headers: Update to v6.1 Juan Quintela
2023-02-13 2:51 ` Juan Quintela [this message]
2023-02-13 2:51 ` [PULL 06/22] migration: Simplify ram_find_and_save_block() Juan Quintela
2023-02-13 2:51 ` [PULL 07/22] migration: Make find_dirty_block() return a single parameter Juan Quintela
2023-02-13 2:51 ` [PULL 08/22] migration: Split ram_bytes_total_common() in two functions Juan Quintela
2023-02-13 2:51 ` [PULL 09/22] migration: Calculate ram size once Juan Quintela
2023-02-13 2:51 ` [PULL 10/22] migration: Make ram_save_target_page() a pointer Juan Quintela
2023-02-13 2:51 ` [PULL 11/22] migration: I messed state_pending_exact/estimate Juan Quintela
2023-02-13 2:51 ` [PULL 12/22] AVX512 support for xbzrle_encode_buffer Juan Quintela
2023-02-13 2:51 ` [PULL 13/22] Update bench-code for addressing CI problem Juan Quintela
2023-02-13 2:51 ` [PULL 14/22] migration: Rework multi-channel checks on URI Juan Quintela
2023-02-13 2:51 ` [PULL 15/22] migration: Cleanup postcopy_preempt_setup() Juan Quintela
2023-02-13 2:51 ` [PULL 16/22] migration: Add a semaphore to count PONGs Juan Quintela
2023-02-13 2:51 ` [PULL 17/22] migration: Postpone postcopy preempt channel to be after main Juan Quintela
2023-02-13 2:51 ` [PULL 18/22] migration/multifd: Change multifd_load_cleanup() signature and usage Juan Quintela
2023-02-13 2:51 ` [PULL 19/22] migration/multifd: Remove unnecessary assignment on multifd_load_cleanup() Juan Quintela
2023-02-13 2:51 ` [PULL 20/22] migration/multifd: Join all multifd threads in order to avoid leaks Juan Quintela
2023-02-13 2:51 ` [PULL 21/22] migration/multifd: Move load_cleanup inside incoming_state_destroy Juan Quintela
2023-02-13 2:51 ` [PULL 22/22] ram: Document migration ram flags Juan Quintela
2023-02-13 14:17 ` [PULL 00/22] Migration 20230213 patches Peter Maydell
-- strict thread matches above, loose matches on Subject: below --
2023-02-13 2:28 Xxx Xx
2023-02-13 2:28 ` [PULL 05/22] util/userfaultfd: Support /dev/userfaultfd Xxx Xx
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230213025150.71537-6-quintela@redhat.com \
--to=quintela@redhat.com \
--cc=berrange@redhat.com \
--cc=cohuck@redhat.com \
--cc=dgilbert@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=marcandre.lureau@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).