linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups
@ 2023-09-05 21:42 Peter Xu
  2023-09-05 21:42 ` [PATCH 1/7] mm/userfaultfd: Make uffd read() wait event exclusive Peter Xu
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Peter Xu @ 2023-09-05 21:42 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Anish Moorthy, Axel Rasmussen, Alexander Viro, Mike Kravetz,
	Peter Zijlstra, Andrew Morton, Mike Rapoport, Christian Brauner,
	peterx, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
	James Houghton, Nadav Amit

Userfaultfd is the type of file that doesn't need wake-all semantics: if
there is a message enqueued (for either a fault address, or an event), we
only need to wake up one service thread to handle it.  Waking up more
normally means a waste of cpu cycles.  Besides that, and more importantly,
that just doesn't scale.

Andrea used to have one patch that made read() to be O(1) but never hit
upstream.  This is my effort to try upstreaming that (which is a
oneliner..), meanwhile on top of that I also made poll() O(1) on wakeup,
too (more or less bring EPOLLEXCLUSIVE to poll()), with some tests showing
that effect.

To verify this, I added a test called uffd-perf (leveraging the refactored
uffd selftest suite) that will measure the messaging channel latencies on
wakeups, and the waitqueue optimizations can be reflected by the new test:

        Constants: 40 uffd threads, on N_CPUS=40, memsize=512M
        Units: milliseconds (to finish the test)
        |-----------------+--------+-------+------------|
        | test case       | before | after |   diff (%) |
        |-----------------+--------+-------+------------|
        | workers=8,poll  |   1762 |  1133 | -55.516328 |
        | workers=8,read  |   1437 |   585 | -145.64103 |
        | workers=16,poll |   1117 |  1097 | -1.8231541 |
        | workers=16,read |   1159 |   759 | -52.700922 |
        | workers=32,poll |   1001 |   973 | -2.8776978 |
        | workers=32,read |    866 |   713 | -21.458626 |
        |-----------------+--------+-------+------------|

The more threads hanging on the fd_wqh, a bigger difference will be there
shown in the numbers.  "8 worker threads" is the worst case here because it
means there can be a worst case of 40-8=32 threads hanging idle on fd_wqh
queue.

In real life, workers can be more than this, but small number of active
worker threads will cause similar effect.

This is currently based on Andrew's mm-unstable branch, but assuming this
is applicable to most of the not-so-old trees.

Comments welcomed, thanks.

Andrea Arcangeli (1):
  mm/userfaultfd: Make uffd read() wait event exclusive

Peter Xu (6):
  poll: Add a poll_flags for poll_queue_proc()
  poll: POLL_ENQUEUE_EXCLUSIVE
  fs/userfaultfd: Use exclusive waitqueue for poll()
  selftests/mm: Replace uffd_read_mutex with a semaphore
  selftests/mm: Create uffd_fault_thread_create|join()
  selftests/mm: uffd perf test

 drivers/vfio/virqfd.c                    |   4 +-
 drivers/vhost/vhost.c                    |   2 +-
 drivers/virt/acrn/irqfd.c                |   2 +-
 fs/aio.c                                 |   2 +-
 fs/eventpoll.c                           |   2 +-
 fs/select.c                              |   9 +-
 fs/userfaultfd.c                         |   8 +-
 include/linux/poll.h                     |  25 ++-
 io_uring/poll.c                          |   4 +-
 mm/memcontrol.c                          |   4 +-
 net/9p/trans_fd.c                        |   3 +-
 tools/testing/selftests/mm/Makefile      |   2 +
 tools/testing/selftests/mm/uffd-common.c |  65 +++++++
 tools/testing/selftests/mm/uffd-common.h |   7 +
 tools/testing/selftests/mm/uffd-perf.c   | 207 +++++++++++++++++++++++
 tools/testing/selftests/mm/uffd-stress.c |  53 +-----
 virt/kvm/eventfd.c                       |   2 +-
 17 files changed, 337 insertions(+), 64 deletions(-)
 create mode 100644 tools/testing/selftests/mm/uffd-perf.c

-- 
2.41.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/7] mm/userfaultfd: Make uffd read() wait event exclusive
  2023-09-05 21:42 [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Peter Xu
@ 2023-09-05 21:42 ` Peter Xu
  2023-09-05 21:42 ` [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc() Peter Xu
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2023-09-05 21:42 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Anish Moorthy, Axel Rasmussen, Alexander Viro, Mike Kravetz,
	Peter Zijlstra, Andrew Morton, Mike Rapoport, Christian Brauner,
	peterx, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
	James Houghton, Nadav Amit

From: Andrea Arcangeli <aarcange@redhat.com>

When a new message is generated for an userfaultfd, instead of waking up
all the readers, we can wake up only one exclusive reader to process the
event.  Waking up >1 readers for 1 message will be a waste of resource,
where the rest readers will see nothing again and re-queue.

This should make userfaultfd read() O(1) on wakeups.

Note that queuing on head is intended (rather than tail) to make sure the
readers are waked up in LIFO fashion; fairness doesn't matter much here,
but caching does.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
[peterx: modified subjects / commit message]
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 fs/userfaultfd.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 56eaae9dac1a..f7fda7d0c994 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1061,7 +1061,11 @@ static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait,
 
 	/* always take the fd_wqh lock before the fault_pending_wqh lock */
 	spin_lock_irq(&ctx->fd_wqh.lock);
-	__add_wait_queue(&ctx->fd_wqh, &wait);
+	/*
+	 * Only wake up one exclusive reader each time there's an event.
+	 * Paired with wake_up_poll() when e.g. a new page fault msg generated.
+	 */
+	__add_wait_queue_exclusive(&ctx->fd_wqh, &wait);
 	for (;;) {
 		set_current_state(TASK_INTERRUPTIBLE);
 		spin_lock(&ctx->fault_pending_wqh.lock);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc()
  2023-09-05 21:42 [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Peter Xu
  2023-09-05 21:42 ` [PATCH 1/7] mm/userfaultfd: Make uffd read() wait event exclusive Peter Xu
@ 2023-09-05 21:42 ` Peter Xu
  2023-09-05 23:21   ` kernel test robot
                     ` (3 more replies)
  2023-09-05 21:42 ` [PATCH 3/7] poll: POLL_ENQUEUE_EXCLUSIVE Peter Xu
                   ` (5 subsequent siblings)
  7 siblings, 4 replies; 14+ messages in thread
From: Peter Xu @ 2023-09-05 21:42 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Anish Moorthy, Axel Rasmussen, Alexander Viro, Mike Kravetz,
	Peter Zijlstra, Andrew Morton, Mike Rapoport, Christian Brauner,
	peterx, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
	James Houghton, Nadav Amit

Allows the poll enqueue function to pass over a flag into it.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 drivers/vfio/virqfd.c     | 4 ++--
 drivers/vhost/vhost.c     | 2 +-
 drivers/virt/acrn/irqfd.c | 2 +-
 fs/aio.c                  | 2 +-
 fs/eventpoll.c            | 2 +-
 fs/select.c               | 4 ++--
 include/linux/poll.h      | 7 +++++--
 io_uring/poll.c           | 4 ++--
 mm/memcontrol.c           | 4 +++-
 net/9p/trans_fd.c         | 3 ++-
 virt/kvm/eventfd.c        | 2 +-
 11 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/drivers/vfio/virqfd.c b/drivers/vfio/virqfd.c
index 29c564b7a6e1..4b817a6f4f72 100644
--- a/drivers/vfio/virqfd.c
+++ b/drivers/vfio/virqfd.c
@@ -75,8 +75,8 @@ static int virqfd_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, void
 	return 0;
 }
 
-static void virqfd_ptable_queue_proc(struct file *file,
-				     wait_queue_head_t *wqh, poll_table *pt)
+static void virqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh,
+				     poll_table *pt, poll_flags flags)
 {
 	struct virqfd *virqfd = container_of(pt, struct virqfd, pt);
 	add_wait_queue(wqh, &virqfd->wait);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c71d573f1c94..02caad721843 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -162,7 +162,7 @@ static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh,
 }
 
 static int vhost_poll_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync,
-			     void *key)
+			     void *key, poll_flags flags)
 {
 	struct vhost_poll *poll = container_of(wait, struct vhost_poll, wait);
 	struct vhost_work *work = &poll->work;
diff --git a/drivers/virt/acrn/irqfd.c b/drivers/virt/acrn/irqfd.c
index d4ad211dce7a..9b79e4e76e49 100644
--- a/drivers/virt/acrn/irqfd.c
+++ b/drivers/virt/acrn/irqfd.c
@@ -94,7 +94,7 @@ static int hsm_irqfd_wakeup(wait_queue_entry_t *wait, unsigned int mode,
 }
 
 static void hsm_irqfd_poll_func(struct file *file, wait_queue_head_t *wqh,
-				poll_table *pt)
+				poll_table *pt, poll_flags flags)
 {
 	struct hsm_irqfd *irqfd;
 
diff --git a/fs/aio.c b/fs/aio.c
index a4c2a6bac72c..abb5b22f4fdf 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1823,7 +1823,7 @@ struct aio_poll_table {
 
 static void
 aio_poll_queue_proc(struct file *file, struct wait_queue_head *head,
-		struct poll_table_struct *p)
+		    struct poll_table_struct *p, poll_flags flags)
 {
 	struct aio_poll_table *pt = container_of(p, struct aio_poll_table, pt);
 
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 1d9a71a0c4c1..c74d6a083fd1 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1270,7 +1270,7 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
  * target file wakeup lists.
  */
 static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead,
-				 poll_table *pt)
+				 poll_table *pt, poll_flags flags)
 {
 	struct ep_pqueue *epq = container_of(pt, struct ep_pqueue, pt);
 	struct epitem *epi = epq->epi;
diff --git a/fs/select.c b/fs/select.c
index 0ee55af1a55c..0433448481e9 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -117,7 +117,7 @@ struct poll_table_page {
  * poll table.
  */
 static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
-		       poll_table *p);
+		       poll_table *p, poll_flags flags);
 
 void poll_initwait(struct poll_wqueues *pwq)
 {
@@ -220,7 +220,7 @@ static int pollwake(wait_queue_entry_t *wait, unsigned mode, int sync, void *key
 
 /* Add a new entry */
 static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
-				poll_table *p)
+		       poll_table *p, poll_flags flags)
 {
 	struct poll_wqueues *pwq = container_of(p, struct poll_wqueues, pt);
 	struct poll_table_entry *entry = poll_get_entry(pwq);
diff --git a/include/linux/poll.h b/include/linux/poll.h
index a9e0e1c2d1f2..cbad520fc65c 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -27,12 +27,15 @@
 
 #define DEFAULT_POLLMASK (EPOLLIN | EPOLLOUT | EPOLLRDNORM | EPOLLWRNORM)
 
+typedef unsigned int poll_flags;
+
 struct poll_table_struct;
 
 /* 
  * structures and helpers for f_op->poll implementations
  */
-typedef void (*poll_queue_proc)(struct file *, wait_queue_head_t *, struct poll_table_struct *);
+typedef void (*poll_queue_proc)(struct file *, wait_queue_head_t *,
+				struct poll_table_struct *, poll_flags);
 
 /*
  * Do not touch the structure directly, use the access functions
@@ -46,7 +49,7 @@ typedef struct poll_table_struct {
 static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_address, poll_table *p)
 {
 	if (p && p->_qproc && wait_address)
-		p->_qproc(filp, wait_address, p);
+		p->_qproc(filp, wait_address, p, 0);
 }
 
 /*
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 4c360ba8793a..c3b41e963a8d 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -533,7 +533,7 @@ static void __io_queue_proc(struct io_poll *poll, struct io_poll_table *pt,
 }
 
 static void io_poll_queue_proc(struct file *file, struct wait_queue_head *head,
-			       struct poll_table_struct *p)
+			       struct poll_table_struct *p, poll_flags flags)
 {
 	struct io_poll_table *pt = container_of(p, struct io_poll_table, pt);
 	struct io_poll *poll = io_kiocb_to_cmd(pt->req, struct io_poll);
@@ -644,7 +644,7 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
 }
 
 static void io_async_queue_proc(struct file *file, struct wait_queue_head *head,
-			       struct poll_table_struct *p)
+				struct poll_table_struct *p, poll_flags flags)
 {
 	struct io_poll_table *pt = container_of(p, struct io_poll_table, pt);
 	struct async_poll *apoll = pt->req->apoll;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ecc07b47e813..97b03ab30d5e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4877,7 +4877,9 @@ static int memcg_event_wake(wait_queue_entry_t *wait, unsigned mode,
 }
 
 static void memcg_event_ptable_queue_proc(struct file *file,
-		wait_queue_head_t *wqh, poll_table *pt)
+					  wait_queue_head_t *wqh,
+					  poll_table *pt,
+					  poll_flags flags)
 {
 	struct mem_cgroup_event *event =
 		container_of(pt, struct mem_cgroup_event, pt);
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index c4015f30f9fa..91f9f474ab01 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -550,7 +550,8 @@ static int p9_pollwake(wait_queue_entry_t *wait, unsigned int mode, int sync, vo
  */
 
 static void
-p9_pollwait(struct file *filp, wait_queue_head_t *wait_address, poll_table *p)
+p9_pollwait(struct file *filp, wait_queue_head_t *wait_address, poll_table *p,
+	    poll_flags flags)
 {
 	struct p9_conn *m = container_of(p, struct p9_conn, pt);
 	struct p9_poll_wait *pwait = NULL;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 89912a17f5d5..645b5d155386 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -246,7 +246,7 @@ irqfd_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, void *key)
 
 static void
 irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh,
-			poll_table *pt)
+			poll_table *pt, poll_flags flags)
 {
 	struct kvm_kernel_irqfd *irqfd =
 		container_of(pt, struct kvm_kernel_irqfd, pt);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/7] poll: POLL_ENQUEUE_EXCLUSIVE
  2023-09-05 21:42 [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Peter Xu
  2023-09-05 21:42 ` [PATCH 1/7] mm/userfaultfd: Make uffd read() wait event exclusive Peter Xu
  2023-09-05 21:42 ` [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc() Peter Xu
@ 2023-09-05 21:42 ` Peter Xu
  2023-09-05 21:42 ` [PATCH 4/7] fs/userfaultfd: Use exclusive waitqueue for poll() Peter Xu
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2023-09-05 21:42 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Anish Moorthy, Axel Rasmussen, Alexander Viro, Mike Kravetz,
	Peter Zijlstra, Andrew Morton, Mike Rapoport, Christian Brauner,
	peterx, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
	James Houghton, Nadav Amit

Add a flag for poll_wait() showing that the caller wants the enqueue to be
exclusive.  It is similar to EPOLLEXCLUSIVE for epoll() but grants kernel
poll users to opt-in with more efficient exclusive queuing where applicable.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 fs/select.c          |  5 ++++-
 include/linux/poll.h | 20 ++++++++++++++++++--
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index 0433448481e9..a3c9088e8d76 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -231,7 +231,10 @@ static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
 	entry->key = p->_key;
 	init_waitqueue_func_entry(&entry->wait, pollwake);
 	entry->wait.private = pwq;
-	add_wait_queue(wait_address, &entry->wait);
+	if (flags & POLL_ENQUEUE_EXCLUSIVE)
+		add_wait_queue_exclusive(wait_address, &entry->wait);
+	else
+		add_wait_queue(wait_address, &entry->wait);
 }
 
 static int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
diff --git a/include/linux/poll.h b/include/linux/poll.h
index cbad520fc65c..11af98ae579c 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -29,6 +29,8 @@
 
 typedef unsigned int poll_flags;
 
+#define POLL_ENQUEUE_EXCLUSIVE  BIT(0)
+
 struct poll_table_struct;
 
 /* 
@@ -46,10 +48,24 @@ typedef struct poll_table_struct {
 	__poll_t _key;
 } poll_table;
 
-static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_address, poll_table *p)
+static inline void __poll_wait(struct file *filp, wait_queue_head_t *wait_address,
+			       poll_table *p, poll_flags flags)
 {
 	if (p && p->_qproc && wait_address)
-		p->_qproc(filp, wait_address, p, 0);
+		p->_qproc(filp, wait_address, p, flags);
+}
+
+static inline void poll_wait(struct file *filp, wait_queue_head_t *wait_address,
+			     poll_table *p)
+{
+	__poll_wait(filp, wait_address, p, 0);
+}
+
+static inline void poll_wait_exclusive(struct file *filp,
+				       wait_queue_head_t *wait_address,
+				       poll_table *p)
+{
+	__poll_wait(filp, wait_address, p, POLL_ENQUEUE_EXCLUSIVE);
 }
 
 /*
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/7] fs/userfaultfd: Use exclusive waitqueue for poll()
  2023-09-05 21:42 [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Peter Xu
                   ` (2 preceding siblings ...)
  2023-09-05 21:42 ` [PATCH 3/7] poll: POLL_ENQUEUE_EXCLUSIVE Peter Xu
@ 2023-09-05 21:42 ` Peter Xu
  2023-09-05 21:42 ` [PATCH 5/7] selftests/mm: Replace uffd_read_mutex with a semaphore Peter Xu
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2023-09-05 21:42 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Anish Moorthy, Axel Rasmussen, Alexander Viro, Mike Kravetz,
	Peter Zijlstra, Andrew Morton, Mike Rapoport, Christian Brauner,
	peterx, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
	James Houghton, Nadav Amit

Userfaultfd is the kind of fd that does not need a wake all semantics when
wake up.  Enqueue using the new POLL_ENQUEUE_EXCLUSIVE flag.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 fs/userfaultfd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index f7fda7d0c994..9c39adc398fc 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -994,7 +994,7 @@ static __poll_t userfaultfd_poll(struct file *file, poll_table *wait)
 	struct userfaultfd_ctx *ctx = file->private_data;
 	__poll_t ret;
 
-	poll_wait(file, &ctx->fd_wqh, wait);
+	poll_wait_exclusive(file, &ctx->fd_wqh, wait);
 
 	if (!userfaultfd_is_initialized(ctx))
 		return EPOLLERR;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/7] selftests/mm: Replace uffd_read_mutex with a semaphore
  2023-09-05 21:42 [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Peter Xu
                   ` (3 preceding siblings ...)
  2023-09-05 21:42 ` [PATCH 4/7] fs/userfaultfd: Use exclusive waitqueue for poll() Peter Xu
@ 2023-09-05 21:42 ` Peter Xu
  2023-09-05 21:42 ` [PATCH 6/7] selftests/mm: Create uffd_fault_thread_create|join() Peter Xu
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2023-09-05 21:42 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Anish Moorthy, Axel Rasmussen, Alexander Viro, Mike Kravetz,
	Peter Zijlstra, Andrew Morton, Mike Rapoport, Christian Brauner,
	peterx, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
	James Houghton, Nadav Amit

Each uffd read threads unlocks the read mutex first, probably just to make
sure the thread is reaching a stage where pthread_cancel() can always work
before the main thread moves on.

However keeping the mutex locked always and unlock in the thread is a bit
hacky.  Replacing it with a semaphore which should be much clearer, where
the main thread will wait() and the thread will just post().  Move it to
uffd-common.* to be reused later.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tools/testing/selftests/mm/uffd-common.c | 1 +
 tools/testing/selftests/mm/uffd-common.h | 2 ++
 tools/testing/selftests/mm/uffd-stress.c | 8 +++-----
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/mm/uffd-common.c b/tools/testing/selftests/mm/uffd-common.c
index 02b89860e193..aded06cab285 100644
--- a/tools/testing/selftests/mm/uffd-common.c
+++ b/tools/testing/selftests/mm/uffd-common.c
@@ -17,6 +17,7 @@ bool map_shared;
 bool test_uffdio_wp = true;
 unsigned long long *count_verify;
 uffd_test_ops_t *uffd_test_ops;
+sem_t uffd_read_sem;
 
 static int uffd_mem_fd_create(off_t mem_size, bool hugetlb)
 {
diff --git a/tools/testing/selftests/mm/uffd-common.h b/tools/testing/selftests/mm/uffd-common.h
index 7c4fa964c3b0..521523baded1 100644
--- a/tools/testing/selftests/mm/uffd-common.h
+++ b/tools/testing/selftests/mm/uffd-common.h
@@ -32,6 +32,7 @@
 #include <inttypes.h>
 #include <stdint.h>
 #include <sys/random.h>
+#include <semaphore.h>
 
 #include "../kselftest.h"
 #include "vm_util.h"
@@ -97,6 +98,7 @@ extern bool map_shared;
 extern bool test_uffdio_wp;
 extern unsigned long long *count_verify;
 extern volatile bool test_uffdio_copy_eexist;
+extern sem_t uffd_read_sem;
 
 extern uffd_test_ops_t anon_uffd_test_ops;
 extern uffd_test_ops_t shmem_uffd_test_ops;
diff --git a/tools/testing/selftests/mm/uffd-stress.c b/tools/testing/selftests/mm/uffd-stress.c
index 469e0476af26..7219f55ae794 100644
--- a/tools/testing/selftests/mm/uffd-stress.c
+++ b/tools/testing/selftests/mm/uffd-stress.c
@@ -125,14 +125,12 @@ static int copy_page_retry(int ufd, unsigned long offset)
 	return __copy_page(ufd, offset, true, test_uffdio_wp);
 }
 
-pthread_mutex_t uffd_read_mutex = PTHREAD_MUTEX_INITIALIZER;
-
 static void *uffd_read_thread(void *arg)
 {
 	struct uffd_args *args = (struct uffd_args *)arg;
 	struct uffd_msg msg;
 
-	pthread_mutex_unlock(&uffd_read_mutex);
+	sem_post(&uffd_read_sem);
 	/* from here cancellation is ok */
 
 	for (;;) {
@@ -196,7 +194,7 @@ static int stress(struct uffd_args *args)
 					   uffd_read_thread,
 					   (void *)&args[cpu]))
 				return 1;
-			pthread_mutex_lock(&uffd_read_mutex);
+			sem_wait(&uffd_read_sem);
 		}
 		if (pthread_create(&background_threads[cpu], &attr,
 				   background_thread, (void *)cpu))
@@ -258,7 +256,7 @@ static int userfaultfd_stress(void)
 	zeropage = area;
 	bzero(zeropage, page_size);
 
-	pthread_mutex_lock(&uffd_read_mutex);
+	sem_init(&uffd_read_sem, 0, 0);
 
 	pthread_attr_init(&attr);
 	pthread_attr_setstacksize(&attr, 16*1024*1024);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 6/7] selftests/mm: Create uffd_fault_thread_create|join()
  2023-09-05 21:42 [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Peter Xu
                   ` (4 preceding siblings ...)
  2023-09-05 21:42 ` [PATCH 5/7] selftests/mm: Replace uffd_read_mutex with a semaphore Peter Xu
@ 2023-09-05 21:42 ` Peter Xu
  2023-09-05 21:42 ` [PATCH 7/7] selftests/mm: uffd perf test Peter Xu
  2023-09-07 19:18 ` [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Axel Rasmussen
  7 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2023-09-05 21:42 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Anish Moorthy, Axel Rasmussen, Alexander Viro, Mike Kravetz,
	Peter Zijlstra, Andrew Morton, Mike Rapoport, Christian Brauner,
	peterx, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
	James Houghton, Nadav Amit

Make them common functions to be reused.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tools/testing/selftests/mm/uffd-common.c | 46 ++++++++++++++++++++++
 tools/testing/selftests/mm/uffd-common.h |  4 ++
 tools/testing/selftests/mm/uffd-stress.c | 49 ++++--------------------
 3 files changed, 57 insertions(+), 42 deletions(-)

diff --git a/tools/testing/selftests/mm/uffd-common.c b/tools/testing/selftests/mm/uffd-common.c
index aded06cab285..851284395b29 100644
--- a/tools/testing/selftests/mm/uffd-common.c
+++ b/tools/testing/selftests/mm/uffd-common.c
@@ -555,6 +555,52 @@ void *uffd_poll_thread(void *arg)
 	return NULL;
 }
 
+void *uffd_read_thread(void *arg)
+{
+	struct uffd_args *args = (struct uffd_args *)arg;
+	struct uffd_msg msg;
+
+	sem_post(&uffd_read_sem);
+	/* from here cancellation is ok */
+
+	for (;;) {
+		if (uffd_read_msg(uffd, &msg))
+			continue;
+		uffd_handle_page_fault(&msg, args);
+	}
+
+	return NULL;
+}
+
+void uffd_fault_thread_create(pthread_t *thread, pthread_attr_t *attr,
+			      struct uffd_args *args, bool poll)
+{
+	if (poll) {
+		if (pthread_create(thread, attr, uffd_poll_thread, args))
+			err("uffd_poll_thread create");
+	} else {
+		if (pthread_create(thread, attr, uffd_read_thread, args))
+			err("uffd_read_thread create");
+		sem_wait(&uffd_read_sem);
+	}
+}
+
+void uffd_fault_thread_join(pthread_t thread, int cpu, bool poll)
+{
+	char c = 1;
+
+	if (poll) {
+		if (write(pipefd[cpu*2+1], &c, 1) != 1)
+			err("pipefd write error");
+	} else {
+		if (pthread_cancel(thread))
+			err("pthread_cancel()");
+	}
+
+	if (pthread_join(thread, NULL))
+		err("pthread_join()");
+}
+
 static void retry_copy_page(int ufd, struct uffdio_copy *uffdio_copy,
 			    unsigned long offset)
 {
diff --git a/tools/testing/selftests/mm/uffd-common.h b/tools/testing/selftests/mm/uffd-common.h
index 521523baded1..9d66ad5c52cb 100644
--- a/tools/testing/selftests/mm/uffd-common.h
+++ b/tools/testing/selftests/mm/uffd-common.h
@@ -114,6 +114,10 @@ void uffd_handle_page_fault(struct uffd_msg *msg, struct uffd_args *args);
 int __copy_page(int ufd, unsigned long offset, bool retry, bool wp);
 int copy_page(int ufd, unsigned long offset, bool wp);
 void *uffd_poll_thread(void *arg);
+void *uffd_read_thread(void *arg);
+void uffd_fault_thread_create(pthread_t *thread, pthread_attr_t *attr,
+			      struct uffd_args *args, bool poll);
+void uffd_fault_thread_join(pthread_t thread, int cpu, bool poll);
 
 int uffd_open_dev(unsigned int flags);
 int uffd_open_sys(unsigned int flags);
diff --git a/tools/testing/selftests/mm/uffd-stress.c b/tools/testing/selftests/mm/uffd-stress.c
index 7219f55ae794..915795e33432 100644
--- a/tools/testing/selftests/mm/uffd-stress.c
+++ b/tools/testing/selftests/mm/uffd-stress.c
@@ -125,23 +125,6 @@ static int copy_page_retry(int ufd, unsigned long offset)
 	return __copy_page(ufd, offset, true, test_uffdio_wp);
 }
 
-static void *uffd_read_thread(void *arg)
-{
-	struct uffd_args *args = (struct uffd_args *)arg;
-	struct uffd_msg msg;
-
-	sem_post(&uffd_read_sem);
-	/* from here cancellation is ok */
-
-	for (;;) {
-		if (uffd_read_msg(uffd, &msg))
-			continue;
-		uffd_handle_page_fault(&msg, args);
-	}
-
-	return NULL;
-}
-
 static void *background_thread(void *arg)
 {
 	unsigned long cpu = (unsigned long) arg;
@@ -186,16 +169,10 @@ static int stress(struct uffd_args *args)
 		if (pthread_create(&locking_threads[cpu], &attr,
 				   locking_thread, (void *)cpu))
 			return 1;
-		if (bounces & BOUNCE_POLL) {
-			if (pthread_create(&uffd_threads[cpu], &attr, uffd_poll_thread, &args[cpu]))
-				err("uffd_poll_thread create");
-		} else {
-			if (pthread_create(&uffd_threads[cpu], &attr,
-					   uffd_read_thread,
-					   (void *)&args[cpu]))
-				return 1;
-			sem_wait(&uffd_read_sem);
-		}
+
+		uffd_fault_thread_create(&uffd_threads[cpu], &attr,
+					 &args[cpu], bounces & BOUNCE_POLL);
+
 		if (pthread_create(&background_threads[cpu], &attr,
 				   background_thread, (void *)cpu))
 			return 1;
@@ -220,21 +197,9 @@ static int stress(struct uffd_args *args)
 		if (pthread_join(locking_threads[cpu], NULL))
 			return 1;
 
-	for (cpu = 0; cpu < nr_cpus; cpu++) {
-		char c;
-		if (bounces & BOUNCE_POLL) {
-			if (write(pipefd[cpu*2+1], &c, 1) != 1)
-				err("pipefd write error");
-			if (pthread_join(uffd_threads[cpu],
-					 (void *)&args[cpu]))
-				return 1;
-		} else {
-			if (pthread_cancel(uffd_threads[cpu]))
-				return 1;
-			if (pthread_join(uffd_threads[cpu], NULL))
-				return 1;
-		}
-	}
+	for (cpu = 0; cpu < nr_cpus; cpu++)
+		uffd_fault_thread_join(uffd_threads[cpu], cpu,
+				       bounces & BOUNCE_POLL);
 
 	return 0;
 }
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 7/7] selftests/mm: uffd perf test
  2023-09-05 21:42 [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Peter Xu
                   ` (5 preceding siblings ...)
  2023-09-05 21:42 ` [PATCH 6/7] selftests/mm: Create uffd_fault_thread_create|join() Peter Xu
@ 2023-09-05 21:42 ` Peter Xu
  2023-09-07 19:18 ` [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Axel Rasmussen
  7 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2023-09-05 21:42 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Anish Moorthy, Axel Rasmussen, Alexander Viro, Mike Kravetz,
	Peter Zijlstra, Andrew Morton, Mike Rapoport, Christian Brauner,
	peterx, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
	James Houghton, Nadav Amit

Add a simple perf test for userfaultfd missing mode, on private anon only.
It mostly only tests the messaging, so memory type / fault type may not
that much yet.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tools/testing/selftests/mm/Makefile      |   2 +
 tools/testing/selftests/mm/uffd-common.c |  18 ++
 tools/testing/selftests/mm/uffd-common.h |   1 +
 tools/testing/selftests/mm/uffd-perf.c   | 207 +++++++++++++++++++++++
 4 files changed, 228 insertions(+)
 create mode 100644 tools/testing/selftests/mm/uffd-perf.c

diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 6a9fc5693145..acb22517d37e 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -64,6 +64,7 @@ TEST_GEN_FILES += thuge-gen
 TEST_GEN_FILES += transhuge-stress
 TEST_GEN_FILES += uffd-stress
 TEST_GEN_FILES += uffd-unit-tests
+TEST_GEN_FILES += uffd-perf
 TEST_GEN_FILES += split_huge_page_test
 TEST_GEN_FILES += ksm_tests
 TEST_GEN_FILES += ksm_functional_tests
@@ -120,6 +121,7 @@ $(TEST_GEN_FILES): vm_util.c
 
 $(OUTPUT)/uffd-stress: uffd-common.c
 $(OUTPUT)/uffd-unit-tests: uffd-common.c
+$(OUTPUT)/uffd-perf: uffd-common.c
 
 ifeq ($(ARCH),x86_64)
 BINARIES_32 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_32))
diff --git a/tools/testing/selftests/mm/uffd-common.c b/tools/testing/selftests/mm/uffd-common.c
index 851284395b29..afbf2f7add56 100644
--- a/tools/testing/selftests/mm/uffd-common.c
+++ b/tools/testing/selftests/mm/uffd-common.c
@@ -725,3 +725,21 @@ int uffd_get_features(uint64_t *features)
 
 	return 0;
 }
+
+uint64_t get_usec(void)
+{
+    uint64_t val = 0;
+    struct timespec t;
+    int ret = clock_gettime(CLOCK_MONOTONIC, &t);
+
+    if (ret == -1) {
+        perror("clock_gettime() failed");
+        /* should never happen */
+        exit(-1);
+    }
+
+    val = t.tv_nsec / 1000;     /* ns -> us */
+    val += t.tv_sec * 1000000;  /* s -> us */
+
+    return val;
+}
diff --git a/tools/testing/selftests/mm/uffd-common.h b/tools/testing/selftests/mm/uffd-common.h
index 9d66ad5c52cb..4273201ae19f 100644
--- a/tools/testing/selftests/mm/uffd-common.h
+++ b/tools/testing/selftests/mm/uffd-common.h
@@ -123,6 +123,7 @@ int uffd_open_dev(unsigned int flags);
 int uffd_open_sys(unsigned int flags);
 int uffd_open(unsigned int flags);
 int uffd_get_features(uint64_t *features);
+uint64_t get_usec(void);
 
 #define TEST_ANON	1
 #define TEST_HUGETLB	2
diff --git a/tools/testing/selftests/mm/uffd-perf.c b/tools/testing/selftests/mm/uffd-perf.c
new file mode 100644
index 000000000000..eda99718311a
--- /dev/null
+++ b/tools/testing/selftests/mm/uffd-perf.c
@@ -0,0 +1,207 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Userfaultfd performance tests.
+ *
+ *  Copyright (C) 2023  Red Hat, Inc.
+ */
+
+#include "uffd-common.h"
+
+#ifdef __NR_userfaultfd
+
+#define  DEF_MEM_SIZE_MB  (512)
+#define  MB(x)  ((x) * 1024 * 1024)
+#define  DEF_N_TESTS  5
+
+static volatile bool perf_test_started;
+static unsigned int n_uffd_threads, n_worker_threads;
+static uint64_t nr_pages_per_worker;
+static unsigned long n_tests = DEF_N_TESTS;
+
+static void setup_env(unsigned long mem_size_mb)
+{
+	/* Test private anon only for now */
+	map_shared = false;
+	uffd_test_ops = &anon_uffd_test_ops;
+	page_size = psize();
+	nr_cpus = n_uffd_threads;
+	nr_pages = MB(mem_size_mb) / page_size;
+	nr_pages_per_worker = nr_pages / n_worker_threads;
+	if (nr_pages_per_worker == 0)
+		err("each worker should at least own one page");
+}
+
+void *worker_fn(void *opaque)
+{
+	unsigned long i = (unsigned long) opaque;
+	unsigned long page_nr, start_nr, end_nr;
+	int v = 0;
+
+	start_nr = i * nr_pages_per_worker;
+	end_nr = (i + 1) * nr_pages_per_worker;
+
+	while (!perf_test_started);
+
+	for (page_nr = start_nr; page_nr < end_nr; page_nr++)
+		v += *(volatile int *)(area_dst + page_nr * page_size);
+
+	return NULL;
+}
+
+static uint64_t run_perf(uint64_t mem_size_mb, bool poll)
+{
+	pthread_t worker_threads[n_worker_threads];
+	pthread_t uffd_threads[n_uffd_threads];
+	const char *errmsg = NULL;
+	struct uffd_args *args;
+	uint64_t start, end;
+	int i, ret;
+
+	if (uffd_test_ctx_init(0, &errmsg))
+		err("%s", errmsg);
+
+	/*
+	 * By default, uffd is opened with NONBLOCK mode; use block mode
+	 * when test read()
+	 */
+	if (!poll) {
+		int flags = fcntl(uffd, F_GETFL);
+
+		if (flags < 0)
+			err("fcntl(F_GETFL) failed");
+
+		if (flags & O_NONBLOCK)
+			flags &= ~O_NONBLOCK;
+
+		if (fcntl(uffd, F_SETFL, flags))
+			err("fcntl(F_SETFL) failed");
+	}
+
+	ret = uffd_register(uffd, area_dst, MB(mem_size_mb),
+			    true, false, false);
+	if (ret)
+		err("uffd_register() failed");
+
+	args = calloc(nr_cpus, sizeof(struct uffd_args));
+	if (!args)
+		err("calloc()");
+
+	for (i = 0; i < n_uffd_threads; i++) {
+		args[i].cpu = i;
+		uffd_fault_thread_create(&uffd_threads[i], NULL,
+					 &args[i], poll);
+	}
+
+	for (i = 0; i < n_worker_threads; i++) {
+		if (pthread_create(&worker_threads[i], NULL,
+				   worker_fn, (void *)(uintptr_t)i))
+			err("create uffd threads");
+	}
+
+	start = get_usec();
+	perf_test_started = true;
+	for (i = 0; i < n_worker_threads; i++)
+		pthread_join(worker_threads[i], NULL);
+	end = get_usec();
+
+	for (i = 0; i < n_uffd_threads; i++) {
+		struct uffd_args *p = &args[i];
+
+		uffd_fault_thread_join(uffd_threads[i], i, poll);
+
+		assert(p->wp_faults == 0 && p->minor_faults == 0);
+	}
+
+	free(args);
+
+	ret = uffd_unregister(uffd, area_dst, MB(mem_size_mb));
+	if (ret)
+		err("uffd_unregister() failed");
+
+	return end - start;
+}
+
+static void usage(const char *prog)
+{
+	printf("usage: %s <options>\n", prog);
+	puts("");
+	printf("  -m: size of memory to test (in MB, default: %u)\n",
+	       DEF_MEM_SIZE_MB);
+	puts("  -p: use poll() (the default)");
+	puts("  -r: use read()");
+	printf("  -t: test rounds (default: %u)\n", DEF_N_TESTS);
+	puts("  -u: number of uffd threads (default: n_cpus)");
+	puts("  -w: number of worker threads (default: n_cpus)");
+	puts("");
+	exit(KSFT_FAIL);
+}
+
+int main(int argc, char *argv[])
+{
+	unsigned long mem_size_mb = DEF_MEM_SIZE_MB;
+	uint64_t result, sum = 0;
+	bool use_poll = true;
+	int opt, count;
+
+	n_uffd_threads = n_worker_threads = sysconf(_SC_NPROCESSORS_ONLN);
+
+	while ((opt = getopt(argc, argv, "hm:prt:u:w:")) != -1) {
+		switch (opt) {
+		case 'm':
+			mem_size_mb = strtoul(optarg, NULL, 10);
+			break;
+		case 'p':
+			use_poll = true;
+			break;
+		case 'r':
+			use_poll = false;
+			break;
+		case 't':
+			n_tests = strtoul(optarg, NULL, 10);
+			break;
+		case 'u':
+			n_uffd_threads = strtoul(optarg, NULL, 10);
+			break;
+		case 'w':
+			n_worker_threads = strtoul(optarg, NULL, 10);
+			break;
+		case 'h':
+		default:
+			/* Unknown */
+			usage(argv[0]);
+			break;
+		}
+	}
+
+	setup_env(mem_size_mb);
+
+	printf("Message mode: \t\t%s\n", use_poll ? "poll" : "read");
+	printf("Mem size: \t\t%lu (MB)\n", mem_size_mb);
+	printf("Uffd threads: \t\t%u\n", n_uffd_threads);
+	printf("Worker threads: \t%u\n", n_worker_threads);
+	printf("Test rounds: \t\t%lu\n", n_tests);
+	printf("Time used (us): \t");
+
+	for (count = 0; count < n_tests; count++) {
+		result = run_perf(mem_size_mb, use_poll);
+		sum += result;
+		printf("%" PRIu64 ", ", result);
+		fflush(stdout);
+	}
+	printf("\b\b \n");
+	printf("Average (us): \t\t%"PRIu64"\n", sum / n_tests);
+
+	return KSFT_PASS;
+}
+
+#else /* __NR_userfaultfd */
+
+#warning "missing __NR_userfaultfd definition"
+
+int main(void)
+{
+	printf("Skipping %s (missing __NR_userfaultfd)\n", __file__);
+	return KSFT_SKIP;
+}
+
+#endif /* __NR_userfaultfd */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc()
  2023-09-05 21:42 ` [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc() Peter Xu
@ 2023-09-05 23:21   ` kernel test robot
  2023-09-06 17:31   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2023-09-05 23:21 UTC (permalink / raw)
  To: Peter Xu, linux-kernel, linux-mm
  Cc: oe-kbuild-all, Anish Moorthy, Axel Rasmussen, Alexander Viro,
	Mike Kravetz, Peter Zijlstra, Andrew Morton,
	Linux Memory Management List, Mike Rapoport, Christian Brauner,
	peterx, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
	James Houghton, Nadav Amit

Hi Peter,

kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Peter-Xu/mm-userfaultfd-Make-uffd-read-wait-event-exclusive/20230906-054430
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20230905214235.320571-3-peterx%40redhat.com
patch subject: [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc()
config: m68k-allyesconfig (https://download.01.org/0day-ci/archive/20230906/202309060752.FQUjUmGA-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230906/202309060752.FQUjUmGA-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309060752.FQUjUmGA-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> net/9p/trans_fd.c:555: warning: Function parameter or member 'flags' not described in 'p9_pollwait'


vim +555 net/9p/trans_fd.c

8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06  542  
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06  543  /**
5503ac565998837 Eric Van Hensbergen 2008-10-13  544   * p9_pollwait - add poll task to the wait queue
5503ac565998837 Eric Van Hensbergen 2008-10-13  545   * @filp: file pointer being polled
5503ac565998837 Eric Van Hensbergen 2008-10-13  546   * @wait_address: wait_q to block on
5503ac565998837 Eric Van Hensbergen 2008-10-13  547   * @p: poll state
ee443996a35c1e0 Eric Van Hensbergen 2008-03-05  548   *
5503ac565998837 Eric Van Hensbergen 2008-10-13  549   * called by files poll operation to add v9fs-poll task to files wait queue
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06  550   */
ee443996a35c1e0 Eric Van Hensbergen 2008-03-05  551  
5503ac565998837 Eric Van Hensbergen 2008-10-13  552  static void
14649d7d7806aba Peter Xu            2023-09-05  553  p9_pollwait(struct file *filp, wait_queue_head_t *wait_address, poll_table *p,
14649d7d7806aba Peter Xu            2023-09-05  554  	    poll_flags flags)
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06 @555  {
5503ac565998837 Eric Van Hensbergen 2008-10-13  556  	struct p9_conn *m = container_of(p, struct p9_conn, pt);
5503ac565998837 Eric Van Hensbergen 2008-10-13  557  	struct p9_poll_wait *pwait = NULL;
5503ac565998837 Eric Van Hensbergen 2008-10-13  558  	int i;
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06  559  
5503ac565998837 Eric Van Hensbergen 2008-10-13  560  	for (i = 0; i < ARRAY_SIZE(m->poll_wait); i++) {
5503ac565998837 Eric Van Hensbergen 2008-10-13  561  		if (m->poll_wait[i].wait_addr == NULL) {
5503ac565998837 Eric Van Hensbergen 2008-10-13  562  			pwait = &m->poll_wait[i];
5503ac565998837 Eric Van Hensbergen 2008-10-13  563  			break;
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06  564  		}
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06  565  	}
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06  566  
5503ac565998837 Eric Van Hensbergen 2008-10-13  567  	if (!pwait) {
5d3851530d6d685 Joe Perches         2011-11-28  568  		p9_debug(P9_DEBUG_ERROR, "not enough wait_address slots\n");
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06  569  		return;
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06  570  	}
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06  571  
5503ac565998837 Eric Van Hensbergen 2008-10-13  572  	pwait->conn = m;
5503ac565998837 Eric Van Hensbergen 2008-10-13  573  	pwait->wait_addr = wait_address;
5503ac565998837 Eric Van Hensbergen 2008-10-13  574  	init_waitqueue_func_entry(&pwait->wait, p9_pollwake);
5503ac565998837 Eric Van Hensbergen 2008-10-13  575  	add_wait_queue(wait_address, &pwait->wait);
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06  576  }
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06  577  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc()
  2023-09-05 21:42 ` [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc() Peter Xu
  2023-09-05 23:21   ` kernel test robot
@ 2023-09-06 17:31   ` kernel test robot
  2023-09-06 20:53   ` kernel test robot
  2023-09-11 20:00   ` Peter Xu
  3 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2023-09-06 17:31 UTC (permalink / raw)
  To: Peter Xu, linux-kernel, linux-mm
  Cc: oe-kbuild-all, Anish Moorthy, Axel Rasmussen, Alexander Viro,
	Mike Kravetz, Peter Zijlstra, Andrew Morton,
	Linux Memory Management List, Mike Rapoport, Christian Brauner,
	peterx, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
	James Houghton, Nadav Amit

Hi Peter,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Peter-Xu/mm-userfaultfd-Make-uffd-read-wait-event-exclusive/20230906-054430
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20230905214235.320571-3-peterx%40redhat.com
patch subject: [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc()
config: x86_64-buildonly-randconfig-005-20230906 (https://download.01.org/0day-ci/archive/20230907/202309070146.47KrWvAH-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230907/202309070146.47KrWvAH-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309070146.47KrWvAH-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/xen/privcmd.c: In function 'privcmd_irqfd_assign':
>> drivers/xen/privcmd.c:965:40: error: passing argument 2 of 'init_poll_funcptr' from incompatible pointer type [-Werror=incompatible-pointer-types]
     965 |         init_poll_funcptr(&kirqfd->pt, irqfd_poll_func);
         |                                        ^~~~~~~~~~~~~~~
         |                                        |
         |                                        void (*)(struct file *, wait_queue_head_t *, poll_table *) {aka void (*)(struct file *, struct wait_queue_head *, struct poll_table_struct *)}
   In file included from drivers/xen/privcmd.c:17:
   include/linux/poll.h:76:70: note: expected 'poll_queue_proc' {aka 'void (*)(struct file *, struct wait_queue_head *, struct poll_table_struct *, unsigned int)'} but argument is of type 'void (*)(struct file *, wait_queue_head_t *, poll_table *)' {aka 'void (*)(struct file *, struct wait_queue_head *, struct poll_table_struct *)'}
      76 | static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc)
         |                                                      ~~~~~~~~~~~~~~~~^~~~~
   cc1: some warnings being treated as errors


vim +/init_poll_funcptr +965 drivers/xen/privcmd.c

f8941e6c4c7129 Viresh Kumar 2023-08-22   924  
f8941e6c4c7129 Viresh Kumar 2023-08-22   925  static int privcmd_irqfd_assign(struct privcmd_irqfd *irqfd)
f8941e6c4c7129 Viresh Kumar 2023-08-22   926  {
f8941e6c4c7129 Viresh Kumar 2023-08-22   927  	struct privcmd_kernel_irqfd *kirqfd, *tmp;
f8941e6c4c7129 Viresh Kumar 2023-08-22   928  	__poll_t events;
f8941e6c4c7129 Viresh Kumar 2023-08-22   929  	struct fd f;
f8941e6c4c7129 Viresh Kumar 2023-08-22   930  	void *dm_op;
f8941e6c4c7129 Viresh Kumar 2023-08-22   931  	int ret;
f8941e6c4c7129 Viresh Kumar 2023-08-22   932  
f8941e6c4c7129 Viresh Kumar 2023-08-22   933  	kirqfd = kzalloc(sizeof(*kirqfd) + irqfd->size, GFP_KERNEL);
f8941e6c4c7129 Viresh Kumar 2023-08-22   934  	if (!kirqfd)
f8941e6c4c7129 Viresh Kumar 2023-08-22   935  		return -ENOMEM;
f8941e6c4c7129 Viresh Kumar 2023-08-22   936  	dm_op = kirqfd + 1;
f8941e6c4c7129 Viresh Kumar 2023-08-22   937  
f8941e6c4c7129 Viresh Kumar 2023-08-22   938  	if (copy_from_user(dm_op, irqfd->dm_op, irqfd->size)) {
f8941e6c4c7129 Viresh Kumar 2023-08-22   939  		ret = -EFAULT;
f8941e6c4c7129 Viresh Kumar 2023-08-22   940  		goto error_kfree;
f8941e6c4c7129 Viresh Kumar 2023-08-22   941  	}
f8941e6c4c7129 Viresh Kumar 2023-08-22   942  
f8941e6c4c7129 Viresh Kumar 2023-08-22   943  	kirqfd->xbufs.size = irqfd->size;
f8941e6c4c7129 Viresh Kumar 2023-08-22   944  	set_xen_guest_handle(kirqfd->xbufs.h, dm_op);
f8941e6c4c7129 Viresh Kumar 2023-08-22   945  	kirqfd->dom = irqfd->dom;
f8941e6c4c7129 Viresh Kumar 2023-08-22   946  	INIT_WORK(&kirqfd->shutdown, irqfd_shutdown);
f8941e6c4c7129 Viresh Kumar 2023-08-22   947  
f8941e6c4c7129 Viresh Kumar 2023-08-22   948  	f = fdget(irqfd->fd);
f8941e6c4c7129 Viresh Kumar 2023-08-22   949  	if (!f.file) {
f8941e6c4c7129 Viresh Kumar 2023-08-22   950  		ret = -EBADF;
f8941e6c4c7129 Viresh Kumar 2023-08-22   951  		goto error_kfree;
f8941e6c4c7129 Viresh Kumar 2023-08-22   952  	}
f8941e6c4c7129 Viresh Kumar 2023-08-22   953  
f8941e6c4c7129 Viresh Kumar 2023-08-22   954  	kirqfd->eventfd = eventfd_ctx_fileget(f.file);
f8941e6c4c7129 Viresh Kumar 2023-08-22   955  	if (IS_ERR(kirqfd->eventfd)) {
f8941e6c4c7129 Viresh Kumar 2023-08-22   956  		ret = PTR_ERR(kirqfd->eventfd);
f8941e6c4c7129 Viresh Kumar 2023-08-22   957  		goto error_fd_put;
f8941e6c4c7129 Viresh Kumar 2023-08-22   958  	}
f8941e6c4c7129 Viresh Kumar 2023-08-22   959  
f8941e6c4c7129 Viresh Kumar 2023-08-22   960  	/*
f8941e6c4c7129 Viresh Kumar 2023-08-22   961  	 * Install our own custom wake-up handling so we are notified via a
f8941e6c4c7129 Viresh Kumar 2023-08-22   962  	 * callback whenever someone signals the underlying eventfd.
f8941e6c4c7129 Viresh Kumar 2023-08-22   963  	 */
f8941e6c4c7129 Viresh Kumar 2023-08-22   964  	init_waitqueue_func_entry(&kirqfd->wait, irqfd_wakeup);
f8941e6c4c7129 Viresh Kumar 2023-08-22  @965  	init_poll_funcptr(&kirqfd->pt, irqfd_poll_func);
f8941e6c4c7129 Viresh Kumar 2023-08-22   966  
f8941e6c4c7129 Viresh Kumar 2023-08-22   967  	mutex_lock(&irqfds_lock);
f8941e6c4c7129 Viresh Kumar 2023-08-22   968  
f8941e6c4c7129 Viresh Kumar 2023-08-22   969  	list_for_each_entry(tmp, &irqfds_list, list) {
f8941e6c4c7129 Viresh Kumar 2023-08-22   970  		if (kirqfd->eventfd == tmp->eventfd) {
f8941e6c4c7129 Viresh Kumar 2023-08-22   971  			ret = -EBUSY;
f8941e6c4c7129 Viresh Kumar 2023-08-22   972  			mutex_unlock(&irqfds_lock);
f8941e6c4c7129 Viresh Kumar 2023-08-22   973  			goto error_eventfd;
f8941e6c4c7129 Viresh Kumar 2023-08-22   974  		}
f8941e6c4c7129 Viresh Kumar 2023-08-22   975  	}
f8941e6c4c7129 Viresh Kumar 2023-08-22   976  
f8941e6c4c7129 Viresh Kumar 2023-08-22   977  	list_add_tail(&kirqfd->list, &irqfds_list);
f8941e6c4c7129 Viresh Kumar 2023-08-22   978  	mutex_unlock(&irqfds_lock);
f8941e6c4c7129 Viresh Kumar 2023-08-22   979  
f8941e6c4c7129 Viresh Kumar 2023-08-22   980  	/*
f8941e6c4c7129 Viresh Kumar 2023-08-22   981  	 * Check if there was an event already pending on the eventfd before we
f8941e6c4c7129 Viresh Kumar 2023-08-22   982  	 * registered, and trigger it as if we didn't miss it.
f8941e6c4c7129 Viresh Kumar 2023-08-22   983  	 */
f8941e6c4c7129 Viresh Kumar 2023-08-22   984  	events = vfs_poll(f.file, &kirqfd->pt);
f8941e6c4c7129 Viresh Kumar 2023-08-22   985  	if (events & EPOLLIN)
f8941e6c4c7129 Viresh Kumar 2023-08-22   986  		irqfd_inject(kirqfd);
f8941e6c4c7129 Viresh Kumar 2023-08-22   987  
f8941e6c4c7129 Viresh Kumar 2023-08-22   988  	/*
f8941e6c4c7129 Viresh Kumar 2023-08-22   989  	 * Do not drop the file until the kirqfd is fully initialized, otherwise
f8941e6c4c7129 Viresh Kumar 2023-08-22   990  	 * we might race against the EPOLLHUP.
f8941e6c4c7129 Viresh Kumar 2023-08-22   991  	 */
f8941e6c4c7129 Viresh Kumar 2023-08-22   992  	fdput(f);
f8941e6c4c7129 Viresh Kumar 2023-08-22   993  	return 0;
f8941e6c4c7129 Viresh Kumar 2023-08-22   994  
f8941e6c4c7129 Viresh Kumar 2023-08-22   995  error_eventfd:
f8941e6c4c7129 Viresh Kumar 2023-08-22   996  	eventfd_ctx_put(kirqfd->eventfd);
f8941e6c4c7129 Viresh Kumar 2023-08-22   997  
f8941e6c4c7129 Viresh Kumar 2023-08-22   998  error_fd_put:
f8941e6c4c7129 Viresh Kumar 2023-08-22   999  	fdput(f);
f8941e6c4c7129 Viresh Kumar 2023-08-22  1000  
f8941e6c4c7129 Viresh Kumar 2023-08-22  1001  error_kfree:
f8941e6c4c7129 Viresh Kumar 2023-08-22  1002  	kfree(kirqfd);
f8941e6c4c7129 Viresh Kumar 2023-08-22  1003  	return ret;
f8941e6c4c7129 Viresh Kumar 2023-08-22  1004  }
f8941e6c4c7129 Viresh Kumar 2023-08-22  1005  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc()
  2023-09-05 21:42 ` [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc() Peter Xu
  2023-09-05 23:21   ` kernel test robot
  2023-09-06 17:31   ` kernel test robot
@ 2023-09-06 20:53   ` kernel test robot
  2023-09-11 20:00   ` Peter Xu
  3 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2023-09-06 20:53 UTC (permalink / raw)
  To: Peter Xu, linux-kernel, linux-mm
  Cc: llvm, oe-kbuild-all, Anish Moorthy, Axel Rasmussen,
	Alexander Viro, Mike Kravetz, Peter Zijlstra, Andrew Morton,
	Linux Memory Management List, Mike Rapoport, Christian Brauner,
	peterx, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
	James Houghton, Nadav Amit

Hi Peter,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Peter-Xu/mm-userfaultfd-Make-uffd-read-wait-event-exclusive/20230906-054430
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20230905214235.320571-3-peterx%40redhat.com
patch subject: [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc()
config: hexagon-randconfig-002-20230906 (https://download.01.org/0day-ci/archive/20230907/202309070440.eKzBcg8X-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project.git f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230907/202309070440.eKzBcg8X-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309070440.eKzBcg8X-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from drivers/vhost/vhost.c:14:
   In file included from include/uapi/linux/vhost.h:14:
   In file included from include/uapi/linux/vhost_types.h:16:
   In file included from include/linux/virtio_config.h:7:
   In file included from include/linux/virtio.h:7:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/hexagon/include/asm/io.h:337:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __raw_readb(PCI_IOBASE + addr);
                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
                                                     ^
   In file included from drivers/vhost/vhost.c:14:
   In file included from include/uapi/linux/vhost.h:14:
   In file included from include/uapi/linux/vhost_types.h:16:
   In file included from include/linux/virtio_config.h:7:
   In file included from include/linux/virtio.h:7:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/hexagon/include/asm/io.h:337:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
                                                     ^
   In file included from drivers/vhost/vhost.c:14:
   In file included from include/uapi/linux/vhost.h:14:
   In file included from include/uapi/linux/vhost_types.h:16:
   In file included from include/linux/virtio_config.h:7:
   In file included from include/linux/virtio.h:7:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/hexagon/include/asm/io.h:337:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
>> drivers/vhost/vhost.c:193:41: error: incompatible function pointer types passing 'int (wait_queue_entry_t *, unsigned int, int, void *, poll_flags)' (aka 'int (struct wait_queue_entry *, unsigned int, int, void *, unsigned int)') to parameter of type 'wait_queue_func_t' (aka 'int (*)(struct wait_queue_entry *, unsigned int, int, void *)') [-Werror,-Wincompatible-function-pointer-types]
           init_waitqueue_func_entry(&poll->wait, vhost_poll_wakeup);
                                                  ^~~~~~~~~~~~~~~~~
   include/linux/wait.h:90:80: note: passing argument to parameter 'func' here
   init_waitqueue_func_entry(struct wait_queue_entry *wq_entry, wait_queue_func_t func)
                                                                                  ^
>> drivers/vhost/vhost.c:194:34: error: incompatible function pointer types passing 'void (struct file *, wait_queue_head_t *, poll_table *)' (aka 'void (struct file *, struct wait_queue_head *, struct poll_table_struct *)') to parameter of type 'poll_queue_proc' (aka 'void (*)(struct file *, struct wait_queue_head *, struct poll_table_struct *, unsigned int)') [-Werror,-Wincompatible-function-pointer-types]
           init_poll_funcptr(&poll->table, vhost_poll_func);
                                           ^~~~~~~~~~~~~~~
   include/linux/poll.h:76:70: note: passing argument to parameter 'qproc' here
   static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc)
                                                                        ^
>> drivers/vhost/vhost.c:215:57: error: too few arguments to function call, expected 5, have 4
                   vhost_poll_wakeup(&poll->wait, 0, 0, poll_to_key(mask));
                   ~~~~~~~~~~~~~~~~~                                     ^
   drivers/vhost/vhost.c:164:12: note: 'vhost_poll_wakeup' declared here
   static int vhost_poll_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync,
              ^
   6 warnings and 3 errors generated.


vim +193 drivers/vhost/vhost.c

87d6a412bd1ed8 Michael S. Tsirkin 2010-09-02  187  
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14  188  /* Init poll structure */
c23f3445e68e1d Tejun Heo          2010-06-02  189  void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
493b94bf5ae0f6 Mike Christie      2023-06-26  190  		     __poll_t mask, struct vhost_dev *dev,
493b94bf5ae0f6 Mike Christie      2023-06-26  191  		     struct vhost_virtqueue *vq)
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14  192  {
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14 @193  	init_waitqueue_func_entry(&poll->wait, vhost_poll_wakeup);
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14 @194  	init_poll_funcptr(&poll->table, vhost_poll_func);
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14  195  	poll->mask = mask;
c23f3445e68e1d Tejun Heo          2010-06-02  196  	poll->dev = dev;
2b8b328b61c799 Jason Wang         2013-01-28  197  	poll->wqh = NULL;
493b94bf5ae0f6 Mike Christie      2023-06-26  198  	poll->vq = vq;
c23f3445e68e1d Tejun Heo          2010-06-02  199  
87d6a412bd1ed8 Michael S. Tsirkin 2010-09-02  200  	vhost_work_init(&poll->work, fn);
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14  201  }
6ac1afbf6132df Asias He           2013-05-06  202  EXPORT_SYMBOL_GPL(vhost_poll_init);
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14  203  
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14  204  /* Start polling a file. We add ourselves to file's wait queue. The caller must
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14  205   * keep a reference to a file until after vhost_poll_stop is called. */
2b8b328b61c799 Jason Wang         2013-01-28  206  int vhost_poll_start(struct vhost_poll *poll, struct file *file)
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14  207  {
e6c8adca20ba45 Al Viro            2017-07-03  208  	__poll_t mask;
d47effe1be0c4f Krishna Kumar      2011-03-01  209  
70181d51209cbc Jason Wang         2013-04-10  210  	if (poll->wqh)
70181d51209cbc Jason Wang         2013-04-10  211  		return 0;
70181d51209cbc Jason Wang         2013-04-10  212  
9965ed174e7d38 Christoph Hellwig  2018-03-05  213  	mask = vfs_poll(file, &poll->table);
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14  214  	if (mask)
3ad6f93e98d6df Al Viro            2017-07-03 @215  		vhost_poll_wakeup(&poll->wait, 0, 0, poll_to_key(mask));
a9a08845e9acbd Linus Torvalds     2018-02-11  216  	if (mask & EPOLLERR) {
dc6455a71c7fc5 Jason Wang         2018-03-27  217  		vhost_poll_stop(poll);
896fc242bc1d26 Yunsheng Lin       2019-08-20  218  		return -EINVAL;
2b8b328b61c799 Jason Wang         2013-01-28  219  	}
2b8b328b61c799 Jason Wang         2013-01-28  220  
896fc242bc1d26 Yunsheng Lin       2019-08-20  221  	return 0;
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14  222  }
6ac1afbf6132df Asias He           2013-05-06  223  EXPORT_SYMBOL_GPL(vhost_poll_start);
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14  224  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups
  2023-09-05 21:42 [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Peter Xu
                   ` (6 preceding siblings ...)
  2023-09-05 21:42 ` [PATCH 7/7] selftests/mm: uffd perf test Peter Xu
@ 2023-09-07 19:18 ` Axel Rasmussen
  2023-09-08 22:01   ` Peter Xu
  7 siblings, 1 reply; 14+ messages in thread
From: Axel Rasmussen @ 2023-09-07 19:18 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, linux-mm, Anish Moorthy, Alexander Viro,
	Mike Kravetz, Peter Zijlstra, Andrew Morton, Mike Rapoport,
	Christian Brauner, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
	James Houghton, Nadav Amit

On Tue, Sep 5, 2023 at 2:42 PM Peter Xu <peterx@redhat.com> wrote:
>
> Userfaultfd is the type of file that doesn't need wake-all semantics: if
> there is a message enqueued (for either a fault address, or an event), we
> only need to wake up one service thread to handle it.  Waking up more
> normally means a waste of cpu cycles.  Besides that, and more importantly,
> that just doesn't scale.

Hi Peter,

I took a quick look over the series and didn't see anything
objectionable. I was planning to actually test the series out and then
send out R-b's, but it will take some additional time (next week).

In the meantime, I was curious about the use case. A design I've seen
for VM live migration is to have 1 thread reading events off the uffd,
and then have many threads actually resolving the fault events that
come in (e.g. fetching pages over the network, issuing UFFDIO_COPY or
UFFDIO_CONTINUE, or whatever). In that design, since we only have a
single reader anyway, I think this series doesn't help.

But, I'm curious if you have data indicating that > 1 reader is more
performant overall? I suspect it might be the case that, with "enough"
vCPUs, it makes sense to do so, but I don't have benchmark data to
tell me what that tipping point is yet.

OTOH, if one reader is plenty in ~all cases, optimizing this path is
less important.




>
> Andrea used to have one patch that made read() to be O(1) but never hit
> upstream.  This is my effort to try upstreaming that (which is a
> oneliner..), meanwhile on top of that I also made poll() O(1) on wakeup,
> too (more or less bring EPOLLEXCLUSIVE to poll()), with some tests showing
> that effect.
>
> To verify this, I added a test called uffd-perf (leveraging the refactored
> uffd selftest suite) that will measure the messaging channel latencies on
> wakeups, and the waitqueue optimizations can be reflected by the new test:
>
>         Constants: 40 uffd threads, on N_CPUS=40, memsize=512M
>         Units: milliseconds (to finish the test)
>         |-----------------+--------+-------+------------|
>         | test case       | before | after |   diff (%) |
>         |-----------------+--------+-------+------------|
>         | workers=8,poll  |   1762 |  1133 | -55.516328 |
>         | workers=8,read  |   1437 |   585 | -145.64103 |
>         | workers=16,poll |   1117 |  1097 | -1.8231541 |
>         | workers=16,read |   1159 |   759 | -52.700922 |
>         | workers=32,poll |   1001 |   973 | -2.8776978 |
>         | workers=32,read |    866 |   713 | -21.458626 |
>         |-----------------+--------+-------+------------|
>
> The more threads hanging on the fd_wqh, a bigger difference will be there
> shown in the numbers.  "8 worker threads" is the worst case here because it
> means there can be a worst case of 40-8=32 threads hanging idle on fd_wqh
> queue.
>
> In real life, workers can be more than this, but small number of active
> worker threads will cause similar effect.
>
> This is currently based on Andrew's mm-unstable branch, but assuming this
> is applicable to most of the not-so-old trees.
>
> Comments welcomed, thanks.
>
> Andrea Arcangeli (1):
>   mm/userfaultfd: Make uffd read() wait event exclusive
>
> Peter Xu (6):
>   poll: Add a poll_flags for poll_queue_proc()
>   poll: POLL_ENQUEUE_EXCLUSIVE
>   fs/userfaultfd: Use exclusive waitqueue for poll()
>   selftests/mm: Replace uffd_read_mutex with a semaphore
>   selftests/mm: Create uffd_fault_thread_create|join()
>   selftests/mm: uffd perf test
>
>  drivers/vfio/virqfd.c                    |   4 +-
>  drivers/vhost/vhost.c                    |   2 +-
>  drivers/virt/acrn/irqfd.c                |   2 +-
>  fs/aio.c                                 |   2 +-
>  fs/eventpoll.c                           |   2 +-
>  fs/select.c                              |   9 +-
>  fs/userfaultfd.c                         |   8 +-
>  include/linux/poll.h                     |  25 ++-
>  io_uring/poll.c                          |   4 +-
>  mm/memcontrol.c                          |   4 +-
>  net/9p/trans_fd.c                        |   3 +-
>  tools/testing/selftests/mm/Makefile      |   2 +
>  tools/testing/selftests/mm/uffd-common.c |  65 +++++++
>  tools/testing/selftests/mm/uffd-common.h |   7 +
>  tools/testing/selftests/mm/uffd-perf.c   | 207 +++++++++++++++++++++++
>  tools/testing/selftests/mm/uffd-stress.c |  53 +-----
>  virt/kvm/eventfd.c                       |   2 +-
>  17 files changed, 337 insertions(+), 64 deletions(-)
>  create mode 100644 tools/testing/selftests/mm/uffd-perf.c
>
> --
> 2.41.0
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups
  2023-09-07 19:18 ` [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Axel Rasmussen
@ 2023-09-08 22:01   ` Peter Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2023-09-08 22:01 UTC (permalink / raw)
  To: Axel Rasmussen
  Cc: linux-kernel, linux-mm, Anish Moorthy, Alexander Viro,
	Mike Kravetz, Peter Zijlstra, Andrew Morton, Mike Rapoport,
	Christian Brauner, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
	James Houghton, Nadav Amit

On Thu, Sep 07, 2023 at 12:18:29PM -0700, Axel Rasmussen wrote:
> On Tue, Sep 5, 2023 at 2:42 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > Userfaultfd is the type of file that doesn't need wake-all semantics: if
> > there is a message enqueued (for either a fault address, or an event), we
> > only need to wake up one service thread to handle it.  Waking up more
> > normally means a waste of cpu cycles.  Besides that, and more importantly,
> > that just doesn't scale.
> 
> Hi Peter,

Hi, Axel,

Sorry to respond late.

> 
> I took a quick look over the series and didn't see anything
> objectionable. I was planning to actually test the series out and then
> send out R-b's, but it will take some additional time (next week).

Thanks.  The 2nd patch definitely needs some fixup on some functions
(either I overlooked without enough CONFIG_* chosen; I am surprised I have
vhost even compiled out when testing..), hope that won't bring you too much
trouble.  I'll send a fixup soon on top of patch 2.

> 
> In the meantime, I was curious about the use case. A design I've seen
> for VM live migration is to have 1 thread reading events off the uffd,
> and then have many threads actually resolving the fault events that
> come in (e.g. fetching pages over the network, issuing UFFDIO_COPY or
> UFFDIO_CONTINUE, or whatever). In that design, since we only have a
> single reader anyway, I think this series doesn't help.

Yes.  If the test to carry out only uses 1 thread, it shouldn't bring much
difference.

> 
> But, I'm curious if you have data indicating that > 1 reader is more
> performant overall? I suspect it might be the case that, with "enough"
> vCPUs, it makes sense to do so, but I don't have benchmark data to
> tell me what that tipping point is yet.
> 
> OTOH, if one reader is plenty in ~all cases, optimizing this path is
> less important.

For myself I don't yet have an application that can leverage this much
indeed, because QEMU so far only uses 1 reader thread.

IIRC Anish was exactly proposing some kvm specific solutions to make single
uffd scale, and this might be suitable for any use case like that where we
may want to use single uffd and try to make it scale with threads.  Using 1
reader + N worker is also a solution, but when using N readers (which also
do the work) the app will hit this problem.

I am also aware that some apps use more than 1 reader threads (umap), but I
don't really know more than that.

The problem is I think we shouldn't have that overhead easily just because
an app invokes >1 readers, meanwhile it also doesn't make much sense to
wake up all readers for a single event for userfaults.  So it should always
be something good to have.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc()
  2023-09-05 21:42 ` [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc() Peter Xu
                     ` (2 preceding siblings ...)
  2023-09-06 20:53   ` kernel test robot
@ 2023-09-11 20:00   ` Peter Xu
  3 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2023-09-11 20:00 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Anish Moorthy, Axel Rasmussen, Alexander Viro, Mike Kravetz,
	Peter Zijlstra, Andrew Morton, Mike Rapoport, Christian Brauner,
	linux-fsdevel, Andrea Arcangeli, Ingo Molnar, James Houghton,
	Nadav Amit

On Tue, Sep 05, 2023 at 05:42:30PM -0400, Peter Xu wrote:
> Allows the poll enqueue function to pass over a flag into it.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

A fixup for this patch (should fix all syzbot errors):

===8<===
From ba55ee0539a7a80b98e7a5e3942c0ee8cabe5f73 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Tue, 5 Sep 2023 20:05:33 -0400
Subject: [PATCH] fixup! poll: Add a poll_flags for poll_queue_proc()

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 drivers/vhost/vhost.c | 4 ++--
 drivers/xen/privcmd.c | 3 ++-
 net/9p/trans_fd.c     | 1 +
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 02caad721843..00813db53ff1 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -152,7 +152,7 @@ static void vhost_flush_work(struct vhost_work *work)
 }
 
 static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh,
-			    poll_table *pt)
+			    poll_table *pt, poll_flags flags)
 {
 	struct vhost_poll *poll;
 
@@ -162,7 +162,7 @@ static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh,
 }
 
 static int vhost_poll_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync,
-			     void *key, poll_flags flags)
+			     void *key)
 {
 	struct vhost_poll *poll = container_of(wait, struct vhost_poll, wait);
 	struct vhost_work *work = &poll->work;
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index f00ad5f5f1d4..43e65186f25d 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -914,7 +914,8 @@ irqfd_wakeup(wait_queue_entry_t *wait, unsigned int mode, int sync, void *key)
 }
 
 static void
-irqfd_poll_func(struct file *file, wait_queue_head_t *wqh, poll_table *pt)
+irqfd_poll_func(struct file *file, wait_queue_head_t *wqh, poll_table *pt,
+		poll_flags flags)
 {
 	struct privcmd_kernel_irqfd *kirqfd =
 		container_of(pt, struct privcmd_kernel_irqfd, pt);
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 91f9f474ab01..2912c4b086a2 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -545,6 +545,7 @@ static int p9_pollwake(wait_queue_entry_t *wait, unsigned int mode, int sync, vo
  * @filp: file pointer being polled
  * @wait_address: wait_q to block on
  * @p: poll state
+ * @flags: poll flags
  *
  * called by files poll operation to add v9fs-poll task to files wait queue
  */
-- 
2.41.0
===8<===

Thanks,

-- 
Peter Xu


^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-09-11 20:53 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-05 21:42 [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Peter Xu
2023-09-05 21:42 ` [PATCH 1/7] mm/userfaultfd: Make uffd read() wait event exclusive Peter Xu
2023-09-05 21:42 ` [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc() Peter Xu
2023-09-05 23:21   ` kernel test robot
2023-09-06 17:31   ` kernel test robot
2023-09-06 20:53   ` kernel test robot
2023-09-11 20:00   ` Peter Xu
2023-09-05 21:42 ` [PATCH 3/7] poll: POLL_ENQUEUE_EXCLUSIVE Peter Xu
2023-09-05 21:42 ` [PATCH 4/7] fs/userfaultfd: Use exclusive waitqueue for poll() Peter Xu
2023-09-05 21:42 ` [PATCH 5/7] selftests/mm: Replace uffd_read_mutex with a semaphore Peter Xu
2023-09-05 21:42 ` [PATCH 6/7] selftests/mm: Create uffd_fault_thread_create|join() Peter Xu
2023-09-05 21:42 ` [PATCH 7/7] selftests/mm: uffd perf test Peter Xu
2023-09-07 19:18 ` [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Axel Rasmussen
2023-09-08 22:01   ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).