* [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups
@ 2023-09-05 21:42 Peter Xu
2023-09-05 21:42 ` [PATCH 1/7] mm/userfaultfd: Make uffd read() wait event exclusive Peter Xu
` (7 more replies)
0 siblings, 8 replies; 14+ messages in thread
From: Peter Xu @ 2023-09-05 21:42 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Anish Moorthy, Axel Rasmussen, Alexander Viro, Mike Kravetz,
Peter Zijlstra, Andrew Morton, Mike Rapoport, Christian Brauner,
peterx, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
James Houghton, Nadav Amit
Userfaultfd is the type of file that doesn't need wake-all semantics: if
there is a message enqueued (for either a fault address, or an event), we
only need to wake up one service thread to handle it. Waking up more
normally means a waste of cpu cycles. Besides that, and more importantly,
that just doesn't scale.
Andrea used to have one patch that made read() to be O(1) but never hit
upstream. This is my effort to try upstreaming that (which is a
oneliner..), meanwhile on top of that I also made poll() O(1) on wakeup,
too (more or less bring EPOLLEXCLUSIVE to poll()), with some tests showing
that effect.
To verify this, I added a test called uffd-perf (leveraging the refactored
uffd selftest suite) that will measure the messaging channel latencies on
wakeups, and the waitqueue optimizations can be reflected by the new test:
Constants: 40 uffd threads, on N_CPUS=40, memsize=512M
Units: milliseconds (to finish the test)
|-----------------+--------+-------+------------|
| test case | before | after | diff (%) |
|-----------------+--------+-------+------------|
| workers=8,poll | 1762 | 1133 | -55.516328 |
| workers=8,read | 1437 | 585 | -145.64103 |
| workers=16,poll | 1117 | 1097 | -1.8231541 |
| workers=16,read | 1159 | 759 | -52.700922 |
| workers=32,poll | 1001 | 973 | -2.8776978 |
| workers=32,read | 866 | 713 | -21.458626 |
|-----------------+--------+-------+------------|
The more threads hanging on the fd_wqh, a bigger difference will be there
shown in the numbers. "8 worker threads" is the worst case here because it
means there can be a worst case of 40-8=32 threads hanging idle on fd_wqh
queue.
In real life, workers can be more than this, but small number of active
worker threads will cause similar effect.
This is currently based on Andrew's mm-unstable branch, but assuming this
is applicable to most of the not-so-old trees.
Comments welcomed, thanks.
Andrea Arcangeli (1):
mm/userfaultfd: Make uffd read() wait event exclusive
Peter Xu (6):
poll: Add a poll_flags for poll_queue_proc()
poll: POLL_ENQUEUE_EXCLUSIVE
fs/userfaultfd: Use exclusive waitqueue for poll()
selftests/mm: Replace uffd_read_mutex with a semaphore
selftests/mm: Create uffd_fault_thread_create|join()
selftests/mm: uffd perf test
drivers/vfio/virqfd.c | 4 +-
drivers/vhost/vhost.c | 2 +-
drivers/virt/acrn/irqfd.c | 2 +-
fs/aio.c | 2 +-
fs/eventpoll.c | 2 +-
fs/select.c | 9 +-
fs/userfaultfd.c | 8 +-
include/linux/poll.h | 25 ++-
io_uring/poll.c | 4 +-
mm/memcontrol.c | 4 +-
net/9p/trans_fd.c | 3 +-
tools/testing/selftests/mm/Makefile | 2 +
tools/testing/selftests/mm/uffd-common.c | 65 +++++++
tools/testing/selftests/mm/uffd-common.h | 7 +
tools/testing/selftests/mm/uffd-perf.c | 207 +++++++++++++++++++++++
tools/testing/selftests/mm/uffd-stress.c | 53 +-----
virt/kvm/eventfd.c | 2 +-
17 files changed, 337 insertions(+), 64 deletions(-)
create mode 100644 tools/testing/selftests/mm/uffd-perf.c
--
2.41.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/7] mm/userfaultfd: Make uffd read() wait event exclusive
2023-09-05 21:42 [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Peter Xu
@ 2023-09-05 21:42 ` Peter Xu
2023-09-05 21:42 ` [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc() Peter Xu
` (6 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2023-09-05 21:42 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Anish Moorthy, Axel Rasmussen, Alexander Viro, Mike Kravetz,
Peter Zijlstra, Andrew Morton, Mike Rapoport, Christian Brauner,
peterx, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
James Houghton, Nadav Amit
From: Andrea Arcangeli <aarcange@redhat.com>
When a new message is generated for an userfaultfd, instead of waking up
all the readers, we can wake up only one exclusive reader to process the
event. Waking up >1 readers for 1 message will be a waste of resource,
where the rest readers will see nothing again and re-queue.
This should make userfaultfd read() O(1) on wakeups.
Note that queuing on head is intended (rather than tail) to make sure the
readers are waked up in LIFO fashion; fairness doesn't matter much here,
but caching does.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
[peterx: modified subjects / commit message]
Signed-off-by: Peter Xu <peterx@redhat.com>
---
fs/userfaultfd.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 56eaae9dac1a..f7fda7d0c994 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1061,7 +1061,11 @@ static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait,
/* always take the fd_wqh lock before the fault_pending_wqh lock */
spin_lock_irq(&ctx->fd_wqh.lock);
- __add_wait_queue(&ctx->fd_wqh, &wait);
+ /*
+ * Only wake up one exclusive reader each time there's an event.
+ * Paired with wake_up_poll() when e.g. a new page fault msg generated.
+ */
+ __add_wait_queue_exclusive(&ctx->fd_wqh, &wait);
for (;;) {
set_current_state(TASK_INTERRUPTIBLE);
spin_lock(&ctx->fault_pending_wqh.lock);
--
2.41.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc()
2023-09-05 21:42 [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Peter Xu
2023-09-05 21:42 ` [PATCH 1/7] mm/userfaultfd: Make uffd read() wait event exclusive Peter Xu
@ 2023-09-05 21:42 ` Peter Xu
2023-09-05 23:21 ` kernel test robot
` (3 more replies)
2023-09-05 21:42 ` [PATCH 3/7] poll: POLL_ENQUEUE_EXCLUSIVE Peter Xu
` (5 subsequent siblings)
7 siblings, 4 replies; 14+ messages in thread
From: Peter Xu @ 2023-09-05 21:42 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Anish Moorthy, Axel Rasmussen, Alexander Viro, Mike Kravetz,
Peter Zijlstra, Andrew Morton, Mike Rapoport, Christian Brauner,
peterx, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
James Houghton, Nadav Amit
Allows the poll enqueue function to pass over a flag into it.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
drivers/vfio/virqfd.c | 4 ++--
drivers/vhost/vhost.c | 2 +-
drivers/virt/acrn/irqfd.c | 2 +-
fs/aio.c | 2 +-
fs/eventpoll.c | 2 +-
fs/select.c | 4 ++--
include/linux/poll.h | 7 +++++--
io_uring/poll.c | 4 ++--
mm/memcontrol.c | 4 +++-
net/9p/trans_fd.c | 3 ++-
virt/kvm/eventfd.c | 2 +-
11 files changed, 21 insertions(+), 15 deletions(-)
diff --git a/drivers/vfio/virqfd.c b/drivers/vfio/virqfd.c
index 29c564b7a6e1..4b817a6f4f72 100644
--- a/drivers/vfio/virqfd.c
+++ b/drivers/vfio/virqfd.c
@@ -75,8 +75,8 @@ static int virqfd_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, void
return 0;
}
-static void virqfd_ptable_queue_proc(struct file *file,
- wait_queue_head_t *wqh, poll_table *pt)
+static void virqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh,
+ poll_table *pt, poll_flags flags)
{
struct virqfd *virqfd = container_of(pt, struct virqfd, pt);
add_wait_queue(wqh, &virqfd->wait);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c71d573f1c94..02caad721843 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -162,7 +162,7 @@ static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh,
}
static int vhost_poll_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync,
- void *key)
+ void *key, poll_flags flags)
{
struct vhost_poll *poll = container_of(wait, struct vhost_poll, wait);
struct vhost_work *work = &poll->work;
diff --git a/drivers/virt/acrn/irqfd.c b/drivers/virt/acrn/irqfd.c
index d4ad211dce7a..9b79e4e76e49 100644
--- a/drivers/virt/acrn/irqfd.c
+++ b/drivers/virt/acrn/irqfd.c
@@ -94,7 +94,7 @@ static int hsm_irqfd_wakeup(wait_queue_entry_t *wait, unsigned int mode,
}
static void hsm_irqfd_poll_func(struct file *file, wait_queue_head_t *wqh,
- poll_table *pt)
+ poll_table *pt, poll_flags flags)
{
struct hsm_irqfd *irqfd;
diff --git a/fs/aio.c b/fs/aio.c
index a4c2a6bac72c..abb5b22f4fdf 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1823,7 +1823,7 @@ struct aio_poll_table {
static void
aio_poll_queue_proc(struct file *file, struct wait_queue_head *head,
- struct poll_table_struct *p)
+ struct poll_table_struct *p, poll_flags flags)
{
struct aio_poll_table *pt = container_of(p, struct aio_poll_table, pt);
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 1d9a71a0c4c1..c74d6a083fd1 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1270,7 +1270,7 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
* target file wakeup lists.
*/
static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead,
- poll_table *pt)
+ poll_table *pt, poll_flags flags)
{
struct ep_pqueue *epq = container_of(pt, struct ep_pqueue, pt);
struct epitem *epi = epq->epi;
diff --git a/fs/select.c b/fs/select.c
index 0ee55af1a55c..0433448481e9 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -117,7 +117,7 @@ struct poll_table_page {
* poll table.
*/
static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
- poll_table *p);
+ poll_table *p, poll_flags flags);
void poll_initwait(struct poll_wqueues *pwq)
{
@@ -220,7 +220,7 @@ static int pollwake(wait_queue_entry_t *wait, unsigned mode, int sync, void *key
/* Add a new entry */
static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
- poll_table *p)
+ poll_table *p, poll_flags flags)
{
struct poll_wqueues *pwq = container_of(p, struct poll_wqueues, pt);
struct poll_table_entry *entry = poll_get_entry(pwq);
diff --git a/include/linux/poll.h b/include/linux/poll.h
index a9e0e1c2d1f2..cbad520fc65c 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -27,12 +27,15 @@
#define DEFAULT_POLLMASK (EPOLLIN | EPOLLOUT | EPOLLRDNORM | EPOLLWRNORM)
+typedef unsigned int poll_flags;
+
struct poll_table_struct;
/*
* structures and helpers for f_op->poll implementations
*/
-typedef void (*poll_queue_proc)(struct file *, wait_queue_head_t *, struct poll_table_struct *);
+typedef void (*poll_queue_proc)(struct file *, wait_queue_head_t *,
+ struct poll_table_struct *, poll_flags);
/*
* Do not touch the structure directly, use the access functions
@@ -46,7 +49,7 @@ typedef struct poll_table_struct {
static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_address, poll_table *p)
{
if (p && p->_qproc && wait_address)
- p->_qproc(filp, wait_address, p);
+ p->_qproc(filp, wait_address, p, 0);
}
/*
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 4c360ba8793a..c3b41e963a8d 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -533,7 +533,7 @@ static void __io_queue_proc(struct io_poll *poll, struct io_poll_table *pt,
}
static void io_poll_queue_proc(struct file *file, struct wait_queue_head *head,
- struct poll_table_struct *p)
+ struct poll_table_struct *p, poll_flags flags)
{
struct io_poll_table *pt = container_of(p, struct io_poll_table, pt);
struct io_poll *poll = io_kiocb_to_cmd(pt->req, struct io_poll);
@@ -644,7 +644,7 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
}
static void io_async_queue_proc(struct file *file, struct wait_queue_head *head,
- struct poll_table_struct *p)
+ struct poll_table_struct *p, poll_flags flags)
{
struct io_poll_table *pt = container_of(p, struct io_poll_table, pt);
struct async_poll *apoll = pt->req->apoll;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ecc07b47e813..97b03ab30d5e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4877,7 +4877,9 @@ static int memcg_event_wake(wait_queue_entry_t *wait, unsigned mode,
}
static void memcg_event_ptable_queue_proc(struct file *file,
- wait_queue_head_t *wqh, poll_table *pt)
+ wait_queue_head_t *wqh,
+ poll_table *pt,
+ poll_flags flags)
{
struct mem_cgroup_event *event =
container_of(pt, struct mem_cgroup_event, pt);
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index c4015f30f9fa..91f9f474ab01 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -550,7 +550,8 @@ static int p9_pollwake(wait_queue_entry_t *wait, unsigned int mode, int sync, vo
*/
static void
-p9_pollwait(struct file *filp, wait_queue_head_t *wait_address, poll_table *p)
+p9_pollwait(struct file *filp, wait_queue_head_t *wait_address, poll_table *p,
+ poll_flags flags)
{
struct p9_conn *m = container_of(p, struct p9_conn, pt);
struct p9_poll_wait *pwait = NULL;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 89912a17f5d5..645b5d155386 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -246,7 +246,7 @@ irqfd_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync, void *key)
static void
irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh,
- poll_table *pt)
+ poll_table *pt, poll_flags flags)
{
struct kvm_kernel_irqfd *irqfd =
container_of(pt, struct kvm_kernel_irqfd, pt);
--
2.41.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/7] poll: POLL_ENQUEUE_EXCLUSIVE
2023-09-05 21:42 [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Peter Xu
2023-09-05 21:42 ` [PATCH 1/7] mm/userfaultfd: Make uffd read() wait event exclusive Peter Xu
2023-09-05 21:42 ` [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc() Peter Xu
@ 2023-09-05 21:42 ` Peter Xu
2023-09-05 21:42 ` [PATCH 4/7] fs/userfaultfd: Use exclusive waitqueue for poll() Peter Xu
` (4 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2023-09-05 21:42 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Anish Moorthy, Axel Rasmussen, Alexander Viro, Mike Kravetz,
Peter Zijlstra, Andrew Morton, Mike Rapoport, Christian Brauner,
peterx, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
James Houghton, Nadav Amit
Add a flag for poll_wait() showing that the caller wants the enqueue to be
exclusive. It is similar to EPOLLEXCLUSIVE for epoll() but grants kernel
poll users to opt-in with more efficient exclusive queuing where applicable.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
fs/select.c | 5 ++++-
include/linux/poll.h | 20 ++++++++++++++++++--
2 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/fs/select.c b/fs/select.c
index 0433448481e9..a3c9088e8d76 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -231,7 +231,10 @@ static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
entry->key = p->_key;
init_waitqueue_func_entry(&entry->wait, pollwake);
entry->wait.private = pwq;
- add_wait_queue(wait_address, &entry->wait);
+ if (flags & POLL_ENQUEUE_EXCLUSIVE)
+ add_wait_queue_exclusive(wait_address, &entry->wait);
+ else
+ add_wait_queue(wait_address, &entry->wait);
}
static int poll_schedule_timeout(struct poll_wqueues *pwq, int state,
diff --git a/include/linux/poll.h b/include/linux/poll.h
index cbad520fc65c..11af98ae579c 100644
--- a/include/linux/poll.h
+++ b/include/linux/poll.h
@@ -29,6 +29,8 @@
typedef unsigned int poll_flags;
+#define POLL_ENQUEUE_EXCLUSIVE BIT(0)
+
struct poll_table_struct;
/*
@@ -46,10 +48,24 @@ typedef struct poll_table_struct {
__poll_t _key;
} poll_table;
-static inline void poll_wait(struct file * filp, wait_queue_head_t * wait_address, poll_table *p)
+static inline void __poll_wait(struct file *filp, wait_queue_head_t *wait_address,
+ poll_table *p, poll_flags flags)
{
if (p && p->_qproc && wait_address)
- p->_qproc(filp, wait_address, p, 0);
+ p->_qproc(filp, wait_address, p, flags);
+}
+
+static inline void poll_wait(struct file *filp, wait_queue_head_t *wait_address,
+ poll_table *p)
+{
+ __poll_wait(filp, wait_address, p, 0);
+}
+
+static inline void poll_wait_exclusive(struct file *filp,
+ wait_queue_head_t *wait_address,
+ poll_table *p)
+{
+ __poll_wait(filp, wait_address, p, POLL_ENQUEUE_EXCLUSIVE);
}
/*
--
2.41.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 4/7] fs/userfaultfd: Use exclusive waitqueue for poll()
2023-09-05 21:42 [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Peter Xu
` (2 preceding siblings ...)
2023-09-05 21:42 ` [PATCH 3/7] poll: POLL_ENQUEUE_EXCLUSIVE Peter Xu
@ 2023-09-05 21:42 ` Peter Xu
2023-09-05 21:42 ` [PATCH 5/7] selftests/mm: Replace uffd_read_mutex with a semaphore Peter Xu
` (3 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2023-09-05 21:42 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Anish Moorthy, Axel Rasmussen, Alexander Viro, Mike Kravetz,
Peter Zijlstra, Andrew Morton, Mike Rapoport, Christian Brauner,
peterx, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
James Houghton, Nadav Amit
Userfaultfd is the kind of fd that does not need a wake all semantics when
wake up. Enqueue using the new POLL_ENQUEUE_EXCLUSIVE flag.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
fs/userfaultfd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index f7fda7d0c994..9c39adc398fc 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -994,7 +994,7 @@ static __poll_t userfaultfd_poll(struct file *file, poll_table *wait)
struct userfaultfd_ctx *ctx = file->private_data;
__poll_t ret;
- poll_wait(file, &ctx->fd_wqh, wait);
+ poll_wait_exclusive(file, &ctx->fd_wqh, wait);
if (!userfaultfd_is_initialized(ctx))
return EPOLLERR;
--
2.41.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 5/7] selftests/mm: Replace uffd_read_mutex with a semaphore
2023-09-05 21:42 [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Peter Xu
` (3 preceding siblings ...)
2023-09-05 21:42 ` [PATCH 4/7] fs/userfaultfd: Use exclusive waitqueue for poll() Peter Xu
@ 2023-09-05 21:42 ` Peter Xu
2023-09-05 21:42 ` [PATCH 6/7] selftests/mm: Create uffd_fault_thread_create|join() Peter Xu
` (2 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2023-09-05 21:42 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Anish Moorthy, Axel Rasmussen, Alexander Viro, Mike Kravetz,
Peter Zijlstra, Andrew Morton, Mike Rapoport, Christian Brauner,
peterx, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
James Houghton, Nadav Amit
Each uffd read threads unlocks the read mutex first, probably just to make
sure the thread is reaching a stage where pthread_cancel() can always work
before the main thread moves on.
However keeping the mutex locked always and unlock in the thread is a bit
hacky. Replacing it with a semaphore which should be much clearer, where
the main thread will wait() and the thread will just post(). Move it to
uffd-common.* to be reused later.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
tools/testing/selftests/mm/uffd-common.c | 1 +
tools/testing/selftests/mm/uffd-common.h | 2 ++
tools/testing/selftests/mm/uffd-stress.c | 8 +++-----
3 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/mm/uffd-common.c b/tools/testing/selftests/mm/uffd-common.c
index 02b89860e193..aded06cab285 100644
--- a/tools/testing/selftests/mm/uffd-common.c
+++ b/tools/testing/selftests/mm/uffd-common.c
@@ -17,6 +17,7 @@ bool map_shared;
bool test_uffdio_wp = true;
unsigned long long *count_verify;
uffd_test_ops_t *uffd_test_ops;
+sem_t uffd_read_sem;
static int uffd_mem_fd_create(off_t mem_size, bool hugetlb)
{
diff --git a/tools/testing/selftests/mm/uffd-common.h b/tools/testing/selftests/mm/uffd-common.h
index 7c4fa964c3b0..521523baded1 100644
--- a/tools/testing/selftests/mm/uffd-common.h
+++ b/tools/testing/selftests/mm/uffd-common.h
@@ -32,6 +32,7 @@
#include <inttypes.h>
#include <stdint.h>
#include <sys/random.h>
+#include <semaphore.h>
#include "../kselftest.h"
#include "vm_util.h"
@@ -97,6 +98,7 @@ extern bool map_shared;
extern bool test_uffdio_wp;
extern unsigned long long *count_verify;
extern volatile bool test_uffdio_copy_eexist;
+extern sem_t uffd_read_sem;
extern uffd_test_ops_t anon_uffd_test_ops;
extern uffd_test_ops_t shmem_uffd_test_ops;
diff --git a/tools/testing/selftests/mm/uffd-stress.c b/tools/testing/selftests/mm/uffd-stress.c
index 469e0476af26..7219f55ae794 100644
--- a/tools/testing/selftests/mm/uffd-stress.c
+++ b/tools/testing/selftests/mm/uffd-stress.c
@@ -125,14 +125,12 @@ static int copy_page_retry(int ufd, unsigned long offset)
return __copy_page(ufd, offset, true, test_uffdio_wp);
}
-pthread_mutex_t uffd_read_mutex = PTHREAD_MUTEX_INITIALIZER;
-
static void *uffd_read_thread(void *arg)
{
struct uffd_args *args = (struct uffd_args *)arg;
struct uffd_msg msg;
- pthread_mutex_unlock(&uffd_read_mutex);
+ sem_post(&uffd_read_sem);
/* from here cancellation is ok */
for (;;) {
@@ -196,7 +194,7 @@ static int stress(struct uffd_args *args)
uffd_read_thread,
(void *)&args[cpu]))
return 1;
- pthread_mutex_lock(&uffd_read_mutex);
+ sem_wait(&uffd_read_sem);
}
if (pthread_create(&background_threads[cpu], &attr,
background_thread, (void *)cpu))
@@ -258,7 +256,7 @@ static int userfaultfd_stress(void)
zeropage = area;
bzero(zeropage, page_size);
- pthread_mutex_lock(&uffd_read_mutex);
+ sem_init(&uffd_read_sem, 0, 0);
pthread_attr_init(&attr);
pthread_attr_setstacksize(&attr, 16*1024*1024);
--
2.41.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 6/7] selftests/mm: Create uffd_fault_thread_create|join()
2023-09-05 21:42 [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Peter Xu
` (4 preceding siblings ...)
2023-09-05 21:42 ` [PATCH 5/7] selftests/mm: Replace uffd_read_mutex with a semaphore Peter Xu
@ 2023-09-05 21:42 ` Peter Xu
2023-09-05 21:42 ` [PATCH 7/7] selftests/mm: uffd perf test Peter Xu
2023-09-07 19:18 ` [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Axel Rasmussen
7 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2023-09-05 21:42 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Anish Moorthy, Axel Rasmussen, Alexander Viro, Mike Kravetz,
Peter Zijlstra, Andrew Morton, Mike Rapoport, Christian Brauner,
peterx, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
James Houghton, Nadav Amit
Make them common functions to be reused.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
tools/testing/selftests/mm/uffd-common.c | 46 ++++++++++++++++++++++
tools/testing/selftests/mm/uffd-common.h | 4 ++
tools/testing/selftests/mm/uffd-stress.c | 49 ++++--------------------
3 files changed, 57 insertions(+), 42 deletions(-)
diff --git a/tools/testing/selftests/mm/uffd-common.c b/tools/testing/selftests/mm/uffd-common.c
index aded06cab285..851284395b29 100644
--- a/tools/testing/selftests/mm/uffd-common.c
+++ b/tools/testing/selftests/mm/uffd-common.c
@@ -555,6 +555,52 @@ void *uffd_poll_thread(void *arg)
return NULL;
}
+void *uffd_read_thread(void *arg)
+{
+ struct uffd_args *args = (struct uffd_args *)arg;
+ struct uffd_msg msg;
+
+ sem_post(&uffd_read_sem);
+ /* from here cancellation is ok */
+
+ for (;;) {
+ if (uffd_read_msg(uffd, &msg))
+ continue;
+ uffd_handle_page_fault(&msg, args);
+ }
+
+ return NULL;
+}
+
+void uffd_fault_thread_create(pthread_t *thread, pthread_attr_t *attr,
+ struct uffd_args *args, bool poll)
+{
+ if (poll) {
+ if (pthread_create(thread, attr, uffd_poll_thread, args))
+ err("uffd_poll_thread create");
+ } else {
+ if (pthread_create(thread, attr, uffd_read_thread, args))
+ err("uffd_read_thread create");
+ sem_wait(&uffd_read_sem);
+ }
+}
+
+void uffd_fault_thread_join(pthread_t thread, int cpu, bool poll)
+{
+ char c = 1;
+
+ if (poll) {
+ if (write(pipefd[cpu*2+1], &c, 1) != 1)
+ err("pipefd write error");
+ } else {
+ if (pthread_cancel(thread))
+ err("pthread_cancel()");
+ }
+
+ if (pthread_join(thread, NULL))
+ err("pthread_join()");
+}
+
static void retry_copy_page(int ufd, struct uffdio_copy *uffdio_copy,
unsigned long offset)
{
diff --git a/tools/testing/selftests/mm/uffd-common.h b/tools/testing/selftests/mm/uffd-common.h
index 521523baded1..9d66ad5c52cb 100644
--- a/tools/testing/selftests/mm/uffd-common.h
+++ b/tools/testing/selftests/mm/uffd-common.h
@@ -114,6 +114,10 @@ void uffd_handle_page_fault(struct uffd_msg *msg, struct uffd_args *args);
int __copy_page(int ufd, unsigned long offset, bool retry, bool wp);
int copy_page(int ufd, unsigned long offset, bool wp);
void *uffd_poll_thread(void *arg);
+void *uffd_read_thread(void *arg);
+void uffd_fault_thread_create(pthread_t *thread, pthread_attr_t *attr,
+ struct uffd_args *args, bool poll);
+void uffd_fault_thread_join(pthread_t thread, int cpu, bool poll);
int uffd_open_dev(unsigned int flags);
int uffd_open_sys(unsigned int flags);
diff --git a/tools/testing/selftests/mm/uffd-stress.c b/tools/testing/selftests/mm/uffd-stress.c
index 7219f55ae794..915795e33432 100644
--- a/tools/testing/selftests/mm/uffd-stress.c
+++ b/tools/testing/selftests/mm/uffd-stress.c
@@ -125,23 +125,6 @@ static int copy_page_retry(int ufd, unsigned long offset)
return __copy_page(ufd, offset, true, test_uffdio_wp);
}
-static void *uffd_read_thread(void *arg)
-{
- struct uffd_args *args = (struct uffd_args *)arg;
- struct uffd_msg msg;
-
- sem_post(&uffd_read_sem);
- /* from here cancellation is ok */
-
- for (;;) {
- if (uffd_read_msg(uffd, &msg))
- continue;
- uffd_handle_page_fault(&msg, args);
- }
-
- return NULL;
-}
-
static void *background_thread(void *arg)
{
unsigned long cpu = (unsigned long) arg;
@@ -186,16 +169,10 @@ static int stress(struct uffd_args *args)
if (pthread_create(&locking_threads[cpu], &attr,
locking_thread, (void *)cpu))
return 1;
- if (bounces & BOUNCE_POLL) {
- if (pthread_create(&uffd_threads[cpu], &attr, uffd_poll_thread, &args[cpu]))
- err("uffd_poll_thread create");
- } else {
- if (pthread_create(&uffd_threads[cpu], &attr,
- uffd_read_thread,
- (void *)&args[cpu]))
- return 1;
- sem_wait(&uffd_read_sem);
- }
+
+ uffd_fault_thread_create(&uffd_threads[cpu], &attr,
+ &args[cpu], bounces & BOUNCE_POLL);
+
if (pthread_create(&background_threads[cpu], &attr,
background_thread, (void *)cpu))
return 1;
@@ -220,21 +197,9 @@ static int stress(struct uffd_args *args)
if (pthread_join(locking_threads[cpu], NULL))
return 1;
- for (cpu = 0; cpu < nr_cpus; cpu++) {
- char c;
- if (bounces & BOUNCE_POLL) {
- if (write(pipefd[cpu*2+1], &c, 1) != 1)
- err("pipefd write error");
- if (pthread_join(uffd_threads[cpu],
- (void *)&args[cpu]))
- return 1;
- } else {
- if (pthread_cancel(uffd_threads[cpu]))
- return 1;
- if (pthread_join(uffd_threads[cpu], NULL))
- return 1;
- }
- }
+ for (cpu = 0; cpu < nr_cpus; cpu++)
+ uffd_fault_thread_join(uffd_threads[cpu], cpu,
+ bounces & BOUNCE_POLL);
return 0;
}
--
2.41.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 7/7] selftests/mm: uffd perf test
2023-09-05 21:42 [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Peter Xu
` (5 preceding siblings ...)
2023-09-05 21:42 ` [PATCH 6/7] selftests/mm: Create uffd_fault_thread_create|join() Peter Xu
@ 2023-09-05 21:42 ` Peter Xu
2023-09-07 19:18 ` [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Axel Rasmussen
7 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2023-09-05 21:42 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Anish Moorthy, Axel Rasmussen, Alexander Viro, Mike Kravetz,
Peter Zijlstra, Andrew Morton, Mike Rapoport, Christian Brauner,
peterx, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
James Houghton, Nadav Amit
Add a simple perf test for userfaultfd missing mode, on private anon only.
It mostly only tests the messaging, so memory type / fault type may not
that much yet.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
tools/testing/selftests/mm/Makefile | 2 +
tools/testing/selftests/mm/uffd-common.c | 18 ++
tools/testing/selftests/mm/uffd-common.h | 1 +
tools/testing/selftests/mm/uffd-perf.c | 207 +++++++++++++++++++++++
4 files changed, 228 insertions(+)
create mode 100644 tools/testing/selftests/mm/uffd-perf.c
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 6a9fc5693145..acb22517d37e 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -64,6 +64,7 @@ TEST_GEN_FILES += thuge-gen
TEST_GEN_FILES += transhuge-stress
TEST_GEN_FILES += uffd-stress
TEST_GEN_FILES += uffd-unit-tests
+TEST_GEN_FILES += uffd-perf
TEST_GEN_FILES += split_huge_page_test
TEST_GEN_FILES += ksm_tests
TEST_GEN_FILES += ksm_functional_tests
@@ -120,6 +121,7 @@ $(TEST_GEN_FILES): vm_util.c
$(OUTPUT)/uffd-stress: uffd-common.c
$(OUTPUT)/uffd-unit-tests: uffd-common.c
+$(OUTPUT)/uffd-perf: uffd-common.c
ifeq ($(ARCH),x86_64)
BINARIES_32 := $(patsubst %,$(OUTPUT)/%,$(BINARIES_32))
diff --git a/tools/testing/selftests/mm/uffd-common.c b/tools/testing/selftests/mm/uffd-common.c
index 851284395b29..afbf2f7add56 100644
--- a/tools/testing/selftests/mm/uffd-common.c
+++ b/tools/testing/selftests/mm/uffd-common.c
@@ -725,3 +725,21 @@ int uffd_get_features(uint64_t *features)
return 0;
}
+
+uint64_t get_usec(void)
+{
+ uint64_t val = 0;
+ struct timespec t;
+ int ret = clock_gettime(CLOCK_MONOTONIC, &t);
+
+ if (ret == -1) {
+ perror("clock_gettime() failed");
+ /* should never happen */
+ exit(-1);
+ }
+
+ val = t.tv_nsec / 1000; /* ns -> us */
+ val += t.tv_sec * 1000000; /* s -> us */
+
+ return val;
+}
diff --git a/tools/testing/selftests/mm/uffd-common.h b/tools/testing/selftests/mm/uffd-common.h
index 9d66ad5c52cb..4273201ae19f 100644
--- a/tools/testing/selftests/mm/uffd-common.h
+++ b/tools/testing/selftests/mm/uffd-common.h
@@ -123,6 +123,7 @@ int uffd_open_dev(unsigned int flags);
int uffd_open_sys(unsigned int flags);
int uffd_open(unsigned int flags);
int uffd_get_features(uint64_t *features);
+uint64_t get_usec(void);
#define TEST_ANON 1
#define TEST_HUGETLB 2
diff --git a/tools/testing/selftests/mm/uffd-perf.c b/tools/testing/selftests/mm/uffd-perf.c
new file mode 100644
index 000000000000..eda99718311a
--- /dev/null
+++ b/tools/testing/selftests/mm/uffd-perf.c
@@ -0,0 +1,207 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Userfaultfd performance tests.
+ *
+ * Copyright (C) 2023 Red Hat, Inc.
+ */
+
+#include "uffd-common.h"
+
+#ifdef __NR_userfaultfd
+
+#define DEF_MEM_SIZE_MB (512)
+#define MB(x) ((x) * 1024 * 1024)
+#define DEF_N_TESTS 5
+
+static volatile bool perf_test_started;
+static unsigned int n_uffd_threads, n_worker_threads;
+static uint64_t nr_pages_per_worker;
+static unsigned long n_tests = DEF_N_TESTS;
+
+static void setup_env(unsigned long mem_size_mb)
+{
+ /* Test private anon only for now */
+ map_shared = false;
+ uffd_test_ops = &anon_uffd_test_ops;
+ page_size = psize();
+ nr_cpus = n_uffd_threads;
+ nr_pages = MB(mem_size_mb) / page_size;
+ nr_pages_per_worker = nr_pages / n_worker_threads;
+ if (nr_pages_per_worker == 0)
+ err("each worker should at least own one page");
+}
+
+void *worker_fn(void *opaque)
+{
+ unsigned long i = (unsigned long) opaque;
+ unsigned long page_nr, start_nr, end_nr;
+ int v = 0;
+
+ start_nr = i * nr_pages_per_worker;
+ end_nr = (i + 1) * nr_pages_per_worker;
+
+ while (!perf_test_started);
+
+ for (page_nr = start_nr; page_nr < end_nr; page_nr++)
+ v += *(volatile int *)(area_dst + page_nr * page_size);
+
+ return NULL;
+}
+
+static uint64_t run_perf(uint64_t mem_size_mb, bool poll)
+{
+ pthread_t worker_threads[n_worker_threads];
+ pthread_t uffd_threads[n_uffd_threads];
+ const char *errmsg = NULL;
+ struct uffd_args *args;
+ uint64_t start, end;
+ int i, ret;
+
+ if (uffd_test_ctx_init(0, &errmsg))
+ err("%s", errmsg);
+
+ /*
+ * By default, uffd is opened with NONBLOCK mode; use block mode
+ * when test read()
+ */
+ if (!poll) {
+ int flags = fcntl(uffd, F_GETFL);
+
+ if (flags < 0)
+ err("fcntl(F_GETFL) failed");
+
+ if (flags & O_NONBLOCK)
+ flags &= ~O_NONBLOCK;
+
+ if (fcntl(uffd, F_SETFL, flags))
+ err("fcntl(F_SETFL) failed");
+ }
+
+ ret = uffd_register(uffd, area_dst, MB(mem_size_mb),
+ true, false, false);
+ if (ret)
+ err("uffd_register() failed");
+
+ args = calloc(nr_cpus, sizeof(struct uffd_args));
+ if (!args)
+ err("calloc()");
+
+ for (i = 0; i < n_uffd_threads; i++) {
+ args[i].cpu = i;
+ uffd_fault_thread_create(&uffd_threads[i], NULL,
+ &args[i], poll);
+ }
+
+ for (i = 0; i < n_worker_threads; i++) {
+ if (pthread_create(&worker_threads[i], NULL,
+ worker_fn, (void *)(uintptr_t)i))
+ err("create uffd threads");
+ }
+
+ start = get_usec();
+ perf_test_started = true;
+ for (i = 0; i < n_worker_threads; i++)
+ pthread_join(worker_threads[i], NULL);
+ end = get_usec();
+
+ for (i = 0; i < n_uffd_threads; i++) {
+ struct uffd_args *p = &args[i];
+
+ uffd_fault_thread_join(uffd_threads[i], i, poll);
+
+ assert(p->wp_faults == 0 && p->minor_faults == 0);
+ }
+
+ free(args);
+
+ ret = uffd_unregister(uffd, area_dst, MB(mem_size_mb));
+ if (ret)
+ err("uffd_unregister() failed");
+
+ return end - start;
+}
+
+static void usage(const char *prog)
+{
+ printf("usage: %s <options>\n", prog);
+ puts("");
+ printf(" -m: size of memory to test (in MB, default: %u)\n",
+ DEF_MEM_SIZE_MB);
+ puts(" -p: use poll() (the default)");
+ puts(" -r: use read()");
+ printf(" -t: test rounds (default: %u)\n", DEF_N_TESTS);
+ puts(" -u: number of uffd threads (default: n_cpus)");
+ puts(" -w: number of worker threads (default: n_cpus)");
+ puts("");
+ exit(KSFT_FAIL);
+}
+
+int main(int argc, char *argv[])
+{
+ unsigned long mem_size_mb = DEF_MEM_SIZE_MB;
+ uint64_t result, sum = 0;
+ bool use_poll = true;
+ int opt, count;
+
+ n_uffd_threads = n_worker_threads = sysconf(_SC_NPROCESSORS_ONLN);
+
+ while ((opt = getopt(argc, argv, "hm:prt:u:w:")) != -1) {
+ switch (opt) {
+ case 'm':
+ mem_size_mb = strtoul(optarg, NULL, 10);
+ break;
+ case 'p':
+ use_poll = true;
+ break;
+ case 'r':
+ use_poll = false;
+ break;
+ case 't':
+ n_tests = strtoul(optarg, NULL, 10);
+ break;
+ case 'u':
+ n_uffd_threads = strtoul(optarg, NULL, 10);
+ break;
+ case 'w':
+ n_worker_threads = strtoul(optarg, NULL, 10);
+ break;
+ case 'h':
+ default:
+ /* Unknown */
+ usage(argv[0]);
+ break;
+ }
+ }
+
+ setup_env(mem_size_mb);
+
+ printf("Message mode: \t\t%s\n", use_poll ? "poll" : "read");
+ printf("Mem size: \t\t%lu (MB)\n", mem_size_mb);
+ printf("Uffd threads: \t\t%u\n", n_uffd_threads);
+ printf("Worker threads: \t%u\n", n_worker_threads);
+ printf("Test rounds: \t\t%lu\n", n_tests);
+ printf("Time used (us): \t");
+
+ for (count = 0; count < n_tests; count++) {
+ result = run_perf(mem_size_mb, use_poll);
+ sum += result;
+ printf("%" PRIu64 ", ", result);
+ fflush(stdout);
+ }
+ printf("\b\b \n");
+ printf("Average (us): \t\t%"PRIu64"\n", sum / n_tests);
+
+ return KSFT_PASS;
+}
+
+#else /* __NR_userfaultfd */
+
+#warning "missing __NR_userfaultfd definition"
+
+int main(void)
+{
+ printf("Skipping %s (missing __NR_userfaultfd)\n", __file__);
+ return KSFT_SKIP;
+}
+
+#endif /* __NR_userfaultfd */
--
2.41.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc()
2023-09-05 21:42 ` [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc() Peter Xu
@ 2023-09-05 23:21 ` kernel test robot
2023-09-06 17:31 ` kernel test robot
` (2 subsequent siblings)
3 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2023-09-05 23:21 UTC (permalink / raw)
To: Peter Xu, linux-kernel, linux-mm
Cc: oe-kbuild-all, Anish Moorthy, Axel Rasmussen, Alexander Viro,
Mike Kravetz, Peter Zijlstra, Andrew Morton,
Linux Memory Management List, Mike Rapoport, Christian Brauner,
peterx, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
James Houghton, Nadav Amit
Hi Peter,
kernel test robot noticed the following build warnings:
[auto build test WARNING on akpm-mm/mm-everything]
url: https://github.com/intel-lab-lkp/linux/commits/Peter-Xu/mm-userfaultfd-Make-uffd-read-wait-event-exclusive/20230906-054430
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20230905214235.320571-3-peterx%40redhat.com
patch subject: [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc()
config: m68k-allyesconfig (https://download.01.org/0day-ci/archive/20230906/202309060752.FQUjUmGA-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230906/202309060752.FQUjUmGA-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309060752.FQUjUmGA-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> net/9p/trans_fd.c:555: warning: Function parameter or member 'flags' not described in 'p9_pollwait'
vim +555 net/9p/trans_fd.c
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06 542
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06 543 /**
5503ac565998837 Eric Van Hensbergen 2008-10-13 544 * p9_pollwait - add poll task to the wait queue
5503ac565998837 Eric Van Hensbergen 2008-10-13 545 * @filp: file pointer being polled
5503ac565998837 Eric Van Hensbergen 2008-10-13 546 * @wait_address: wait_q to block on
5503ac565998837 Eric Van Hensbergen 2008-10-13 547 * @p: poll state
ee443996a35c1e0 Eric Van Hensbergen 2008-03-05 548 *
5503ac565998837 Eric Van Hensbergen 2008-10-13 549 * called by files poll operation to add v9fs-poll task to files wait queue
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06 550 */
ee443996a35c1e0 Eric Van Hensbergen 2008-03-05 551
5503ac565998837 Eric Van Hensbergen 2008-10-13 552 static void
14649d7d7806aba Peter Xu 2023-09-05 553 p9_pollwait(struct file *filp, wait_queue_head_t *wait_address, poll_table *p,
14649d7d7806aba Peter Xu 2023-09-05 554 poll_flags flags)
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06 @555 {
5503ac565998837 Eric Van Hensbergen 2008-10-13 556 struct p9_conn *m = container_of(p, struct p9_conn, pt);
5503ac565998837 Eric Van Hensbergen 2008-10-13 557 struct p9_poll_wait *pwait = NULL;
5503ac565998837 Eric Van Hensbergen 2008-10-13 558 int i;
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06 559
5503ac565998837 Eric Van Hensbergen 2008-10-13 560 for (i = 0; i < ARRAY_SIZE(m->poll_wait); i++) {
5503ac565998837 Eric Van Hensbergen 2008-10-13 561 if (m->poll_wait[i].wait_addr == NULL) {
5503ac565998837 Eric Van Hensbergen 2008-10-13 562 pwait = &m->poll_wait[i];
5503ac565998837 Eric Van Hensbergen 2008-10-13 563 break;
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06 564 }
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06 565 }
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06 566
5503ac565998837 Eric Van Hensbergen 2008-10-13 567 if (!pwait) {
5d3851530d6d685 Joe Perches 2011-11-28 568 p9_debug(P9_DEBUG_ERROR, "not enough wait_address slots\n");
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06 569 return;
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06 570 }
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06 571
5503ac565998837 Eric Van Hensbergen 2008-10-13 572 pwait->conn = m;
5503ac565998837 Eric Van Hensbergen 2008-10-13 573 pwait->wait_addr = wait_address;
5503ac565998837 Eric Van Hensbergen 2008-10-13 574 init_waitqueue_func_entry(&pwait->wait, p9_pollwake);
5503ac565998837 Eric Van Hensbergen 2008-10-13 575 add_wait_queue(wait_address, &pwait->wait);
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06 576 }
8a0dc95fd976a05 Eric Van Hensbergen 2008-02-06 577
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc()
2023-09-05 21:42 ` [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc() Peter Xu
2023-09-05 23:21 ` kernel test robot
@ 2023-09-06 17:31 ` kernel test robot
2023-09-06 20:53 ` kernel test robot
2023-09-11 20:00 ` Peter Xu
3 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2023-09-06 17:31 UTC (permalink / raw)
To: Peter Xu, linux-kernel, linux-mm
Cc: oe-kbuild-all, Anish Moorthy, Axel Rasmussen, Alexander Viro,
Mike Kravetz, Peter Zijlstra, Andrew Morton,
Linux Memory Management List, Mike Rapoport, Christian Brauner,
peterx, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
James Houghton, Nadav Amit
Hi Peter,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
url: https://github.com/intel-lab-lkp/linux/commits/Peter-Xu/mm-userfaultfd-Make-uffd-read-wait-event-exclusive/20230906-054430
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20230905214235.320571-3-peterx%40redhat.com
patch subject: [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc()
config: x86_64-buildonly-randconfig-005-20230906 (https://download.01.org/0day-ci/archive/20230907/202309070146.47KrWvAH-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230907/202309070146.47KrWvAH-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309070146.47KrWvAH-lkp@intel.com/
All errors (new ones prefixed by >>):
drivers/xen/privcmd.c: In function 'privcmd_irqfd_assign':
>> drivers/xen/privcmd.c:965:40: error: passing argument 2 of 'init_poll_funcptr' from incompatible pointer type [-Werror=incompatible-pointer-types]
965 | init_poll_funcptr(&kirqfd->pt, irqfd_poll_func);
| ^~~~~~~~~~~~~~~
| |
| void (*)(struct file *, wait_queue_head_t *, poll_table *) {aka void (*)(struct file *, struct wait_queue_head *, struct poll_table_struct *)}
In file included from drivers/xen/privcmd.c:17:
include/linux/poll.h:76:70: note: expected 'poll_queue_proc' {aka 'void (*)(struct file *, struct wait_queue_head *, struct poll_table_struct *, unsigned int)'} but argument is of type 'void (*)(struct file *, wait_queue_head_t *, poll_table *)' {aka 'void (*)(struct file *, struct wait_queue_head *, struct poll_table_struct *)'}
76 | static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc)
| ~~~~~~~~~~~~~~~~^~~~~
cc1: some warnings being treated as errors
vim +/init_poll_funcptr +965 drivers/xen/privcmd.c
f8941e6c4c7129 Viresh Kumar 2023-08-22 924
f8941e6c4c7129 Viresh Kumar 2023-08-22 925 static int privcmd_irqfd_assign(struct privcmd_irqfd *irqfd)
f8941e6c4c7129 Viresh Kumar 2023-08-22 926 {
f8941e6c4c7129 Viresh Kumar 2023-08-22 927 struct privcmd_kernel_irqfd *kirqfd, *tmp;
f8941e6c4c7129 Viresh Kumar 2023-08-22 928 __poll_t events;
f8941e6c4c7129 Viresh Kumar 2023-08-22 929 struct fd f;
f8941e6c4c7129 Viresh Kumar 2023-08-22 930 void *dm_op;
f8941e6c4c7129 Viresh Kumar 2023-08-22 931 int ret;
f8941e6c4c7129 Viresh Kumar 2023-08-22 932
f8941e6c4c7129 Viresh Kumar 2023-08-22 933 kirqfd = kzalloc(sizeof(*kirqfd) + irqfd->size, GFP_KERNEL);
f8941e6c4c7129 Viresh Kumar 2023-08-22 934 if (!kirqfd)
f8941e6c4c7129 Viresh Kumar 2023-08-22 935 return -ENOMEM;
f8941e6c4c7129 Viresh Kumar 2023-08-22 936 dm_op = kirqfd + 1;
f8941e6c4c7129 Viresh Kumar 2023-08-22 937
f8941e6c4c7129 Viresh Kumar 2023-08-22 938 if (copy_from_user(dm_op, irqfd->dm_op, irqfd->size)) {
f8941e6c4c7129 Viresh Kumar 2023-08-22 939 ret = -EFAULT;
f8941e6c4c7129 Viresh Kumar 2023-08-22 940 goto error_kfree;
f8941e6c4c7129 Viresh Kumar 2023-08-22 941 }
f8941e6c4c7129 Viresh Kumar 2023-08-22 942
f8941e6c4c7129 Viresh Kumar 2023-08-22 943 kirqfd->xbufs.size = irqfd->size;
f8941e6c4c7129 Viresh Kumar 2023-08-22 944 set_xen_guest_handle(kirqfd->xbufs.h, dm_op);
f8941e6c4c7129 Viresh Kumar 2023-08-22 945 kirqfd->dom = irqfd->dom;
f8941e6c4c7129 Viresh Kumar 2023-08-22 946 INIT_WORK(&kirqfd->shutdown, irqfd_shutdown);
f8941e6c4c7129 Viresh Kumar 2023-08-22 947
f8941e6c4c7129 Viresh Kumar 2023-08-22 948 f = fdget(irqfd->fd);
f8941e6c4c7129 Viresh Kumar 2023-08-22 949 if (!f.file) {
f8941e6c4c7129 Viresh Kumar 2023-08-22 950 ret = -EBADF;
f8941e6c4c7129 Viresh Kumar 2023-08-22 951 goto error_kfree;
f8941e6c4c7129 Viresh Kumar 2023-08-22 952 }
f8941e6c4c7129 Viresh Kumar 2023-08-22 953
f8941e6c4c7129 Viresh Kumar 2023-08-22 954 kirqfd->eventfd = eventfd_ctx_fileget(f.file);
f8941e6c4c7129 Viresh Kumar 2023-08-22 955 if (IS_ERR(kirqfd->eventfd)) {
f8941e6c4c7129 Viresh Kumar 2023-08-22 956 ret = PTR_ERR(kirqfd->eventfd);
f8941e6c4c7129 Viresh Kumar 2023-08-22 957 goto error_fd_put;
f8941e6c4c7129 Viresh Kumar 2023-08-22 958 }
f8941e6c4c7129 Viresh Kumar 2023-08-22 959
f8941e6c4c7129 Viresh Kumar 2023-08-22 960 /*
f8941e6c4c7129 Viresh Kumar 2023-08-22 961 * Install our own custom wake-up handling so we are notified via a
f8941e6c4c7129 Viresh Kumar 2023-08-22 962 * callback whenever someone signals the underlying eventfd.
f8941e6c4c7129 Viresh Kumar 2023-08-22 963 */
f8941e6c4c7129 Viresh Kumar 2023-08-22 964 init_waitqueue_func_entry(&kirqfd->wait, irqfd_wakeup);
f8941e6c4c7129 Viresh Kumar 2023-08-22 @965 init_poll_funcptr(&kirqfd->pt, irqfd_poll_func);
f8941e6c4c7129 Viresh Kumar 2023-08-22 966
f8941e6c4c7129 Viresh Kumar 2023-08-22 967 mutex_lock(&irqfds_lock);
f8941e6c4c7129 Viresh Kumar 2023-08-22 968
f8941e6c4c7129 Viresh Kumar 2023-08-22 969 list_for_each_entry(tmp, &irqfds_list, list) {
f8941e6c4c7129 Viresh Kumar 2023-08-22 970 if (kirqfd->eventfd == tmp->eventfd) {
f8941e6c4c7129 Viresh Kumar 2023-08-22 971 ret = -EBUSY;
f8941e6c4c7129 Viresh Kumar 2023-08-22 972 mutex_unlock(&irqfds_lock);
f8941e6c4c7129 Viresh Kumar 2023-08-22 973 goto error_eventfd;
f8941e6c4c7129 Viresh Kumar 2023-08-22 974 }
f8941e6c4c7129 Viresh Kumar 2023-08-22 975 }
f8941e6c4c7129 Viresh Kumar 2023-08-22 976
f8941e6c4c7129 Viresh Kumar 2023-08-22 977 list_add_tail(&kirqfd->list, &irqfds_list);
f8941e6c4c7129 Viresh Kumar 2023-08-22 978 mutex_unlock(&irqfds_lock);
f8941e6c4c7129 Viresh Kumar 2023-08-22 979
f8941e6c4c7129 Viresh Kumar 2023-08-22 980 /*
f8941e6c4c7129 Viresh Kumar 2023-08-22 981 * Check if there was an event already pending on the eventfd before we
f8941e6c4c7129 Viresh Kumar 2023-08-22 982 * registered, and trigger it as if we didn't miss it.
f8941e6c4c7129 Viresh Kumar 2023-08-22 983 */
f8941e6c4c7129 Viresh Kumar 2023-08-22 984 events = vfs_poll(f.file, &kirqfd->pt);
f8941e6c4c7129 Viresh Kumar 2023-08-22 985 if (events & EPOLLIN)
f8941e6c4c7129 Viresh Kumar 2023-08-22 986 irqfd_inject(kirqfd);
f8941e6c4c7129 Viresh Kumar 2023-08-22 987
f8941e6c4c7129 Viresh Kumar 2023-08-22 988 /*
f8941e6c4c7129 Viresh Kumar 2023-08-22 989 * Do not drop the file until the kirqfd is fully initialized, otherwise
f8941e6c4c7129 Viresh Kumar 2023-08-22 990 * we might race against the EPOLLHUP.
f8941e6c4c7129 Viresh Kumar 2023-08-22 991 */
f8941e6c4c7129 Viresh Kumar 2023-08-22 992 fdput(f);
f8941e6c4c7129 Viresh Kumar 2023-08-22 993 return 0;
f8941e6c4c7129 Viresh Kumar 2023-08-22 994
f8941e6c4c7129 Viresh Kumar 2023-08-22 995 error_eventfd:
f8941e6c4c7129 Viresh Kumar 2023-08-22 996 eventfd_ctx_put(kirqfd->eventfd);
f8941e6c4c7129 Viresh Kumar 2023-08-22 997
f8941e6c4c7129 Viresh Kumar 2023-08-22 998 error_fd_put:
f8941e6c4c7129 Viresh Kumar 2023-08-22 999 fdput(f);
f8941e6c4c7129 Viresh Kumar 2023-08-22 1000
f8941e6c4c7129 Viresh Kumar 2023-08-22 1001 error_kfree:
f8941e6c4c7129 Viresh Kumar 2023-08-22 1002 kfree(kirqfd);
f8941e6c4c7129 Viresh Kumar 2023-08-22 1003 return ret;
f8941e6c4c7129 Viresh Kumar 2023-08-22 1004 }
f8941e6c4c7129 Viresh Kumar 2023-08-22 1005
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc()
2023-09-05 21:42 ` [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc() Peter Xu
2023-09-05 23:21 ` kernel test robot
2023-09-06 17:31 ` kernel test robot
@ 2023-09-06 20:53 ` kernel test robot
2023-09-11 20:00 ` Peter Xu
3 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2023-09-06 20:53 UTC (permalink / raw)
To: Peter Xu, linux-kernel, linux-mm
Cc: llvm, oe-kbuild-all, Anish Moorthy, Axel Rasmussen,
Alexander Viro, Mike Kravetz, Peter Zijlstra, Andrew Morton,
Linux Memory Management List, Mike Rapoport, Christian Brauner,
peterx, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
James Houghton, Nadav Amit
Hi Peter,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
url: https://github.com/intel-lab-lkp/linux/commits/Peter-Xu/mm-userfaultfd-Make-uffd-read-wait-event-exclusive/20230906-054430
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20230905214235.320571-3-peterx%40redhat.com
patch subject: [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc()
config: hexagon-randconfig-002-20230906 (https://download.01.org/0day-ci/archive/20230907/202309070440.eKzBcg8X-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project.git f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230907/202309070440.eKzBcg8X-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309070440.eKzBcg8X-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from drivers/vhost/vhost.c:14:
In file included from include/uapi/linux/vhost.h:14:
In file included from include/uapi/linux/vhost_types.h:16:
In file included from include/linux/virtio_config.h:7:
In file included from include/linux/virtio.h:7:
In file included from include/linux/scatterlist.h:9:
In file included from arch/hexagon/include/asm/io.h:337:
include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __raw_readb(PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
#define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
^
In file included from drivers/vhost/vhost.c:14:
In file included from include/uapi/linux/vhost.h:14:
In file included from include/uapi/linux/vhost_types.h:16:
In file included from include/linux/virtio_config.h:7:
In file included from include/linux/virtio.h:7:
In file included from include/linux/scatterlist.h:9:
In file included from arch/hexagon/include/asm/io.h:337:
include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
#define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
^
In file included from drivers/vhost/vhost.c:14:
In file included from include/uapi/linux/vhost.h:14:
In file included from include/uapi/linux/vhost_types.h:16:
In file included from include/linux/virtio_config.h:7:
In file included from include/linux/virtio.h:7:
In file included from include/linux/scatterlist.h:9:
In file included from arch/hexagon/include/asm/io.h:337:
include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writeb(value, PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
>> drivers/vhost/vhost.c:193:41: error: incompatible function pointer types passing 'int (wait_queue_entry_t *, unsigned int, int, void *, poll_flags)' (aka 'int (struct wait_queue_entry *, unsigned int, int, void *, unsigned int)') to parameter of type 'wait_queue_func_t' (aka 'int (*)(struct wait_queue_entry *, unsigned int, int, void *)') [-Werror,-Wincompatible-function-pointer-types]
init_waitqueue_func_entry(&poll->wait, vhost_poll_wakeup);
^~~~~~~~~~~~~~~~~
include/linux/wait.h:90:80: note: passing argument to parameter 'func' here
init_waitqueue_func_entry(struct wait_queue_entry *wq_entry, wait_queue_func_t func)
^
>> drivers/vhost/vhost.c:194:34: error: incompatible function pointer types passing 'void (struct file *, wait_queue_head_t *, poll_table *)' (aka 'void (struct file *, struct wait_queue_head *, struct poll_table_struct *)') to parameter of type 'poll_queue_proc' (aka 'void (*)(struct file *, struct wait_queue_head *, struct poll_table_struct *, unsigned int)') [-Werror,-Wincompatible-function-pointer-types]
init_poll_funcptr(&poll->table, vhost_poll_func);
^~~~~~~~~~~~~~~
include/linux/poll.h:76:70: note: passing argument to parameter 'qproc' here
static inline void init_poll_funcptr(poll_table *pt, poll_queue_proc qproc)
^
>> drivers/vhost/vhost.c:215:57: error: too few arguments to function call, expected 5, have 4
vhost_poll_wakeup(&poll->wait, 0, 0, poll_to_key(mask));
~~~~~~~~~~~~~~~~~ ^
drivers/vhost/vhost.c:164:12: note: 'vhost_poll_wakeup' declared here
static int vhost_poll_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync,
^
6 warnings and 3 errors generated.
vim +193 drivers/vhost/vhost.c
87d6a412bd1ed8 Michael S. Tsirkin 2010-09-02 187
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14 188 /* Init poll structure */
c23f3445e68e1d Tejun Heo 2010-06-02 189 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
493b94bf5ae0f6 Mike Christie 2023-06-26 190 __poll_t mask, struct vhost_dev *dev,
493b94bf5ae0f6 Mike Christie 2023-06-26 191 struct vhost_virtqueue *vq)
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14 192 {
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14 @193 init_waitqueue_func_entry(&poll->wait, vhost_poll_wakeup);
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14 @194 init_poll_funcptr(&poll->table, vhost_poll_func);
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14 195 poll->mask = mask;
c23f3445e68e1d Tejun Heo 2010-06-02 196 poll->dev = dev;
2b8b328b61c799 Jason Wang 2013-01-28 197 poll->wqh = NULL;
493b94bf5ae0f6 Mike Christie 2023-06-26 198 poll->vq = vq;
c23f3445e68e1d Tejun Heo 2010-06-02 199
87d6a412bd1ed8 Michael S. Tsirkin 2010-09-02 200 vhost_work_init(&poll->work, fn);
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14 201 }
6ac1afbf6132df Asias He 2013-05-06 202 EXPORT_SYMBOL_GPL(vhost_poll_init);
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14 203
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14 204 /* Start polling a file. We add ourselves to file's wait queue. The caller must
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14 205 * keep a reference to a file until after vhost_poll_stop is called. */
2b8b328b61c799 Jason Wang 2013-01-28 206 int vhost_poll_start(struct vhost_poll *poll, struct file *file)
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14 207 {
e6c8adca20ba45 Al Viro 2017-07-03 208 __poll_t mask;
d47effe1be0c4f Krishna Kumar 2011-03-01 209
70181d51209cbc Jason Wang 2013-04-10 210 if (poll->wqh)
70181d51209cbc Jason Wang 2013-04-10 211 return 0;
70181d51209cbc Jason Wang 2013-04-10 212
9965ed174e7d38 Christoph Hellwig 2018-03-05 213 mask = vfs_poll(file, &poll->table);
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14 214 if (mask)
3ad6f93e98d6df Al Viro 2017-07-03 @215 vhost_poll_wakeup(&poll->wait, 0, 0, poll_to_key(mask));
a9a08845e9acbd Linus Torvalds 2018-02-11 216 if (mask & EPOLLERR) {
dc6455a71c7fc5 Jason Wang 2018-03-27 217 vhost_poll_stop(poll);
896fc242bc1d26 Yunsheng Lin 2019-08-20 218 return -EINVAL;
2b8b328b61c799 Jason Wang 2013-01-28 219 }
2b8b328b61c799 Jason Wang 2013-01-28 220
896fc242bc1d26 Yunsheng Lin 2019-08-20 221 return 0;
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14 222 }
6ac1afbf6132df Asias He 2013-05-06 223 EXPORT_SYMBOL_GPL(vhost_poll_start);
3a4d5c94e95935 Michael S. Tsirkin 2010-01-14 224
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups
2023-09-05 21:42 [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Peter Xu
` (6 preceding siblings ...)
2023-09-05 21:42 ` [PATCH 7/7] selftests/mm: uffd perf test Peter Xu
@ 2023-09-07 19:18 ` Axel Rasmussen
2023-09-08 22:01 ` Peter Xu
7 siblings, 1 reply; 14+ messages in thread
From: Axel Rasmussen @ 2023-09-07 19:18 UTC (permalink / raw)
To: Peter Xu
Cc: linux-kernel, linux-mm, Anish Moorthy, Alexander Viro,
Mike Kravetz, Peter Zijlstra, Andrew Morton, Mike Rapoport,
Christian Brauner, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
James Houghton, Nadav Amit
On Tue, Sep 5, 2023 at 2:42 PM Peter Xu <peterx@redhat.com> wrote:
>
> Userfaultfd is the type of file that doesn't need wake-all semantics: if
> there is a message enqueued (for either a fault address, or an event), we
> only need to wake up one service thread to handle it. Waking up more
> normally means a waste of cpu cycles. Besides that, and more importantly,
> that just doesn't scale.
Hi Peter,
I took a quick look over the series and didn't see anything
objectionable. I was planning to actually test the series out and then
send out R-b's, but it will take some additional time (next week).
In the meantime, I was curious about the use case. A design I've seen
for VM live migration is to have 1 thread reading events off the uffd,
and then have many threads actually resolving the fault events that
come in (e.g. fetching pages over the network, issuing UFFDIO_COPY or
UFFDIO_CONTINUE, or whatever). In that design, since we only have a
single reader anyway, I think this series doesn't help.
But, I'm curious if you have data indicating that > 1 reader is more
performant overall? I suspect it might be the case that, with "enough"
vCPUs, it makes sense to do so, but I don't have benchmark data to
tell me what that tipping point is yet.
OTOH, if one reader is plenty in ~all cases, optimizing this path is
less important.
>
> Andrea used to have one patch that made read() to be O(1) but never hit
> upstream. This is my effort to try upstreaming that (which is a
> oneliner..), meanwhile on top of that I also made poll() O(1) on wakeup,
> too (more or less bring EPOLLEXCLUSIVE to poll()), with some tests showing
> that effect.
>
> To verify this, I added a test called uffd-perf (leveraging the refactored
> uffd selftest suite) that will measure the messaging channel latencies on
> wakeups, and the waitqueue optimizations can be reflected by the new test:
>
> Constants: 40 uffd threads, on N_CPUS=40, memsize=512M
> Units: milliseconds (to finish the test)
> |-----------------+--------+-------+------------|
> | test case | before | after | diff (%) |
> |-----------------+--------+-------+------------|
> | workers=8,poll | 1762 | 1133 | -55.516328 |
> | workers=8,read | 1437 | 585 | -145.64103 |
> | workers=16,poll | 1117 | 1097 | -1.8231541 |
> | workers=16,read | 1159 | 759 | -52.700922 |
> | workers=32,poll | 1001 | 973 | -2.8776978 |
> | workers=32,read | 866 | 713 | -21.458626 |
> |-----------------+--------+-------+------------|
>
> The more threads hanging on the fd_wqh, a bigger difference will be there
> shown in the numbers. "8 worker threads" is the worst case here because it
> means there can be a worst case of 40-8=32 threads hanging idle on fd_wqh
> queue.
>
> In real life, workers can be more than this, but small number of active
> worker threads will cause similar effect.
>
> This is currently based on Andrew's mm-unstable branch, but assuming this
> is applicable to most of the not-so-old trees.
>
> Comments welcomed, thanks.
>
> Andrea Arcangeli (1):
> mm/userfaultfd: Make uffd read() wait event exclusive
>
> Peter Xu (6):
> poll: Add a poll_flags for poll_queue_proc()
> poll: POLL_ENQUEUE_EXCLUSIVE
> fs/userfaultfd: Use exclusive waitqueue for poll()
> selftests/mm: Replace uffd_read_mutex with a semaphore
> selftests/mm: Create uffd_fault_thread_create|join()
> selftests/mm: uffd perf test
>
> drivers/vfio/virqfd.c | 4 +-
> drivers/vhost/vhost.c | 2 +-
> drivers/virt/acrn/irqfd.c | 2 +-
> fs/aio.c | 2 +-
> fs/eventpoll.c | 2 +-
> fs/select.c | 9 +-
> fs/userfaultfd.c | 8 +-
> include/linux/poll.h | 25 ++-
> io_uring/poll.c | 4 +-
> mm/memcontrol.c | 4 +-
> net/9p/trans_fd.c | 3 +-
> tools/testing/selftests/mm/Makefile | 2 +
> tools/testing/selftests/mm/uffd-common.c | 65 +++++++
> tools/testing/selftests/mm/uffd-common.h | 7 +
> tools/testing/selftests/mm/uffd-perf.c | 207 +++++++++++++++++++++++
> tools/testing/selftests/mm/uffd-stress.c | 53 +-----
> virt/kvm/eventfd.c | 2 +-
> 17 files changed, 337 insertions(+), 64 deletions(-)
> create mode 100644 tools/testing/selftests/mm/uffd-perf.c
>
> --
> 2.41.0
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups
2023-09-07 19:18 ` [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Axel Rasmussen
@ 2023-09-08 22:01 ` Peter Xu
0 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2023-09-08 22:01 UTC (permalink / raw)
To: Axel Rasmussen
Cc: linux-kernel, linux-mm, Anish Moorthy, Alexander Viro,
Mike Kravetz, Peter Zijlstra, Andrew Morton, Mike Rapoport,
Christian Brauner, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
James Houghton, Nadav Amit
On Thu, Sep 07, 2023 at 12:18:29PM -0700, Axel Rasmussen wrote:
> On Tue, Sep 5, 2023 at 2:42 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > Userfaultfd is the type of file that doesn't need wake-all semantics: if
> > there is a message enqueued (for either a fault address, or an event), we
> > only need to wake up one service thread to handle it. Waking up more
> > normally means a waste of cpu cycles. Besides that, and more importantly,
> > that just doesn't scale.
>
> Hi Peter,
Hi, Axel,
Sorry to respond late.
>
> I took a quick look over the series and didn't see anything
> objectionable. I was planning to actually test the series out and then
> send out R-b's, but it will take some additional time (next week).
Thanks. The 2nd patch definitely needs some fixup on some functions
(either I overlooked without enough CONFIG_* chosen; I am surprised I have
vhost even compiled out when testing..), hope that won't bring you too much
trouble. I'll send a fixup soon on top of patch 2.
>
> In the meantime, I was curious about the use case. A design I've seen
> for VM live migration is to have 1 thread reading events off the uffd,
> and then have many threads actually resolving the fault events that
> come in (e.g. fetching pages over the network, issuing UFFDIO_COPY or
> UFFDIO_CONTINUE, or whatever). In that design, since we only have a
> single reader anyway, I think this series doesn't help.
Yes. If the test to carry out only uses 1 thread, it shouldn't bring much
difference.
>
> But, I'm curious if you have data indicating that > 1 reader is more
> performant overall? I suspect it might be the case that, with "enough"
> vCPUs, it makes sense to do so, but I don't have benchmark data to
> tell me what that tipping point is yet.
>
> OTOH, if one reader is plenty in ~all cases, optimizing this path is
> less important.
For myself I don't yet have an application that can leverage this much
indeed, because QEMU so far only uses 1 reader thread.
IIRC Anish was exactly proposing some kvm specific solutions to make single
uffd scale, and this might be suitable for any use case like that where we
may want to use single uffd and try to make it scale with threads. Using 1
reader + N worker is also a solution, but when using N readers (which also
do the work) the app will hit this problem.
I am also aware that some apps use more than 1 reader threads (umap), but I
don't really know more than that.
The problem is I think we shouldn't have that overhead easily just because
an app invokes >1 readers, meanwhile it also doesn't make much sense to
wake up all readers for a single event for userfaults. So it should always
be something good to have.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc()
2023-09-05 21:42 ` [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc() Peter Xu
` (2 preceding siblings ...)
2023-09-06 20:53 ` kernel test robot
@ 2023-09-11 20:00 ` Peter Xu
3 siblings, 0 replies; 14+ messages in thread
From: Peter Xu @ 2023-09-11 20:00 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Anish Moorthy, Axel Rasmussen, Alexander Viro, Mike Kravetz,
Peter Zijlstra, Andrew Morton, Mike Rapoport, Christian Brauner,
linux-fsdevel, Andrea Arcangeli, Ingo Molnar, James Houghton,
Nadav Amit
On Tue, Sep 05, 2023 at 05:42:30PM -0400, Peter Xu wrote:
> Allows the poll enqueue function to pass over a flag into it.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
A fixup for this patch (should fix all syzbot errors):
===8<===
From ba55ee0539a7a80b98e7a5e3942c0ee8cabe5f73 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Tue, 5 Sep 2023 20:05:33 -0400
Subject: [PATCH] fixup! poll: Add a poll_flags for poll_queue_proc()
Signed-off-by: Peter Xu <peterx@redhat.com>
---
drivers/vhost/vhost.c | 4 ++--
drivers/xen/privcmd.c | 3 ++-
net/9p/trans_fd.c | 1 +
3 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 02caad721843..00813db53ff1 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -152,7 +152,7 @@ static void vhost_flush_work(struct vhost_work *work)
}
static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh,
- poll_table *pt)
+ poll_table *pt, poll_flags flags)
{
struct vhost_poll *poll;
@@ -162,7 +162,7 @@ static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh,
}
static int vhost_poll_wakeup(wait_queue_entry_t *wait, unsigned mode, int sync,
- void *key, poll_flags flags)
+ void *key)
{
struct vhost_poll *poll = container_of(wait, struct vhost_poll, wait);
struct vhost_work *work = &poll->work;
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index f00ad5f5f1d4..43e65186f25d 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -914,7 +914,8 @@ irqfd_wakeup(wait_queue_entry_t *wait, unsigned int mode, int sync, void *key)
}
static void
-irqfd_poll_func(struct file *file, wait_queue_head_t *wqh, poll_table *pt)
+irqfd_poll_func(struct file *file, wait_queue_head_t *wqh, poll_table *pt,
+ poll_flags flags)
{
struct privcmd_kernel_irqfd *kirqfd =
container_of(pt, struct privcmd_kernel_irqfd, pt);
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 91f9f474ab01..2912c4b086a2 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -545,6 +545,7 @@ static int p9_pollwake(wait_queue_entry_t *wait, unsigned int mode, int sync, vo
* @filp: file pointer being polled
* @wait_address: wait_q to block on
* @p: poll state
+ * @flags: poll flags
*
* called by files poll operation to add v9fs-poll task to files wait queue
*/
--
2.41.0
===8<===
Thanks,
--
Peter Xu
^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2023-09-11 20:53 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-05 21:42 [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Peter Xu
2023-09-05 21:42 ` [PATCH 1/7] mm/userfaultfd: Make uffd read() wait event exclusive Peter Xu
2023-09-05 21:42 ` [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc() Peter Xu
2023-09-05 23:21 ` kernel test robot
2023-09-06 17:31 ` kernel test robot
2023-09-06 20:53 ` kernel test robot
2023-09-11 20:00 ` Peter Xu
2023-09-05 21:42 ` [PATCH 3/7] poll: POLL_ENQUEUE_EXCLUSIVE Peter Xu
2023-09-05 21:42 ` [PATCH 4/7] fs/userfaultfd: Use exclusive waitqueue for poll() Peter Xu
2023-09-05 21:42 ` [PATCH 5/7] selftests/mm: Replace uffd_read_mutex with a semaphore Peter Xu
2023-09-05 21:42 ` [PATCH 6/7] selftests/mm: Create uffd_fault_thread_create|join() Peter Xu
2023-09-05 21:42 ` [PATCH 7/7] selftests/mm: uffd perf test Peter Xu
2023-09-07 19:18 ` [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Axel Rasmussen
2023-09-08 22:01 ` Peter Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).