linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups
@ 2023-09-05 21:42 Peter Xu
  2023-09-05 21:42 ` [PATCH 1/7] mm/userfaultfd: Make uffd read() wait event exclusive Peter Xu
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Peter Xu @ 2023-09-05 21:42 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Anish Moorthy, Axel Rasmussen, Alexander Viro, Mike Kravetz,
	Peter Zijlstra, Andrew Morton, Mike Rapoport, Christian Brauner,
	peterx, linux-fsdevel, Andrea Arcangeli, Ingo Molnar,
	James Houghton, Nadav Amit

Userfaultfd is the type of file that doesn't need wake-all semantics: if
there is a message enqueued (for either a fault address, or an event), we
only need to wake up one service thread to handle it.  Waking up more
normally means a waste of cpu cycles.  Besides that, and more importantly,
that just doesn't scale.

Andrea used to have one patch that made read() to be O(1) but never hit
upstream.  This is my effort to try upstreaming that (which is a
oneliner..), meanwhile on top of that I also made poll() O(1) on wakeup,
too (more or less bring EPOLLEXCLUSIVE to poll()), with some tests showing
that effect.

To verify this, I added a test called uffd-perf (leveraging the refactored
uffd selftest suite) that will measure the messaging channel latencies on
wakeups, and the waitqueue optimizations can be reflected by the new test:

        Constants: 40 uffd threads, on N_CPUS=40, memsize=512M
        Units: milliseconds (to finish the test)
        |-----------------+--------+-------+------------|
        | test case       | before | after |   diff (%) |
        |-----------------+--------+-------+------------|
        | workers=8,poll  |   1762 |  1133 | -55.516328 |
        | workers=8,read  |   1437 |   585 | -145.64103 |
        | workers=16,poll |   1117 |  1097 | -1.8231541 |
        | workers=16,read |   1159 |   759 | -52.700922 |
        | workers=32,poll |   1001 |   973 | -2.8776978 |
        | workers=32,read |    866 |   713 | -21.458626 |
        |-----------------+--------+-------+------------|

The more threads hanging on the fd_wqh, a bigger difference will be there
shown in the numbers.  "8 worker threads" is the worst case here because it
means there can be a worst case of 40-8=32 threads hanging idle on fd_wqh
queue.

In real life, workers can be more than this, but small number of active
worker threads will cause similar effect.

This is currently based on Andrew's mm-unstable branch, but assuming this
is applicable to most of the not-so-old trees.

Comments welcomed, thanks.

Andrea Arcangeli (1):
  mm/userfaultfd: Make uffd read() wait event exclusive

Peter Xu (6):
  poll: Add a poll_flags for poll_queue_proc()
  poll: POLL_ENQUEUE_EXCLUSIVE
  fs/userfaultfd: Use exclusive waitqueue for poll()
  selftests/mm: Replace uffd_read_mutex with a semaphore
  selftests/mm: Create uffd_fault_thread_create|join()
  selftests/mm: uffd perf test

 drivers/vfio/virqfd.c                    |   4 +-
 drivers/vhost/vhost.c                    |   2 +-
 drivers/virt/acrn/irqfd.c                |   2 +-
 fs/aio.c                                 |   2 +-
 fs/eventpoll.c                           |   2 +-
 fs/select.c                              |   9 +-
 fs/userfaultfd.c                         |   8 +-
 include/linux/poll.h                     |  25 ++-
 io_uring/poll.c                          |   4 +-
 mm/memcontrol.c                          |   4 +-
 net/9p/trans_fd.c                        |   3 +-
 tools/testing/selftests/mm/Makefile      |   2 +
 tools/testing/selftests/mm/uffd-common.c |  65 +++++++
 tools/testing/selftests/mm/uffd-common.h |   7 +
 tools/testing/selftests/mm/uffd-perf.c   | 207 +++++++++++++++++++++++
 tools/testing/selftests/mm/uffd-stress.c |  53 +-----
 virt/kvm/eventfd.c                       |   2 +-
 17 files changed, 337 insertions(+), 64 deletions(-)
 create mode 100644 tools/testing/selftests/mm/uffd-perf.c

-- 
2.41.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-09-11 20:53 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-05 21:42 [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Peter Xu
2023-09-05 21:42 ` [PATCH 1/7] mm/userfaultfd: Make uffd read() wait event exclusive Peter Xu
2023-09-05 21:42 ` [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc() Peter Xu
2023-09-05 23:21   ` kernel test robot
2023-09-06 17:31   ` kernel test robot
2023-09-06 20:53   ` kernel test robot
2023-09-11 20:00   ` Peter Xu
2023-09-05 21:42 ` [PATCH 3/7] poll: POLL_ENQUEUE_EXCLUSIVE Peter Xu
2023-09-05 21:42 ` [PATCH 4/7] fs/userfaultfd: Use exclusive waitqueue for poll() Peter Xu
2023-09-05 21:42 ` [PATCH 5/7] selftests/mm: Replace uffd_read_mutex with a semaphore Peter Xu
2023-09-05 21:42 ` [PATCH 6/7] selftests/mm: Create uffd_fault_thread_create|join() Peter Xu
2023-09-05 21:42 ` [PATCH 7/7] selftests/mm: uffd perf test Peter Xu
2023-09-07 19:18 ` [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Axel Rasmussen
2023-09-08 22:01   ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).