[PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Peter Xu <peterx@redhat.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: Anish Moorthy <amoorthy@google.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Rapoport <rppt@kernel.org>,
	Christian Brauner <brauner@kernel.org>,
	peterx@redhat.com, linux-fsdevel@vger.kernel.org,
	Andrea Arcangeli <aarcange@redhat.com>,
	Ingo Molnar <mingo@redhat.com>,
	James Houghton <jthoughton@google.com>,
	Nadav Amit <nadav.amit@gmail.com>
Subject: [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups
Date: Tue,  5 Sep 2023 17:42:28 -0400	[thread overview]
Message-ID: <20230905214235.320571-1-peterx@redhat.com> (raw)

Userfaultfd is the type of file that doesn't need wake-all semantics: if
there is a message enqueued (for either a fault address, or an event), we
only need to wake up one service thread to handle it.  Waking up more
normally means a waste of cpu cycles.  Besides that, and more importantly,
that just doesn't scale.

Andrea used to have one patch that made read() to be O(1) but never hit
upstream.  This is my effort to try upstreaming that (which is a
oneliner..), meanwhile on top of that I also made poll() O(1) on wakeup,
too (more or less bring EPOLLEXCLUSIVE to poll()), with some tests showing
that effect.

To verify this, I added a test called uffd-perf (leveraging the refactored
uffd selftest suite) that will measure the messaging channel latencies on
wakeups, and the waitqueue optimizations can be reflected by the new test:

        Constants: 40 uffd threads, on N_CPUS=40, memsize=512M
        Units: milliseconds (to finish the test)
        |-----------------+--------+-------+------------|
        | test case       | before | after |   diff (%) |
        |-----------------+--------+-------+------------|
        | workers=8,poll  |   1762 |  1133 | -55.516328 |
        | workers=8,read  |   1437 |   585 | -145.64103 |
        | workers=16,poll |   1117 |  1097 | -1.8231541 |
        | workers=16,read |   1159 |   759 | -52.700922 |
        | workers=32,poll |   1001 |   973 | -2.8776978 |
        | workers=32,read |    866 |   713 | -21.458626 |
        |-----------------+--------+-------+------------|

The more threads hanging on the fd_wqh, a bigger difference will be there
shown in the numbers.  "8 worker threads" is the worst case here because it
means there can be a worst case of 40-8=32 threads hanging idle on fd_wqh
queue.

In real life, workers can be more than this, but small number of active
worker threads will cause similar effect.

This is currently based on Andrew's mm-unstable branch, but assuming this
is applicable to most of the not-so-old trees.

Comments welcomed, thanks.

Andrea Arcangeli (1):
  mm/userfaultfd: Make uffd read() wait event exclusive

Peter Xu (6):
  poll: Add a poll_flags for poll_queue_proc()
  poll: POLL_ENQUEUE_EXCLUSIVE
  fs/userfaultfd: Use exclusive waitqueue for poll()
  selftests/mm: Replace uffd_read_mutex with a semaphore
  selftests/mm: Create uffd_fault_thread_create|join()
  selftests/mm: uffd perf test

 drivers/vfio/virqfd.c                    |   4 +-
 drivers/vhost/vhost.c                    |   2 +-
 drivers/virt/acrn/irqfd.c                |   2 +-
 fs/aio.c                                 |   2 +-
 fs/eventpoll.c                           |   2 +-
 fs/select.c                              |   9 +-
 fs/userfaultfd.c                         |   8 +-
 include/linux/poll.h                     |  25 ++-
 io_uring/poll.c                          |   4 +-
 mm/memcontrol.c                          |   4 +-
 net/9p/trans_fd.c                        |   3 +-
 tools/testing/selftests/mm/Makefile      |   2 +
 tools/testing/selftests/mm/uffd-common.c |  65 +++++++
 tools/testing/selftests/mm/uffd-common.h |   7 +
 tools/testing/selftests/mm/uffd-perf.c   | 207 +++++++++++++++++++++++
 tools/testing/selftests/mm/uffd-stress.c |  53 +-----
 virt/kvm/eventfd.c                       |   2 +-
 17 files changed, 337 insertions(+), 64 deletions(-)
 create mode 100644 tools/testing/selftests/mm/uffd-perf.c

-- 
2.41.0

next             reply	other threads:[~2023-09-05 21:43 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-05 21:42 Peter Xu [this message]
2023-09-05 21:42 ` [PATCH 1/7] mm/userfaultfd: Make uffd read() wait event exclusive Peter Xu
2023-09-05 21:42 ` [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc() Peter Xu
2023-09-05 23:21   ` kernel test robot
2023-09-06 17:31   ` kernel test robot
2023-09-06 20:53   ` kernel test robot
2023-09-11 20:00   ` Peter Xu
2023-09-05 21:42 ` [PATCH 3/7] poll: POLL_ENQUEUE_EXCLUSIVE Peter Xu
2023-09-05 21:42 ` [PATCH 4/7] fs/userfaultfd: Use exclusive waitqueue for poll() Peter Xu
2023-09-05 21:42 ` [PATCH 5/7] selftests/mm: Replace uffd_read_mutex with a semaphore Peter Xu
2023-09-05 21:42 ` [PATCH 6/7] selftests/mm: Create uffd_fault_thread_create|join() Peter Xu
2023-09-05 21:42 ` [PATCH 7/7] selftests/mm: uffd perf test Peter Xu
2023-09-07 19:18 ` [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Axel Rasmussen
2023-09-08 22:01   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230905214235.320571-1-peterx@redhat.com \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=amoorthy@google.com \
    --cc=axelrasmussen@google.com \
    --cc=brauner@kernel.org \
    --cc=jthoughton@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=nadav.amit@gmail.com \
    --cc=peterz@infradead.org \
    --cc=rppt@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).