linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Axel Rasmussen <axelrasmussen@google.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Anish Moorthy <amoorthy@google.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Rapoport <rppt@kernel.org>,
	Christian Brauner <brauner@kernel.org>,
	linux-fsdevel@vger.kernel.org,
	Andrea Arcangeli <aarcange@redhat.com>,
	Ingo Molnar <mingo@redhat.com>,
	James Houghton <jthoughton@google.com>,
	Nadav Amit <nadav.amit@gmail.com>
Subject: Re: [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups
Date: Fri, 8 Sep 2023 18:01:26 -0400	[thread overview]
Message-ID: <ZPuZtm244zhMteqc@x1n> (raw)
In-Reply-To: <CAJHvVcjQR95KVfu2qv3hepkLWkH5J8qRG_BazHKSXoGoGnUATg@mail.gmail.com>

On Thu, Sep 07, 2023 at 12:18:29PM -0700, Axel Rasmussen wrote:
> On Tue, Sep 5, 2023 at 2:42 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > Userfaultfd is the type of file that doesn't need wake-all semantics: if
> > there is a message enqueued (for either a fault address, or an event), we
> > only need to wake up one service thread to handle it.  Waking up more
> > normally means a waste of cpu cycles.  Besides that, and more importantly,
> > that just doesn't scale.
> 
> Hi Peter,

Hi, Axel,

Sorry to respond late.

> 
> I took a quick look over the series and didn't see anything
> objectionable. I was planning to actually test the series out and then
> send out R-b's, but it will take some additional time (next week).

Thanks.  The 2nd patch definitely needs some fixup on some functions
(either I overlooked without enough CONFIG_* chosen; I am surprised I have
vhost even compiled out when testing..), hope that won't bring you too much
trouble.  I'll send a fixup soon on top of patch 2.

> 
> In the meantime, I was curious about the use case. A design I've seen
> for VM live migration is to have 1 thread reading events off the uffd,
> and then have many threads actually resolving the fault events that
> come in (e.g. fetching pages over the network, issuing UFFDIO_COPY or
> UFFDIO_CONTINUE, or whatever). In that design, since we only have a
> single reader anyway, I think this series doesn't help.

Yes.  If the test to carry out only uses 1 thread, it shouldn't bring much
difference.

> 
> But, I'm curious if you have data indicating that > 1 reader is more
> performant overall? I suspect it might be the case that, with "enough"
> vCPUs, it makes sense to do so, but I don't have benchmark data to
> tell me what that tipping point is yet.
> 
> OTOH, if one reader is plenty in ~all cases, optimizing this path is
> less important.

For myself I don't yet have an application that can leverage this much
indeed, because QEMU so far only uses 1 reader thread.

IIRC Anish was exactly proposing some kvm specific solutions to make single
uffd scale, and this might be suitable for any use case like that where we
may want to use single uffd and try to make it scale with threads.  Using 1
reader + N worker is also a solution, but when using N readers (which also
do the work) the app will hit this problem.

I am also aware that some apps use more than 1 reader threads (umap), but I
don't really know more than that.

The problem is I think we shouldn't have that overhead easily just because
an app invokes >1 readers, meanwhile it also doesn't make much sense to
wake up all readers for a single event for userfaults.  So it should always
be something good to have.

Thanks,

-- 
Peter Xu


      reply	other threads:[~2023-09-08 22:02 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-05 21:42 [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Peter Xu
2023-09-05 21:42 ` [PATCH 1/7] mm/userfaultfd: Make uffd read() wait event exclusive Peter Xu
2023-09-05 21:42 ` [PATCH 2/7] poll: Add a poll_flags for poll_queue_proc() Peter Xu
2023-09-05 23:21   ` kernel test robot
2023-09-06 17:31   ` kernel test robot
2023-09-06 20:53   ` kernel test robot
2023-09-11 20:00   ` Peter Xu
2023-09-05 21:42 ` [PATCH 3/7] poll: POLL_ENQUEUE_EXCLUSIVE Peter Xu
2023-09-05 21:42 ` [PATCH 4/7] fs/userfaultfd: Use exclusive waitqueue for poll() Peter Xu
2023-09-05 21:42 ` [PATCH 5/7] selftests/mm: Replace uffd_read_mutex with a semaphore Peter Xu
2023-09-05 21:42 ` [PATCH 6/7] selftests/mm: Create uffd_fault_thread_create|join() Peter Xu
2023-09-05 21:42 ` [PATCH 7/7] selftests/mm: uffd perf test Peter Xu
2023-09-07 19:18 ` [PATCH 0/7] mm/userfaultfd/poll: Scale userfaultfd wakeups Axel Rasmussen
2023-09-08 22:01   ` Peter Xu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZPuZtm244zhMteqc@x1n \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=amoorthy@google.com \
    --cc=axelrasmussen@google.com \
    --cc=brauner@kernel.org \
    --cc=jthoughton@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=nadav.amit@gmail.com \
    --cc=peterz@infradead.org \
    --cc=rppt@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).