Re: [PATCH 1/1] mm: handle swap page faults if the faulting page can be locked

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Alistair Popple <apopple@nvidia.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>,
	akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@suse.com,
	josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com,
	laurent.dufour@fr.ibm.com, michel@lespinasse.org,
	liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz,
	minchan@google.com, dave@stgolabs.net,
	punit.agrawal@bytedance.com, lstoakes@gmail.com,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel-team@android.com
Subject: Re: [PATCH 1/1] mm: handle swap page faults if the faulting page can be locked
Date: Mon, 17 Apr 2023 10:49:31 +1000	[thread overview]
Message-ID: <87sfczuxkc.fsf@nvidia.com> (raw)
In-Reply-To: <ZDmetaUdmlEz/W8Q@casper.infradead.org>


Matthew Wilcox <willy@infradead.org> writes:

> On Fri, Apr 14, 2023 at 11:00:43AM -0700, Suren Baghdasaryan wrote:
>> When page fault is handled under VMA lock protection, all swap page
>> faults are retried with mmap_lock because folio_lock_or_retry
>> implementation has to drop and reacquire mmap_lock if folio could
>> not be immediately locked.
>> Instead of retrying all swapped page faults, retry only when folio
>> locking fails.
>
> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>
> Let's just review what can now be handled under the VMA lock instead of
> the mmap_lock, in case somebody knows better than me that it's not safe.
>
>  - We can call migration_entry_wait().  This will wait for PG_locked to
>    become clear (in migration_entry_wait_on_locked()).  As previously
>    discussed offline, I think this is safe to do while holding the VMA
>    locked.

Do we even need to be holding the VMA locked while in
migration_entry_wait()? My understanding is we're just waiting for
PG_locked to be cleared so we can return with a reasonable chance the
migration entry is gone. If for example it has been unmapped or
protections downgraded we will simply refault.

>  - We can call remove_device_exclusive_entry().  That calls
>    folio_lock_or_retry(), which will fail if it can't get the VMA lock.

Looks ok to me.

>  - We can call pgmap->ops->migrate_to_ram().  Perhaps somebody familiar
>    with Nouveau and amdkfd could comment on how safe this is?

Currently this won't work because drives assume mmap_lock is held during
pgmap->ops->migrate_to_ram(). Primarily this is because
migrate_vma_setup()/migrate_vma_pages() is used to handle the fault and
that asserts mmap_lock is taken in walk_page_range() and also
migrate_vma_insert_page().

So I don't think we can call that case without mmap_lock.

At a glance it seems it should be relatively easy to move to using
lock_vma_under_rcu(). Drivers will need updating as well though because
migrate_vma_setup() is called outside of fault handling paths so drivers
will currently take mmap_lock rather than vma lock when looking up the
vma. See for example nouveau_svmm_bind().

>  - I believe we can't call handle_pte_marker() because we exclude UFFD
>    VMAs earlier.
>  - We can call swap_readpage() if we allocate a new folio.  I haven't
>    traced through all this code to tell if it's OK.
>
> So ... I believe this is all OK, but we're definitely now willing to
> wait for I/O from the swap device while holding the VMA lock when we
> weren't before.  And maybe we should make a bigger deal of it in the
> changelog.
>
> And maybe we shouldn't just be failing the folio_lock_or_retry(),
> maybe we should be waiting for the folio lock with the VMA locked.

next prev parent reply	other threads:[~2023-04-17  1:06 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-14 18:00 [PATCH 1/1] mm: handle swap page faults if the faulting page can be locked Suren Baghdasaryan
2023-04-14 18:32 ` Suren Baghdasaryan
2023-04-14 18:43 ` Matthew Wilcox
2023-04-14 19:48   ` Suren Baghdasaryan
2023-04-14 20:31     ` Matthew Wilcox
2023-04-14 21:51       ` Suren Baghdasaryan
2023-04-15  0:34         ` Hillf Danton
2023-04-15  2:15         ` Matthew Wilcox
2023-04-17  0:49   ` Alistair Popple [this message]
2023-04-17 18:13     ` Suren Baghdasaryan
2023-04-17 23:33       ` Alistair Popple
2023-04-17 23:50         ` Suren Baghdasaryan
2023-04-18  1:07           ` Suren Baghdasaryan
2023-05-01 17:54             ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sfczuxkc.fsf@nvidia.com \
    --to=apopple@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave@stgolabs.net \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=jglisse@google.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@android.com \
    --cc=laurent.dufour@fr.ibm.com \
    --cc=ldufour@linux.ibm.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=mhocko@suse.com \
    --cc=michel@lespinasse.org \
    --cc=minchan@google.com \
    --cc=punit.agrawal@bytedance.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.