From: Peter Xu <peterx@redhat.com>
To: Suren Baghdasaryan <surenb@google.com>
Cc: akpm@linux-foundation.org, lokeshgidra@google.com,
aarcange@redhat.com, 21cnbao@gmail.com, v-songbaohua@oppo.com,
david@redhat.com, willy@infradead.org, Liam.Howlett@oracle.com,
lorenzo.stoakes@oracle.com, hughd@google.com, jannh@google.com,
kaleshsingh@google.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/1] userfaultfd: do not block on locking a large folio with raised refcount
Date: Tue, 25 Feb 2025 16:32:01 -0500 [thread overview]
Message-ID: <Z7420bbHoz3y73xh@x1.local> (raw)
In-Reply-To: <20250225204613.2316092-1-surenb@google.com>
On Tue, Feb 25, 2025 at 12:46:13PM -0800, Suren Baghdasaryan wrote:
> Lokesh recently raised an issue about UFFDIO_MOVE getting into a deadlock
> state when it goes into split_folio() with raised folio refcount.
> split_folio() expects the reference count to be exactly
> mapcount + num_pages_in_folio + 1 (see can_split_folio()) and fails with
> EAGAIN otherwise. If multiple processes are trying to move the same
> large folio, they raise the refcount (all tasks succeed in that) then
> one of them succeeds in locking the folio, while others will block in
> folio_lock() while keeping the refcount raised. The winner of this
> race will proceed with calling split_folio() and will fail returning
> EAGAIN to the caller and unlocking the folio. The next competing process
> will get the folio locked and will go through the same flow. In the
> meantime the original winner will be retried and will block in
> folio_lock(), getting into the queue of waiting processes only to repeat
> the same path. All this results in a livelock.
> An easy fix would be to avoid waiting for the folio lock while holding
> folio refcount, similar to madvise_free_huge_pmd() where folio lock is
> acquired before raising the folio refcount.
> Modify move_pages_pte() to try locking the folio first and if that fails
> and the folio is large then return EAGAIN without touching the folio
> refcount. If the folio is single-page then split_folio() is not called,
> so we don't have this issue.
> Lokesh has a reproducer [1] and I verified that this change fixes the
> issue.
>
> [1] https://github.com/lokeshgidra/uffd_move_ioctl_deadlock
>
> Reported-by: Lokesh Gidra <lokeshgidra@google.com>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
One question irrelevant of this change below..
> ---
> mm/userfaultfd.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index 867898c4e30b..f17f8290c523 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -1236,6 +1236,7 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
> */
> if (!src_folio) {
> struct folio *folio;
> + bool locked;
>
> /*
> * Pin the page while holding the lock to be sure the
> @@ -1255,12 +1256,26 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd,
> goto out;
> }
>
> + locked = folio_trylock(folio);
> + /*
> + * We avoid waiting for folio lock with a raised refcount
> + * for large folios because extra refcounts will result in
> + * split_folio() failing later and retrying. If multiple
> + * tasks are trying to move a large folio we can end
> + * livelocking.
> + */
> + if (!locked && folio_test_large(folio)) {
> + spin_unlock(src_ptl);
> + err = -EAGAIN;
> + goto out;
> + }
> +
> folio_get(folio);
> src_folio = folio;
> src_folio_pte = orig_src_pte;
> spin_unlock(src_ptl);
>
> - if (!folio_trylock(src_folio)) {
> + if (!locked) {
> pte_unmap(&orig_src_pte);
> pte_unmap(&orig_dst_pte);
.. just notice this. Are these problematic? I mean, orig_*_pte are stack
variables, afaict. I'd expect these things blow on HIGHPTE..
> src_pte = dst_pte = NULL;
>
> base-commit: 801d47bd96ce22acd43809bc09e004679f707c39
> --
> 2.48.1.658.g4767266eb4-goog
>
--
Peter Xu
next prev parent reply other threads:[~2025-02-25 21:32 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-25 20:46 [PATCH 1/1] userfaultfd: do not block on locking a large folio with raised refcount Suren Baghdasaryan
2025-02-25 20:56 ` Lokesh Gidra
2025-02-25 21:32 ` Peter Xu [this message]
2025-02-25 22:12 ` Suren Baghdasaryan
2025-02-25 22:21 ` Suren Baghdasaryan
2025-02-25 22:48 ` Peter Xu
2025-02-25 23:01 ` Suren Baghdasaryan
2025-02-26 14:59 ` Liam R. Howlett
2025-02-26 16:11 ` Suren Baghdasaryan
2025-02-26 16:16 ` Matthew Wilcox
2025-02-26 16:22 ` Suren Baghdasaryan
2025-02-26 18:59 ` Suren Baghdasaryan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z7420bbHoz3y73xh@x1.local \
--to=peterx@redhat.com \
--cc=21cnbao@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=jannh@google.com \
--cc=kaleshsingh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lokeshgidra@google.com \
--cc=lorenzo.stoakes@oracle.com \
--cc=surenb@google.com \
--cc=v-songbaohua@oppo.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.