From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Zi Yan <ziy@nvidia.com>, Andrew Morton <akpm@linux-foundation.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Hugh Dickins <hughd@google.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Nico Pache <npache@redhat.com>,
Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
Matthew Wilcox <willy@infradead.org>,
Bas van Dijk <bas@dfinity.org>,
Eero Kelly <eero.kelly@dfinity.org>,
Andrew Battat <andrew.battat@dfinity.org>,
Adam Bratschi-Kaye <adam.bratschikaye@dfinity.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH v2] mm/huge_memory: fix a folio_split() race condition with folio_try_get()
Date: Tue, 3 Mar 2026 09:48:19 +0100 [thread overview]
Message-ID: <56a23a23-cbed-4ace-acef-3ada41bc182d@kernel.org> (raw)
In-Reply-To: <20260302203159.3208341-1-ziy@nvidia.com>
On 3/2/26 21:31, Zi Yan wrote:
> During a pagecache folio split, the values in the related xarray should not
> be changed from the original folio at xarray split time until all
> after-split folios are well formed and stored in the xarray. Current use
> of xas_try_split() in __split_unmapped_folio() lets some after-split folios
> show up at wrong indices in the xarray. When these misplaced after-split
> folios are unfrozen, before correct folios are stored via __xa_store(), and
> grabbed by folio_try_get(), they are returned to userspace at wrong file
> indices, causing data corruption. More detailed explanation is at the
> bottom.
>
> The reproducer is at: https://github.com/dfinity/thp-madv-remove-test
> It
> 1. creates a memfd,
> 2. forks,
> 3. in the child process, maps the file with large folios (via shmem code
> path) and reads the mapped file continuously with 16 threads,
> 4. in the parent process, uses madvise(MADV_REMOVE) to punch poles in the
> large folio.
>
> Data corruption can be observed without the fix. Basically, data from a
> wrong page->index is returned.
>
> Fix it by using the original folio in xas_try_split() calls, so that
> folio_try_get() can get the right after-split folios after the original
> folio is unfrozen.
>
> Uniform split, split_huge_page*(), is not affected, since it uses
> xas_split_alloc() and xas_split() only once and stores the original folio
> in the xarray. Change xas_split() used in uniform split branch to use
> the original folio to avoid confusion.
>
> Fixes below points to the commit introduces the code, but folio_split() is
> used in a later commit 7460b470a131f ("mm/truncate: use folio_split() in
> truncate operation").
>
> More details:
>
> For example, a folio f is split non-uniformly into f, f2, f3, f4 like
> below:
> +----------------+---------+----+----+
> | f | f2 | f3 | f4 |
> +----------------+---------+----+----+
> but the xarray would look like below after __split_unmapped_folio() is
> done:
> +----------------+---------+----+----+
> | f | f2 | f3 | f3 |
> +----------------+---------+----+----+
>
> After __split_unmapped_folio(), the code changes the xarray and unfreezes
> after-split folios:
>
> 1. unfreezes f2, __xa_store(f2)
> 2. unfreezes f3, __xa_store(f3)
> 3. unfreezes f4, __xa_store(f4), which overwrites the second f3 to f4.
> 4. unfreezes f.
>
> Meanwhile, a parallel filemap_get_entry() can read the second f3 from the
> xarray and use folio_try_get() on it at step 2 when f3 is unfrozen. Then,
> f3 is wrongly returned to user.
>
> After the fix, the xarray looks like below after __split_unmapped_folio():
> +----------------+---------+----+----+
> | f | f | f | f |
> +----------------+---------+----+----+
> so that the race window no longer exists.
>
> Fixes: 00527733d0dc8 ("mm/huge_memory: add two new (not yet used) functions for folio_split()")
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Reported-by: Bas van Dijk <bas@dfinity.org>
> Closes: https://lore.kernel.org/all/CAKNNEtw5_kZomhkugedKMPOG-sxs5Q5OLumWJdiWXv+C9Yct0w@mail.gmail.com/
> Tested-by: Lance Yang <lance.yang@linux.dev>
> Cc: <stable@vger.kernel.org>
> ---
> mm/huge_memory.c | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 56db54fa48181..f0bdac3f270b5 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3647,6 +3647,7 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
> const bool is_anon = folio_test_anon(folio);
> int old_order = folio_order(folio);
> int start_order = split_type == SPLIT_TYPE_UNIFORM ? new_order : old_order - 1;
> + struct folio *old_folio = folio;
> int split_order;
>
> /*
> @@ -3668,11 +3669,17 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
> * irq is disabled to allocate enough memory, whereas
> * non-uniform split can handle ENOMEM.
> */
> - if (split_type == SPLIT_TYPE_UNIFORM)
> - xas_split(xas, folio, old_order);
> - else {
Just wondering whether we should no move the comment over here now, so
it just covers both cases.
> + if (split_type == SPLIT_TYPE_UNIFORM) {
> + xas_split(xas, old_folio, old_order);
> + } else {
> xas_set_order(xas, folio->index, split_order);
> - xas_try_split(xas, folio, old_order);
> + /*
> + * use the to-be-split folio, so that a parallel
> + * folio_try_get() waits on it until xarray is
> + * updated with after-split folios and
> + * the original one is unfrozen.
> + */
> + xas_try_split(xas, old_folio, old_order);
> if (xas_error(xas))
> return xas_error(xas);
> }
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
next prev parent reply other threads:[~2026-03-03 8:48 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-02 20:31 [PATCH v2] mm/huge_memory: fix a folio_split() race condition with folio_try_get() Zi Yan
2026-03-03 6:09 ` Baolin Wang
2026-03-03 8:28 ` Wei Yang
2026-03-03 8:48 ` David Hildenbrand (Arm) [this message]
2026-03-03 16:31 ` Zi Yan
2026-03-03 10:01 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56a23a23-cbed-4ace-acef-3ada41bc182d@kernel.org \
--to=david@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=adam.bratschikaye@dfinity.org \
--cc=akpm@linux-foundation.org \
--cc=andrew.battat@dfinity.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bas@dfinity.org \
--cc=dev.jain@arm.com \
--cc=eero.kelly@dfinity.org \
--cc=hughd@google.com \
--cc=lance.yang@linux.dev \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=npache@redhat.com \
--cc=ryan.roberts@arm.com \
--cc=stable@vger.kernel.org \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.