From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 89BB83EAC6F; Tue, 17 Mar 2026 16:54:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773766453; cv=none; b=Y5h4BVR5+V0F4BxAUPngayPWjSciTCgxTNwx11peX2hJNDpW9ORJlr1vJZcRHx5SdU5fCNGs+WhS+f7x93Kyio8JI/R5cnD4F+QJMczSd/G1vRfnJrgOnNa6a7QVrzkjI98SXy+pPAMn4N2J4ojl/sbRahm1icLkSLW4raCrvqQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773766453; c=relaxed/simple; bh=/mRxv3ngUcaOhejzejuPvPvwDZ8J+0jsM8JaEnyGivY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TzC2J3zm31WnRPmRil/WTly34O9HKU+6LY7opN6zlAH78iHdIHfgzDqZ7ouafas13Q6h4Z6noYGm7femh17BUoGjlkjlApmrFJJKGi3zG8VN66wrmGUzM3GyAQP85lT8LlrLVol/6iL+wzoifcuQ125CkScSEtIGgiusEIq/xIQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b=mPtC1mbK; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b="mPtC1mbK" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BBA5AC4CEF7; Tue, 17 Mar 2026 16:54:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1773766453; bh=/mRxv3ngUcaOhejzejuPvPvwDZ8J+0jsM8JaEnyGivY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=mPtC1mbK+FMrukN/iQKjUd5V5xxceLg4LAO/KLCBtMZHsK5kh7hcOdRUUZo4YOr5H RU9L3blLsCpm2H5spcSuMGPRU/3OiJGPRAKTNEHx9oB1+6tywWAr6mOP0FF77af0Jf xkseiPHoU1tXRofQK/RSaiDNQIvbrPvXWpoGahN8= From: Greg Kroah-Hartman To: stable@vger.kernel.org Cc: Greg Kroah-Hartman , patches@lists.linux.dev, Zi Yan , Bas van Dijk , Lance Yang , Lorenzo Stoakes , Wei Yang , Baolin Wang , Barry Song , David Hildenbrand , Dev Jain , Hugh Dickins , Liam Howlett , "Matthew Wilcox (Oracle)" , Nico Pache , Ryan Roberts , Andrew Morton Subject: [PATCH 6.19 250/378] mm/huge_memory: fix a folio_split() race condition with folio_try_get() Date: Tue, 17 Mar 2026 17:33:27 +0100 Message-ID: <20260317163016.217245867@linuxfoundation.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260317163006.959177102@linuxfoundation.org> References: <20260317163006.959177102@linuxfoundation.org> User-Agent: quilt/0.69 X-stable: review X-Patchwork-Hint: ignore Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit 6.19-stable review patch. If anyone has any objections, please let me know. ------------------ From: Zi Yan commit 577a1f495fd78d8fb61b67ac3d3b595b01f6fcb0 upstream. During a pagecache folio split, the values in the related xarray should not be changed from the original folio at xarray split time until all after-split folios are well formed and stored in the xarray. Current use of xas_try_split() in __split_unmapped_folio() lets some after-split folios show up at wrong indices in the xarray. When these misplaced after-split folios are unfrozen, before correct folios are stored via __xa_store(), and grabbed by folio_try_get(), they are returned to userspace at wrong file indices, causing data corruption. More detailed explanation is at the bottom. The reproducer is at: https://github.com/dfinity/thp-madv-remove-test It 1. creates a memfd, 2. forks, 3. in the child process, maps the file with large folios (via shmem code path) and reads the mapped file continuously with 16 threads, 4. in the parent process, uses madvise(MADV_REMOVE) to punch poles in the large folio. Data corruption can be observed without the fix. Basically, data from a wrong page->index is returned. Fix it by using the original folio in xas_try_split() calls, so that folio_try_get() can get the right after-split folios after the original folio is unfrozen. Uniform split, split_huge_page*(), is not affected, since it uses xas_split_alloc() and xas_split() only once and stores the original folio in the xarray. Change xas_split() used in uniform split branch to use the original folio to avoid confusion. Fixes below points to the commit introduces the code, but folio_split() is used in a later commit 7460b470a131f ("mm/truncate: use folio_split() in truncate operation"). More details: For example, a folio f is split non-uniformly into f, f2, f3, f4 like below: +----------------+---------+----+----+ | f | f2 | f3 | f4 | +----------------+---------+----+----+ but the xarray would look like below after __split_unmapped_folio() is done: +----------------+---------+----+----+ | f | f2 | f3 | f3 | +----------------+---------+----+----+ After __split_unmapped_folio(), the code changes the xarray and unfreezes after-split folios: 1. unfreezes f2, __xa_store(f2) 2. unfreezes f3, __xa_store(f3) 3. unfreezes f4, __xa_store(f4), which overwrites the second f3 to f4. 4. unfreezes f. Meanwhile, a parallel filemap_get_entry() can read the second f3 from the xarray and use folio_try_get() on it at step 2 when f3 is unfrozen. Then, f3 is wrongly returned to user. After the fix, the xarray looks like below after __split_unmapped_folio(): +----------------+---------+----+----+ | f | f | f | f | +----------------+---------+----+----+ so that the race window no longer exists. [ziy@nvidia.com: move comment, per David] Link: https://lkml.kernel.org/r/5C9FA053-A4C6-4615-BE05-74E47A6462B3@nvidia.com Link: https://lkml.kernel.org/r/20260302203159.3208341-1-ziy@nvidia.com Fixes: 00527733d0dc ("mm/huge_memory: add two new (not yet used) functions for folio_split()") Signed-off-by: Zi Yan Reported-by: Bas van Dijk Closes: https://lore.kernel.org/all/CAKNNEtw5_kZomhkugedKMPOG-sxs5Q5OLumWJdiWXv+C9Yct0w@mail.gmail.com/ Tested-by: Lance Yang Reviewed-by: Lorenzo Stoakes Reviewed-by: Wei Yang Reviewed-by: Baolin Wang Cc: Barry Song Cc: David Hildenbrand Cc: Dev Jain Cc: Hugh Dickins Cc: Liam Howlett Cc: Matthew Wilcox (Oracle) Cc: Nico Pache Cc: Ryan Roberts Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- mm/huge_memory.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3631,6 +3631,7 @@ static int __split_unmapped_folio(struct const bool is_anon = folio_test_anon(folio); int old_order = folio_order(folio); int start_order = split_type == SPLIT_TYPE_UNIFORM ? new_order : old_order - 1; + struct folio *old_folio = folio; int split_order; /* @@ -3651,12 +3652,16 @@ static int __split_unmapped_folio(struct * uniform split has xas_split_alloc() called before * irq is disabled to allocate enough memory, whereas * non-uniform split can handle ENOMEM. + * Use the to-be-split folio, so that a parallel + * folio_try_get() waits on it until xarray is updated + * with after-split folios and the original one is + * unfrozen. */ - if (split_type == SPLIT_TYPE_UNIFORM) - xas_split(xas, folio, old_order); - else { + if (split_type == SPLIT_TYPE_UNIFORM) { + xas_split(xas, old_folio, old_order); + } else { xas_set_order(xas, folio->index, split_order); - xas_try_split(xas, folio, old_order); + xas_try_split(xas, old_folio, old_order); if (xas_error(xas)) return xas_error(xas); }