All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zi Yan <ziy@nvidia.com>
To: "Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Song Liu <songliubraving@fb.com>
Cc: Chris Mason <clm@fb.com>, David Sterba <dsterba@suse.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <ljs@kernel.org>, Zi Yan <ziy@nvidia.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Nico Pache <npache@redhat.com>,
	Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
	Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Shuah Khan <shuah@kernel.org>,
	linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kselftest@vger.kernel.org
Subject: [PATCH 7.2 v3 02/12] mm/khugepaged: add folio dirty check after try_to_unmap()
Date: Fri, 17 Apr 2026 22:44:19 -0400	[thread overview]
Message-ID: <20260418024429.4055056-3-ziy@nvidia.com> (raw)
In-Reply-To: <20260418024429.4055056-1-ziy@nvidia.com>

This check ensures the correctness of collapse read-only THPs for FSes
after READ_ONLY_THP_FOR_FS is enabled by default for all FSes supporting
PMD THP pagecache.

READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps
and inode->i_writecount to prevent any write to read-only to-be-collapsed
folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the
aforementioned mechanism will go away too. To ensure khugepaged functions
as expected after the changes, skip if any folio is dirty after
try_to_unmap(), since a dirty folio means this read-only folio
got some writes via mmap can happen between try_to_unmap() and
try_to_unmap_flush() via cached TLB entries and khugepaged does not support
writable pagecache folio collapse yet.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/khugepaged.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 3eb5d982d3d3..1c0fdc81d276 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1979,8 +1979,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 				}
 			} else if (folio_test_dirty(folio)) {
 				/*
-				 * khugepaged only works on read-only fd,
-				 * so this page is dirty because it hasn't
+				 * This page is dirty because it hasn't
 				 * been flushed since first write. There
 				 * won't be new dirty pages.
 				 *
@@ -2038,8 +2037,8 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 		if (!is_shmem && (folio_test_dirty(folio) ||
 				  folio_test_writeback(folio))) {
 			/*
-			 * khugepaged only works on read-only fd, so this
-			 * folio is dirty because it hasn't been flushed
+			 * khugepaged only works on clean file-backed folios,
+			 * so this folio is dirty because it hasn't been flushed
 			 * since first write.
 			 */
 			result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
@@ -2083,6 +2082,24 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 			goto out_unlock;
 		}
 
+		/*
+		 * At this point, the folio is locked, unmapped. Make sure the
+		 * folio is clean, so that no one else is able to write to it,
+		 * since that would require taking the folio lock first.
+		 * Otherwise that means the folio was pointed by a dirty PTE and
+		 * some CPU might have a valid TLB entry with dirty bit set
+		 * still pointing to this folio and writes can happen without
+		 * causing a page table walk and folio lock acquisition before
+		 * the try_to_unmap_flush() below is done. After the collapse,
+		 * file-backed folio is not set as dirty and can be discarded
+		 * before any new write marks the folio dirty, causing data
+		 * corruption.
+		 */
+		if (!is_shmem && folio_test_dirty(folio)) {
+			result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
+			goto out_unlock;
+		}
+
 		/*
 		 * Accumulate the folios that are being collapsed.
 		 */
-- 
2.43.0


  parent reply	other threads:[~2026-04-18  2:44 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-18  2:44 [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
2026-04-18  2:44 ` [PATCH 7.2 v3 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
2026-04-20  6:07   ` Baolin Wang
2026-04-23  2:43   ` Lance Yang
2026-04-23  2:51     ` Zi Yan
2026-04-23  4:47       ` Lance Yang
2026-04-18  2:44 ` Zi Yan [this message]
2026-04-20  6:28   ` [PATCH 7.2 v3 02/12] mm/khugepaged: add folio dirty check after try_to_unmap() Baolin Wang
2026-04-23  8:30   ` Lance Yang
2026-04-23 13:14     ` Zi Yan
2026-04-18  2:44 ` [PATCH 7.2 v3 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
2026-04-20  6:31   ` Baolin Wang
2026-04-25 12:00   ` Zi Yan
2026-04-27  8:06     ` David Hildenbrand (Arm)
2026-04-27 11:52       ` Zi Yan
2026-04-18  2:44 ` [PATCH 7.2 v3 04/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled() Zi Yan
2026-04-20  6:55   ` Baolin Wang
2026-04-20 14:57     ` Zi Yan
2026-04-21  2:12       ` Baolin Wang
2026-04-18  2:44 ` [PATCH 7.2 v3 05/12] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
2026-04-18  2:44 ` [PATCH 7.2 v3 06/12] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
2026-04-18  2:44 ` [PATCH 7.2 v3 07/12] fs: remove nr_thps from struct address_space Zi Yan
2026-04-18  2:44 ` [PATCH 7.2 v3 08/12] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
2026-04-18  2:44 ` [PATCH 7.2 v3 09/12] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
2026-04-18  2:44 ` [PATCH 7.2 v3 10/12] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
2026-04-18  2:44 ` [PATCH 7.2 v3 11/12] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
2026-04-20  7:56   ` Baolin Wang
2026-04-18  2:44 ` [PATCH 7.2 v3 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS code from guard-regions Zi Yan
2026-04-18  9:27 ` [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260418024429.4055056-3-ziy@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=brauner@kernel.org \
    --cc=clm@fb.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=dsterba@suse.com \
    --cc=jack@suse.cz \
    --cc=lance.yang@linux.dev \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shuah@kernel.org \
    --cc=songliubraving@fb.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.