public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Lance Yang <lance.yang@linux.dev>
To: ziy@nvidia.com
Cc: willy@infradead.org, songliubraving@fb.com, clm@fb.com,
	dsterba@suse.com, viro@zeniv.linux.org.uk, brauner@kernel.org,
	jack@suse.cz, akpm@linux-foundation.org, david@kernel.org,
	ljs@kernel.org, baolin.wang@linux.alibaba.com,
	Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com,
	dev.jain@arm.com, baohua@kernel.org, lance.yang@linux.dev,
	vbabka@kernel.org, rppt@kernel.org, surenb@google.com,
	mhocko@suse.com, shuah@kernel.org, linux-btrfs@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org
Subject: Re: [PATCH 7.2 v3 02/12] mm/khugepaged: add folio dirty check after try_to_unmap()
Date: Thu, 23 Apr 2026 16:30:50 +0800	[thread overview]
Message-ID: <20260423083050.68509-1-lance.yang@linux.dev> (raw)
In-Reply-To: <20260418024429.4055056-3-ziy@nvidia.com>


On Fri, Apr 17, 2026 at 10:44:19PM -0400, Zi Yan wrote:
>This check ensures the correctness of collapse read-only THPs for FSes
>after READ_ONLY_THP_FOR_FS is enabled by default for all FSes supporting
>PMD THP pagecache.
>
>READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps
>and inode->i_writecount to prevent any write to read-only to-be-collapsed
>folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the
>aforementioned mechanism will go away too. To ensure khugepaged functions
>as expected after the changes, skip if any folio is dirty after
>try_to_unmap(), since a dirty folio means this read-only folio
>got some writes via mmap can happen between try_to_unmap() and
>try_to_unmap_flush() via cached TLB entries and khugepaged does not support
>writable pagecache folio collapse yet.
>
>Signed-off-by: Zi Yan <ziy@nvidia.com>
>---
> mm/khugepaged.c | 25 +++++++++++++++++++++----
> 1 file changed, 21 insertions(+), 4 deletions(-)
>
>diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>index 3eb5d982d3d3..1c0fdc81d276 100644
>--- a/mm/khugepaged.c
>+++ b/mm/khugepaged.c
>@@ -1979,8 +1979,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> 				}
> 			} else if (folio_test_dirty(folio)) {
> 				/*
>-				 * khugepaged only works on read-only fd,
>-				 * so this page is dirty because it hasn't
>+				 * This page is dirty because it hasn't
> 				 * been flushed since first write. There
> 				 * won't be new dirty pages.
> 				 *
>@@ -2038,8 +2037,8 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> 		if (!is_shmem && (folio_test_dirty(folio) ||
> 				  folio_test_writeback(folio))) {
> 			/*
>-			 * khugepaged only works on read-only fd, so this
>-			 * folio is dirty because it hasn't been flushed
>+			 * khugepaged only works on clean file-backed folios,
>+			 * so this folio is dirty because it hasn't been flushed
> 			 * since first write.
> 			 */
> 			result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
>@@ -2083,6 +2082,24 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> 			goto out_unlock;
> 		}
> 
>+		/*
>+		 * At this point, the folio is locked, unmapped. Make sure the
>+		 * folio is clean, so that no one else is able to write to it,
>+		 * since that would require taking the folio lock first.
>+		 * Otherwise that means the folio was pointed by a dirty PTE and
>+		 * some CPU might have a valid TLB entry with dirty bit set
>+		 * still pointing to this folio and writes can happen without
>+		 * causing a page table walk and folio lock acquisition before
>+		 * the try_to_unmap_flush() below is done. After the collapse,
>+		 * file-backed folio is not set as dirty and can be discarded
>+		 * before any new write marks the folio dirty, causing data
>+		 * corruption.
>+		 */
>+		if (!is_shmem && folio_test_dirty(folio)) {
>+			result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
>+			goto out_unlock;

Looks buggy :)

This runs after folio_isolate_lru() and after xas_lock_irq(&xas) ...

If not missing something, "goto out_unlock" would leave the xarray lock
held and the folio off the LRU :)

Note that the block right above does call xas_unlock_irq(&xas), and it
also does call folio_putback_lru(folio):

---8<---
		if (folio_ref_count(folio) != 2 + folio_nr_pages(folio)) {
			result = SCAN_PAGE_COUNT;
			xas_unlock_irq(&xas);     <-
			folio_putback_lru(folio); <-
			goto out_unlock;
		}
---

So we should follow the same cleanup as that block here, right?


  parent reply	other threads:[~2026-04-23  8:31 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-18  2:44 [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
2026-04-18  2:44 ` [PATCH 7.2 v3 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
2026-04-20  6:07   ` Baolin Wang
2026-04-23  2:43   ` Lance Yang
2026-04-23  2:51     ` Zi Yan
2026-04-23  4:47       ` Lance Yang
2026-04-18  2:44 ` [PATCH 7.2 v3 02/12] mm/khugepaged: add folio dirty check after try_to_unmap() Zi Yan
2026-04-20  6:28   ` Baolin Wang
2026-04-23  8:30   ` Lance Yang [this message]
2026-04-18  2:44 ` [PATCH 7.2 v3 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
2026-04-20  6:31   ` Baolin Wang
2026-04-18  2:44 ` [PATCH 7.2 v3 04/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled() Zi Yan
2026-04-20  6:55   ` Baolin Wang
2026-04-20 14:57     ` Zi Yan
2026-04-21  2:12       ` Baolin Wang
2026-04-18  2:44 ` [PATCH 7.2 v3 05/12] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
2026-04-18  2:44 ` [PATCH 7.2 v3 06/12] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
2026-04-18  2:44 ` [PATCH 7.2 v3 07/12] fs: remove nr_thps from struct address_space Zi Yan
2026-04-18  2:44 ` [PATCH 7.2 v3 08/12] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
2026-04-18  2:44 ` [PATCH 7.2 v3 09/12] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
2026-04-18  2:44 ` [PATCH 7.2 v3 10/12] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
2026-04-18  2:44 ` [PATCH 7.2 v3 11/12] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
2026-04-20  7:56   ` Baolin Wang
2026-04-18  2:44 ` [PATCH 7.2 v3 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS code from guard-regions Zi Yan
2026-04-18  9:27 ` [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260423083050.68509-1-lance.yang@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=brauner@kernel.org \
    --cc=clm@fb.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=dsterba@suse.com \
    --cc=jack@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shuah@kernel.org \
    --cc=songliubraving@fb.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox