All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lance Yang <lance.yang@linux.dev>
To: ziy@nvidia.com
Cc: akpm@linux-foundation.org, david@kernel.org, willy@infradead.org,
	songliubraving@fb.com, clm@fb.com, dsterba@suse.com,
	viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz,
	ljs@kernel.org, baolin.wang@linux.alibaba.com,
	Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com,
	dev.jain@arm.com, baohua@kernel.org, lance.yang@linux.dev,
	vbabka@kernel.org, rppt@kernel.org, surenb@google.com,
	mhocko@suse.com, shuah@kernel.org, linux-btrfs@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org
Subject: Re: [PATCH v5 02/14] mm/khugepaged: add folio dirty check after try_to_unmap()
Date: Wed,  6 May 2026 13:23:57 +0800	[thread overview]
Message-ID: <20260506052357.91716-1-lance.yang@linux.dev> (raw)
In-Reply-To: <20260429152924.727124-3-ziy@nvidia.com>


On Wed, Apr 29, 2026 at 11:29:12AM -0400, Zi Yan wrote:
>This check ensures the correctness of read-only PMD folio collapse
>after it is enabled for all FSes supporting PMD pagecache folios and
>replaces READ_ONLY_THP_FOR_FS.
>
>READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps
>and inode->i_writecount to prevent any write to read-only to-be-collapsed
>folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the
>aforementioned mechanism will go away too. To ensure khugepaged functions
>as expected after the changes, skip if any folio is dirty after
>try_to_unmap(), since a dirty folio at that point means this read-only
>folio can get writes between try_to_unmap() and try_to_unmap_flush() via
>cached TLB entries and khugepaged does not support writable pagecache folio
>collapse yet.
>
>Signed-off-by: Zi Yan <ziy@nvidia.com>
>Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>Acked-by: David Hildenbrand (Arm) <david@kernel.org>
>---
> mm/khugepaged.c | 28 ++++++++++++++++++++++++----
> 1 file changed, 24 insertions(+), 4 deletions(-)
>
>diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>index 6808f2b48d864..71209a72195ab 100644
>--- a/mm/khugepaged.c
>+++ b/mm/khugepaged.c
>@@ -2327,8 +2327,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> 				}
> 			} else if (folio_test_dirty(folio)) {
> 				/*
>-				 * khugepaged only works on read-only fd,
>-				 * so this page is dirty because it hasn't
>+				 * This page is dirty because it hasn't
> 				 * been flushed since first write. There
> 				 * won't be new dirty pages.
> 				 *
>@@ -2386,8 +2385,8 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> 		if (!is_shmem && (folio_test_dirty(folio) ||
> 				  folio_test_writeback(folio))) {
> 			/*
>-			 * khugepaged only works on read-only fd, so this
>-			 * folio is dirty because it hasn't been flushed
>+			 * khugepaged only works on clean file-backed folios,
>+			 * so this folio is dirty because it hasn't been flushed
> 			 * since first write.
> 			 */
> 			result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
>@@ -2431,6 +2430,27 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> 			goto out_unlock;
> 		}
> 
>+		/*
>+		 * At this point, the folio is locked and unmapped. If the PTE
>+		 * was dirty, try_to_unmap() has transferred the dirty bit to
>+		 * the folio and we must not collapse it into a clean
>+		 * file-backed folio.
>+		 *
>+		 * If the folio is clean here, no one can write it until we
>+		 * drop the folio lock. A write through a stale TLB entry came
>+		 * from a clean PTE and must fault because the PTE has been
>+		 * cleared; the fault path has to take the folio lock before

Yeah, try_to_unmap_one() also already documents the required arch
guarantee for a clean cached TLB entry after the PTE is cleared.

			/*
			 * We clear the PTE but do not flush so potentially
			 * a remote CPU could still be writing to the folio.
			 * If the entry was previously clean then the
			 * architecture must guarantee that a clear->dirty
			 * transition on a cached TLB entry is written through
			 * and traps if the PTE is unmapped.
			 */

Lesson learned :)

>+		 * installing a writable mapping. Buffered write paths also
>+		 * have to take the folio lock before modifying file contents
>+		 * without a mapping, typically via write_begin_get_folio().
>+		 */
>+		if (!is_shmem && folio_test_dirty(folio)) {
>+			result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
>+			xas_unlock_irq(&xas);
>+			folio_putback_lru(folio);
>+			goto out_unlock;
>+		}

LGTM.
Reviewed-by: Lance Yang <lance.yang@linux.dev>

  parent reply	other threads:[~2026-05-06  5:24 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-29 15:29 [PATCH v5 00/14] Remove CONFIG_READ_ONLY_THP_FOR_FS and enable file THP for writable files Zi Yan
2026-04-29 15:29 ` [PATCH v5 01/14] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
2026-04-30 14:37   ` Zi Yan
2026-04-30 15:04     ` Andrew Morton
2026-05-04  3:48   ` Nico Pache
2026-05-07  3:29     ` Lance Yang
2026-05-07  5:52       ` Zi Yan
2026-05-07  6:08   ` Zi Yan
2026-05-07  6:57     ` Zi Yan
2026-05-08 19:39   ` David Hildenbrand (Arm)
2026-04-29 15:29 ` [PATCH v5 02/14] mm/khugepaged: add folio dirty check after try_to_unmap() Zi Yan
2026-04-30 15:11   ` Zi Yan
2026-05-04  3:53   ` Nico Pache
2026-05-06  5:23   ` Lance Yang [this message]
2026-04-29 15:29 ` [PATCH v5 03/14] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
2026-05-04  3:57   ` Nico Pache
2026-05-07  4:29   ` Lance Yang
2026-05-08 19:43   ` David Hildenbrand (Arm)
2026-04-29 15:29 ` [PATCH v5 04/14] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_enabled() Zi Yan
2026-05-04  4:00   ` Nico Pache
2026-05-07  4:49   ` Lance Yang
2026-05-08 18:54     ` Andrew Morton
2026-05-11  7:15   ` Lance Yang
2026-04-29 15:35 ` [PATCH v5 05/14] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
2026-05-04  4:02   ` Nico Pache
2026-05-07 12:48   ` Lance Yang
2026-05-08  2:52   ` Wei Yang
2026-05-08  3:22     ` Lance Yang
2026-04-29 15:35 ` [PATCH v5 06/14] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
2026-05-07 12:59   ` Lance Yang
2026-04-29 15:35 ` [PATCH v5 07/14] fs: remove nr_thps from struct address_space Zi Yan
2026-05-04  4:11   ` Nico Pache
2026-04-29 15:35 ` [PATCH v5 08/14] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
2026-04-29 15:35 ` [PATCH v5 09/14] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
2026-04-30 15:12   ` Zi Yan
2026-05-08  7:01   ` Lance Yang
2026-05-08 19:46   ` David Hildenbrand (Arm)
2026-04-29 15:35 ` [PATCH v5 10/14] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
2026-04-29 15:35 ` [PATCH v5 11/14] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
2026-04-30 15:16   ` Zi Yan
2026-04-30 15:27     ` Zi Yan
2026-05-08 19:48       ` David Hildenbrand (Arm)
2026-05-04  4:23   ` Nico Pache
2026-05-06 13:11     ` Zi Yan
2026-05-08 19:51       ` David Hildenbrand (Arm)
2026-05-18 23:43         ` Zi Yan
2026-05-04 10:11   ` Nico Pache
2026-05-06 13:15     ` Zi Yan
2026-05-07  6:35       ` Nico Pache
2026-05-07  7:21         ` Zi Yan
2026-05-07  7:24   ` Zi Yan
2026-05-08 20:06   ` David Hildenbrand (Arm)
2026-05-17  2:45     ` Zi Yan
2026-04-29 15:35 ` [PATCH v5 12/14] selftests/mm: remove READ_ONLY_THP_FOR_FS code from guard-regions Zi Yan
2026-04-29 15:35 ` [PATCH v5 13/14] mm/khugepaged: enable clean pagecache folio collapse for writable files Zi Yan
2026-04-30 15:18   ` Zi Yan
2026-05-08 20:09     ` David Hildenbrand (Arm)
2026-05-08  7:46   ` Lance Yang
2026-05-08 20:13   ` David Hildenbrand (Arm)
2026-05-17  7:29     ` Zi Yan
2026-04-29 15:35 ` [PATCH v5 14/14] selftests/mm: add writable-file collapse tests for khugepaged Zi Yan
2026-04-29 16:13 ` [PATCH v5 00/14] Remove CONFIG_READ_ONLY_THP_FOR_FS and enable file THP for writable files Andrew Morton
2026-05-09 22:10   ` Zi Yan
2026-05-11  7:19     ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260506052357.91716-1-lance.yang@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=brauner@kernel.org \
    --cc=clm@fb.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=dsterba@suse.com \
    --cc=jack@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shuah@kernel.org \
    --cc=songliubraving@fb.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.