Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Zi Yan <ziy@nvidia.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Song Liu <songliubraving@fb.com>
Cc: Chris Mason <clm@fb.com>, David Sterba <dsterba@suse.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	Lorenzo Stoakes <ljs@kernel.org>, Zi Yan <ziy@nvidia.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Nico Pache <npache@redhat.com>,
	Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
	Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Shuah Khan <shuah@kernel.org>,
	<linux-btrfs@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-fsdevel@vger.kernel.org>, <linux-mm@kvack.org>,
	<linux-kselftest@vger.kernel.org>
Subject: Re: [PATCH v5 02/14] mm/khugepaged: add folio dirty check after try_to_unmap()
Date: Thu, 30 Apr 2026 11:11:09 -0400	[thread overview]
Message-ID: <CCA78A7B-95FB-4B25-BD3C-FC690D2B1343@nvidia.com> (raw)
In-Reply-To: <20260429152924.727124-3-ziy@nvidia.com>

On 29 Apr 2026, at 11:29, Zi Yan wrote:

> This check ensures the correctness of read-only PMD folio collapse
> after it is enabled for all FSes supporting PMD pagecache folios and
> replaces READ_ONLY_THP_FOR_FS.
>
> READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps
> and inode->i_writecount to prevent any write to read-only to-be-collapsed
> folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the
> aforementioned mechanism will go away too. To ensure khugepaged functions
> as expected after the changes, skip if any folio is dirty after
> try_to_unmap(), since a dirty folio at that point means this read-only
> folio can get writes between try_to_unmap() and try_to_unmap_flush() via
> cached TLB entries and khugepaged does not support writable pagecache folio
> collapse yet.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> ---
>  mm/khugepaged.c | 28 ++++++++++++++++++++++++----
>  1 file changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 6808f2b48d864..71209a72195ab 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2327,8 +2327,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>  				}
>  			} else if (folio_test_dirty(folio)) {
>  				/*
> -				 * khugepaged only works on read-only fd,
> -				 * so this page is dirty because it hasn't
> +				 * This page is dirty because it hasn't
>  				 * been flushed since first write. There
>  				 * won't be new dirty pages.
>  				 *
> @@ -2386,8 +2385,8 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>  		if (!is_shmem && (folio_test_dirty(folio) ||
>  				  folio_test_writeback(folio))) {
>  			/*
> -			 * khugepaged only works on read-only fd, so this
> -			 * folio is dirty because it hasn't been flushed
> +			 * khugepaged only works on clean file-backed folios,
> +			 * so this folio is dirty because it hasn't been flushed
>  			 * since first write.
>  			 */
>  			result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
> @@ -2431,6 +2430,27 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>  			goto out_unlock;
>  		}
>
> +		/*
> +		 * At this point, the folio is locked and unmapped. If the PTE
> +		 * was dirty, try_to_unmap() has transferred the dirty bit to
> +		 * the folio and we must not collapse it into a clean
> +		 * file-backed folio.
> +		 *
> +		 * If the folio is clean here, no one can write it until we
> +		 * drop the folio lock. A write through a stale TLB entry came
> +		 * from a clean PTE and must fault because the PTE has been
> +		 * cleared; the fault path has to take the folio lock before
> +		 * installing a writable mapping. Buffered write paths also
> +		 * have to take the folio lock before modifying file contents
> +		 * without a mapping, typically via write_begin_get_folio().
> +		 */
> +		if (!is_shmem && folio_test_dirty(folio)) {
> +			result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
> +			xas_unlock_irq(&xas);
> +			folio_putback_lru(folio);
> +			goto out_unlock;

Sashiko asked:

Could a concurrent operation, such as truncate(), lock the folio, remove it
from the page cache, and drop the final reference while we are jumping to
xa_unlocked?
If the page is freed back to the buddy allocator before try_to_unmap_flush()
completes, could this leave a stale TLB entry pointing to the freed page,
potentially allowing memory corruption if the page is reallocated?

Answer:

The folio still has pagecache and LRU refs before try_to_unmap_flush() and
the truncate and free operation cannot be completed in that small window.

> +		}
> +
>  		/*
>  		 * Accumulate the folios that are being collapsed.
>  		 */
> -- 
> 2.53.0


Best Regards,
Yan, Zi

  reply	other threads:[~2026-04-30 15:11 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-29 15:29 [PATCH v5 00/14] Remove CONFIG_READ_ONLY_THP_FOR_FS and enable file THP for writable files Zi Yan
2026-04-29 15:29 ` [PATCH v5 01/14] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
2026-04-30 14:37   ` Zi Yan
2026-04-30 15:04     ` Andrew Morton
2026-05-04  3:48   ` Nico Pache
2026-05-07  3:29     ` Lance Yang
2026-05-07  5:52       ` Zi Yan
2026-05-07  6:08   ` Zi Yan
2026-05-07  6:57     ` Zi Yan
2026-05-08 19:39   ` David Hildenbrand (Arm)
2026-04-29 15:29 ` [PATCH v5 02/14] mm/khugepaged: add folio dirty check after try_to_unmap() Zi Yan
2026-04-30 15:11   ` Zi Yan [this message]
2026-05-04  3:53   ` Nico Pache
2026-05-06  5:23   ` Lance Yang
2026-04-29 15:29 ` [PATCH v5 03/14] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
2026-05-04  3:57   ` Nico Pache
2026-05-07  4:29   ` Lance Yang
2026-05-08 19:43   ` David Hildenbrand (Arm)
2026-04-29 15:29 ` [PATCH v5 04/14] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_enabled() Zi Yan
2026-05-04  4:00   ` Nico Pache
2026-05-07  4:49   ` Lance Yang
2026-05-08 18:54     ` Andrew Morton
2026-04-29 15:35 ` [PATCH v5 05/14] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
2026-05-04  4:02   ` Nico Pache
2026-05-07 12:48   ` Lance Yang
2026-05-08  2:52   ` Wei Yang
2026-05-08  3:22     ` Lance Yang
2026-04-29 15:35 ` [PATCH v5 06/14] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
2026-05-07 12:59   ` Lance Yang
2026-04-29 15:35 ` [PATCH v5 07/14] fs: remove nr_thps from struct address_space Zi Yan
2026-05-04  4:11   ` Nico Pache
2026-04-29 15:35 ` [PATCH v5 08/14] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
2026-04-29 15:35 ` [PATCH v5 09/14] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
2026-04-30 15:12   ` Zi Yan
2026-05-08  7:01   ` Lance Yang
2026-05-08 19:46   ` David Hildenbrand (Arm)
2026-04-29 15:35 ` [PATCH v5 10/14] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
2026-04-29 15:35 ` [PATCH v5 11/14] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
2026-04-30 15:16   ` Zi Yan
2026-04-30 15:27     ` Zi Yan
2026-05-08 19:48       ` David Hildenbrand (Arm)
2026-05-04  4:23   ` Nico Pache
2026-05-06 13:11     ` Zi Yan
2026-05-08 19:51       ` David Hildenbrand (Arm)
2026-05-04 10:11   ` Nico Pache
2026-05-06 13:15     ` Zi Yan
2026-05-07  6:35       ` Nico Pache
2026-05-07  7:21         ` Zi Yan
2026-05-07  7:24   ` Zi Yan
2026-05-08 20:06   ` David Hildenbrand (Arm)
2026-04-29 15:35 ` [PATCH v5 12/14] selftests/mm: remove READ_ONLY_THP_FOR_FS code from guard-regions Zi Yan
2026-04-29 15:35 ` [PATCH v5 13/14] mm/khugepaged: enable clean pagecache folio collapse for writable files Zi Yan
2026-04-30 15:18   ` Zi Yan
2026-05-08 20:09     ` David Hildenbrand (Arm)
2026-05-08  7:46   ` Lance Yang
2026-05-08 20:13   ` David Hildenbrand (Arm)
2026-04-29 15:35 ` [PATCH v5 14/14] selftests/mm: add writable-file collapse tests for khugepaged Zi Yan
2026-04-29 16:13 ` [PATCH v5 00/14] Remove CONFIG_READ_ONLY_THP_FOR_FS and enable file THP for writable files Andrew Morton
2026-05-09 22:10   ` Zi Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CCA78A7B-95FB-4B25-BD3C-FC690D2B1343@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=brauner@kernel.org \
    --cc=clm@fb.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=dsterba@suse.com \
    --cc=jack@suse.cz \
    --cc=lance.yang@linux.dev \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shuah@kernel.org \
    --cc=songliubraving@fb.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox