From: Lance Yang <lance.yang@linux.dev>
To: ziy@nvidia.com
Cc: willy@infradead.org, songliubraving@fb.com, clm@fb.com,
dsterba@suse.com, viro@zeniv.linux.org.uk, brauner@kernel.org,
jack@suse.cz, akpm@linux-foundation.org, david@kernel.org,
ljs@kernel.org, baolin.wang@linux.alibaba.com,
Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com,
dev.jain@arm.com, baohua@kernel.org, lance.yang@linux.dev,
vbabka@kernel.org, rppt@kernel.org, surenb@google.com,
mhocko@suse.com, shuah@kernel.org, linux-btrfs@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, linux-kselftest@vger.kernel.org
Subject: Re: [PATCH 7.2 v3 02/12] mm/khugepaged: add folio dirty check after try_to_unmap()
Date: Thu, 23 Apr 2026 16:30:50 +0800 [thread overview]
Message-ID: <20260423083050.68509-1-lance.yang@linux.dev> (raw)
In-Reply-To: <20260418024429.4055056-3-ziy@nvidia.com>
On Fri, Apr 17, 2026 at 10:44:19PM -0400, Zi Yan wrote:
>This check ensures the correctness of collapse read-only THPs for FSes
>after READ_ONLY_THP_FOR_FS is enabled by default for all FSes supporting
>PMD THP pagecache.
>
>READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps
>and inode->i_writecount to prevent any write to read-only to-be-collapsed
>folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the
>aforementioned mechanism will go away too. To ensure khugepaged functions
>as expected after the changes, skip if any folio is dirty after
>try_to_unmap(), since a dirty folio means this read-only folio
>got some writes via mmap can happen between try_to_unmap() and
>try_to_unmap_flush() via cached TLB entries and khugepaged does not support
>writable pagecache folio collapse yet.
>
>Signed-off-by: Zi Yan <ziy@nvidia.com>
>---
> mm/khugepaged.c | 25 +++++++++++++++++++++----
> 1 file changed, 21 insertions(+), 4 deletions(-)
>
>diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>index 3eb5d982d3d3..1c0fdc81d276 100644
>--- a/mm/khugepaged.c
>+++ b/mm/khugepaged.c
>@@ -1979,8 +1979,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> }
> } else if (folio_test_dirty(folio)) {
> /*
>- * khugepaged only works on read-only fd,
>- * so this page is dirty because it hasn't
>+ * This page is dirty because it hasn't
> * been flushed since first write. There
> * won't be new dirty pages.
> *
>@@ -2038,8 +2037,8 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> if (!is_shmem && (folio_test_dirty(folio) ||
> folio_test_writeback(folio))) {
> /*
>- * khugepaged only works on read-only fd, so this
>- * folio is dirty because it hasn't been flushed
>+ * khugepaged only works on clean file-backed folios,
>+ * so this folio is dirty because it hasn't been flushed
> * since first write.
> */
> result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
>@@ -2083,6 +2082,24 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> goto out_unlock;
> }
>
>+ /*
>+ * At this point, the folio is locked, unmapped. Make sure the
>+ * folio is clean, so that no one else is able to write to it,
>+ * since that would require taking the folio lock first.
>+ * Otherwise that means the folio was pointed by a dirty PTE and
>+ * some CPU might have a valid TLB entry with dirty bit set
>+ * still pointing to this folio and writes can happen without
>+ * causing a page table walk and folio lock acquisition before
>+ * the try_to_unmap_flush() below is done. After the collapse,
>+ * file-backed folio is not set as dirty and can be discarded
>+ * before any new write marks the folio dirty, causing data
>+ * corruption.
>+ */
>+ if (!is_shmem && folio_test_dirty(folio)) {
>+ result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
>+ goto out_unlock;
Looks buggy :)
This runs after folio_isolate_lru() and after xas_lock_irq(&xas) ...
If not missing something, "goto out_unlock" would leave the xarray lock
held and the folio off the LRU :)
Note that the block right above does call xas_unlock_irq(&xas), and it
also does call folio_putback_lru(folio):
---8<---
if (folio_ref_count(folio) != 2 + folio_nr_pages(folio)) {
result = SCAN_PAGE_COUNT;
xas_unlock_irq(&xas); <-
folio_putback_lru(folio); <-
goto out_unlock;
}
---
So we should follow the same cleanup as that block here, right?
next prev parent reply other threads:[~2026-04-23 8:31 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-18 2:44 [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 01/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check Zi Yan
2026-04-20 6:07 ` Baolin Wang
2026-04-23 2:43 ` Lance Yang
2026-04-23 2:51 ` Zi Yan
2026-04-23 4:47 ` Lance Yang
2026-04-18 2:44 ` [PATCH 7.2 v3 02/12] mm/khugepaged: add folio dirty check after try_to_unmap() Zi Yan
2026-04-20 6:28 ` Baolin Wang
2026-04-23 8:30 ` Lance Yang [this message]
2026-04-23 13:14 ` Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 03/12] mm/huge_memory: remove READ_ONLY_THP_FOR_FS from file_thp_enabled() Zi Yan
2026-04-20 6:31 ` Baolin Wang
2026-04-25 12:00 ` Zi Yan
2026-04-27 8:06 ` David Hildenbrand (Arm)
2026-04-27 11:52 ` Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 04/12] mm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_pmd_enabled() Zi Yan
2026-04-20 6:55 ` Baolin Wang
2026-04-20 14:57 ` Zi Yan
2026-04-21 2:12 ` Baolin Wang
2026-04-18 2:44 ` [PATCH 7.2 v3 05/12] mm: remove READ_ONLY_THP_FOR_FS Kconfig option Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 06/12] mm: fs: remove filemap_nr_thps*() functions and their users Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 07/12] fs: remove nr_thps from struct address_space Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 08/12] mm/huge_memory: remove folio split check for READ_ONLY_THP_FOR_FS Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 09/12] mm/truncate: use folio_split() in truncate_inode_partial_folio() Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 10/12] fs/btrfs: remove a comment referring to READ_ONLY_THP_FOR_FS Zi Yan
2026-04-18 2:44 ` [PATCH 7.2 v3 11/12] selftests/mm: remove READ_ONLY_THP_FOR_FS in khugepaged Zi Yan
2026-04-20 7:56 ` Baolin Wang
2026-04-18 2:44 ` [PATCH 7.2 v3 12/12] selftests/mm: remove READ_ONLY_THP_FOR_FS code from guard-regions Zi Yan
2026-04-18 9:27 ` [PATCH 7.2 v3 00/12] Remove read-only THP support for FSes without large folio support Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260423083050.68509-1-lance.yang@linux.dev \
--to=lance.yang@linux.dev \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=brauner@kernel.org \
--cc=clm@fb.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=dsterba@suse.com \
--cc=jack@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=npache@redhat.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shuah@kernel.org \
--cc=songliubraving@fb.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.