linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shivank Garg <shivankg@amd.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Zi Yan <ziy@nvidia.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Nico Pache <npache@redhat.com>,
	Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
	Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
	Steven Rostedt <rostedt@goodmis.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	Zach O'Keefe <zokeefe@google.com>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>,
	<linux-trace-kernel@vger.kernel.org>, <shivankg@amd.com>,
	Branden Moore <Branden.Moore@amd.com>
Subject: [PATCH 1/2] mm/khugepaged: do synchronous writeback for MADV_COLLAPSE
Date: Mon, 10 Nov 2025 11:32:53 +0000	[thread overview]
Message-ID: <20251110113254.77822-1-shivankg@amd.com> (raw)

When MADV_COLLAPSE is called on file-backed mappings (e.g., executable
text sections), the pages may still be dirty from recent writes. The
current code triggers an async flush via filemap_flush() and returns
SCAN_FAIL, requiring userspace to retry the operation.

This is problematic for userspace that wants to collapse text pages into
THPs to reduce ITLB pressure. The first madvise() call always fails with
EINVAL, and only subsequent calls succeed after writeback completes.

For direct MADV_COLLAPSE calls (!cc->is_khugepaged), perform a synchronous
writeback using filemap_write_and_wait_range() before scanning the folios.
This ensures that folios are clean on the first attempt.

Reported-by: Branden Moore <Branden.Moore@amd.com>
Closes: https://lore.kernel.org/all/4e26fe5e-7374-467c-a333-9dd48f85d7cc@amd.com
Fixes: 34488399fa08 ("mm/madvise: add file and shmem support to MADV_COLLAPSE")
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Shivank Garg <shivankg@amd.com>
---
Applies cleanly on:
6.18-rc5
mm-stable:e9a6fb0bc


 mm/khugepaged.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index abe54f0043c7..d08ed6eb9ce1 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -21,6 +21,7 @@
 #include <linux/shmem_fs.h>
 #include <linux/dax.h>
 #include <linux/ksm.h>
+#include <linux/backing-dev.h>
 
 #include <asm/tlb.h>
 #include <asm/pgalloc.h>
@@ -1845,6 +1846,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 	struct page *dst;
 	struct folio *folio, *tmp, *new_folio;
 	pgoff_t index = 0, end = start + HPAGE_PMD_NR;
+	loff_t range_start, range_end;
 	LIST_HEAD(pagelist);
 	XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER);
 	int nr_none = 0, result = SCAN_SUCCEED;
@@ -1853,6 +1855,21 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 	VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem);
 	VM_BUG_ON(start & (HPAGE_PMD_NR - 1));
 
+	/*
+	 * For MADV_COLLAPSE on regular files, do a synchronous writeback
+	 * to ensure dirty folios are flushed before we attempt collapse.
+	 * This is a best-effort approach to avoid failing on the first
+	 * attempt when freshly-written executable text is still dirty.
+	 */
+	if (!is_shmem && cc && !cc->is_khugepaged && mapping_can_writeback(mapping)) {
+		range_start = (loff_t)start << PAGE_SHIFT;
+		range_end = ((loff_t)end << PAGE_SHIFT) - 1;
+		if (filemap_write_and_wait_range(mapping, range_start, range_end)) {
+			result = SCAN_FAIL;
+			goto out;
+		}
+	}
+
 	result = alloc_charge_folio(&new_folio, mm, cc);
 	if (result != SCAN_SUCCEED)
 		goto out;

base-commit: e9a6fb0bcdd7609be6969112f3fbfcce3b1d4a7c
-- 
2.43.0


             reply	other threads:[~2025-11-10 11:33 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-10 11:32 Shivank Garg [this message]
2025-11-10 11:32 ` [PATCH 2/2] mm/khugepaged: return EAGAIN for transient dirty pages in MADV_COLLAPSE Shivank Garg
2025-11-10 11:56   ` Lorenzo Stoakes
2025-11-19 10:25     ` Garg, Shivank
2025-11-19 18:16       ` Lorenzo Stoakes
2025-11-20  7:03         ` Garg, Shivank
2025-11-10 13:46   ` Dev Jain
2025-11-19  7:00     ` Garg, Shivank
2025-11-10 12:01 ` [PATCH 1/2] mm/khugepaged: do synchronous writeback for MADV_COLLAPSE Lorenzo Stoakes
2025-11-10 13:07   ` Garg, Shivank
2025-11-10 13:22     ` Lorenzo Stoakes
2025-11-10 16:06       ` Lorenzo Stoakes
2025-11-10 16:55         ` Zi Yan
2025-11-10 19:40           ` Garg, Shivank
2025-11-10 19:48             ` Lorenzo Stoakes
2025-11-10 19:53               ` Lorenzo Stoakes
2025-11-10 21:16                 ` Lorenzo Stoakes
2025-11-10 21:56                   ` Zi Yan
2025-11-10 22:03                     ` Lorenzo Stoakes
2025-11-11  5:26                   ` Garg, Shivank
2025-11-10 13:24   ` Dev Jain
2025-11-10 13:36     ` Lorenzo Stoakes
2025-11-11  3:41       ` Dev Jain
2025-11-10 13:47 ` Matthew Wilcox
2025-11-10 13:52   ` Lorenzo Stoakes
2025-11-10 14:17     ` Matthew Wilcox
2025-11-10 14:20     ` Garg, Shivank
2025-11-10 14:23       ` Lorenzo Stoakes
2025-11-10 14:28       ` Matthew Wilcox
2025-11-11  5:58         ` Garg, Shivank
2025-11-13 15:30           ` David Hildenbrand (Red Hat)
2025-11-10 14:00   ` David Hildenbrand (Red Hat)
2025-11-10 14:03     ` Matthew Wilcox
2025-11-10 14:05       ` Lorenzo Stoakes
2025-11-10 14:54         ` David Hildenbrand (Red Hat)
2025-11-13 14:21 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251110113254.77822-1-shivankg@amd.com \
    --to=shivankg@amd.com \
    --cc=Branden.Moore@amd.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=npache@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=ryan.roberts@arm.com \
    --cc=ziy@nvidia.com \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).