Linux Trace Kernel
 help / color / mirror / Atom feed
From: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
To: Wandun <chenwandun1@gmail.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org,
	linux-rt-devel@lists.linux.dev
Cc: akpm@linux-foundation.org, surenb@google.com, mhocko@suse.com,
	jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com,
	rostedt@goodmis.org, mhiramat@kernel.org,
	mathieu.desnoyers@efficios.com, david@kernel.org, ljs@kernel.org,
	liam@infradead.org, rppt@kernel.org, bigeasy@linutronix.de,
	clrkwllms@kernel.org, Alexander.Krabler@kuka.com,
	Hugh Dickins <hughd@google.com>
Subject: Re: [RFC PATCH 1/3] mm/compaction: skip isolate mlocked folios when compact_unevictable_allowed=0
Date: Mon, 22 Jun 2026 11:55:34 +0200	[thread overview]
Message-ID: <c8793c0f-7156-4cb7-9e6e-7909397e2fff@kernel.org> (raw)
In-Reply-To: <040788a9-e0d5-478e-bb48-3d22b8b41020@gmail.com>

On 6/18/26 13:43, Wandun wrote:
> 
> 
> On 6/18/26 02:52, Vlastimil Babka (SUSE) wrote:
>> On 6/4/26 04:38, Wandun Chen wrote:
>>> From: Wandun Chen <chenwandun@lixiang.com>
>>>
>>> compact_unevictable_allowed is default 0 under PREEMPT_RT,
>>> isolate_migratepages_block() skips folios with PG_unevictable set.
>>> However, mlock_folio() sets PG_mlocked immediately but defers
>>> PG_unevictable to mlock_folio_batch(), result in a folio with
>>> PG_mlocked=1 but PG_unevictable=0. Compaction will isolate such a
>>> folio.
>>>
>>> Fix by checking folio_test_mlocked() together with the existing
>>> folio_test_unevictable() check.
>>>
>>> A similar issue has been reported by Alexander Krabler on a 6.12-rt
>>> aarch64 system. Vlastimil suggested to check the mlocked flag [1].
>>>
>>> Reported-by: Alexander Krabler <Alexander.Krabler@kuka.com>
>>> Closes: https://lore.kernel.org/all/DU0PR01MB10385345F7153F334100981888259A@DU0PR01MB10385.eurprd01.prod.exchangelabs.com/
>>> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
>>> Signed-off-by: Wandun Chen <chenwandun@lixiang.com>
>>> Link: https://lore.kernel.org/all/33275585-f2db-4779-89f0-3ae24b455a67@suse.cz/ [1]
>> 
>> Well in that thread, Hugh doubted my suggestion and then it seems we didn't
>> concluded anything. Did you actually in practice observe the issue that
>> Alexander had, and that this patch fixed it, or is that theoretical?
>> 
> Yes, I wrote a test case that can reproduce it in a few second.
> 
> The test case contains 3 steps:
> 1. mlockall
> 2. mmap file(2GB) + trigger file write page fault;
> 3. during step 1, trigger compact via /proc/sys/vm/compact_memory
> 
> 
> My reproduction environment is qemu with 4GB ram, 8 core, aarch64,
> preempt_rt and includes the tracepoint in patch 02.
> After running the reproduction program for a few seconds, the
> following output appears.

Ah, nice.

> repro-403     [004] ....1   101.270505: mm_compaction_isolate_folio: pfn=0x71e3a mode=0x0 flags=referenced|uptodate|mlocked
> repro-403     [004] ....1   101.270507: mm_compaction_isolate_folio: pfn=0x71e3b mode=0x0 flags=referenced|uptodate|mlocked
> repro-403     [004] ....1   101.270513: mm_compaction_isolate_folio: pfn=0x71e3c mode=0x0 flags=referenced|uptodate|mlocked
> repro-403     [004] ....1   101.270515: mm_compaction_isolate_folio: pfn=0x71e3d mode=0x0 flags=uptodate|mlocked
> repro-403     [004] ....1   101.270517: mm_compaction_isolate_folio: pfn=0x71e3e mode=0x0 flags=uptodate|mlocked
> repro-403     [004] ....1   101.270520: mm_compaction_isolate_folio: pfn=0x71e3f mode=0x0 flags=uptodate|mlocked
> 
> 
> Unfortunately, I recently found that there is still a bug in the
> fix patch. Setting mlocked in the mlock_folio function could happen
> even after the page is successfully isolated, so it still cannot
> prevent migration. Because of this, I need to think more about how
> to fix it.
> 
> Perhaps we should double-check whether the page is mlocked during
> the actual migration phase.

So IIUC the isolation+migration might be started between the folio is
allocated, and mlocked? In that case the check during migration could still
be racy, and if the page is isolated, it's already bad for the RT process.

So this would only be a short-term problem after the mlockall, but we don't
have a way for the RT process to know the moment it's all settled, right?
Probably the proper solution would be for mlock[all]() itself to wait for an
isolated page, and only continue once it knows it can't be isolated anymore.
This might howver would go against some of the folio batching optimizations?

> What do you think of this best-effort approach?
> 
> 
> Best regards,
> Wandun
> 
> 
> 
> 
> 
> The full reproducer is as below:
> 
> /* gcc repro.c -o repro -lpthread */
> 
> #define _GNU_SOURCE
> #include <fcntl.h>
> #include <pthread.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/mman.h>
> #include <unistd.h>
> 
> #define PAGE_SIZE       4096
> #define NR_PAGES        32
> #define FILE_SIZE       (2ULL * 1024 * 1024 * 1024)
> 
> static void *worker_fn(void *arg)
> {
> 	int fd = (long)arg;
> 	size_t len = (size_t)FILE_SIZE;
> 	char *p = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> 	if (p == MAP_FAILED)
> 		return NULL;
> 
> 	for (size_t off = 0; off + NR_PAGES * PAGE_SIZE <= len;
> 	     off += NR_PAGES * PAGE_SIZE) {
> 		for (int i = 0; i < NR_PAGES; i++)
> 			p[off + i * PAGE_SIZE] = 1;
> 		usleep(200);
> 	}
> 
> 	munmap(p, len);
> 	return NULL;
> }
> 
> static void *compact_fn(void *arg)
> {
> 	(void)arg;
> 	int fd = open("/proc/sys/vm/compact_memory", O_WRONLY);
> 	if (fd < 0)
> 		return NULL;
> 
> 	while (1) {
> 		if (write(fd, "1", 1) < 0) {}
> 		usleep(5000);
> 	}
> }
> 
> int main(void)
> {
> 	mlockall(MCL_CURRENT | MCL_FUTURE);
> 
> 	int fd = open("./repro_largefile.dat", O_RDWR | O_CREAT, 0600);
> 	if (fd < 0)
> 		return 1;
> 	unlink("./repro_largefile.dat");
> 	if (ftruncate(fd, (off_t)FILE_SIZE) < 0)
> 		return 1;
> 
> 	printf("repro_largefile: 1 worker, %d pages/batch, Ctrl-C to stop\n",
> 	       NR_PAGES);
> 
> 	pthread_t compact, worker;
> 	pthread_create(&compact, NULL, compact_fn, NULL);
> 	pthread_create(&worker, NULL, worker_fn, (void *)(long)fd);
> 
> 	pthread_join(worker, NULL);
> 	return 0;
> }
> 
>>> ---
>>>  mm/compaction.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>> index b776f35ad020..7e07b792bcb5 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -1116,7 +1116,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>>  		is_unevictable = folio_test_unevictable(folio);
>>>  
>>>  		/* Compaction might skip unevictable pages but CMA takes them */
>>> -		if (!(mode & ISOLATE_UNEVICTABLE) && is_unevictable)
>>> +		if (!(mode & ISOLATE_UNEVICTABLE) &&
>>> +		    (is_unevictable || folio_test_mlocked(folio)))
>>>  			goto isolate_fail_put;
>>>  
>>>  		/*
>> 
> 


  reply	other threads:[~2026-06-22  9:55 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-04  2:38 [RFC PATCH 0/3] mm/compaction: honour compact_unevictable_allowed in mlock race and alloc_contig path Wandun Chen
2026-06-04  2:38 ` [RFC PATCH 1/3] mm/compaction: skip isolate mlocked folios when compact_unevictable_allowed=0 Wandun Chen
2026-06-17 18:52   ` Vlastimil Babka (SUSE)
2026-06-18 11:43     ` Wandun
2026-06-22  9:55       ` Vlastimil Babka (SUSE) [this message]
2026-06-04  2:38 ` [RFC PATCH 2/3] mm/compaction: add per-folio isolation tracepoint Wandun Chen
2026-06-04  2:38 ` [RFC PATCH 3/3] mm/compaction: respect compact_unevictable_allowed in alloc_contig path Wandun Chen
2026-06-17 18:57   ` Vlastimil Babka (SUSE)
2026-06-18 11:47     ` Wandun
2026-06-15  8:28 ` [RFC PATCH 0/3] mm/compaction: honour compact_unevictable_allowed in mlock race and " Wandun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c8793c0f-7156-4cb7-9e6e-7909397e2fff@kernel.org \
    --to=vbabka@kernel.org \
    --cc=Alexander.Krabler@kuka.com \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=chenwandun1@gmail.com \
    --cc=clrkwllms@kernel.org \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jackmanb@google.com \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=ljs@kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox