From: Wandun <chenwandun1@gmail.com>
To: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org,
linux-rt-devel@lists.linux.dev
Cc: akpm@linux-foundation.org, surenb@google.com, mhocko@suse.com,
jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com,
rostedt@goodmis.org, mhiramat@kernel.org,
mathieu.desnoyers@efficios.com, david@kernel.org, ljs@kernel.org,
liam@infradead.org, rppt@kernel.org, bigeasy@linutronix.de,
clrkwllms@kernel.org, Alexander.Krabler@kuka.com,
Hugh Dickins <hughd@google.com>
Subject: Re: [RFC PATCH 1/3] mm/compaction: skip isolate mlocked folios when compact_unevictable_allowed=0
Date: Wed, 24 Jun 2026 19:08:06 +0800 [thread overview]
Message-ID: <ca1115c0-1509-453a-8235-08e381a3da6f@gmail.com> (raw)
In-Reply-To: <c8793c0f-7156-4cb7-9e6e-7909397e2fff@kernel.org>
On 6/22/26 17:55, Vlastimil Babka (SUSE) wrote:
> On 6/18/26 13:43, Wandun wrote:
>>
>>
>> On 6/18/26 02:52, Vlastimil Babka (SUSE) wrote:
>>> On 6/4/26 04:38, Wandun Chen wrote:
>>>> From: Wandun Chen <chenwandun@lixiang.com>
>>>>
>>>> compact_unevictable_allowed is default 0 under PREEMPT_RT,
>>>> isolate_migratepages_block() skips folios with PG_unevictable set.
>>>> However, mlock_folio() sets PG_mlocked immediately but defers
>>>> PG_unevictable to mlock_folio_batch(), result in a folio with
>>>> PG_mlocked=1 but PG_unevictable=0. Compaction will isolate such a
>>>> folio.
>>>>
>>>> Fix by checking folio_test_mlocked() together with the existing
>>>> folio_test_unevictable() check.
>>>>
>>>> A similar issue has been reported by Alexander Krabler on a 6.12-rt
>>>> aarch64 system. Vlastimil suggested to check the mlocked flag [1].
>>>>
>>>> Reported-by: Alexander Krabler <Alexander.Krabler@kuka.com>
>>>> Closes: https://lore.kernel.org/all/DU0PR01MB10385345F7153F334100981888259A@DU0PR01MB10385.eurprd01.prod.exchangelabs.com/
>>>> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
>>>> Signed-off-by: Wandun Chen <chenwandun@lixiang.com>
>>>> Link: https://lore.kernel.org/all/33275585-f2db-4779-89f0-3ae24b455a67@suse.cz/ [1]
>>>
>>> Well in that thread, Hugh doubted my suggestion and then it seems we didn't
>>> concluded anything. Did you actually in practice observe the issue that
>>> Alexander had, and that this patch fixed it, or is that theoretical?
>>>
>> Yes, I wrote a test case that can reproduce it in a few second.
>>
>> The test case contains 3 steps:
>> 1. mlockall
>> 2. mmap file(2GB) + trigger file write page fault;
>> 3. during step 1, trigger compact via /proc/sys/vm/compact_memory
>>
>>
>> My reproduction environment is qemu with 4GB ram, 8 core, aarch64,
>> preempt_rt and includes the tracepoint in patch 02.
>> After running the reproduction program for a few seconds, the
>> following output appears.
>
> Ah, nice.
>
>> repro-403 [004] ....1 101.270505: mm_compaction_isolate_folio: pfn=0x71e3a mode=0x0 flags=referenced|uptodate|mlocked
>> repro-403 [004] ....1 101.270507: mm_compaction_isolate_folio: pfn=0x71e3b mode=0x0 flags=referenced|uptodate|mlocked
>> repro-403 [004] ....1 101.270513: mm_compaction_isolate_folio: pfn=0x71e3c mode=0x0 flags=referenced|uptodate|mlocked
>> repro-403 [004] ....1 101.270515: mm_compaction_isolate_folio: pfn=0x71e3d mode=0x0 flags=uptodate|mlocked
>> repro-403 [004] ....1 101.270517: mm_compaction_isolate_folio: pfn=0x71e3e mode=0x0 flags=uptodate|mlocked
>> repro-403 [004] ....1 101.270520: mm_compaction_isolate_folio: pfn=0x71e3f mode=0x0 flags=uptodate|mlocked
>>
>>
>> Unfortunately, I recently found that there is still a bug in the
>> fix patch. Setting mlocked in the mlock_folio function could happen
>> even after the page is successfully isolated, so it still cannot
>> prevent migration. Because of this, I need to think more about how
>> to fix it.
>>
>> Perhaps we should double-check whether the page is mlocked during
>> the actual migration phase.
>
> So IIUC the isolation+migration might be started between the folio is
> allocated, and mlocked? In that case the check during migration could still
Yes, in that case it still be racy, it is not a good idea to check page flags.
> be racy, and if the page is isolated, it's already bad for the RT process.
IIUC, more accurately, the migration entry in the page talbe is real a bad for
RT process, because isolate page doesn't modify the page table, so memory
access continues as usual, therefore a new idea occur.
S1. In the mlock[all] syscall, if mlock_vma_pages_range hit a migration entry,
then, it should wait for the migration to complete.
S2. During the unmap phase of memory migration, prevent a page from being unmapped
if the page's associated vma is markd with VM_LOCKED, similar to how reclaim is
disabled for pages in a VM_LOCKED vma(try_to_unmap_one).
For a page handled during the mlock[all] syscall:
- if migration has been already finished, there is noting to do;
- if migration is in progress and the migration etnry is already filled, we
wait (S1)
- if the page is in-fight, going to be isolated/migrated, S2 prevents the unmap.
For a page handled during a page fault: VM_LOCKED is already set on the vma,
so S2 guarantees it will not be unmapped, hence no migration entry.
Thanks a lot for the detailed feedback, Vlastimil.
Best regards,
Wandun
>
> So this would only be a short-term problem after the mlockall, but we don't
> have a way for the RT process to know the moment it's all settled, right?
Yes, some pages may have been isolated and will do migration.
> Probably the proper solution would be for mlock[all]() itself to wait for an
> isolated page, and only continue once it knows it can't be isolated anymore.
> This might howver would go against some of the folio batching optimizations?
>
>> What do you think of this best-effort approach?
>>
>>
>> Best regards,
>> Wandun
>>
>>
>>
>>
>>
>> The full reproducer is as below:
>>
>> /* gcc repro.c -o repro -lpthread */
>>
>> #define _GNU_SOURCE
>> #include <fcntl.h>
>> #include <pthread.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <sys/mman.h>
>> #include <unistd.h>
>>
>> #define PAGE_SIZE 4096
>> #define NR_PAGES 32
>> #define FILE_SIZE (2ULL * 1024 * 1024 * 1024)
>>
>> static void *worker_fn(void *arg)
>> {
>> int fd = (long)arg;
>> size_t len = (size_t)FILE_SIZE;
>> char *p = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>> if (p == MAP_FAILED)
>> return NULL;
>>
>> for (size_t off = 0; off + NR_PAGES * PAGE_SIZE <= len;
>> off += NR_PAGES * PAGE_SIZE) {
>> for (int i = 0; i < NR_PAGES; i++)
>> p[off + i * PAGE_SIZE] = 1;
>> usleep(200);
>> }
>>
>> munmap(p, len);
>> return NULL;
>> }
>>
>> static void *compact_fn(void *arg)
>> {
>> (void)arg;
>> int fd = open("/proc/sys/vm/compact_memory", O_WRONLY);
>> if (fd < 0)
>> return NULL;
>>
>> while (1) {
>> if (write(fd, "1", 1) < 0) {}
>> usleep(5000);
>> }
>> }
>>
>> int main(void)
>> {
>> mlockall(MCL_CURRENT | MCL_FUTURE);
>>
>> int fd = open("./repro_largefile.dat", O_RDWR | O_CREAT, 0600);
>> if (fd < 0)
>> return 1;
>> unlink("./repro_largefile.dat");
>> if (ftruncate(fd, (off_t)FILE_SIZE) < 0)
>> return 1;
>>
>> printf("repro_largefile: 1 worker, %d pages/batch, Ctrl-C to stop\n",
>> NR_PAGES);
>>
>> pthread_t compact, worker;
>> pthread_create(&compact, NULL, compact_fn, NULL);
>> pthread_create(&worker, NULL, worker_fn, (void *)(long)fd);
>>
>> pthread_join(worker, NULL);
>> return 0;
>> }
>>
>>>> ---
>>>> mm/compaction.c | 3 ++-
>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>> index b776f35ad020..7e07b792bcb5 100644
>>>> --- a/mm/compaction.c
>>>> +++ b/mm/compaction.c
>>>> @@ -1116,7 +1116,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>>> is_unevictable = folio_test_unevictable(folio);
>>>>
>>>> /* Compaction might skip unevictable pages but CMA takes them */
>>>> - if (!(mode & ISOLATE_UNEVICTABLE) && is_unevictable)
>>>> + if (!(mode & ISOLATE_UNEVICTABLE) &&
>>>> + (is_unevictable || folio_test_mlocked(folio)))
>>>> goto isolate_fail_put;
>>>>
>>>> /*
>>>
>>
>
next prev parent reply other threads:[~2026-06-24 11:08 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-04 2:38 [RFC PATCH 0/3] mm/compaction: honour compact_unevictable_allowed in mlock race and alloc_contig path Wandun Chen
2026-06-04 2:38 ` [RFC PATCH 1/3] mm/compaction: skip isolate mlocked folios when compact_unevictable_allowed=0 Wandun Chen
2026-06-17 18:52 ` Vlastimil Babka (SUSE)
2026-06-18 11:43 ` Wandun
2026-06-22 9:55 ` Vlastimil Babka (SUSE)
2026-06-24 11:08 ` Wandun [this message]
2026-06-04 2:38 ` [RFC PATCH 2/3] mm/compaction: add per-folio isolation tracepoint Wandun Chen
2026-06-04 2:38 ` [RFC PATCH 3/3] mm/compaction: respect compact_unevictable_allowed in alloc_contig path Wandun Chen
2026-06-17 18:57 ` Vlastimil Babka (SUSE)
2026-06-18 11:47 ` Wandun
2026-06-15 8:28 ` [RFC PATCH 0/3] mm/compaction: honour compact_unevictable_allowed in mlock race and " Wandun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ca1115c0-1509-453a-8235-08e381a3da6f@gmail.com \
--to=chenwandun1@gmail.com \
--cc=Alexander.Krabler@kuka.com \
--cc=akpm@linux-foundation.org \
--cc=bigeasy@linutronix.de \
--cc=clrkwllms@kernel.org \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=jackmanb@google.com \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-rt-devel@lists.linux.dev \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=ljs@kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mhocko@suse.com \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox