From: Lance Yang <lance.yang@linux.dev>
To: dev.jain@arm.com
Cc: Liam.Howlett@oracle.com, akpm@linux-foundation.org,
anshuman.khandual@arm.com, baohua@kernel.org, david@kernel.org,
harry.yoo@oracle.com, jannh@google.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
lorenzo.stoakes@oracle.com, riel@surriel.com,
ryan.roberts@arm.com, stable@kernel.org, vbabka@kernel.org,
Lance Yang <lance.yang@linux.dev>
Subject: Re: [PATCH v3] mm/rmap: fix incorrect pte restoration for lazyfree folios
Date: Tue, 10 Mar 2026 20:13:04 +0800 [thread overview]
Message-ID: <20260310121304.17173-1-lance.yang@linux.dev> (raw)
In-Reply-To: <20260303061528.2429162-1-dev.jain@arm.com>
On Tue, 3 Mar 2026 11:45:28 +0530, Dev Jain wrote:
>We batch unmap anonymous lazyfree folios by folio_unmap_pte_batch.
>If the batch has a mix of writable and non-writable bits, we may end up
>setting the entire batch writable. Fix this by respecting writable bit
>during batching.
>Although on a successful unmap of a lazyfree folio, the soft-dirty bit is
>lost, preserve it on pte restoration by respecting the bit during batching,
>to make the fix consistent w.r.t both writable bit and soft-dirty bit.
>
>I was able to write the below reproducer and crash the kernel.
>Explanation of reproducer (set 64K mTHP to always):
>
>Fault in a 64K large folio. Split the VMA at mid-point with MADV_DONTFORK.
>fork() - parent points to the folio with 8 writable ptes and 8 non-writable
>ptes. Merge the VMAs with MADV_DOFORK so that folio_unmap_pte_batch() can
>determine all the 16 ptes as a batch. Do MADV_FREE on the range to mark
>the folio as lazyfree. Write to the memory to dirty the pte, eventually
>rmap will dirty the folio. Then trigger reclaim, we will hit the pte
>restoration path, and the kernel will crash with the following trace:
>
>[ 21.134473] kernel BUG at mm/page_table_check.c:118!
>[ 21.134497] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
>[ 21.135917] Modules linked in:
>[ 21.136085] CPU: 1 UID: 0 PID: 1735 Comm: dup-lazyfree Not tainted 7.0.0-rc1-00116-g018018a17770 #1028 PREEMPT
>[ 21.136858] Hardware name: linux,dummy-virt (DT)
>[ 21.137019] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
>[ 21.137308] pc : page_table_check_set+0x28c/0x2a8
>[ 21.137607] lr : page_table_check_set+0x134/0x2a8
>[ 21.137885] sp : ffff80008a3b3340
>[ 21.138124] x29: ffff80008a3b3340 x28: fffffdffc3d14400 x27: ffffd1a55e03d000
>[ 21.138623] x26: 0040000000000040 x25: ffffd1a55f7dd000 x24: 0000000000000001
>[ 21.139045] x23: 0000000000000001 x22: 0000000000000001 x21: ffffd1a55f217f30
>[ 21.139629] x20: 0000000000134521 x19: 0000000000134519 x18: 005c43e000040000
>[ 21.140027] x17: 0001400000000000 x16: 0001700000000000 x15: 000000000000ffff
>[ 21.140578] x14: 000000000000000c x13: 005c006000000000 x12: 0000000000000020
>[ 21.140828] x11: 0000000000000000 x10: 005c000000000000 x9 : ffffd1a55c079ee0
>[ 21.141077] x8 : 0000000000000001 x7 : 005c03e000040000 x6 : 000000004000ffff
>[ 21.141490] x5 : ffff00017fffce00 x4 : 0000000000000001 x3 : 0000000000000002
>[ 21.141741] x2 : 0000000000134510 x1 : 0000000000000000 x0 : ffff0000c08228c0
>[ 21.141991] Call trace:
>[ 21.142093] page_table_check_set+0x28c/0x2a8 (P)
>[ 21.142265] __page_table_check_ptes_set+0x144/0x1e8
>[ 21.142441] __set_ptes_anysz.constprop.0+0x160/0x1a8
>[ 21.142766] contpte_set_ptes+0xe8/0x140
>[ 21.142907] try_to_unmap_one+0x10c4/0x10d0
>[ 21.143177] rmap_walk_anon+0x100/0x250
>[ 21.143315] try_to_unmap+0xa0/0xc8
>[ 21.143441] shrink_folio_list+0x59c/0x18a8
>[ 21.143759] shrink_lruvec+0x664/0xbf0
>[ 21.144043] shrink_node+0x218/0x878
>[ 21.144285] __node_reclaim.constprop.0+0x98/0x338
>[ 21.144763] user_proactive_reclaim+0x2a4/0x340
>[ 21.145056] reclaim_store+0x3c/0x60
>[ 21.145216] dev_attr_store+0x20/0x40
>[ 21.145585] sysfs_kf_write+0x84/0xa8
>[ 21.145835] kernfs_fop_write_iter+0x130/0x1c8
>[ 21.145994] vfs_write+0x2b8/0x368
>[ 21.146119] ksys_write+0x70/0x110
>[ 21.146240] __arm64_sys_write+0x24/0x38
>[ 21.146380] invoke_syscall+0x50/0x120
>[ 21.146513] el0_svc_common.constprop.0+0x48/0xf8
>[ 21.146679] do_el0_svc+0x28/0x40
>[ 21.146798] el0_svc+0x34/0x110
>[ 21.146926] el0t_64_sync_handler+0xa0/0xe8
>[ 21.147074] el0t_64_sync+0x198/0x1a0
>[ 21.147225] Code: f9400441 b4fff241 17ffff94 d4210000 (d4210000)
>[ 21.147440] ---[ end trace 0000000000000000 ]---
>
>
>#define _GNU_SOURCE
>#include <stdio.h>
>#include <unistd.h>
>#include <stdlib.h>
>#include <sys/mman.h>
>#include <string.h>
>#include <sys/wait.h>
>#include <sched.h>
>#include <fcntl.h>
>
>void write_to_reclaim() {
> const char *path = "/sys/devices/system/node/node0/reclaim";
I wasn't able to get this reproducer working with node reclaim, but using
memcg v1 memory.force_empty worked fine for me.
[...]
>
>Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation")
>Cc: stable <stable@kernel.org>
>Signed-off-by: Dev Jain <dev.jain@arm.com>
>---
Thanks!
Tested-by: Lance Yang <lance.yang@linux.dev>
prev parent reply other threads:[~2026-03-10 12:13 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-03 6:15 [PATCH v3] mm/rmap: fix incorrect pte restoration for lazyfree folios Dev Jain
2026-03-03 8:50 ` David Hildenbrand (Arm)
2026-03-03 9:54 ` Lorenzo Stoakes
2026-03-03 10:22 ` Dev Jain
2026-03-03 9:57 ` Barry Song
2026-03-03 10:32 ` Dev Jain
2026-03-03 12:17 ` Wei Yang
2026-03-03 12:25 ` Dev Jain
2026-03-03 12:50 ` Wei Yang
2026-03-10 12:13 ` Lance Yang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260310121304.17173-1-lance.yang@linux.dev \
--to=lance.yang@linux.dev \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=baohua@kernel.org \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=harry.yoo@oracle.com \
--cc=jannh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=riel@surriel.com \
--cc=ryan.roberts@arm.com \
--cc=stable@kernel.org \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.