[syzbot] [mm?] kernel BUG in try_to_unmap

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [syzbot] [mm?] kernel BUG in try_to_unmap_one (2)
@ 2025-06-05  5:38 syzbot
  2025-06-05  6:11 ` David Hildenbrand
  0 siblings, 1 reply; 10+ messages in thread
From: syzbot @ 2025-06-05  5:38 UTC (permalink / raw)
  To: Liam.Howlett, akpm, david, harry.yoo, linux-kernel, linux-mm,
	lorenzo.stoakes, riel, syzkaller-bugs, vbabka

Hello,

syzbot found the following issue on:

HEAD commit:    d7fa1af5b33e Merge branch 'for-next/core' into for-kernelci
git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
console output: https://syzkaller.appspot.com/x/log.txt?x=1757300c580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=89c13de706fbf07a
dashboard link: https://syzkaller.appspot.com/bug?extid=3b220254df55d8ca8a61
compiler:       Debian clang version 20.1.6 (++20250514063057+1e4d39e07757-1~exp1~20250514183223.118), Debian LLD 20.1.6
userspace arch: arm64
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=150f7ed4580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=13745970580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/da97ad659b2c/disk-d7fa1af5.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/659e123552a8/vmlinux-d7fa1af5.xz
kernel image: https://storage.googleapis.com/syzbot-assets/6ec5dbf4643e/Image-d7fa1af5.gz.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+3b220254df55d8ca8a61@syzkaller.appspotmail.com

head: 05ffc00000000309 fffffdffc6628001 0080000000000000 0000000100000000
head: ffffffff00000000 0000000000000024 00000000ffffffff 0000000000000200
page dumped because: VM_BUG_ON_FOLIO(!pvmw.pte)
------------[ cut here ]------------
kernel BUG at mm/rmap.c:1955!
Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
Modules linked in:
CPU: 1 UID: 0 PID: 9503 Comm: syz-executor315 Not tainted 6.15.0-rc7-syzkaller-gd7fa1af5b33e #0 PREEMPT 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : try_to_unmap_one+0x2c54/0x2d40 mm/rmap.c:1955
lr : try_to_unmap_one+0x2c54/0x2d40 mm/rmap.c:1955
sp : ffff80009e906380
x29: ffff80009e9065e0 x28: 0000000000000038
 x27: ffff0000c9dbee80
x26: fffffdffc6628018 x25: fffffdffc6628030 x24: dfff800000000000
x23: ffff0000d84efdc0 x22: ffff0000d84efde0 x21: 0000000000000001
x20: fffffdffc6628000 x19: 05ffc00000020849 x18: 00000000ffffffff
x17: 0000000000000000 x16: ffff80008adbe9e4 x15: 0000000000000001
x14: 1fffe0003386f2e2 x13: 0000000000000000 x12: 0000000000000000
x11: ffff60003386f2e3 x10: 0000000000ff0100 x9 : 664e624a89365e00
x8 : 664e624a89365e00 x7 : 0000000000000001 x6 : 0000000000000001
x5 : ffff80009e905a98 x4 : ffff80008f415ba0 x3 : ffff8000807b4b68
x2 : 0000000000000001 x1 : 0000000100000001 x0 : 000000000000002f
Call trace:
 try_to_unmap_one+0x2c54/0x2d40 mm/rmap.c:1955 (P)
 rmap_walk_anon+0x47c/0x640 mm/rmap.c:2834
 rmap_walk+0x128/0x1e8 mm/rmap.c:2939
 try_to_unmap+0xc4/0x120 mm/rmap.c:2263
 unmap_poisoned_folio+0x278/0x4a4 mm/memory-failure.c:1610
 shrink_folio_list+0x608/0x4410 mm/vmscan.c:1131
 reclaim_folio_list+0xdc/0x5d0 mm/vmscan.c:2217
 reclaim_pages+0x420/0x544 mm/vmscan.c:2254
 madvise_cold_or_pageout_pte_range+0x1d38/0x20d4 mm/madvise.c:434
 walk_pmd_range mm/pagewalk.c:130 [inline]
 walk_pud_range mm/pagewalk.c:226 [inline]
 walk_p4d_range mm/pagewalk.c:264 [inline]
 walk_pgd_range+0xb4c/0x16bc mm/pagewalk.c:305
 __walk_page_range+0x13c/0x654 mm/pagewalk.c:412
 walk_page_range_mm+0x4fc/0x7dc mm/pagewalk.c:505
 walk_page_range+0x80/0x98 mm/pagewalk.c:584
 madvise_pageout_page_range mm/madvise.c:617 [inline]
 madvise_pageout mm/madvise.c:644 [inline]
 madvise_vma_behavior mm/madvise.c:1269 [inline]
 madvise_walk_vmas mm/madvise.c:1530 [inline]
 madvise_do_behavior+0x1940/0x2908 mm/madvise.c:1695
 do_madvise mm/madvise.c:1782 [inline]
 __do_sys_madvise mm/madvise.c:1790 [inline]
 __se_sys_madvise mm/madvise.c:1788 [inline]
 __arm64_sys_madvise+0x10c/0x154 mm/madvise.c:1788
 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
 invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
 el0_svc+0x58/0x17c arch/arm64/kernel/entry-common.c:767
 el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:786
 el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600
Code: f9404be0 b0051fc1 910c8021 97fdefe4 (d4210000) 
---[ end trace 0000000000000000 ]---


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [syzbot] [mm?] kernel BUG in try_to_unmap_one (2)
  2025-06-05  5:38 [syzbot] [mm?] kernel BUG in try_to_unmap_one (2) syzbot
@ 2025-06-05  6:11 ` David Hildenbrand
  2025-06-05  6:27   ` David Hildenbrand
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2025-06-05  6:11 UTC (permalink / raw)
  To: syzbot, Liam.Howlett, akpm, harry.yoo, linux-kernel, linux-mm,
	lorenzo.stoakes, riel, syzkaller-bugs, vbabka, Jens Axboe,
	Catalin Marinas

On 05.06.25 07:38, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    d7fa1af5b33e Merge branch 'for-next/core' into for-kernelci

Hmmm, another very odd page-table mapping related problem on that tree 
found on arm64 only:

https://lore.kernel.org/all/f031d35b-13e3-4dec-a89c-f221331be735@kernel.dk/T/#mef6b1f00bd47724e3ba756d9c898128ab010ed34


Are we maybe corrupting ptes/pfns etc?

> git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
> console output: https://syzkaller.appspot.com/x/log.txt?x=1757300c580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=89c13de706fbf07a
> dashboard link: https://syzkaller.appspot.com/bug?extid=3b220254df55d8ca8a61
> compiler:       Debian clang version 20.1.6 (++20250514063057+1e4d39e07757-1~exp1~20250514183223.118), Debian LLD 20.1.6
> userspace arch: arm64
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=150f7ed4580000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=13745970580000
> 
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/da97ad659b2c/disk-d7fa1af5.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/659e123552a8/vmlinux-d7fa1af5.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/6ec5dbf4643e/Image-d7fa1af5.gz.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+3b220254df55d8ca8a61@syzkaller.appspotmail.com
> 
> head: 05ffc00000000309 fffffdffc6628001 0080000000000000 0000000100000000
> head: ffffffff00000000 0000000000000024 00000000ffffffff 0000000000000200
> page dumped because: VM_BUG_ON_FOLIO(!pvmw.pte)
> ------------[ cut here ]------------
> kernel BUG at mm/rmap.c:1955!
> Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
> Modules linked in:
> CPU: 1 UID: 0 PID: 9503 Comm: syz-executor315 Not tainted 6.15.0-rc7-syzkaller-gd7fa1af5b33e #0 PREEMPT
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
> pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : try_to_unmap_one+0x2c54/0x2d40 mm/rmap.c:1955
> lr : try_to_unmap_one+0x2c54/0x2d40 mm/rmap.c:1955
> sp : ffff80009e906380
> x29: ffff80009e9065e0 x28: 0000000000000038
>   x27: ffff0000c9dbee80
> x26: fffffdffc6628018 x25: fffffdffc6628030 x24: dfff800000000000
> x23: ffff0000d84efdc0 x22: ffff0000d84efde0 x21: 0000000000000001
> x20: fffffdffc6628000 x19: 05ffc00000020849 x18: 00000000ffffffff
> x17: 0000000000000000 x16: ffff80008adbe9e4 x15: 0000000000000001
> x14: 1fffe0003386f2e2 x13: 0000000000000000 x12: 0000000000000000
> x11: ffff60003386f2e3 x10: 0000000000ff0100 x9 : 664e624a89365e00
> x8 : 664e624a89365e00 x7 : 0000000000000001 x6 : 0000000000000001
> x5 : ffff80009e905a98 x4 : ffff80008f415ba0 x3 : ffff8000807b4b68
> x2 : 0000000000000001 x1 : 0000000100000001 x0 : 000000000000002f
> Call trace:
>   try_to_unmap_one+0x2c54/0x2d40 mm/rmap.c:1955 (P)
>   rmap_walk_anon+0x47c/0x640 mm/rmap.c:2834
>   rmap_walk+0x128/0x1e8 mm/rmap.c:2939
>   try_to_unmap+0xc4/0x120 mm/rmap.c:2263
>   unmap_poisoned_folio+0x278/0x4a4 mm/memory-failure.c:1610
>   shrink_folio_list+0x608/0x4410 mm/vmscan.c:1131
>   reclaim_folio_list+0xdc/0x5d0 mm/vmscan.c:2217
>   reclaim_pages+0x420/0x544 mm/vmscan.c:2254
>   madvise_cold_or_pageout_pte_range+0x1d38/0x20d4 mm/madvise.c:434
>   walk_pmd_range mm/pagewalk.c:130 [inline]
>   walk_pud_range mm/pagewalk.c:226 [inline]
>   walk_p4d_range mm/pagewalk.c:264 [inline]
>   walk_pgd_range+0xb4c/0x16bc mm/pagewalk.c:305
>   __walk_page_range+0x13c/0x654 mm/pagewalk.c:412
>   walk_page_range_mm+0x4fc/0x7dc mm/pagewalk.c:505
>   walk_page_range+0x80/0x98 mm/pagewalk.c:584
>   madvise_pageout_page_range mm/madvise.c:617 [inline]
>   madvise_pageout mm/madvise.c:644 [inline]
>   madvise_vma_behavior mm/madvise.c:1269 [inline]
>   madvise_walk_vmas mm/madvise.c:1530 [inline]
>   madvise_do_behavior+0x1940/0x2908 mm/madvise.c:1695
>   do_madvise mm/madvise.c:1782 [inline]
>   __do_sys_madvise mm/madvise.c:1790 [inline]
>   __se_sys_madvise mm/madvise.c:1788 [inline]
>   __arm64_sys_madvise+0x10c/0x154 mm/madvise.c:1788
>   __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
>   invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
>   el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
>   do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
>   el0_svc+0x58/0x17c arch/arm64/kernel/entry-common.c:767
>   el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:786
>   el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600
> Code: f9404be0 b0051fc1 910c8021 97fdefe4 (d4210000)
> ---[ end trace 0000000000000000 ]---
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> 
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
> 
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.
> 
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
> 
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
> 
> If you want to undo deduplication, reply with:
> #syz undup
> 


-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [syzbot] [mm?] kernel BUG in try_to_unmap_one (2)
  2025-06-05  6:11 ` David Hildenbrand
@ 2025-06-05  6:27   ` David Hildenbrand
  2025-06-05  6:37     ` David Hildenbrand
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2025-06-05  6:27 UTC (permalink / raw)
  To: syzbot, Liam.Howlett, akpm, harry.yoo, linux-kernel, linux-mm,
	lorenzo.stoakes, riel, syzkaller-bugs, vbabka, Jens Axboe,
	Catalin Marinas

On 05.06.25 08:11, David Hildenbrand wrote:
> On 05.06.25 07:38, syzbot wrote:
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit:    d7fa1af5b33e Merge branch 'for-next/core' into for-kernelci
> 
> Hmmm, another very odd page-table mapping related problem on that tree
> found on arm64 only:

In this particular reproducer we seem to be having MADV_HUGEPAGE and 
io_uring_setup() be racing with MADV_HWPOISON, MADV_PAGEOUT and 
io_uring_register(IORING_REGISTER_BUFFERS).

I assume the issue is related to MADV_HWPOISON, MADV_PAGEOUT and 
io_uring_register racing, only. I suspect MADV_HWPOISON is trying to 
split a THP, while MADV_PAGEOUT tries paging it out.

IORING_REGISTER_BUFFERS ends up in 
io_sqe_buffers_register->io_sqe_buffer_register where we GUP-fast and 
try coalescing buffers.

And something about THPs is not particularly happy :)

-- 
Cheers,

David / dhildenb

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [syzbot] [mm?] kernel BUG in try_to_unmap_one (2)
  2025-06-05  6:27   ` David Hildenbrand
@ 2025-06-05  6:37     ` David Hildenbrand
  2025-06-05  7:18       ` Jinjiang Tu
  2025-06-05  7:37       ` Jinjiang Tu
  0 siblings, 2 replies; 10+ messages in thread
From: David Hildenbrand @ 2025-06-05  6:37 UTC (permalink / raw)
  To: syzbot, Liam.Howlett, akpm, harry.yoo, linux-kernel, linux-mm,
	lorenzo.stoakes, riel, syzkaller-bugs, vbabka, Jens Axboe,
	Catalin Marinas, Jinjiang Tu

On 05.06.25 08:27, David Hildenbrand wrote:
> On 05.06.25 08:11, David Hildenbrand wrote:
>> On 05.06.25 07:38, syzbot wrote:
>>> Hello,
>>>
>>> syzbot found the following issue on:
>>>
>>> HEAD commit:    d7fa1af5b33e Merge branch 'for-next/core' into for-kernelci
>>
>> Hmmm, another very odd page-table mapping related problem on that tree
>> found on arm64 only:
> 
> In this particular reproducer we seem to be having MADV_HUGEPAGE and
> io_uring_setup() be racing with MADV_HWPOISON, MADV_PAGEOUT and
> io_uring_register(IORING_REGISTER_BUFFERS).
> 
> I assume the issue is related to MADV_HWPOISON, MADV_PAGEOUT and
> io_uring_register racing, only. I suspect MADV_HWPOISON is trying to
> split a THP, while MADV_PAGEOUT tries paging it out.
> 
> IORING_REGISTER_BUFFERS ends up in
> io_sqe_buffers_register->io_sqe_buffer_register where we GUP-fast and
> try coalescing buffers.
> 
> And something about THPs is not particularly happy :)
> 

Not sure if realted to io_uring.

unmap_poisoned_folio() calls try_to_unmap() without TTU_SPLIT_HUGE_PMD.

When called from memory_failure(), we make sure to never call it on a large folio: WARN_ON(folio_test_large(folio));

However, from shrink_folio_list() we might call unmap_poisoned_folio() on a large folio, which doesn't work if it is still PMD-mapped. Maybe passing TTU_SPLIT_HUGE_PMD would fix it.


Likely the relevant commit is:

commit 1b0449544c6482179ac84530b61fc192a6527bfd
Author: Jinjiang Tu <tujinjiang@huawei.com>
Date:   Tue Mar 18 16:39:39 2025 +0800

     mm/vmscan: don't try to reclaim hwpoison folio
     
     Syzkaller reports a bug as follows:
     
     Injecting memory failure for pfn 0x18b00e at process virtual address 0x20ffd000
     Memory failure: 0x18b00e: dirty swapcache page still referenced by 2 users
     Memory failure: 0x18b00e: recovery action for dirty swapcache page: Failed
     page: refcount:2 mapcount:0 mapping:0000000000000000 index:0x20ffd pfn:0x18b00e
     memcg:ffff0000dd6d9000
     anon flags: 0x5ffffe00482011(locked|dirty|arch_1|swapbacked|hwpoison|node=0|zone=2|lastcpupid=0xfffff)
     raw: 005ffffe00482011 dead000000000100 dead000000000122 ffff0000e232a7c9
     raw: 0000000000020ffd 0000000000000000 00000002ffffffff ffff0000dd6d9000
     page dumped because: VM_BUG_ON_FOLIO(!folio_test_uptodate(folio))

CCing Jinjiang Tu

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [syzbot] [mm?] kernel BUG in try_to_unmap_one (2)
  2025-06-05  6:37     ` David Hildenbrand
@ 2025-06-05  7:18       ` Jinjiang Tu
  2025-06-06  7:56         ` David Hildenbrand
  2025-06-05  7:37       ` Jinjiang Tu
  1 sibling, 1 reply; 10+ messages in thread
From: Jinjiang Tu @ 2025-06-05  7:18 UTC (permalink / raw)
  To: David Hildenbrand, syzbot, Liam.Howlett, akpm, harry.yoo,
	linux-kernel, linux-mm, lorenzo.stoakes, riel, syzkaller-bugs,
	vbabka, Jens Axboe, Catalin Marinas


在 2025/6/5 14:37, David Hildenbrand 写道:
> On 05.06.25 08:27, David Hildenbrand wrote:
>> On 05.06.25 08:11, David Hildenbrand wrote:
>>> On 05.06.25 07:38, syzbot wrote:
>>>> Hello,
>>>>
>>>> syzbot found the following issue on:
>>>>
>>>> HEAD commit:    d7fa1af5b33e Merge branch 'for-next/core' into 
>>>> for-kernelci
>>>
>>> Hmmm, another very odd page-table mapping related problem on that tree
>>> found on arm64 only:
>>
>> In this particular reproducer we seem to be having MADV_HUGEPAGE and
>> io_uring_setup() be racing with MADV_HWPOISON, MADV_PAGEOUT and
>> io_uring_register(IORING_REGISTER_BUFFERS).
>>
>> I assume the issue is related to MADV_HWPOISON, MADV_PAGEOUT and
>> io_uring_register racing, only. I suspect MADV_HWPOISON is trying to
>> split a THP, while MADV_PAGEOUT tries paging it out.
>>
>> IORING_REGISTER_BUFFERS ends up in
>> io_sqe_buffers_register->io_sqe_buffer_register where we GUP-fast and
>> try coalescing buffers.
>>
>> And something about THPs is not particularly happy :)
>>
>
> Not sure if realted to io_uring.
>
> unmap_poisoned_folio() calls try_to_unmap() without TTU_SPLIT_HUGE_PMD.
>
> When called from memory_failure(), we make sure to never call it on a 
> large folio: WARN_ON(folio_test_large(folio));
>
> However, from shrink_folio_list() we might call unmap_poisoned_folio() 
> on a large folio, which doesn't work if it is still PMD-mapped. Maybe 
> passing TTU_SPLIT_HUGE_PMD would fix it.
>
TTU_SPLIT_HUGE_PMD only converts the PMD-mapped THP to PTE-mapped THP, and may trigger the below WARN_ON_ONCE in try_to_unmap_one.

	if (PageHWPoison(subpage) && (flags & TTU_HWPOISON)) {
		...
	} else if (likely(pte_present(pteval)) && pte_unused(pteval) &&
		!userfaultfd_armed(vma)) {
		 ....
	} else if (folio_test_anon(folio)) {
		swp_entry_t entry = page_swap_entry(subpage);
		pte_t swp_pte;
		/*
		 * Store the swap location in the pte.
		 * See handle_pte_fault() ...
		*/
		if (unlikely(folio_test_swapbacked(folio) !=
			folio_test_swapcache(folio))) {
			WARN_ON_ONCE(1);          // here. if the subpage isn't hwposioned, and we hasn't call add_to_swap() for the THP
			goto walk_abort;
		 }

If we want to unmap in shrink_folio_list, we have to try_to_split_thp_page() like memory_failure(). But it't too complicated, maybe just skip the
hwpoisoned folio is enough? If the folio is accessed again, memory_failure will be trigerred again and kill the accessing process since the folio
has be hwpoisoned.

>
> Likely the relevant commit is:
>
> commit 1b0449544c6482179ac84530b61fc192a6527bfd
> Author: Jinjiang Tu <tujinjiang@huawei.com>
> Date:   Tue Mar 18 16:39:39 2025 +0800

Yes, It is caused by this commit.

>
>     mm/vmscan: don't try to reclaim hwpoison folio
>         Syzkaller reports a bug as follows:
>         Injecting memory failure for pfn 0x18b00e at process virtual 
> address 0x20ffd000
>     Memory failure: 0x18b00e: dirty swapcache page still referenced by 
> 2 users
>     Memory failure: 0x18b00e: recovery action for dirty swapcache 
> page: Failed
>     page: refcount:2 mapcount:0 mapping:0000000000000000 index:0x20ffd 
> pfn:0x18b00e
>     memcg:ffff0000dd6d9000
>     anon flags: 
> 0x5ffffe00482011(locked|dirty|arch_1|swapbacked|hwpoison|node=0|zone=2|lastcpupid=0xfffff)
>     raw: 005ffffe00482011 dead000000000100 dead000000000122 
> ffff0000e232a7c9
>     raw: 0000000000020ffd 0000000000000000 00000002ffffffff 
> ffff0000dd6d9000
>     page dumped because: VM_BUG_ON_FOLIO(!folio_test_uptodate(folio))
>
> CCing Jinjiang Tu
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [syzbot] [mm?] kernel BUG in try_to_unmap_one (2)
  2025-06-05  6:37     ` David Hildenbrand
  2025-06-05  7:18       ` Jinjiang Tu
@ 2025-06-05  7:37       ` Jinjiang Tu
  2025-06-06  7:40         ` David Hildenbrand
  1 sibling, 1 reply; 10+ messages in thread
From: Jinjiang Tu @ 2025-06-05  7:37 UTC (permalink / raw)
  To: David Hildenbrand, syzbot, Liam.Howlett, akpm, harry.yoo,
	linux-kernel, linux-mm, lorenzo.stoakes, riel, syzkaller-bugs,
	vbabka, Jens Axboe, Catalin Marinas


在 2025/6/5 14:37, David Hildenbrand 写道:
> On 05.06.25 08:27, David Hildenbrand wrote:
>> On 05.06.25 08:11, David Hildenbrand wrote:
>>> On 05.06.25 07:38, syzbot wrote:
>>>> Hello,
>>>>
>>>> syzbot found the following issue on:
>>>>
>>>> HEAD commit:    d7fa1af5b33e Merge branch 'for-next/core' into 
>>>> for-kernelci
>>>
>>> Hmmm, another very odd page-table mapping related problem on that tree
>>> found on arm64 only:
>>
>> In this particular reproducer we seem to be having MADV_HUGEPAGE and
>> io_uring_setup() be racing with MADV_HWPOISON, MADV_PAGEOUT and
>> io_uring_register(IORING_REGISTER_BUFFERS).
>>
>> I assume the issue is related to MADV_HWPOISON, MADV_PAGEOUT and
>> io_uring_register racing, only. I suspect MADV_HWPOISON is trying to
>> split a THP, while MADV_PAGEOUT tries paging it out.
>>
>> IORING_REGISTER_BUFFERS ends up in
>> io_sqe_buffers_register->io_sqe_buffer_register where we GUP-fast and
>> try coalescing buffers.
>>
>> And something about THPs is not particularly happy :)
>>
>
> Not sure if realted to io_uring.
>
> unmap_poisoned_folio() calls try_to_unmap() without TTU_SPLIT_HUGE_PMD.
>
> When called from memory_failure(), we make sure to never call it on a 
> large folio: WARN_ON(folio_test_large(folio));
>
> However, from shrink_folio_list() we might call unmap_poisoned_folio() 
> on a large folio, which doesn't work if it is still PMD-mapped. Maybe 
> passing TTU_SPLIT_HUGE_PMD would fix it.
>
>
> Likely the relevant commit is:
>
> commit 1b0449544c6482179ac84530b61fc192a6527bfd
> Author: Jinjiang Tu <tujinjiang@huawei.com>
> Date:   Tue Mar 18 16:39:39 2025 +0800
>
>     mm/vmscan: don't try to reclaim hwpoison folio
>         Syzkaller reports a bug as follows:
>         Injecting memory failure for pfn 0x18b00e at process virtual 
> address 0x20ffd000
>     Memory failure: 0x18b00e: dirty swapcache page still referenced by 
> 2 users
>     Memory failure: 0x18b00e: recovery action for dirty swapcache 
> page: Failed
>     page: refcount:2 mapcount:0 mapping:0000000000000000 index:0x20ffd 
> pfn:0x18b00e
>     memcg:ffff0000dd6d9000
>     anon flags: 
> 0x5ffffe00482011(locked|dirty|arch_1|swapbacked|hwpoison|node=0|zone=2|lastcpupid=0xfffff)
>     raw: 005ffffe00482011 dead000000000100 dead000000000122 
> ffff0000e232a7c9
>     raw: 0000000000020ffd 0000000000000000 00000002ffffffff 
> ffff0000dd6d9000
>     page dumped because: VM_BUG_ON_FOLIO(!folio_test_uptodate(folio))
>
> CCing Jinjiang Tu

By the way, unmap_poisoned_folio() is called in do_migrate_range() too. the folio may be in lru and is a large folio.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [syzbot] [mm?] kernel BUG in try_to_unmap_one (2)
  2025-06-05  7:37       ` Jinjiang Tu
@ 2025-06-06  7:40         ` David Hildenbrand
  0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2025-06-06  7:40 UTC (permalink / raw)
  To: Jinjiang Tu, syzbot, Liam.Howlett, akpm, harry.yoo, linux-kernel,
	linux-mm, lorenzo.stoakes, riel, syzkaller-bugs, vbabka,
	Jens Axboe, Catalin Marinas

On 05.06.25 09:37, Jinjiang Tu wrote:
> 
> 在 2025/6/5 14:37, David Hildenbrand 写道:
>> On 05.06.25 08:27, David Hildenbrand wrote:
>>> On 05.06.25 08:11, David Hildenbrand wrote:
>>>> On 05.06.25 07:38, syzbot wrote:
>>>>> Hello,
>>>>>
>>>>> syzbot found the following issue on:
>>>>>
>>>>> HEAD commit:    d7fa1af5b33e Merge branch 'for-next/core' into
>>>>> for-kernelci
>>>>
>>>> Hmmm, another very odd page-table mapping related problem on that tree
>>>> found on arm64 only:
>>>
>>> In this particular reproducer we seem to be having MADV_HUGEPAGE and
>>> io_uring_setup() be racing with MADV_HWPOISON, MADV_PAGEOUT and
>>> io_uring_register(IORING_REGISTER_BUFFERS).
>>>
>>> I assume the issue is related to MADV_HWPOISON, MADV_PAGEOUT and
>>> io_uring_register racing, only. I suspect MADV_HWPOISON is trying to
>>> split a THP, while MADV_PAGEOUT tries paging it out.
>>>
>>> IORING_REGISTER_BUFFERS ends up in
>>> io_sqe_buffers_register->io_sqe_buffer_register where we GUP-fast and
>>> try coalescing buffers.
>>>
>>> And something about THPs is not particularly happy :)
>>>
>>
>> Not sure if realted to io_uring.
>>
>> unmap_poisoned_folio() calls try_to_unmap() without TTU_SPLIT_HUGE_PMD.
>>
>> When called from memory_failure(), we make sure to never call it on a
>> large folio: WARN_ON(folio_test_large(folio));
>>
>> However, from shrink_folio_list() we might call unmap_poisoned_folio()
>> on a large folio, which doesn't work if it is still PMD-mapped. Maybe
>> passing TTU_SPLIT_HUGE_PMD would fix it.
>>
>>
>> Likely the relevant commit is:
>>
>> commit 1b0449544c6482179ac84530b61fc192a6527bfd
>> Author: Jinjiang Tu <tujinjiang@huawei.com>
>> Date:   Tue Mar 18 16:39:39 2025 +0800
>>
>>      mm/vmscan: don't try to reclaim hwpoison folio
>>          Syzkaller reports a bug as follows:
>>          Injecting memory failure for pfn 0x18b00e at process virtual
>> address 0x20ffd000
>>      Memory failure: 0x18b00e: dirty swapcache page still referenced by
>> 2 users
>>      Memory failure: 0x18b00e: recovery action for dirty swapcache
>> page: Failed
>>      page: refcount:2 mapcount:0 mapping:0000000000000000 index:0x20ffd
>> pfn:0x18b00e
>>      memcg:ffff0000dd6d9000
>>      anon flags:
>> 0x5ffffe00482011(locked|dirty|arch_1|swapbacked|hwpoison|node=0|zone=2|lastcpupid=0xfffff)
>>      raw: 005ffffe00482011 dead000000000100 dead000000000122
>> ffff0000e232a7c9
>>      raw: 0000000000020ffd 0000000000000000 00000002ffffffff
>> ffff0000dd6d9000
>>      page dumped because: VM_BUG_ON_FOLIO(!folio_test_uptodate(folio))
>>
>> CCing Jinjiang Tu
> 
> By the way, unmap_poisoned_folio() is called in do_migrate_range() too. the folio may be in lru and is a large folio.

Indeed.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [syzbot] [mm?] kernel BUG in try_to_unmap_one (2)
  2025-06-05  7:18       ` Jinjiang Tu
@ 2025-06-06  7:56         ` David Hildenbrand
  2025-06-07  1:29           ` Jinjiang Tu
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2025-06-06  7:56 UTC (permalink / raw)
  To: Jinjiang Tu, syzbot, Liam.Howlett, akpm, harry.yoo, linux-kernel,
	linux-mm, lorenzo.stoakes, riel, syzkaller-bugs, vbabka,
	Jens Axboe, Catalin Marinas

On 05.06.25 09:18, Jinjiang Tu wrote:
> 
> 在 2025/6/5 14:37, David Hildenbrand 写道:
>> On 05.06.25 08:27, David Hildenbrand wrote:
>>> On 05.06.25 08:11, David Hildenbrand wrote:
>>>> On 05.06.25 07:38, syzbot wrote:
>>>>> Hello,
>>>>>
>>>>> syzbot found the following issue on:
>>>>>
>>>>> HEAD commit:    d7fa1af5b33e Merge branch 'for-next/core' into
>>>>> for-kernelci
>>>>
>>>> Hmmm, another very odd page-table mapping related problem on that tree
>>>> found on arm64 only:
>>>
>>> In this particular reproducer we seem to be having MADV_HUGEPAGE and
>>> io_uring_setup() be racing with MADV_HWPOISON, MADV_PAGEOUT and
>>> io_uring_register(IORING_REGISTER_BUFFERS).
>>>
>>> I assume the issue is related to MADV_HWPOISON, MADV_PAGEOUT and
>>> io_uring_register racing, only. I suspect MADV_HWPOISON is trying to
>>> split a THP, while MADV_PAGEOUT tries paging it out.
>>>
>>> IORING_REGISTER_BUFFERS ends up in
>>> io_sqe_buffers_register->io_sqe_buffer_register where we GUP-fast and
>>> try coalescing buffers.
>>>
>>> And something about THPs is not particularly happy :)
>>>
>>
>> Not sure if realted to io_uring.
>>
>> unmap_poisoned_folio() calls try_to_unmap() without TTU_SPLIT_HUGE_PMD.
>>
>> When called from memory_failure(), we make sure to never call it on a
>> large folio: WARN_ON(folio_test_large(folio));
>>
>> However, from shrink_folio_list() we might call unmap_poisoned_folio()
>> on a large folio, which doesn't work if it is still PMD-mapped. Maybe
>> passing TTU_SPLIT_HUGE_PMD would fix it.
>>
> TTU_SPLIT_HUGE_PMD only converts the PMD-mapped THP to PTE-mapped THP, and may trigger the below WARN_ON_ONCE in try_to_unmap_one.
> 
> 	if (PageHWPoison(subpage) && (flags & TTU_HWPOISON)) {
> 		...
> 	} else if (likely(pte_present(pteval)) && pte_unused(pteval) &&
> 		!userfaultfd_armed(vma)) {
> 		 ....
> 	} else if (folio_test_anon(folio)) {
> 		swp_entry_t entry = page_swap_entry(subpage);
> 		pte_t swp_pte;
> 		/*
> 		 * Store the swap location in the pte.
> 		 * See handle_pte_fault() ...
> 		*/
> 		if (unlikely(folio_test_swapbacked(folio) !=
> 			folio_test_swapcache(folio))) {
> 			WARN_ON_ONCE(1);          // here. if the subpage isn't hwposioned, and we hasn't call add_to_swap() for the THP
> 			goto walk_abort;
> 		 }

This makes me wonder if we should start splitting up try_to_unmap(), to handle the individual cases more cleanly at some point ...

Maybe for now something like:

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b91a33fb6c694..995486a3ff4d2 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1566,6 +1566,14 @@ int unmap_poisoned_folio(struct folio *folio, unsigned long pfn, bool must_kill)
         enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_HWPOISON;
         struct address_space *mapping;
  
+       /*
+        * try_to_unmap() cannot deal with some subpages of an anon folio
+        * not being hwpoisoned: we cannot unmap them without swap.
+        */
+       if (folio_test_large(folio) && !folio_test_hugetlb(folio) &&
+           folio_test_anon(folio) && !folio_test_swapcache(folio))
+               return -EBUSY;
+
         if (folio_test_swapcache(folio)) {
                 pr_err("%#lx: keeping poisoned page in swap cache\n", pfn);
                 ttu &= ~TTU_HWPOISON;



> 
> If we want to unmap in shrink_folio_list, we have to try_to_split_thp_page() like memory_failure(). But it't too complicated, maybe just skip the
> hwpoisoned folio is enough? If the folio is accessed again, memory_failure will be trigerred again and kill the accessing process since the folio
> has be hwpoisoned.


Maybe we should try splitting in there? But staring at shrink_folio_list(), not that easy.

We could return -E2BIG and let the caller try splitting, to then retry.

-- 
Cheers,

David / dhildenb



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [syzbot] [mm?] kernel BUG in try_to_unmap_one (2)
  2025-06-06  7:56         ` David Hildenbrand
@ 2025-06-07  1:29           ` Jinjiang Tu
  2025-06-09  8:35             ` Miaohe Lin
  0 siblings, 1 reply; 10+ messages in thread
From: Jinjiang Tu @ 2025-06-07  1:29 UTC (permalink / raw)
  To: David Hildenbrand, syzbot, Liam.Howlett, akpm, harry.yoo,
	linux-kernel, linux-mm, lorenzo.stoakes, riel, syzkaller-bugs,
	vbabka, Jens Axboe, Catalin Marinas, Miaohe Lin


在 2025/6/6 15:56, David Hildenbrand 写道:
> On 05.06.25 09:18, Jinjiang Tu wrote:
>>
>> 在 2025/6/5 14:37, David Hildenbrand 写道:
>>> On 05.06.25 08:27, David Hildenbrand wrote:
>>>> On 05.06.25 08:11, David Hildenbrand wrote:
>>>>> On 05.06.25 07:38, syzbot wrote:
>>>>>> Hello,
>>>>>>
>>>>>> syzbot found the following issue on:
>>>>>>
>>>>>> HEAD commit:    d7fa1af5b33e Merge branch 'for-next/core' into
>>>>>> for-kernelci
>>>>>
>>>>> Hmmm, another very odd page-table mapping related problem on that 
>>>>> tree
>>>>> found on arm64 only:
>>>>
>>>> In this particular reproducer we seem to be having MADV_HUGEPAGE and
>>>> io_uring_setup() be racing with MADV_HWPOISON, MADV_PAGEOUT and
>>>> io_uring_register(IORING_REGISTER_BUFFERS).
>>>>
>>>> I assume the issue is related to MADV_HWPOISON, MADV_PAGEOUT and
>>>> io_uring_register racing, only. I suspect MADV_HWPOISON is trying to
>>>> split a THP, while MADV_PAGEOUT tries paging it out.
>>>>
>>>> IORING_REGISTER_BUFFERS ends up in
>>>> io_sqe_buffers_register->io_sqe_buffer_register where we GUP-fast and
>>>> try coalescing buffers.
>>>>
>>>> And something about THPs is not particularly happy :)
>>>>
>>>
>>> Not sure if realted to io_uring.
>>>
>>> unmap_poisoned_folio() calls try_to_unmap() without TTU_SPLIT_HUGE_PMD.
>>>
>>> When called from memory_failure(), we make sure to never call it on a
>>> large folio: WARN_ON(folio_test_large(folio));
>>>
>>> However, from shrink_folio_list() we might call unmap_poisoned_folio()
>>> on a large folio, which doesn't work if it is still PMD-mapped. Maybe
>>> passing TTU_SPLIT_HUGE_PMD would fix it.
>>>
>> TTU_SPLIT_HUGE_PMD only converts the PMD-mapped THP to PTE-mapped 
>> THP, and may trigger the below WARN_ON_ONCE in try_to_unmap_one.
>>
>>     if (PageHWPoison(subpage) && (flags & TTU_HWPOISON)) {
>>         ...
>>     } else if (likely(pte_present(pteval)) && pte_unused(pteval) &&
>>         !userfaultfd_armed(vma)) {
>>          ....
>>     } else if (folio_test_anon(folio)) {
>>         swp_entry_t entry = page_swap_entry(subpage);
>>         pte_t swp_pte;
>>         /*
>>          * Store the swap location in the pte.
>>          * See handle_pte_fault() ...
>>         */
>>         if (unlikely(folio_test_swapbacked(folio) !=
>>             folio_test_swapcache(folio))) {
>>             WARN_ON_ONCE(1);          // here. if the subpage isn't 
>> hwposioned, and we hasn't call add_to_swap() for the THP
>>             goto walk_abort;
>>          }
>
> This makes me wonder if we should start splitting up try_to_unmap(), 
> to handle the individual cases more cleanly at some point ...
>
> Maybe for now something like:
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index b91a33fb6c694..995486a3ff4d2 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1566,6 +1566,14 @@ int unmap_poisoned_folio(struct folio *folio, 
> unsigned long pfn, bool must_kill)
>         enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_HWPOISON;
>         struct address_space *mapping;
>
> +       /*
> +        * try_to_unmap() cannot deal with some subpages of an anon folio
> +        * not being hwpoisoned: we cannot unmap them without swap.
> +        */
> +       if (folio_test_large(folio) && !folio_test_hugetlb(folio) &&
> +           folio_test_anon(folio) && !folio_test_swapcache(folio))
> +               return -EBUSY;
> +

If the THP is in swapcache, we also have to split PMD-mapped to PTE-mapped first.

> if (folio_test_swapcache(folio)) {
>                 pr_err("%#lx: keeping poisoned page in swap cache\n", 
> pfn);
>                 ttu &= ~TTU_HWPOISON;
>
>
>
>>
>> If we want to unmap in shrink_folio_list, we have to 
>> try_to_split_thp_page() like memory_failure(). But it't too 
>> complicated, maybe just skip the
>> hwpoisoned folio is enough? If the folio is accessed again, 
>> memory_failure will be trigerred again and kill the accessing process 
>> since the folio
>> has be hwpoisoned.
>
>
> Maybe we should try splitting in there? But staring at 
> shrink_folio_list(), not that easy.
>
> We could return -E2BIG and let the caller try splitting, to then retry.

Since UCE is rare in real world, and could race with any subsystem, which is more race. Taking too much time to handle UCE in other subsystem is
meaningless and complicated. Just skipping is enough. memory_failure() will handle it if the UCE is trigerred again.

CC Miaohe Lin



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [syzbot] [mm?] kernel BUG in try_to_unmap_one (2)
  2025-06-07  1:29           ` Jinjiang Tu
@ 2025-06-09  8:35             ` Miaohe Lin
  0 siblings, 0 replies; 10+ messages in thread
From: Miaohe Lin @ 2025-06-09  8:35 UTC (permalink / raw)
  To: Jinjiang Tu, David Hildenbrand
  Cc: syzbot, Liam.Howlett, akpm, harry.yoo, linux-kernel, linux-mm,
	lorenzo.stoakes, riel, syzkaller-bugs, vbabka, Jens Axboe,
	Catalin Marinas

On 2025/6/7 9:29, Jinjiang Tu wrote:
> 
> 在 2025/6/6 15:56, David Hildenbrand 写道:
>> On 05.06.25 09:18, Jinjiang Tu wrote:
>>>
>>> 在 2025/6/5 14:37, David Hildenbrand 写道:
>>>> On 05.06.25 08:27, David Hildenbrand wrote:
>>>>> On 05.06.25 08:11, David Hildenbrand wrote:
>>>>>> On 05.06.25 07:38, syzbot wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> syzbot found the following issue on:
>>>>>>>
>>>>>>> HEAD commit:    d7fa1af5b33e Merge branch 'for-next/core' into
>>>>>>> for-kernelci
>>>>>>
>>>>>> Hmmm, another very odd page-table mapping related problem on that tree
>>>>>> found on arm64 only:
>>>>>
>>>>> In this particular reproducer we seem to be having MADV_HUGEPAGE and
>>>>> io_uring_setup() be racing with MADV_HWPOISON, MADV_PAGEOUT and
>>>>> io_uring_register(IORING_REGISTER_BUFFERS).
>>>>>
>>>>> I assume the issue is related to MADV_HWPOISON, MADV_PAGEOUT and
>>>>> io_uring_register racing, only. I suspect MADV_HWPOISON is trying to
>>>>> split a THP, while MADV_PAGEOUT tries paging it out.
>>>>>
>>>>> IORING_REGISTER_BUFFERS ends up in
>>>>> io_sqe_buffers_register->io_sqe_buffer_register where we GUP-fast and
>>>>> try coalescing buffers.
>>>>>
>>>>> And something about THPs is not particularly happy :)
>>>>>
>>>>
>>>> Not sure if realted to io_uring.
>>>>
>>>> unmap_poisoned_folio() calls try_to_unmap() without TTU_SPLIT_HUGE_PMD.
>>>>
>>>> When called from memory_failure(), we make sure to never call it on a
>>>> large folio: WARN_ON(folio_test_large(folio));
>>>>
>>>> However, from shrink_folio_list() we might call unmap_poisoned_folio()
>>>> on a large folio, which doesn't work if it is still PMD-mapped. Maybe
>>>> passing TTU_SPLIT_HUGE_PMD would fix it.
>>>>
>>> TTU_SPLIT_HUGE_PMD only converts the PMD-mapped THP to PTE-mapped THP, and may trigger the below WARN_ON_ONCE in try_to_unmap_one.
>>>
>>>     if (PageHWPoison(subpage) && (flags & TTU_HWPOISON)) {
>>>         ...
>>>     } else if (likely(pte_present(pteval)) && pte_unused(pteval) &&
>>>         !userfaultfd_armed(vma)) {
>>>          ....
>>>     } else if (folio_test_anon(folio)) {
>>>         swp_entry_t entry = page_swap_entry(subpage);
>>>         pte_t swp_pte;
>>>         /*
>>>          * Store the swap location in the pte.
>>>          * See handle_pte_fault() ...
>>>         */
>>>         if (unlikely(folio_test_swapbacked(folio) !=
>>>             folio_test_swapcache(folio))) {
>>>             WARN_ON_ONCE(1);          // here. if the subpage isn't hwposioned, and we hasn't call add_to_swap() for the THP
>>>             goto walk_abort;
>>>          }
>>
>> This makes me wonder if we should start splitting up try_to_unmap(), to handle the individual cases more cleanly at some point ...
>>
>> Maybe for now something like:
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index b91a33fb6c694..995486a3ff4d2 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1566,6 +1566,14 @@ int unmap_poisoned_folio(struct folio *folio, unsigned long pfn, bool must_kill)
>>         enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_HWPOISON;
>>         struct address_space *mapping;
>>
>> +       /*
>> +        * try_to_unmap() cannot deal with some subpages of an anon folio
>> +        * not being hwpoisoned: we cannot unmap them without swap.
>> +        */
>> +       if (folio_test_large(folio) && !folio_test_hugetlb(folio) &&
>> +           folio_test_anon(folio) && !folio_test_swapcache(folio))
>> +               return -EBUSY;
>> +
> 
> If the THP is in swapcache, we also have to split PMD-mapped to PTE-mapped first.
> 
>> if (folio_test_swapcache(folio)) {
>>                 pr_err("%#lx: keeping poisoned page in swap cache\n", pfn);
>>                 ttu &= ~TTU_HWPOISON;
>>
>>
>>
>>>
>>> If we want to unmap in shrink_folio_list, we have to try_to_split_thp_page() like memory_failure(). But it't too complicated, maybe just skip the
>>> hwpoisoned folio is enough? If the folio is accessed again, memory_failure will be trigerred again and kill the accessing process since the folio
>>> has be hwpoisoned.
>>
>>
>> Maybe we should try splitting in there? But staring at shrink_folio_list(), not that easy.
>>
>> We could return -E2BIG and let the caller try splitting, to then retry.
> 
> Since UCE is rare in real world, and could race with any subsystem, which is more race. Taking too much time to handle UCE in other subsystem is
> meaningless and complicated. Just skipping is enough. memory_failure() will handle it if the UCE is trigerred again.

IMHO, unmap_poisoned_folio() is designed for basic pages only, not for large folios. And above race should be really rare in real world,
so it might be better to ignore large folios in unmap_poisoned_folio() and memory_failure will handle all of these when UCE is re-triggered.

Thanks both.
.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-06-09  8:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-05  5:38 [syzbot] [mm?] kernel BUG in try_to_unmap_one (2) syzbot
2025-06-05  6:11 ` David Hildenbrand
2025-06-05  6:27   ` David Hildenbrand
2025-06-05  6:37     ` David Hildenbrand
2025-06-05  7:18       ` Jinjiang Tu
2025-06-06  7:56         ` David Hildenbrand
2025-06-07  1:29           ` Jinjiang Tu
2025-06-09  8:35             ` Miaohe Lin
2025-06-05  7:37       ` Jinjiang Tu
2025-06-06  7:40         ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).