* [syzbot] [arm?] WARNING in copy_highpage
@ 2025-10-01 21:48 syzbot
2025-10-03 17:05 ` Catalin Marinas
0 siblings, 1 reply; 6+ messages in thread
From: syzbot @ 2025-10-01 21:48 UTC (permalink / raw)
To: catalin.marinas, linux-arm-kernel, linux-kernel, syzkaller-bugs,
will
Hello,
syzbot found the following issue on:
HEAD commit: fec734e8d564 Merge tag 'riscv-for-linus-v6.17-rc8' of git:..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=12187d34580000
kernel config: https://syzkaller.appspot.com/x/.config?x=13bd892ec3b155a2
dashboard link: https://syzkaller.appspot.com/bug?extid=d1974fc28545a3e6218b
compiler: aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: arm64
Unfortunately, I don't have any reproducer for this issue yet.
Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/fa3fbcfdac58/non_bootable_disk-fec734e8.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/d7e18b408aea/vmlinux-fec734e8.xz
kernel image: https://storage.googleapis.com/syzbot-assets/9b7984f47117/Image-fec734e8.gz.xz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d1974fc28545a3e6218b@syzkaller.appspotmail.com
------------[ cut here ]------------
WARNING: CPU: 1 PID: 25189 at arch/arm64/mm/copypage.c:55 try_page_mte_tagging arch/arm64/include/asm/mte.h:93 [inline]
WARNING: CPU: 1 PID: 25189 at arch/arm64/mm/copypage.c:55 copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55
Modules linked in:
CPU: 1 UID: 0 PID: 25189 Comm: syz.2.7336 Not tainted syzkaller #0 PREEMPT
Hardware name: linux,dummy-virt (DT)
pstate: 00402009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55
lr : copy_highpage+0xb4/0x334 arch/arm64/mm/copypage.c:25
sp : ffff800088053940
x29: ffff800088053940 x28: ffffc1ffc0acf800 x27: ffff800088053b10
x26: ffffc1ffc0acf808 x25: ffffc1ffc037b1c0 x24: ffffc1ffc037b1c0
x23: ffffc1ffc0acf800 x22: ffffc1ffc0acf800 x21: fff000002b3e0000
x20: fff000000dec7000 x19: ffffc1ffc037b1c0 x18: 0000000000000000
x17: fff07ffffcffa000 x16: ffff800080008000 x15: 0000000000000001
x14: 0000000000000000 x13: 0000000000000003 x12: 000000000006d9ad
x11: 0000000000000000 x10: 0000000000000010 x9 : 0000000000000000
x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
x5 : ffff800088053b18 x4 : ffff80008032df94 x3 : 00000000ff000000
x2 : 01ffc00003000001 x1 : 01ffc00003000001 x0 : 01ffc00003000001
Call trace:
try_page_mte_tagging arch/arm64/include/asm/mte.h:93 [inline] (P)
copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55 (P)
copy_mc_highpage include/linux/highmem.h:383 [inline]
folio_mc_copy+0x44/0x6c mm/util.c:740
__migrate_folio.constprop.0+0xc4/0x23c mm/migrate.c:851
migrate_folio+0x1c/0x2c mm/migrate.c:882
move_to_new_folio+0x58/0x144 mm/migrate.c:1097
migrate_folio_move mm/migrate.c:1370 [inline]
migrate_folios_move mm/migrate.c:1719 [inline]
migrate_pages_batch+0xaf4/0x1024 mm/migrate.c:1966
migrate_pages_sync mm/migrate.c:2023 [inline]
migrate_pages+0xb9c/0xcdc mm/migrate.c:2105
do_mbind+0x20c/0x4a4 mm/mempolicy.c:1539
kernel_mbind mm/mempolicy.c:1682 [inline]
__do_sys_mbind mm/mempolicy.c:1756 [inline]
__se_sys_mbind mm/mempolicy.c:1752 [inline]
__arm64_sys_mbind+0xd0/0xd8 mm/mempolicy.c:1752
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x48/0x110 arch/arm64/kernel/syscall.c:49
el0_svc_common.constprop.0+0x40/0xe0 arch/arm64/kernel/syscall.c:132
do_el0_svc+0x1c/0x28 arch/arm64/kernel/syscall.c:151
el0_svc+0x34/0x10c arch/arm64/kernel/entry-common.c:879
el0t_64_sync_handler+0xa0/0xe4 arch/arm64/kernel/entry-common.c:898
el0t_64_sync+0x1a4/0x1a8 arch/arm64/kernel/entry.S:596
---[ end trace 0000000000000000 ]---
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title
If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)
If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report
If you want to undo deduplication, reply with:
#syz undup
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [syzbot] [arm?] WARNING in copy_highpage
2025-10-01 21:48 [syzbot] [arm?] WARNING in copy_highpage syzbot
@ 2025-10-03 17:05 ` Catalin Marinas
2025-10-06 7:55 ` David Hildenbrand
0 siblings, 1 reply; 6+ messages in thread
From: Catalin Marinas @ 2025-10-03 17:05 UTC (permalink / raw)
To: syzbot
Cc: linux-arm-kernel, linux-kernel, syzkaller-bugs, will,
David Hildenbrand
Thanks for the report (for some reason, outlook did not deliver this to
my inbox; Will pointed me at the message)
Adding David H as well, he may have some ideas. I haven't tried to
reproduce it yet.
On Wed, Oct 01, 2025 at 02:48:30PM -0700, syzbot wrote:
> syzbot found the following issue on:
>
> HEAD commit: fec734e8d564 Merge tag 'riscv-for-linus-v6.17-rc8' of git:..
So that's just before 6.17, not something that turned up during the
merging window.
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=12187d34580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=13bd892ec3b155a2
> dashboard link: https://syzkaller.appspot.com/bug?extid=d1974fc28545a3e6218b
> compiler: aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> userspace arch: arm64
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/fa3fbcfdac58/non_bootable_disk-fec734e8.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/d7e18b408aea/vmlinux-fec734e8.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/9b7984f47117/Image-fec734e8.gz.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+d1974fc28545a3e6218b@syzkaller.appspotmail.com
>
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 25189 at arch/arm64/mm/copypage.c:55 try_page_mte_tagging arch/arm64/include/asm/mte.h:93 [inline]
> WARNING: CPU: 1 PID: 25189 at arch/arm64/mm/copypage.c:55 copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55
This warning means that the destination page is already tagged
(PG_mte_tagged set) when it got to copy_page(). In general it is fine
as we copy into and override all the tags but my assumption until now
has been that such new pages are always untagged.
> Modules linked in:
> CPU: 1 UID: 0 PID: 25189 Comm: syz.2.7336 Not tainted syzkaller #0 PREEMPT
> Hardware name: linux,dummy-virt (DT)
> pstate: 00402009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55
> lr : copy_highpage+0xb4/0x334 arch/arm64/mm/copypage.c:25
> sp : ffff800088053940
> x29: ffff800088053940 x28: ffffc1ffc0acf800 x27: ffff800088053b10
> x26: ffffc1ffc0acf808 x25: ffffc1ffc037b1c0 x24: ffffc1ffc037b1c0
> x23: ffffc1ffc0acf800 x22: ffffc1ffc0acf800 x21: fff000002b3e0000
> x20: fff000000dec7000 x19: ffffc1ffc037b1c0 x18: 0000000000000000
> x17: fff07ffffcffa000 x16: ffff800080008000 x15: 0000000000000001
> x14: 0000000000000000 x13: 0000000000000003 x12: 000000000006d9ad
> x11: 0000000000000000 x10: 0000000000000010 x9 : 0000000000000000
> x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
> x5 : ffff800088053b18 x4 : ffff80008032df94 x3 : 00000000ff000000
> x2 : 01ffc00003000001 x1 : 01ffc00003000001 x0 : 01ffc00003000001
> Call trace:
> try_page_mte_tagging arch/arm64/include/asm/mte.h:93 [inline] (P)
> copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55 (P)
> copy_mc_highpage include/linux/highmem.h:383 [inline]
> folio_mc_copy+0x44/0x6c mm/util.c:740
> __migrate_folio.constprop.0+0xc4/0x23c mm/migrate.c:851
> migrate_folio+0x1c/0x2c mm/migrate.c:882
> move_to_new_folio+0x58/0x144 mm/migrate.c:1097
> migrate_folio_move mm/migrate.c:1370 [inline]
> migrate_folios_move mm/migrate.c:1719 [inline]
> migrate_pages_batch+0xaf4/0x1024 mm/migrate.c:1966
> migrate_pages_sync mm/migrate.c:2023 [inline]
> migrate_pages+0xb9c/0xcdc mm/migrate.c:2105
> do_mbind+0x20c/0x4a4 mm/mempolicy.c:1539
> kernel_mbind mm/mempolicy.c:1682 [inline]
> __do_sys_mbind mm/mempolicy.c:1756 [inline]
I don't think we ever stressed MTE with mbind before. I have a suspicion
this problem has been around for some time.
My reading of do_mbind() is that it ends up allocating pages for
migrating into via alloc_migration_target_by_mpol() ->
folio_alloc_mpol(). Pages returned should be untagged and uninitialised
unless the PG_* flags have not been cleared on a prior free. Or
migrate_pages_batch() somehow reuses some pages instead of reallocating.
--
Catalin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [syzbot] [arm?] WARNING in copy_highpage
2025-10-03 17:05 ` Catalin Marinas
@ 2025-10-06 7:55 ` David Hildenbrand
2025-10-06 9:38 ` David Hildenbrand
2025-10-06 13:17 ` Catalin Marinas
0 siblings, 2 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-10-06 7:55 UTC (permalink / raw)
To: Catalin Marinas, syzbot
Cc: linux-arm-kernel, linux-kernel, syzkaller-bugs, will
>> Modules linked in:
>> CPU: 1 UID: 0 PID: 25189 Comm: syz.2.7336 Not tainted syzkaller #0 PREEMPT
>> Hardware name: linux,dummy-virt (DT)
>> pstate: 00402009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> pc : copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55
>> lr : copy_highpage+0xb4/0x334 arch/arm64/mm/copypage.c:25
>> sp : ffff800088053940
>> x29: ffff800088053940 x28: ffffc1ffc0acf800 x27: ffff800088053b10
>> x26: ffffc1ffc0acf808 x25: ffffc1ffc037b1c0 x24: ffffc1ffc037b1c0
>> x23: ffffc1ffc0acf800 x22: ffffc1ffc0acf800 x21: fff000002b3e0000
>> x20: fff000000dec7000 x19: ffffc1ffc037b1c0 x18: 0000000000000000
>> x17: fff07ffffcffa000 x16: ffff800080008000 x15: 0000000000000001
>> x14: 0000000000000000 x13: 0000000000000003 x12: 000000000006d9ad
>> x11: 0000000000000000 x10: 0000000000000010 x9 : 0000000000000000
>> x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
>> x5 : ffff800088053b18 x4 : ffff80008032df94 x3 : 00000000ff000000
>> x2 : 01ffc00003000001 x1 : 01ffc00003000001 x0 : 01ffc00003000001
>> Call trace:
>> try_page_mte_tagging arch/arm64/include/asm/mte.h:93 [inline] (P)
>> copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55 (P)
>> copy_mc_highpage include/linux/highmem.h:383 [inline]
>> folio_mc_copy+0x44/0x6c mm/util.c:740
>> __migrate_folio.constprop.0+0xc4/0x23c mm/migrate.c:851
>> migrate_folio+0x1c/0x2c mm/migrate.c:882
>> move_to_new_folio+0x58/0x144 mm/migrate.c:1097
>> migrate_folio_move mm/migrate.c:1370 [inline]
>> migrate_folios_move mm/migrate.c:1719 [inline]
>> migrate_pages_batch+0xaf4/0x1024 mm/migrate.c:1966
>> migrate_pages_sync mm/migrate.c:2023 [inline]
>> migrate_pages+0xb9c/0xcdc mm/migrate.c:2105
>> do_mbind+0x20c/0x4a4 mm/mempolicy.c:1539
>> kernel_mbind mm/mempolicy.c:1682 [inline]
>> __do_sys_mbind mm/mempolicy.c:1756 [inline]
>
> I don't think we ever stressed MTE with mbind before. I have a suspicion
> this problem has been around for some time.
>
> My reading of do_mbind() is that it ends up allocating pages for
> migrating into via alloc_migration_target_by_mpol() ->
> folio_alloc_mpol(). Pages returned should be untagged and uninitialised
> unless the PG_* flags have not been cleared on a prior free. Or
> migrate_pages_batch() somehow reuses some pages instead of reallocating.
Staring at __migrate_folio(), I assume we can end up successfully
calling folio_mc_copy(), but then failing in __folio_migrate_mapping().
Seems to be as easy as failing the folio_ref_freeze() in
__folio_migrate_mapping().
We return -EAGAIN in that case, making the caller retry, stumbling into
an already-tagged page. (with the same source / destination parameters)
IIRC)
So likely this is simply us re-doing the copy after a migration failed
after the copy.
Could it happen that we are calling it with a different
source/destination combination the second time? I don't think so, but I
am not 100% sure.
The most reliable way would be to un-tag in case folio_mc_copy succeeded
but __folio_migrate_mapping() failed.
I'm also wondering whether we can simply perform the copy after the
__folio_migrate_mapping() call: the src folio is locked and unmapped,
nobody can really modify it. Same for the dst folio.
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [syzbot] [arm?] WARNING in copy_highpage
2025-10-06 7:55 ` David Hildenbrand
@ 2025-10-06 9:38 ` David Hildenbrand
2025-10-06 13:17 ` Catalin Marinas
1 sibling, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-10-06 9:38 UTC (permalink / raw)
To: Catalin Marinas, syzbot
Cc: linux-arm-kernel, linux-kernel, syzkaller-bugs, will, Kefeng Wang
>
> The most reliable way would be to un-tag in case folio_mc_copy succeeded
> but __folio_migrate_mapping() failed.
>
> I'm also wondering whether we can simply perform the copy after the
> __folio_migrate_mapping() call: the src folio is locked and unmapped,
> nobody can really modify it. Same for the dst folio.
Answering that myself: obviously we don't want to fail after migrating
the mapping, that is more expensive to recover from.
And I think that also explains how commit 060913999d7a ("mm: migrate:
support poisoned recover from migrate folio") likely introduced the
issue by moving the copy.
CCing Kefeng
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [syzbot] [arm?] WARNING in copy_highpage
2025-10-06 7:55 ` David Hildenbrand
2025-10-06 9:38 ` David Hildenbrand
@ 2025-10-06 13:17 ` Catalin Marinas
2025-10-06 13:25 ` David Hildenbrand
1 sibling, 1 reply; 6+ messages in thread
From: Catalin Marinas @ 2025-10-06 13:17 UTC (permalink / raw)
To: David Hildenbrand
Cc: syzbot, linux-arm-kernel, linux-kernel, syzkaller-bugs, will,
Kefeng Wang
On Mon, Oct 06, 2025 at 09:55:27AM +0200, David Hildenbrand wrote:
> > > Modules linked in:
> > > CPU: 1 UID: 0 PID: 25189 Comm: syz.2.7336 Not tainted syzkaller #0 PREEMPT
> > > Hardware name: linux,dummy-virt (DT)
> > > pstate: 00402009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > pc : copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55
> > > lr : copy_highpage+0xb4/0x334 arch/arm64/mm/copypage.c:25
> > > sp : ffff800088053940
> > > x29: ffff800088053940 x28: ffffc1ffc0acf800 x27: ffff800088053b10
> > > x26: ffffc1ffc0acf808 x25: ffffc1ffc037b1c0 x24: ffffc1ffc037b1c0
> > > x23: ffffc1ffc0acf800 x22: ffffc1ffc0acf800 x21: fff000002b3e0000
> > > x20: fff000000dec7000 x19: ffffc1ffc037b1c0 x18: 0000000000000000
> > > x17: fff07ffffcffa000 x16: ffff800080008000 x15: 0000000000000001
> > > x14: 0000000000000000 x13: 0000000000000003 x12: 000000000006d9ad
> > > x11: 0000000000000000 x10: 0000000000000010 x9 : 0000000000000000
> > > x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
> > > x5 : ffff800088053b18 x4 : ffff80008032df94 x3 : 00000000ff000000
> > > x2 : 01ffc00003000001 x1 : 01ffc00003000001 x0 : 01ffc00003000001
> > > Call trace:
> > > try_page_mte_tagging arch/arm64/include/asm/mte.h:93 [inline] (P)
> > > copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55 (P)
> > > copy_mc_highpage include/linux/highmem.h:383 [inline]
> > > folio_mc_copy+0x44/0x6c mm/util.c:740
> > > __migrate_folio.constprop.0+0xc4/0x23c mm/migrate.c:851
> > > migrate_folio+0x1c/0x2c mm/migrate.c:882
> > > move_to_new_folio+0x58/0x144 mm/migrate.c:1097
> > > migrate_folio_move mm/migrate.c:1370 [inline]
> > > migrate_folios_move mm/migrate.c:1719 [inline]
> > > migrate_pages_batch+0xaf4/0x1024 mm/migrate.c:1966
> > > migrate_pages_sync mm/migrate.c:2023 [inline]
> > > migrate_pages+0xb9c/0xcdc mm/migrate.c:2105
> > > do_mbind+0x20c/0x4a4 mm/mempolicy.c:1539
> > > kernel_mbind mm/mempolicy.c:1682 [inline]
> > > __do_sys_mbind mm/mempolicy.c:1756 [inline]
> >
> > I don't think we ever stressed MTE with mbind before. I have a suspicion
> > this problem has been around for some time.
> >
> > My reading of do_mbind() is that it ends up allocating pages for
> > migrating into via alloc_migration_target_by_mpol() ->
> > folio_alloc_mpol(). Pages returned should be untagged and uninitialised
> > unless the PG_* flags have not been cleared on a prior free. Or
> > migrate_pages_batch() somehow reuses some pages instead of reallocating.
>
> Staring at __migrate_folio(), I assume we can end up successfully calling
> folio_mc_copy(), but then failing in __folio_migrate_mapping().
>
> Seems to be as easy as failing the folio_ref_freeze() in
> __folio_migrate_mapping().
>
> We return -EAGAIN in that case, making the caller retry, stumbling into an
> already-tagged page. (with the same source / destination parameters) IIRC)
>
> So likely this is simply us re-doing the copy after a migration failed after
> the copy.
>
> Could it happen that we are calling it with a different source/destination
> combination the second time? I don't think so, but I am not 100% sure.
Thanks David. I can now see how it would retry on the same pages without
reallocating. At least we know it's not causing any side-effects, only
messing up the MTE safety warnings.
> The most reliable way would be to un-tag in case folio_mc_copy succeeded but
> __folio_migrate_mapping() failed.
Clearing an MTE specific flag in the core code doesn't look great. Also
going for some generic mask like PAGE_FLAGS_CHECK_AT_PREP may have
side-effects as we don't know where the page is coming from (we have
those get_new_folio()/put_new_folio() arguments passed on by higher up
callers).
I'm tempted to just drop the warning in the arm64 copy_highpage(),
replace it with a comment about migration retrying on a potentially
tagged page. It will have to override the tags each time (as it
currently does but also warns).
--
Catalin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [syzbot] [arm?] WARNING in copy_highpage
2025-10-06 13:17 ` Catalin Marinas
@ 2025-10-06 13:25 ` David Hildenbrand
0 siblings, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-10-06 13:25 UTC (permalink / raw)
To: Catalin Marinas
Cc: syzbot, linux-arm-kernel, linux-kernel, syzkaller-bugs, will,
Kefeng Wang
On 06.10.25 15:17, Catalin Marinas wrote:
> On Mon, Oct 06, 2025 at 09:55:27AM +0200, David Hildenbrand wrote:
>>>> Modules linked in:
>>>> CPU: 1 UID: 0 PID: 25189 Comm: syz.2.7336 Not tainted syzkaller #0 PREEMPT
>>>> Hardware name: linux,dummy-virt (DT)
>>>> pstate: 00402009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>> pc : copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55
>>>> lr : copy_highpage+0xb4/0x334 arch/arm64/mm/copypage.c:25
>>>> sp : ffff800088053940
>>>> x29: ffff800088053940 x28: ffffc1ffc0acf800 x27: ffff800088053b10
>>>> x26: ffffc1ffc0acf808 x25: ffffc1ffc037b1c0 x24: ffffc1ffc037b1c0
>>>> x23: ffffc1ffc0acf800 x22: ffffc1ffc0acf800 x21: fff000002b3e0000
>>>> x20: fff000000dec7000 x19: ffffc1ffc037b1c0 x18: 0000000000000000
>>>> x17: fff07ffffcffa000 x16: ffff800080008000 x15: 0000000000000001
>>>> x14: 0000000000000000 x13: 0000000000000003 x12: 000000000006d9ad
>>>> x11: 0000000000000000 x10: 0000000000000010 x9 : 0000000000000000
>>>> x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
>>>> x5 : ffff800088053b18 x4 : ffff80008032df94 x3 : 00000000ff000000
>>>> x2 : 01ffc00003000001 x1 : 01ffc00003000001 x0 : 01ffc00003000001
>>>> Call trace:
>>>> try_page_mte_tagging arch/arm64/include/asm/mte.h:93 [inline] (P)
>>>> copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55 (P)
>>>> copy_mc_highpage include/linux/highmem.h:383 [inline]
>>>> folio_mc_copy+0x44/0x6c mm/util.c:740
>>>> __migrate_folio.constprop.0+0xc4/0x23c mm/migrate.c:851
>>>> migrate_folio+0x1c/0x2c mm/migrate.c:882
>>>> move_to_new_folio+0x58/0x144 mm/migrate.c:1097
>>>> migrate_folio_move mm/migrate.c:1370 [inline]
>>>> migrate_folios_move mm/migrate.c:1719 [inline]
>>>> migrate_pages_batch+0xaf4/0x1024 mm/migrate.c:1966
>>>> migrate_pages_sync mm/migrate.c:2023 [inline]
>>>> migrate_pages+0xb9c/0xcdc mm/migrate.c:2105
>>>> do_mbind+0x20c/0x4a4 mm/mempolicy.c:1539
>>>> kernel_mbind mm/mempolicy.c:1682 [inline]
>>>> __do_sys_mbind mm/mempolicy.c:1756 [inline]
>>>
>>> I don't think we ever stressed MTE with mbind before. I have a suspicion
>>> this problem has been around for some time.
>>>
>>> My reading of do_mbind() is that it ends up allocating pages for
>>> migrating into via alloc_migration_target_by_mpol() ->
>>> folio_alloc_mpol(). Pages returned should be untagged and uninitialised
>>> unless the PG_* flags have not been cleared on a prior free. Or
>>> migrate_pages_batch() somehow reuses some pages instead of reallocating.
>>
>> Staring at __migrate_folio(), I assume we can end up successfully calling
>> folio_mc_copy(), but then failing in __folio_migrate_mapping().
>>
>> Seems to be as easy as failing the folio_ref_freeze() in
>> __folio_migrate_mapping().
>>
>> We return -EAGAIN in that case, making the caller retry, stumbling into an
>> already-tagged page. (with the same source / destination parameters) IIRC)
>>
>> So likely this is simply us re-doing the copy after a migration failed after
>> the copy.
>>
>> Could it happen that we are calling it with a different source/destination
>> combination the second time? I don't think so, but I am not 100% sure.
>
> Thanks David. I can now see how it would retry on the same pages without
> reallocating. At least we know it's not causing any side-effects, only
> messing up the MTE safety warnings.
As long as the folio is not getting reused elsewhere, yes.
I haven't fully understood yet if there could be cases where we use the
folio for another source. But I think it's not trivially possible,
because I think we allocate dst folios based on source-folio properties
(order, node, zone, etc).
>
>> The most reliable way would be to un-tag in case folio_mc_copy succeeded but
>> __folio_migrate_mapping() failed.
>
> Clearing an MTE specific flag in the core code doesn't look great. Also
> going for some generic mask like PAGE_FLAGS_CHECK_AT_PREP may have
> side-effects as we don't know where the page is coming from (we have
> those get_new_folio()/put_new_folio() arguments passed on by higher up
> callers).
As an alternative, I would probably have done something like providing a
simple folio_mc_copy_abort().
>
> I'm tempted to just drop the warning in the arm64 copy_highpage(),
> replace it with a comment about migration retrying on a potentially
> tagged page. It will have to override the tags each time (as it
> currently does but also warns).
Works for me. Maybe we could warn if the tag would change, because I
think after we unmapped the folio during migration, the tag can no
longer change.
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-10-06 13:26 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-01 21:48 [syzbot] [arm?] WARNING in copy_highpage syzbot
2025-10-03 17:05 ` Catalin Marinas
2025-10-06 7:55 ` David Hildenbrand
2025-10-06 9:38 ` David Hildenbrand
2025-10-06 13:17 ` Catalin Marinas
2025-10-06 13:25 ` David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).