[syzbot] [arm?] WARNING in copy

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [syzbot] [arm?] WARNING in copy_highpage
@ 2025-10-01 21:48 syzbot
  2025-10-03 17:05 ` Catalin Marinas
  0 siblings, 1 reply; 6+ messages in thread
From: syzbot @ 2025-10-01 21:48 UTC (permalink / raw)
  To: catalin.marinas, linux-arm-kernel, linux-kernel, syzkaller-bugs,
	will

Hello,

syzbot found the following issue on:

HEAD commit:    fec734e8d564 Merge tag 'riscv-for-linus-v6.17-rc8' of git:..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=12187d34580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=13bd892ec3b155a2
dashboard link: https://syzkaller.appspot.com/bug?extid=d1974fc28545a3e6218b
compiler:       aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: arm64

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/fa3fbcfdac58/non_bootable_disk-fec734e8.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/d7e18b408aea/vmlinux-fec734e8.xz
kernel image: https://storage.googleapis.com/syzbot-assets/9b7984f47117/Image-fec734e8.gz.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d1974fc28545a3e6218b@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 1 PID: 25189 at arch/arm64/mm/copypage.c:55 try_page_mte_tagging arch/arm64/include/asm/mte.h:93 [inline]
WARNING: CPU: 1 PID: 25189 at arch/arm64/mm/copypage.c:55 copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55
Modules linked in:
CPU: 1 UID: 0 PID: 25189 Comm: syz.2.7336 Not tainted syzkaller #0 PREEMPT 
Hardware name: linux,dummy-virt (DT)
pstate: 00402009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55
lr : copy_highpage+0xb4/0x334 arch/arm64/mm/copypage.c:25
sp : ffff800088053940
x29: ffff800088053940 x28: ffffc1ffc0acf800 x27: ffff800088053b10
x26: ffffc1ffc0acf808 x25: ffffc1ffc037b1c0 x24: ffffc1ffc037b1c0
x23: ffffc1ffc0acf800 x22: ffffc1ffc0acf800 x21: fff000002b3e0000
x20: fff000000dec7000 x19: ffffc1ffc037b1c0 x18: 0000000000000000
x17: fff07ffffcffa000 x16: ffff800080008000 x15: 0000000000000001
x14: 0000000000000000 x13: 0000000000000003 x12: 000000000006d9ad
x11: 0000000000000000 x10: 0000000000000010 x9 : 0000000000000000
x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
x5 : ffff800088053b18 x4 : ffff80008032df94 x3 : 00000000ff000000
x2 : 01ffc00003000001 x1 : 01ffc00003000001 x0 : 01ffc00003000001
Call trace:
 try_page_mte_tagging arch/arm64/include/asm/mte.h:93 [inline] (P)
 copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55 (P)
 copy_mc_highpage include/linux/highmem.h:383 [inline]
 folio_mc_copy+0x44/0x6c mm/util.c:740
 __migrate_folio.constprop.0+0xc4/0x23c mm/migrate.c:851
 migrate_folio+0x1c/0x2c mm/migrate.c:882
 move_to_new_folio+0x58/0x144 mm/migrate.c:1097
 migrate_folio_move mm/migrate.c:1370 [inline]
 migrate_folios_move mm/migrate.c:1719 [inline]
 migrate_pages_batch+0xaf4/0x1024 mm/migrate.c:1966
 migrate_pages_sync mm/migrate.c:2023 [inline]
 migrate_pages+0xb9c/0xcdc mm/migrate.c:2105
 do_mbind+0x20c/0x4a4 mm/mempolicy.c:1539
 kernel_mbind mm/mempolicy.c:1682 [inline]
 __do_sys_mbind mm/mempolicy.c:1756 [inline]
 __se_sys_mbind mm/mempolicy.c:1752 [inline]
 __arm64_sys_mbind+0xd0/0xd8 mm/mempolicy.c:1752
 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
 invoke_syscall+0x48/0x110 arch/arm64/kernel/syscall.c:49
 el0_svc_common.constprop.0+0x40/0xe0 arch/arm64/kernel/syscall.c:132
 do_el0_svc+0x1c/0x28 arch/arm64/kernel/syscall.c:151
 el0_svc+0x34/0x10c arch/arm64/kernel/entry-common.c:879
 el0t_64_sync_handler+0xa0/0xe4 arch/arm64/kernel/entry-common.c:898
 el0t_64_sync+0x1a4/0x1a8 arch/arm64/kernel/entry.S:596
---[ end trace 0000000000000000 ]---


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [arm?] WARNING in copy_highpage
  2025-10-01 21:48 [syzbot] [arm?] WARNING in copy_highpage syzbot
@ 2025-10-03 17:05 ` Catalin Marinas
  2025-10-06  7:55   ` David Hildenbrand
  0 siblings, 1 reply; 6+ messages in thread
From: Catalin Marinas @ 2025-10-03 17:05 UTC (permalink / raw)
  To: syzbot
  Cc: linux-arm-kernel, linux-kernel, syzkaller-bugs, will,
	David Hildenbrand

Thanks for the report (for some reason, outlook did not deliver this to
my inbox; Will pointed me at the message)

Adding David H as well, he may have some ideas. I haven't tried to
reproduce it yet.

On Wed, Oct 01, 2025 at 02:48:30PM -0700, syzbot wrote:
> syzbot found the following issue on:
> 
> HEAD commit:    fec734e8d564 Merge tag 'riscv-for-linus-v6.17-rc8' of git:..

So that's just before 6.17, not something that turned up during the
merging window.

> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=12187d34580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=13bd892ec3b155a2
> dashboard link: https://syzkaller.appspot.com/bug?extid=d1974fc28545a3e6218b
> compiler:       aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> userspace arch: arm64
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/fa3fbcfdac58/non_bootable_disk-fec734e8.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/d7e18b408aea/vmlinux-fec734e8.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/9b7984f47117/Image-fec734e8.gz.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+d1974fc28545a3e6218b@syzkaller.appspotmail.com
> 
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 25189 at arch/arm64/mm/copypage.c:55 try_page_mte_tagging arch/arm64/include/asm/mte.h:93 [inline]
> WARNING: CPU: 1 PID: 25189 at arch/arm64/mm/copypage.c:55 copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55

This warning means that the destination page is already tagged
(PG_mte_tagged set) when it got to copy_page().  In general it is fine
as we copy into and override all the tags but my assumption until now
has been that such new pages are always untagged.

> Modules linked in:
> CPU: 1 UID: 0 PID: 25189 Comm: syz.2.7336 Not tainted syzkaller #0 PREEMPT 
> Hardware name: linux,dummy-virt (DT)
> pstate: 00402009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55
> lr : copy_highpage+0xb4/0x334 arch/arm64/mm/copypage.c:25
> sp : ffff800088053940
> x29: ffff800088053940 x28: ffffc1ffc0acf800 x27: ffff800088053b10
> x26: ffffc1ffc0acf808 x25: ffffc1ffc037b1c0 x24: ffffc1ffc037b1c0
> x23: ffffc1ffc0acf800 x22: ffffc1ffc0acf800 x21: fff000002b3e0000
> x20: fff000000dec7000 x19: ffffc1ffc037b1c0 x18: 0000000000000000
> x17: fff07ffffcffa000 x16: ffff800080008000 x15: 0000000000000001
> x14: 0000000000000000 x13: 0000000000000003 x12: 000000000006d9ad
> x11: 0000000000000000 x10: 0000000000000010 x9 : 0000000000000000
> x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
> x5 : ffff800088053b18 x4 : ffff80008032df94 x3 : 00000000ff000000
> x2 : 01ffc00003000001 x1 : 01ffc00003000001 x0 : 01ffc00003000001
> Call trace:
>  try_page_mte_tagging arch/arm64/include/asm/mte.h:93 [inline] (P)
>  copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55 (P)
>  copy_mc_highpage include/linux/highmem.h:383 [inline]
>  folio_mc_copy+0x44/0x6c mm/util.c:740
>  __migrate_folio.constprop.0+0xc4/0x23c mm/migrate.c:851
>  migrate_folio+0x1c/0x2c mm/migrate.c:882
>  move_to_new_folio+0x58/0x144 mm/migrate.c:1097
>  migrate_folio_move mm/migrate.c:1370 [inline]
>  migrate_folios_move mm/migrate.c:1719 [inline]
>  migrate_pages_batch+0xaf4/0x1024 mm/migrate.c:1966
>  migrate_pages_sync mm/migrate.c:2023 [inline]
>  migrate_pages+0xb9c/0xcdc mm/migrate.c:2105
>  do_mbind+0x20c/0x4a4 mm/mempolicy.c:1539
>  kernel_mbind mm/mempolicy.c:1682 [inline]
>  __do_sys_mbind mm/mempolicy.c:1756 [inline]

I don't think we ever stressed MTE with mbind before. I have a suspicion
this problem has been around for some time.

My reading of do_mbind() is that it ends up allocating pages for
migrating into via alloc_migration_target_by_mpol() ->
folio_alloc_mpol(). Pages returned should be untagged and uninitialised
unless the PG_* flags have not been cleared on a prior free. Or
migrate_pages_batch() somehow reuses some pages instead of reallocating.

-- 
Catalin


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [arm?] WARNING in copy_highpage
  2025-10-03 17:05 ` Catalin Marinas
@ 2025-10-06  7:55   ` David Hildenbrand
  2025-10-06  9:38     ` David Hildenbrand
  2025-10-06 13:17     ` Catalin Marinas
  0 siblings, 2 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-10-06  7:55 UTC (permalink / raw)
  To: Catalin Marinas, syzbot
  Cc: linux-arm-kernel, linux-kernel, syzkaller-bugs, will

>> Modules linked in:
>> CPU: 1 UID: 0 PID: 25189 Comm: syz.2.7336 Not tainted syzkaller #0 PREEMPT
>> Hardware name: linux,dummy-virt (DT)
>> pstate: 00402009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> pc : copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55
>> lr : copy_highpage+0xb4/0x334 arch/arm64/mm/copypage.c:25
>> sp : ffff800088053940
>> x29: ffff800088053940 x28: ffffc1ffc0acf800 x27: ffff800088053b10
>> x26: ffffc1ffc0acf808 x25: ffffc1ffc037b1c0 x24: ffffc1ffc037b1c0
>> x23: ffffc1ffc0acf800 x22: ffffc1ffc0acf800 x21: fff000002b3e0000
>> x20: fff000000dec7000 x19: ffffc1ffc037b1c0 x18: 0000000000000000
>> x17: fff07ffffcffa000 x16: ffff800080008000 x15: 0000000000000001
>> x14: 0000000000000000 x13: 0000000000000003 x12: 000000000006d9ad
>> x11: 0000000000000000 x10: 0000000000000010 x9 : 0000000000000000
>> x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
>> x5 : ffff800088053b18 x4 : ffff80008032df94 x3 : 00000000ff000000
>> x2 : 01ffc00003000001 x1 : 01ffc00003000001 x0 : 01ffc00003000001
>> Call trace:
>>   try_page_mte_tagging arch/arm64/include/asm/mte.h:93 [inline] (P)
>>   copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55 (P)
>>   copy_mc_highpage include/linux/highmem.h:383 [inline]
>>   folio_mc_copy+0x44/0x6c mm/util.c:740
>>   __migrate_folio.constprop.0+0xc4/0x23c mm/migrate.c:851
>>   migrate_folio+0x1c/0x2c mm/migrate.c:882
>>   move_to_new_folio+0x58/0x144 mm/migrate.c:1097
>>   migrate_folio_move mm/migrate.c:1370 [inline]
>>   migrate_folios_move mm/migrate.c:1719 [inline]
>>   migrate_pages_batch+0xaf4/0x1024 mm/migrate.c:1966
>>   migrate_pages_sync mm/migrate.c:2023 [inline]
>>   migrate_pages+0xb9c/0xcdc mm/migrate.c:2105
>>   do_mbind+0x20c/0x4a4 mm/mempolicy.c:1539
>>   kernel_mbind mm/mempolicy.c:1682 [inline]
>>   __do_sys_mbind mm/mempolicy.c:1756 [inline]
> 
> I don't think we ever stressed MTE with mbind before. I have a suspicion
> this problem has been around for some time.
> 
> My reading of do_mbind() is that it ends up allocating pages for
> migrating into via alloc_migration_target_by_mpol() ->
> folio_alloc_mpol(). Pages returned should be untagged and uninitialised
> unless the PG_* flags have not been cleared on a prior free. Or
> migrate_pages_batch() somehow reuses some pages instead of reallocating.

Staring at __migrate_folio(), I assume we can end up successfully 
calling folio_mc_copy(), but then failing in __folio_migrate_mapping().

Seems to be as easy as failing the folio_ref_freeze() in 
__folio_migrate_mapping().

We return -EAGAIN in that case, making the caller retry, stumbling into 
an already-tagged page. (with the same source / destination parameters) 
IIRC)

So likely this is simply us re-doing the copy after a migration failed 
after the copy.

Could it happen that we are calling it with a different 
source/destination combination the second time? I don't think so, but I 
am not 100% sure.

The most reliable way would be to un-tag in case folio_mc_copy succeeded 
but __folio_migrate_mapping() failed.

I'm also wondering whether we can simply perform the copy after the 
__folio_migrate_mapping() call: the src folio is locked and unmapped, 
nobody can really modify it. Same for the dst folio.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [arm?] WARNING in copy_highpage
  2025-10-06  7:55   ` David Hildenbrand
@ 2025-10-06  9:38     ` David Hildenbrand
  2025-10-06 13:17     ` Catalin Marinas
  1 sibling, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-10-06  9:38 UTC (permalink / raw)
  To: Catalin Marinas, syzbot
  Cc: linux-arm-kernel, linux-kernel, syzkaller-bugs, will, Kefeng Wang


> 
> The most reliable way would be to un-tag in case folio_mc_copy succeeded
> but __folio_migrate_mapping() failed.
> 
> I'm also wondering whether we can simply perform the copy after the
> __folio_migrate_mapping() call: the src folio is locked and unmapped,
> nobody can really modify it. Same for the dst folio.

Answering that myself: obviously we don't want to fail after migrating 
the mapping, that is more expensive to recover from.

And I think that also explains how commit 060913999d7a ("mm: migrate: 
support poisoned recover from migrate folio") likely introduced the 
issue by moving the copy.

CCing Kefeng

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [arm?] WARNING in copy_highpage
  2025-10-06  7:55   ` David Hildenbrand
  2025-10-06  9:38     ` David Hildenbrand
@ 2025-10-06 13:17     ` Catalin Marinas
  2025-10-06 13:25       ` David Hildenbrand
  1 sibling, 1 reply; 6+ messages in thread
From: Catalin Marinas @ 2025-10-06 13:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: syzbot, linux-arm-kernel, linux-kernel, syzkaller-bugs, will,
	Kefeng Wang

On Mon, Oct 06, 2025 at 09:55:27AM +0200, David Hildenbrand wrote:
> > > Modules linked in:
> > > CPU: 1 UID: 0 PID: 25189 Comm: syz.2.7336 Not tainted syzkaller #0 PREEMPT
> > > Hardware name: linux,dummy-virt (DT)
> > > pstate: 00402009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > pc : copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55
> > > lr : copy_highpage+0xb4/0x334 arch/arm64/mm/copypage.c:25
> > > sp : ffff800088053940
> > > x29: ffff800088053940 x28: ffffc1ffc0acf800 x27: ffff800088053b10
> > > x26: ffffc1ffc0acf808 x25: ffffc1ffc037b1c0 x24: ffffc1ffc037b1c0
> > > x23: ffffc1ffc0acf800 x22: ffffc1ffc0acf800 x21: fff000002b3e0000
> > > x20: fff000000dec7000 x19: ffffc1ffc037b1c0 x18: 0000000000000000
> > > x17: fff07ffffcffa000 x16: ffff800080008000 x15: 0000000000000001
> > > x14: 0000000000000000 x13: 0000000000000003 x12: 000000000006d9ad
> > > x11: 0000000000000000 x10: 0000000000000010 x9 : 0000000000000000
> > > x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
> > > x5 : ffff800088053b18 x4 : ffff80008032df94 x3 : 00000000ff000000
> > > x2 : 01ffc00003000001 x1 : 01ffc00003000001 x0 : 01ffc00003000001
> > > Call trace:
> > >   try_page_mte_tagging arch/arm64/include/asm/mte.h:93 [inline] (P)
> > >   copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55 (P)
> > >   copy_mc_highpage include/linux/highmem.h:383 [inline]
> > >   folio_mc_copy+0x44/0x6c mm/util.c:740
> > >   __migrate_folio.constprop.0+0xc4/0x23c mm/migrate.c:851
> > >   migrate_folio+0x1c/0x2c mm/migrate.c:882
> > >   move_to_new_folio+0x58/0x144 mm/migrate.c:1097
> > >   migrate_folio_move mm/migrate.c:1370 [inline]
> > >   migrate_folios_move mm/migrate.c:1719 [inline]
> > >   migrate_pages_batch+0xaf4/0x1024 mm/migrate.c:1966
> > >   migrate_pages_sync mm/migrate.c:2023 [inline]
> > >   migrate_pages+0xb9c/0xcdc mm/migrate.c:2105
> > >   do_mbind+0x20c/0x4a4 mm/mempolicy.c:1539
> > >   kernel_mbind mm/mempolicy.c:1682 [inline]
> > >   __do_sys_mbind mm/mempolicy.c:1756 [inline]
> > 
> > I don't think we ever stressed MTE with mbind before. I have a suspicion
> > this problem has been around for some time.
> > 
> > My reading of do_mbind() is that it ends up allocating pages for
> > migrating into via alloc_migration_target_by_mpol() ->
> > folio_alloc_mpol(). Pages returned should be untagged and uninitialised
> > unless the PG_* flags have not been cleared on a prior free. Or
> > migrate_pages_batch() somehow reuses some pages instead of reallocating.
> 
> Staring at __migrate_folio(), I assume we can end up successfully calling
> folio_mc_copy(), but then failing in __folio_migrate_mapping().
> 
> Seems to be as easy as failing the folio_ref_freeze() in
> __folio_migrate_mapping().
> 
> We return -EAGAIN in that case, making the caller retry, stumbling into an
> already-tagged page. (with the same source / destination parameters) IIRC)
> 
> So likely this is simply us re-doing the copy after a migration failed after
> the copy.
> 
> Could it happen that we are calling it with a different source/destination
> combination the second time? I don't think so, but I am not 100% sure.

Thanks David. I can now see how it would retry on the same pages without
reallocating. At least we know it's not causing any side-effects, only
messing up the MTE safety warnings.

> The most reliable way would be to un-tag in case folio_mc_copy succeeded but
> __folio_migrate_mapping() failed.

Clearing an MTE specific flag in the core code doesn't look great. Also
going for some generic mask like PAGE_FLAGS_CHECK_AT_PREP may have
side-effects as we don't know where the page is coming from (we have
those get_new_folio()/put_new_folio() arguments passed on by higher up
callers).

I'm tempted to just drop the warning in the arm64 copy_highpage(),
replace it with a comment about migration retrying on a potentially
tagged page. It will have to override the tags each time (as it
currently does but also warns).

-- 
Catalin


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [syzbot] [arm?] WARNING in copy_highpage
  2025-10-06 13:17     ` Catalin Marinas
@ 2025-10-06 13:25       ` David Hildenbrand
  0 siblings, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-10-06 13:25 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: syzbot, linux-arm-kernel, linux-kernel, syzkaller-bugs, will,
	Kefeng Wang

On 06.10.25 15:17, Catalin Marinas wrote:
> On Mon, Oct 06, 2025 at 09:55:27AM +0200, David Hildenbrand wrote:
>>>> Modules linked in:
>>>> CPU: 1 UID: 0 PID: 25189 Comm: syz.2.7336 Not tainted syzkaller #0 PREEMPT
>>>> Hardware name: linux,dummy-virt (DT)
>>>> pstate: 00402009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>> pc : copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55
>>>> lr : copy_highpage+0xb4/0x334 arch/arm64/mm/copypage.c:25
>>>> sp : ffff800088053940
>>>> x29: ffff800088053940 x28: ffffc1ffc0acf800 x27: ffff800088053b10
>>>> x26: ffffc1ffc0acf808 x25: ffffc1ffc037b1c0 x24: ffffc1ffc037b1c0
>>>> x23: ffffc1ffc0acf800 x22: ffffc1ffc0acf800 x21: fff000002b3e0000
>>>> x20: fff000000dec7000 x19: ffffc1ffc037b1c0 x18: 0000000000000000
>>>> x17: fff07ffffcffa000 x16: ffff800080008000 x15: 0000000000000001
>>>> x14: 0000000000000000 x13: 0000000000000003 x12: 000000000006d9ad
>>>> x11: 0000000000000000 x10: 0000000000000010 x9 : 0000000000000000
>>>> x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
>>>> x5 : ffff800088053b18 x4 : ffff80008032df94 x3 : 00000000ff000000
>>>> x2 : 01ffc00003000001 x1 : 01ffc00003000001 x0 : 01ffc00003000001
>>>> Call trace:
>>>>    try_page_mte_tagging arch/arm64/include/asm/mte.h:93 [inline] (P)
>>>>    copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55 (P)
>>>>    copy_mc_highpage include/linux/highmem.h:383 [inline]
>>>>    folio_mc_copy+0x44/0x6c mm/util.c:740
>>>>    __migrate_folio.constprop.0+0xc4/0x23c mm/migrate.c:851
>>>>    migrate_folio+0x1c/0x2c mm/migrate.c:882
>>>>    move_to_new_folio+0x58/0x144 mm/migrate.c:1097
>>>>    migrate_folio_move mm/migrate.c:1370 [inline]
>>>>    migrate_folios_move mm/migrate.c:1719 [inline]
>>>>    migrate_pages_batch+0xaf4/0x1024 mm/migrate.c:1966
>>>>    migrate_pages_sync mm/migrate.c:2023 [inline]
>>>>    migrate_pages+0xb9c/0xcdc mm/migrate.c:2105
>>>>    do_mbind+0x20c/0x4a4 mm/mempolicy.c:1539
>>>>    kernel_mbind mm/mempolicy.c:1682 [inline]
>>>>    __do_sys_mbind mm/mempolicy.c:1756 [inline]
>>>
>>> I don't think we ever stressed MTE with mbind before. I have a suspicion
>>> this problem has been around for some time.
>>>
>>> My reading of do_mbind() is that it ends up allocating pages for
>>> migrating into via alloc_migration_target_by_mpol() ->
>>> folio_alloc_mpol(). Pages returned should be untagged and uninitialised
>>> unless the PG_* flags have not been cleared on a prior free. Or
>>> migrate_pages_batch() somehow reuses some pages instead of reallocating.
>>
>> Staring at __migrate_folio(), I assume we can end up successfully calling
>> folio_mc_copy(), but then failing in __folio_migrate_mapping().
>>
>> Seems to be as easy as failing the folio_ref_freeze() in
>> __folio_migrate_mapping().
>>
>> We return -EAGAIN in that case, making the caller retry, stumbling into an
>> already-tagged page. (with the same source / destination parameters) IIRC)
>>
>> So likely this is simply us re-doing the copy after a migration failed after
>> the copy.
>>
>> Could it happen that we are calling it with a different source/destination
>> combination the second time? I don't think so, but I am not 100% sure.
> 
> Thanks David. I can now see how it would retry on the same pages without
> reallocating. At least we know it's not causing any side-effects, only
> messing up the MTE safety warnings.

As long as the folio is not getting reused elsewhere, yes.

I haven't fully understood yet if there could be cases where we use the 
folio for another source. But I think it's not trivially possible, 
because I think we allocate dst folios based on source-folio properties 
(order, node, zone, etc).

> 
>> The most reliable way would be to un-tag in case folio_mc_copy succeeded but
>> __folio_migrate_mapping() failed.
> 
> Clearing an MTE specific flag in the core code doesn't look great. Also
> going for some generic mask like PAGE_FLAGS_CHECK_AT_PREP may have
> side-effects as we don't know where the page is coming from (we have
> those get_new_folio()/put_new_folio() arguments passed on by higher up
> callers).

As an alternative, I would probably have done something like providing a 
simple folio_mc_copy_abort().

> 
> I'm tempted to just drop the warning in the arm64 copy_highpage(),
> replace it with a comment about migration retrying on a potentially
> tagged page. It will have to override the tags each time (as it
> currently does but also warns).

Works for me. Maybe we could warn if the tag would change, because I 
think after we unmapped the folio during migration, the tag can no 
longer change.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-10-06 13:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-01 21:48 [syzbot] [arm?] WARNING in copy_highpage syzbot
2025-10-03 17:05 ` Catalin Marinas
2025-10-06  7:55   ` David Hildenbrand
2025-10-06  9:38     ` David Hildenbrand
2025-10-06 13:17     ` Catalin Marinas
2025-10-06 13:25       ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).