public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [syzbot] [kernel?] linux-next test error: WARNING in switch_mm_irqs_off
@ 2025-04-14 10:28 syzbot
  2025-04-14 13:56 ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: syzbot @ 2025-04-14 10:28 UTC (permalink / raw)
  To: bp, dave.hansen, hpa, linux-kernel, linux-next, luto, mingo,
	peterz, sfr, syzkaller-bugs, tglx, x86

Hello,

syzbot found the following issue on:

HEAD commit:    b425262c07a6 Add linux-next specific files for 20250414
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=10ddb398580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=cc26bd6fced8397d
dashboard link: https://syzkaller.appspot.com/bug?extid=c2537ce72a879a38113e
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/e04788e9f74f/disk-b425262c.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/32ac1bacc159/vmlinux-b425262c.xz
kernel image: https://storage.googleapis.com/syzbot-assets/99cc84c793ed/bzImage-b425262c.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+c2537ce72a879a38113e@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 0 PID: 9 at arch/x86/mm/tlb.c:919 switch_mm_irqs_off+0x686/0x810 arch/x86/mm/tlb.c:918
Modules linked in:
CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.15.0-rc2-next-20250414-syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
Workqueue: events once_deferred
RIP: 0010:switch_mm_irqs_off+0x686/0x810 arch/x86/mm/tlb.c:918
Code: 90 41 f7 c5 00 08 00 00 0f 84 ee fa ff ff 90 0f 0b 90 e9 e5 fa ff ff 90 0f 0b 90 e9 76 fe ff ff 90 0f 0b 90 e9 cc fb ff ff 90 <0f> 0b 90 4d 39 f4 0f 85 eb fb ff ff e9 31 fc ff ff 90 0f 0b 90 e9
RSP: 0000:ffffc900000e7680 EFLAGS: 00010056
RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffff816ffd4d
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88801b070940
RBP: ffffc900000e7750 R08: ffff88801b070947 R09: 1ffff1100360e128
R10: dffffc0000000000 R11: ffffed100360e129 R12: ffffffff8ee49240
R13: ffff88801b070940 R14: ffffffff8ee49240 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff888124faa000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88823ffff000 CR3: 000000001b078000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 unuse_temporary_mm+0x9f/0x100 arch/x86/mm/tlb.c:1038
 __text_poke+0x7b6/0xb40 arch/x86/kernel/alternative.c:2214
 text_poke arch/x86/kernel/alternative.c:2257 [inline]
 smp_text_poke_batch_finish+0x3e7/0x12c0 arch/x86/kernel/alternative.c:2565
 arch_jump_label_transform_apply+0x1c/0x30 arch/x86/kernel/jump_label.c:146
 static_key_disable_cpuslocked+0xd2/0x1c0 kernel/jump_label.c:240
 static_key_disable+0x1a/0x20 kernel/jump_label.c:248
 once_deferred+0x70/0xb0 lib/once.c:20
 process_one_work kernel/workqueue.c:3238 [inline]
 process_scheduled_works+0xac3/0x18e0 kernel/workqueue.c:3319
 worker_thread+0x870/0xd50 kernel/workqueue.c:3400
 kthread+0x7b7/0x940 kernel/kthread.c:464
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:153
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [kernel?] linux-next test error: WARNING in switch_mm_irqs_off
  2025-04-14 10:28 [syzbot] [kernel?] linux-next test error: WARNING in switch_mm_irqs_off syzbot
@ 2025-04-14 13:56 ` Peter Zijlstra
  2025-04-14 15:10   ` Ingo Molnar
  2025-04-17 13:02   ` [tip: x86/alternatives] x86/mm: Remove the mm_cpumask(prev) warning from switch_mm_irqs_off() tip-bot2 for Peter Zijlstra
  0 siblings, 2 replies; 7+ messages in thread
From: Peter Zijlstra @ 2025-04-14 13:56 UTC (permalink / raw)
  To: syzbot, riel
  Cc: bp, dave.hansen, hpa, linux-kernel, linux-next, luto, mingo, sfr,
	syzkaller-bugs, tglx, x86

On Mon, Apr 14, 2025 at 03:28:27AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    b425262c07a6 Add linux-next specific files for 20250414
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=10ddb398580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=cc26bd6fced8397d
> dashboard link: https://syzkaller.appspot.com/bug?extid=c2537ce72a879a38113e
> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> 
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/e04788e9f74f/disk-b425262c.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/32ac1bacc159/vmlinux-b425262c.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/99cc84c793ed/bzImage-b425262c.xz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+c2537ce72a879a38113e@syzkaller.appspotmail.com
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 9 at arch/x86/mm/tlb.c:919 switch_mm_irqs_off+0x686/0x810 arch/x86/mm/tlb.c:918
> Modules linked in:
> CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.15.0-rc2-next-20250414-syzkaller #0 PREEMPT(full) 
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
> Workqueue: events once_deferred
> RIP: 0010:switch_mm_irqs_off+0x686/0x810 arch/x86/mm/tlb.c:918
> Code: 90 41 f7 c5 00 08 00 00 0f 84 ee fa ff ff 90 0f 0b 90 e9 e5 fa ff ff 90 0f 0b 90 e9 76 fe ff ff 90 0f 0b 90 e9 cc fb ff ff 90 <0f> 0b 90 4d 39 f4 0f 85 eb fb ff ff e9 31 fc ff ff 90 0f 0b 90 e9
> RSP: 0000:ffffc900000e7680 EFLAGS: 00010056
> RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffff816ffd4d
> RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88801b070940
> RBP: ffffc900000e7750 R08: ffff88801b070947 R09: 1ffff1100360e128
> R10: dffffc0000000000 R11: ffffed100360e129 R12: ffffffff8ee49240
> R13: ffff88801b070940 R14: ffffffff8ee49240 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff888124faa000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffff88823ffff000 CR3: 000000001b078000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  unuse_temporary_mm+0x9f/0x100 arch/x86/mm/tlb.c:1038
>  __text_poke+0x7b6/0xb40 arch/x86/kernel/alternative.c:2214
>  text_poke arch/x86/kernel/alternative.c:2257 [inline]
>  smp_text_poke_batch_finish+0x3e7/0x12c0 arch/x86/kernel/alternative.c:2565
>  arch_jump_label_transform_apply+0x1c/0x30 arch/x86/kernel/jump_label.c:146
>  static_key_disable_cpuslocked+0xd2/0x1c0 kernel/jump_label.c:240
>  static_key_disable+0x1a/0x20 kernel/jump_label.c:248
>  once_deferred+0x70/0xb0 lib/once.c:20
>  process_one_work kernel/workqueue.c:3238 [inline]
>  process_scheduled_works+0xac3/0x18e0 kernel/workqueue.c:3319
>  worker_thread+0x870/0xd50 kernel/workqueue.c:3400
>  kthread+0x7b7/0x940 kernel/kthread.c:464
>  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:153
>  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
>  </TASK>

So I can reproduce, and I I think I see what happens, except I'm
confused as to why the recently merged patches show this.

AFAIU what happens is that unuse_temporary_mm() clears the mm_cpumask()
for the current CPU, while switch_mm_irqs_off() then checks that the
mm_cpumask() bit is set for the current CPU.

This behaviour hasn't really changed since 209954cbc7d0 ("x86/mm/tlb:
Update mm_cpumask lazily") introduced both.

I'm not entirely sure what the best way forward is.. we can simply
delete the warning, or make use_temporary_mm() tag the special MMs
somehow and exclude them from the warning.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [kernel?] linux-next test error: WARNING in switch_mm_irqs_off
  2025-04-14 13:56 ` Peter Zijlstra
@ 2025-04-14 15:10   ` Ingo Molnar
  2025-04-14 15:50     ` Ingo Molnar
  2025-04-14 18:50     ` Peter Zijlstra
  2025-04-17 13:02   ` [tip: x86/alternatives] x86/mm: Remove the mm_cpumask(prev) warning from switch_mm_irqs_off() tip-bot2 for Peter Zijlstra
  1 sibling, 2 replies; 7+ messages in thread
From: Ingo Molnar @ 2025-04-14 15:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: syzbot, riel, bp, dave.hansen, hpa, linux-kernel, linux-next,
	luto, mingo, sfr, syzkaller-bugs, tglx, x86


* Peter Zijlstra <peterz@infradead.org> wrote:

> > Call Trace:
> >  <TASK>
> >  unuse_temporary_mm+0x9f/0x100 arch/x86/mm/tlb.c:1038
> >  __text_poke+0x7b6/0xb40 arch/x86/kernel/alternative.c:2214
> >  text_poke arch/x86/kernel/alternative.c:2257 [inline]
> >  smp_text_poke_batch_finish+0x3e7/0x12c0 arch/x86/kernel/alternative.c:2565
> >  arch_jump_label_transform_apply+0x1c/0x30 arch/x86/kernel/jump_label.c:146
> >  static_key_disable_cpuslocked+0xd2/0x1c0 kernel/jump_label.c:240
> >  static_key_disable+0x1a/0x20 kernel/jump_label.c:248
> >  once_deferred+0x70/0xb0 lib/once.c:20
> >  process_one_work kernel/workqueue.c:3238 [inline]
> >  process_scheduled_works+0xac3/0x18e0 kernel/workqueue.c:3319
> >  worker_thread+0x870/0xd50 kernel/workqueue.c:3400
> >  kthread+0x7b7/0x940 kernel/kthread.c:464
> >  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:153
> >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> >  </TASK>
> 
> So I can reproduce, and I I think I see what happens, except I'm
> confused as to why the recently merged patches show this.
> 
> AFAIU what happens is that unuse_temporary_mm() clears the 
> mm_cpumask() for the current CPU, while switch_mm_irqs_off() then 
> checks that the mm_cpumask() bit is set for the current CPU.
> 
> This behaviour hasn't really changed since 209954cbc7d0 ("x86/mm/tlb: 
> Update mm_cpumask lazily") introduced both.
> 
> I'm not entirely sure what the best way forward is.. we can simply 
> delete the warning, or make use_temporary_mm() tag the special MMs 
> somehow and exclude them from the warning.

So, mm_cpumask is basically tracking on which CPUs the MM ran on, and 
this gets cleared lazily whenever there's an opportune time, but not 
during context switches (for shared cacheline performance reasons), 
right?

So why do we need to clear the mm_cpumask in unuse_temporary_mm() to 
begin with:

	/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
        cpumask_clear_cpu(smp_processor_id(), mm_cpumask(this_cpu_read(cpu_tlbstate.loaded_mm)));

What TLB flushing are we worried about here? Nothing much should 
trigger any TLB flushing for text_poke_mm AFAICS?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [syzbot] [kernel?] linux-next test error: WARNING in switch_mm_irqs_off
  2025-04-14 15:10   ` Ingo Molnar
@ 2025-04-14 15:50     ` Ingo Molnar
  2025-04-14 18:50     ` Peter Zijlstra
  1 sibling, 0 replies; 7+ messages in thread
From: Ingo Molnar @ 2025-04-14 15:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: syzbot, riel, bp, dave.hansen, hpa, linux-kernel, linux-next,
	luto, mingo, sfr, syzkaller-bugs, tglx, x86


* Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > Call Trace:
> > >  <TASK>
> > >  unuse_temporary_mm+0x9f/0x100 arch/x86/mm/tlb.c:1038
> > >  __text_poke+0x7b6/0xb40 arch/x86/kernel/alternative.c:2214
> > >  text_poke arch/x86/kernel/alternative.c:2257 [inline]
> > >  smp_text_poke_batch_finish+0x3e7/0x12c0 arch/x86/kernel/alternative.c:2565
> > >  arch_jump_label_transform_apply+0x1c/0x30 arch/x86/kernel/jump_label.c:146
> > >  static_key_disable_cpuslocked+0xd2/0x1c0 kernel/jump_label.c:240
> > >  static_key_disable+0x1a/0x20 kernel/jump_label.c:248
> > >  once_deferred+0x70/0xb0 lib/once.c:20
> > >  process_one_work kernel/workqueue.c:3238 [inline]
> > >  process_scheduled_works+0xac3/0x18e0 kernel/workqueue.c:3319
> > >  worker_thread+0x870/0xd50 kernel/workqueue.c:3400
> > >  kthread+0x7b7/0x940 kernel/kthread.c:464
> > >  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:153
> > >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> > >  </TASK>
> > 
> > So I can reproduce, and I I think I see what happens, except I'm
> > confused as to why the recently merged patches show this.
> > 
> > AFAIU what happens is that unuse_temporary_mm() clears the 
> > mm_cpumask() for the current CPU, while switch_mm_irqs_off() then 
> > checks that the mm_cpumask() bit is set for the current CPU.
> > 
> > This behaviour hasn't really changed since 209954cbc7d0 ("x86/mm/tlb: 
> > Update mm_cpumask lazily") introduced both.
> > 
> > I'm not entirely sure what the best way forward is.. we can simply 
> > delete the warning, or make use_temporary_mm() tag the special MMs 
> > somehow and exclude them from the warning.
> 
> So, mm_cpumask is basically tracking on which CPUs the MM ran on, and 
> this gets cleared lazily whenever there's an opportune time, but not 
> during context switches (for shared cacheline performance reasons), 
> right?
> 
> So why do we need to clear the mm_cpumask in unuse_temporary_mm() to 
> begin with:
> 
> 	/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
>         cpumask_clear_cpu(smp_processor_id(), mm_cpumask(this_cpu_read(cpu_tlbstate.loaded_mm)));
> 
> What TLB flushing are we worried about here? Nothing much should 
> trigger any TLB flushing for text_poke_mm AFAICS?

Ie. something like the patch below - but I might be missing something 
here ...

Thanks,

	Ingo

=================>
 arch/x86/mm/tlb.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 0ebbaab55b0a..d36d370042e2 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -1032,9 +1032,6 @@ void unuse_temporary_mm(struct mm_struct *prev_mm)
 {
 	lockdep_assert_preemption_disabled();
 
-	/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
-	cpumask_clear_cpu(smp_processor_id(), mm_cpumask(this_cpu_read(cpu_tlbstate.loaded_mm)));
-
 	switch_mm_irqs_off(NULL, prev_mm, current);
 
 	/*

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [syzbot] [kernel?] linux-next test error: WARNING in switch_mm_irqs_off
  2025-04-14 15:10   ` Ingo Molnar
  2025-04-14 15:50     ` Ingo Molnar
@ 2025-04-14 18:50     ` Peter Zijlstra
  1 sibling, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2025-04-14 18:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: syzbot, riel, bp, dave.hansen, hpa, linux-kernel, linux-next,
	luto, mingo, sfr, syzkaller-bugs, tglx, x86

On Mon, Apr 14, 2025 at 05:10:03PM +0200, Ingo Molnar wrote:
> 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > Call Trace:
> > >  <TASK>
> > >  unuse_temporary_mm+0x9f/0x100 arch/x86/mm/tlb.c:1038
> > >  __text_poke+0x7b6/0xb40 arch/x86/kernel/alternative.c:2214
> > >  text_poke arch/x86/kernel/alternative.c:2257 [inline]
> > >  smp_text_poke_batch_finish+0x3e7/0x12c0 arch/x86/kernel/alternative.c:2565
> > >  arch_jump_label_transform_apply+0x1c/0x30 arch/x86/kernel/jump_label.c:146
> > >  static_key_disable_cpuslocked+0xd2/0x1c0 kernel/jump_label.c:240
> > >  static_key_disable+0x1a/0x20 kernel/jump_label.c:248
> > >  once_deferred+0x70/0xb0 lib/once.c:20
> > >  process_one_work kernel/workqueue.c:3238 [inline]
> > >  process_scheduled_works+0xac3/0x18e0 kernel/workqueue.c:3319
> > >  worker_thread+0x870/0xd50 kernel/workqueue.c:3400
> > >  kthread+0x7b7/0x940 kernel/kthread.c:464
> > >  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:153
> > >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> > >  </TASK>
> > 
> > So I can reproduce, and I I think I see what happens, except I'm
> > confused as to why the recently merged patches show this.
> > 
> > AFAIU what happens is that unuse_temporary_mm() clears the 
> > mm_cpumask() for the current CPU, while switch_mm_irqs_off() then 
> > checks that the mm_cpumask() bit is set for the current CPU.
> > 
> > This behaviour hasn't really changed since 209954cbc7d0 ("x86/mm/tlb: 
> > Update mm_cpumask lazily") introduced both.
> > 
> > I'm not entirely sure what the best way forward is.. we can simply 
> > delete the warning, or make use_temporary_mm() tag the special MMs 
> > somehow and exclude them from the warning.
> 
> So, mm_cpumask is basically tracking on which CPUs the MM ran on, and 
> this gets cleared lazily whenever there's an opportune time, but not 
> during context switches (for shared cacheline performance reasons), 
> right?
> 
> So why do we need to clear the mm_cpumask in unuse_temporary_mm() to 
> begin with:
> 
> 	/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
>         cpumask_clear_cpu(smp_processor_id(), mm_cpumask(this_cpu_read(cpu_tlbstate.loaded_mm)));
> 
> What TLB flushing are we worried about here? Nothing much should 
> trigger any TLB flushing for text_poke_mm AFAICS?

No, it will actually try and issue TLBI for text_poking_mm and then
things go sideways. If you look up the original thread:

  https://lkml.kernel.org/r/20241113095550.GBZzR3pg-RhJKPDazS@fat_crate.local

you'll find this was discussed. You were on Cc there.

Some of the solutions there made the TLBI not explode, but fundamentally
the whole temporary_mm thing is CPU local and the CR3 switch away is
sufficient.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip: x86/alternatives] x86/mm: Remove the mm_cpumask(prev) warning from switch_mm_irqs_off()
  2025-04-14 13:56 ` Peter Zijlstra
  2025-04-14 15:10   ` Ingo Molnar
@ 2025-04-17 13:02   ` tip-bot2 for Peter Zijlstra
  2025-05-05 16:56     ` Aleksandr Nogikh
  1 sibling, 1 reply; 7+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2025-04-17 13:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: syzbot+c2537ce72a879a38113e, Borislav Petkov, Peter Zijlstra,
	Ingo Molnar, Andy Lutomirski, Brian Gerst, Juergen Gross,
	Andrew Cooper, Rik van Riel, H. Peter Anvin, Linus Torvalds,
	linux-kernel, x86

The following commit has been merged into the x86/alternatives branch of tip:

Commit-ID:     52ebfe7412ce4b3af54fe962af58efe9b25cd9a9
Gitweb:        https://git.kernel.org/tip/52ebfe7412ce4b3af54fe962af58efe9b25cd9a9
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Thu, 17 Apr 2025 14:34:13 +02:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Thu, 17 Apr 2025 14:46:25 +02:00

x86/mm: Remove the mm_cpumask(prev) warning from switch_mm_irqs_off()

The CONFIG_DEBUG_VM=y warning in switch_mm_irqs_off() started
triggering in testing:

	VM_WARN_ON_ONCE(prev != &init_mm && !cpumask_test_cpu(cpu, mm_cpumask(prev)));

AFAIU what happens is that unuse_temporary_mm() clears the mm_cpumask()
for the current CPU, while switch_mm_irqs_off() then checks that the
mm_cpumask() bit is set for the current CPU.

While this behaviour hasn't really changed since the following commit:

  209954cbc7d0 ("x86/mm/tlb: Update mm_cpumask lazily")

introduced both, but the warning is wrong, so remove it.

[ mingo: Patchified Peter's email. ]

Reported-by: syzbot+c2537ce72a879a38113e@syzkaller.appspotmail.com
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/20250414135629.GA17910@noisy.programming.kicks-ass.net
---
 arch/x86/mm/tlb.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index c9b87e5..79c124f 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -905,14 +905,6 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next,
 		this_cpu_write(cpu_tlbstate.loaded_mm, LOADED_MM_SWITCHING);
 		barrier();
 
-		/*
-		 * Leave this CPU in prev's mm_cpumask. Atomic writes to
-		 * mm_cpumask can be expensive under contention. The CPU
-		 * will be removed lazily at TLB flush time.
-		 */
-		VM_WARN_ON_ONCE(prev != &init_mm && !cpumask_test_cpu(cpu,
-				mm_cpumask(prev)));
-
 		/* Start receiving IPIs and then read tlb_gen (and LAM below) */
 		if (next != &init_mm && !cpumask_test_cpu(cpu, mm_cpumask(next)))
 			cpumask_set_cpu(cpu, mm_cpumask(next));

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [tip: x86/alternatives] x86/mm: Remove the mm_cpumask(prev) warning from switch_mm_irqs_off()
  2025-04-17 13:02   ` [tip: x86/alternatives] x86/mm: Remove the mm_cpumask(prev) warning from switch_mm_irqs_off() tip-bot2 for Peter Zijlstra
@ 2025-05-05 16:56     ` Aleksandr Nogikh
  0 siblings, 0 replies; 7+ messages in thread
From: Aleksandr Nogikh @ 2025-05-05 16:56 UTC (permalink / raw)
  To: tip-bot2
  Cc: andrew.cooper3, bp, brgerst, hpa, jgross, linux-kernel,
	linux-tip-commits, luto, mingo, peterz, riel,
	syzbot+c2537ce72a879a38113e, torvalds, x86, syzkaller-bugs,
	dvyukov

Hi Peter, Ingo,

Thanks for addressing the problem!

It's been a couple of weeks since the commit has been merged into
x86/alternatives. However, it doesn't appear to be in linux-next yet,
which unfortunately prevents syzbot from fuzzing the linux-next tree.

When could we expect the commit to reach linux-next? If it's possible
to get it there sooner, that would be much appreciated.

Thanks,
Aleksandr

On Thu, 17 Apr 2025 13:02:48 -0000 tip-bot2 for Peter Zijlstra <tip-bot2@linutronix.de> wrote:
> The following commit has been merged into the x86/alternatives branch of tip:
> 
> Commit-ID:     52ebfe7412ce4b3af54fe962af58efe9b25cd9a9
> Gitweb:        https://git.kernel.org/tip/52ebfe7412ce4b3af54fe962af58efe9b25cd9a9
> Author:        Peter Zijlstra <peterz@infradead.org>
> AuthorDate:    Thu, 17 Apr 2025 14:34:13 +02:00
> Committer:     Ingo Molnar <mingo@kernel.org>
> CommitterDate: Thu, 17 Apr 2025 14:46:25 +02:00
> 
> x86/mm: Remove the mm_cpumask(prev) warning from switch_mm_irqs_off()
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-05-05 16:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-14 10:28 [syzbot] [kernel?] linux-next test error: WARNING in switch_mm_irqs_off syzbot
2025-04-14 13:56 ` Peter Zijlstra
2025-04-14 15:10   ` Ingo Molnar
2025-04-14 15:50     ` Ingo Molnar
2025-04-14 18:50     ` Peter Zijlstra
2025-04-17 13:02   ` [tip: x86/alternatives] x86/mm: Remove the mm_cpumask(prev) warning from switch_mm_irqs_off() tip-bot2 for Peter Zijlstra
2025-05-05 16:56     ` Aleksandr Nogikh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox