Re: [PATCH] exit: skip IRQ disabled warning during power off

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Breno Leitao <leitao@debian.org>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: akpm@linux-foundation.org, brauner@kernel.org,
	linux-kernel@vger.kernel.org, joel.granados@kernel.org,
	kernel-team@meta.com, oleg@redhat.com
Subject: Re: [PATCH] exit: skip IRQ disabled warning during power off
Date: Fri, 4 Apr 2025 05:51:48 -0700	[thread overview]
Message-ID: <Z+/V5AzsSqY9ALqL@gmail.com> (raw)
In-Reply-To: <CAGudoHG9LWyv7-ZoO_v3W62gXCYQoYujXRQhW7SbMENeydWj=Q@mail.gmail.com>

Hello Mateusz,


On Fri, Apr 04, 2025 at 07:40:45AM +0200, Mateusz Guzik wrote:
> On Thu, Apr 3, 2025 at 8:01 PM Breno Leitao <leitao@debian.org> wrote:
> >
> > When the system is shutting down due to pid 1 exiting, which is common
> > on virtual machines, a warning message is printed.
> >
> >         WARNING: CPU: 0 PID: 1 at kernel/exit.c:897 do_exit+0x7e3/0xab0
> >
> > This occurs because do_exit() is called after kernel_power_off(), which
> > disables interrupts. native_machine_shutdown() expliclty disable
> > interrupt to avoid receiving the timer interrupt, forcing scheduler load
> > balance during the power off phase.
> >
> > This is the simplified code path:
> >
> >         kernel_power_off()
> >           - native_machine_shutdown()
> >                 - local_irq_disable()
> >         do_exit()
> >
> > Modify the warning condition in do_exit() to only trigger the warning if
> > the system is not powering off, since it is expected to have the irq
> > disabled in that case.
> >
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> > ---
> >  kernel/exit.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/exit.c b/kernel/exit.c
> > index 3485e5fc499e4..97ec4f8bfd98f 100644
> > --- a/kernel/exit.c
> > +++ b/kernel/exit.c
> > @@ -878,7 +878,7 @@ void __noreturn do_exit(long code)
> >         struct task_struct *tsk = current;
> >         int group_dead;
> >
> > -       WARN_ON(irqs_disabled());
> > +       WARN_ON(irqs_disabled() && system_state != SYSTEM_POWER_OFF);
> >
> >         synchronize_group_exit(tsk, code);
> >
> >
> 
> Can you share the backtrace?

Sure. Here is the decoded stack from the the latest net-next
0907e7fb35756  ("Add linux-next specific files for 20250117")

	[  254.466712] ACPI: PM: Preparing to enter system sleep state S5
	[  254.474273] reboot: Power down
	[  254.479332] ------------[ cut here ]------------
	[  254.479934] WARNING: CPU: 0 PID: 1 at kernel/exit.c:881 do_exit (kernel/exit.c:881)
	[  254.480597] Modules linked in: evdev(E) serio_raw(E) button(E) virtio_mmio(E) 9pnet_virtio(E) 9p(E) 9pnet(E) netfs(E)
	[  254.483163] Tainted: [E]=UNSIGNED_MODULE, [N]=TEST
	[  254.483736] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
	[  254.484912] RIP: 0010:do_exit (kernel/exit.c:881)
	[ 254.485348] Code: 00 00 45 31 f6 f7 c3 00 02 00 00 41 0f 94 c6 48 c7 c7 48 8b 5d 87 44 89 f6 31 d2 31 c9 e8 c8 c5 41 00 f7 c3 00 02 00 00 75 02 <0f> 0b 48 c7 c7 78 8b 5d 87 44 89 f6 31 d2 31 c9 e8 ab c5 41 00 48
	All code
	========
	0:   00 00                   add    %al,(%rax)
	2:   45 31 f6                xor    %r14d,%r14d
	5:   f7 c3 00 02 00 00       test   $0x200,%ebx
	b:   41 0f 94 c6             sete   %r14b
	f:   48 c7 c7 48 8b 5d 87    mov    $0xffffffff875d8b48,%rdi
	16:   44 89 f6                mov    %r14d,%esi
	19:   31 d2                   xor    %edx,%edx
	1b:   31 c9                   xor    %ecx,%ecx
	1d:   e8 c8 c5 41 00          call   0x41c5ea
	22:   f7 c3 00 02 00 00       test   $0x200,%ebx
	28:   75 02                   jne    0x2c
	2a:*  0f 0b                   ud2             <-- trapping instruction
	2c:   48 c7 c7 78 8b 5d 87    mov    $0xffffffff875d8b78,%rdi
	33:   44 89 f6                mov    %r14d,%esi
	36:   31 d2                   xor    %edx,%edx
	38:   31 c9                   xor    %ecx,%ecx
	3a:   e8 ab c5 41 00          call   0x41c5ea
	3f:   48                      rex.W

	Code starting with the faulting instruction
	===========================================
	0:   0f 0b                   ud2
	2:   48 c7 c7 78 8b 5d 87    mov    $0xffffffff875d8b78,%rdi
	9:   44 89 f6                mov    %r14d,%esi
	c:   31 d2                   xor    %edx,%edx
	e:   31 c9                   xor    %ecx,%ecx
	10:   e8 ab c5 41 00          call   0x41c5c0
	15:   48                      rex.W
	[  254.487377] RSP: 0018:ffa000000001fb80 EFLAGS: 00010046
	[  254.487947] RAX: 49739800a6a6bf00 RBX: 0000000000000016 RCX: 0000000000000000
	[  254.488735] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff875d8b48
	[  254.489545] RBP: ffa000000001fd10 R08: dffffc0000000000 R09: 1ffffffff13312b6
	[  254.490406] R10: dffffc0000000000 R11: fffffbfff13312b7 R12: 000000004321fedc
	[  254.491330] R13: dffffc0000000000 R14: 0000000000000001 R15: dffffc0000000000
	[  254.492200] FS:  00007f658bad6780(0000) GS:ff110004c6000000(0000) knlGS:0000000000000000
	[  254.493005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[  254.493667] CR2: 00007f7769811000 CR3: 000000010eebe004 CR4: 0000000000771ef0
	[  254.494555] PKRU: 55555554
	[  254.495006] Call Trace:
	[  254.495305]  <TASK>
	[  254.495609] ? __warn (kernel/panic.c:242 kernel/panic.c:748)
	[  254.496046] ? do_exit (kernel/exit.c:881)
	[  254.496675] ? do_exit (kernel/exit.c:881)
	[  254.497124] ? report_bug (lib/bug.c:? lib/bug.c:219)
	[  254.497564] ? handle_bug (arch/x86/kernel/traps.c:285)
	[  254.497984] ? exc_invalid_op (arch/x86/kernel/traps.c:309)
	[  254.498394] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621)
	[  254.498874] ? do_exit (kernel/exit.c:881)
	[  254.499355] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:182)
	[  254.499892] ? __rcu_read_unlock (kernel/rcu/tree_plugin.h:445)
	[  254.500311] ? atomic_notifier_call_chain (./include/linux/rcupdate.h:337 ./include/linux/rcupdate.h:849 kernel/notifier.c:222)
	[  254.500864] ? __pfx_do_exit (kernel/exit.c:877)
	[  254.501302] ? __pfx_ftrace_likely_update (kernel/trace/trace_branch.c:203)
	[  254.501825] ? native_machine_shutdown (arch/x86/kernel/reboot.c:765)
	[  254.502380] ? atomic_notifier_call_chain (./include/linux/rcupdate.h:337 ./include/linux/rcupdate.h:849 kernel/notifier.c:222)
	[  254.502984] __x64_sys_reboot (kernel/reboot.c:?)
	[  254.503407] ? __pfx___x64_sys_reboot (kernel/reboot.c:722)
	[  254.503952] ? __pfx_ftrace_likely_update (kernel/trace/trace_branch.c:203)
	[  254.504527] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:182)
	[  254.505067] ? __pfx_ftrace_likely_update (kernel/trace/trace_branch.c:203)
	[  254.505599] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:182)
	[  254.506159] ? __pfx_ftrace_likely_update (kernel/trace/trace_branch.c:203)
	[  254.506712] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:182)
	[  254.507263] ? do_syscall_64 (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./include/linux/entry-common.h:198 arch/x86/entry/common.c:79)
	[  254.507865] do_syscall_64 (arch/x86/entry/common.c:83)
	[  254.508317] ? exc_page_fault (arch/x86/mm/fault.c:1542)
	[  254.508781] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
	[  254.509315] RIP: 0033:0x7f658b904a27
	[ 254.509879] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 89 fa be 69 19 12 28 bf ad de e1 fe b8 a9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 c1 43 0f 00 f7 d8 64 89 02 b8
	All code
	========
	0:   64 89 01                mov    %eax,%fs:(%rcx)
	3:   48 83 c8 ff             or     $0xffffffffffffffff,%rax
	7:   c3                      ret
	8:   66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
	f:   00 00 00
	12:   90                      nop
	13:   f3 0f 1e fa             endbr64
	17:   89 fa                   mov    %edi,%edx
	19:   be 69 19 12 28          mov    $0x28121969,%esi
	1e:   bf ad de e1 fe          mov    $0xfee1dead,%edi
	23:   b8 a9 00 00 00          mov    $0xa9,%eax
	28:   0f 05                   syscall
	2a:*  48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax         <-- trapping instruction
	30:   77 01                   ja     0x33
	32:   c3                      ret
	33:   48 8b 15 c1 43 0f 00    mov    0xf43c1(%rip),%rdx        # 0xf43fb
	3a:   f7 d8                   neg    %eax
	3c:   64 89 02                mov    %eax,%fs:(%rdx)
	3f:   b8                      .byte 0xb8

	Code starting with the faulting instruction
	===========================================
	0:   48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax
	6:   77 01                   ja     0x9
	8:   c3                      ret
	9:   48 8b 15 c1 43 0f 00    mov    0xf43c1(%rip),%rdx        # 0xf43d1
	10:   f7 d8                   neg    %eax
	12:   64 89 02                mov    %eax,%fs:(%rdx)
	15:   b8                      .byte 0xb8
	[  254.511720] RSP: 002b:00007ffec6b608c8 EFLAGS: 00000217 ORIG_RAX: 00000000000000a9
	[  254.512535] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f658b904a27
	[  254.513332] RDX: 000000004321fedc RSI: 0000000028121969 RDI: 00000000fee1dead
	[  254.514243] RBP: 000000000000000a R08: 000055b7f5e58690 R09: 0000000000000000
	[  254.515058] R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000000011
	[  254.515833] R13: 00007ffec6b60c88 R14: 000055b7f5e3828c R15: 000000000000000a
	[  254.516702]  </TASK>
	[  254.517021] irq event stamp: 1438412
	[  254.517510] hardirqs last enabled at (1438411): _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:26 ./arch/x86/include/asm/irqflags.h:87 ./arch/x86/include/asm/irqflags.h:147 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
	[  254.518554] hardirqs last disabled at (1438412): native_machine_shutdown (arch/x86/kernel/reboot.c:?)
	[  254.519466] softirqs last enabled at (1438324): handle_softirqs (./arch/x86/include/asm/preempt.h:26 kernel/softirq.c:408 kernel/softirq.c:589)
	[  254.520398] softirqs last disabled at (1438303): __irq_exit_rcu (./arch/x86/include/asm/jump_label.h:36 kernel/softirq.c:664)


> Note first thing synchronize_group_exit() is going to do is cycle
> through an irq-protected lock, so by the time it unlocks irqs are
> enabled again.

When pid=1 is being killed, then synchronize_group_exit() will be called
with irq enabled (as shown by the warning above), and
synchronize_group_exit()->spin_unlock_irq() will restore the interrupt
(once it got disabled in spin_lock_irq() pair).

On the other side, if irqs are disabled when synchronize_group_exit() is
called, then synchronize_group_exit->spin_unlock_irq() will not enable
the interrupts, right?

Am I following your comment properly?

> Preferably whatever the code path which ends up here would sort it out.
> 
> If that's not feasible, I think this warrants a comment above the warn.

Sorry, I am not sure what you meant by this. Would you mind rephrasing?

Thanks for the review,
--breno

next prev parent reply	other threads:[~2025-04-04 12:52 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-03 18:01 [PATCH] exit: skip IRQ disabled warning during power off Breno Leitao
2025-04-04  5:40 ` Mateusz Guzik
2025-04-04 12:51   ` Breno Leitao [this message]
2025-04-04 14:16     ` Oleg Nesterov
2025-04-04 15:14       ` Breno Leitao
2025-04-04 15:31         ` Oleg Nesterov
2025-05-11  4:43           ` Andrew Morton
2025-04-04 14:20     ` Mateusz Guzik
2025-04-04 15:08       ` Breno Leitao
2025-04-04 15:24       ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z+/V5AzsSqY9ALqL@gmail.com \
    --to=leitao@debian.org \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=joel.granados@kernel.org \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mjguzik@gmail.com \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.