From: David Woodhouse <dwmw2@infradead.org>
To: Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
Waiman Long <longman@redhat.com>,
Boqun Feng <boqun.feng@gmail.com>,
linux-kernel <linux-kernel@vger.kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Subject: [PATCH v2] lockdep: add lockdep_cleanup_dead_cpu()
Date: Sat, 28 Oct 2023 20:24:45 +0100 [thread overview]
Message-ID: <635fa006e8f3816b4a36b964d6281f0d8efa789b.camel@infradead.org> (raw)
In-Reply-To: <e5ba02138c31da60daf91ce505ac3860d022332b.camel@infradead.org>
[-- Attachment #1: Type: text/plain, Size: 4253 bytes --]
From: David Woodhouse <dwmw@amazon.co.uk>
Add a function to check that an offline CPU left the tracing infrastructure
in a sane state. The acpi_idle_play_dead() function was recently observed¹
calling safe_halt() instead of raw_safe_halt(), which had the side-effect
of setting the hardirqs_enabled flag for the offline CPU. On x86 this
triggered lockdep warnings when the CPU came back online, but too early
for the exception to be handled correctly, leading to a triple-fault.
Add lockdep_cleanup_dead_cpu() to check for this kind of failure mode,
print the events leading up to it, and correct it so that the CPU can
come online again correctly.
[ 61.556652] smpboot: CPU 1 is now offline
[ 61.556769] CPU 1 left hardirqs enabled!
[ 61.556915] irq event stamp: 128149
[ 61.556965] hardirqs last enabled at (128149): [<ffffffff81720a36>] acpi_idle_play_dead+0x46/0x70
[ 61.557055] hardirqs last disabled at (128148): [<ffffffff81124d50>] do_idle+0x90/0xe0
[ 61.557117] softirqs last enabled at (128078): [<ffffffff81cec74c>] __do_softirq+0x31c/0x423
[ 61.557199] softirqs last disabled at (128065): [<ffffffff810baae1>] __irq_exit_rcu+0x91/0x100
¹ https://lore.kernel.org/lkml/a079bba5a0e47d6534b307553fc3772d26ce911b.camel@infradead.org/
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
---
v2: Fix spelling. 'Offlone' wasn't quite what I meant to type.
Add reference to ACPI patch.
Fix kerneldoc args for lockdep_cleanup_dead_cpu() (thanks lkp)
Closes: https://lore.kernel.org/oe-kbuild-all/202310290041.L5ndwcQ9-lkp@intel.com/
include/linux/irqflags.h | 4 ++++
kernel/cpu.c | 1 +
kernel/locking/lockdep.c | 24 ++++++++++++++++++++++++
3 files changed, 29 insertions(+)
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 2b665c32f5fe..547ed55fc7ff 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -24,12 +24,16 @@
extern void lockdep_hardirqs_on_prepare(void);
extern void lockdep_hardirqs_on(unsigned long ip);
extern void lockdep_hardirqs_off(unsigned long ip);
+ extern void lockdep_cleanup_dead_cpu(unsigned int cpu,
+ struct task_struct *idle);
#else
static inline void lockdep_softirqs_on(unsigned long ip) { }
static inline void lockdep_softirqs_off(unsigned long ip) { }
static inline void lockdep_hardirqs_on_prepare(void) { }
static inline void lockdep_hardirqs_on(unsigned long ip) { }
static inline void lockdep_hardirqs_off(unsigned long ip) { }
+ static inline void lockdep_cleanup_dead_cpu(unsigned int cpu,
+ struct task_struct *idle) {}
#endif
#ifdef CONFIG_TRACE_IRQFLAGS
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 6de7c6bb74ee..225f5bc3708f 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1371,6 +1371,7 @@ static int takedown_cpu(unsigned int cpu)
cpuhp_bp_sync_dead(cpu);
+ lockdep_cleanup_dead_cpu(cpu, idle_thread_get(cpu));
tick_cleanup_dead_cpu(cpu);
rcutree_migrate_callbacks(cpu);
return 0;
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index e85b5ad3e206..62bfda8991b8 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4538,6 +4538,30 @@ void lockdep_softirqs_off(unsigned long ip)
debug_atomic_inc(redundant_softirqs_off);
}
+/**
+ * lockdep_cleanup_dead_cpu - Ensure CPU lockdep state is cleanly stopped
+ *
+ * @cpu: index of offlined CPU
+ * @idle: task pointer for offlined CPU's idle thread
+ *
+ * Invoked after the CPU is dead. Ensures that the tracing infrastructure
+ * is left in a suitable state for the CPU to be subsequently brought
+ * online again.
+ */
+void lockdep_cleanup_dead_cpu(unsigned int cpu, struct task_struct *idle)
+{
+ if (unlikely(!debug_locks))
+ return;
+
+ if (unlikely(per_cpu(hardirqs_enabled, cpu))) {
+ pr_warn("CPU %u left hardirqs enabled!", cpu);
+ if (idle)
+ print_irqtrace_events(idle);
+ /* Clean it up for when the CPU comes online again. */
+ per_cpu(hardirqs_enabled, cpu) = 0;
+ }
+}
+
static int
mark_usage(struct task_struct *curr, struct held_lock *hlock, int check)
{
--
2.41.0
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]
next prev parent reply other threads:[~2023-10-28 19:25 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-28 11:14 [PATCH] lockdep: add lockdep_cleanup_dead_cpu() David Woodhouse
2023-10-28 14:13 ` Thomas Gleixner
2023-10-28 17:14 ` kernel test robot
2023-10-28 19:24 ` David Woodhouse [this message]
2023-10-29 10:05 ` [PATCH v2] " kernel test robot
2023-10-29 17:33 ` Thomas Gleixner
2023-10-29 17:47 ` David Woodhouse
2023-10-30 8:45 ` [PATCH v3] " David Woodhouse
2024-09-24 14:20 ` David Woodhouse
2024-09-26 12:09 ` Boqun Feng
2024-09-26 12:16 ` David Woodhouse
2024-09-26 12:34 ` Boqun Feng
2024-09-26 14:34 ` David Woodhouse
2024-09-26 15:13 ` Boqun Feng
2024-09-26 15:38 ` David Woodhouse
2023-10-30 16:37 ` [PATCH v2] " kernel test robot
2023-10-30 11:17 ` [PATCH] " Peter Zijlstra
-- strict thread matches above, loose matches on Subject: below --
2024-09-26 15:17 [PATCH v2] " David Woodhouse
2024-09-26 16:09 ` Boqun Feng
2024-09-26 16:37 ` David Woodhouse
2024-09-27 0:45 ` Boqun Feng
2024-09-27 8:45 ` David Woodhouse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=635fa006e8f3816b4a36b964d6281f0d8efa789b.camel@infradead.org \
--to=dwmw2@infradead.org \
--cc=boqun.feng@gmail.com \
--cc=jgross@suse.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).