linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] lockdep: add lockdep_cleanup_dead_cpu()
@ 2023-10-28 11:14 David Woodhouse
  2023-10-28 14:13 ` Thomas Gleixner
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: David Woodhouse @ 2023-10-28 11:14 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Will Deacon,
	Waiman Long, Boqun Feng, linux-kernel
  Cc: Juergen Gross

[-- Attachment #1: Type: text/plain, Size: 3815 bytes --]

From: David Woodhouse <dwmw@amazon.co.uk>

Add a function to check that an offlone CPU left the tracing infrastructure
in a sane state. The acpi_idle_play_dead() function was recently observed
calling safe_halt() instead of raw_safe_halt(), which had the side-effect
of setting the hardirqs_enabled flag for the offline CPU. On x86 this
triggered lockdep warnings when the CPU came back online, but too early
for the exception to be handled correctly, leading to a triple-fault.

Add lockdep_cleanup_dead_cpu() to check for this kind of failure mode,
print the events leading up to it, and correct it so that the CPU can
come online again correctly.

[   61.556652] smpboot: CPU 1 is now offline
[   61.556769] CPU 1 left hardirqs enabled!
[   61.556915] irq event stamp: 128149
[   61.556965] hardirqs last  enabled at (128149): [<ffffffff81720a36>] acpi_idle_play_dead+0x46/0x70
[   61.557055] hardirqs last disabled at (128148): [<ffffffff81124d50>] do_idle+0x90/0xe0
[   61.557117] softirqs last  enabled at (128078): [<ffffffff81cec74c>] __do_softirq+0x31c/0x423
[   61.557199] softirqs last disabled at (128065): [<ffffffff810baae1>] __irq_exit_rcu+0x91/0x100

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---

This would have saved us a lot of debugging in the last week.

 include/linux/irqflags.h |  4 ++++
 kernel/cpu.c             |  1 +
 kernel/locking/lockdep.c | 21 +++++++++++++++++++++
 3 files changed, 26 insertions(+)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 2b665c32f5fe..547ed55fc7ff 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -24,12 +24,16 @@
   extern void lockdep_hardirqs_on_prepare(void);
   extern void lockdep_hardirqs_on(unsigned long ip);
   extern void lockdep_hardirqs_off(unsigned long ip);
+  extern void lockdep_cleanup_dead_cpu(unsigned int cpu,
+				       struct task_struct *idle);
 #else
   static inline void lockdep_softirqs_on(unsigned long ip) { }
   static inline void lockdep_softirqs_off(unsigned long ip) { }
   static inline void lockdep_hardirqs_on_prepare(void) { }
   static inline void lockdep_hardirqs_on(unsigned long ip) { }
   static inline void lockdep_hardirqs_off(unsigned long ip) { }
+  static inline void lockdep_cleanup_dead_cpu(unsigned int cpu,
+					      struct task_struct *idle) {}
 #endif
 
 #ifdef CONFIG_TRACE_IRQFLAGS
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 6de7c6bb74ee..225f5bc3708f 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1371,6 +1371,7 @@ static int takedown_cpu(unsigned int cpu)
 
 	cpuhp_bp_sync_dead(cpu);
 
+	lockdep_cleanup_dead_cpu(cpu, idle_thread_get(cpu));
 	tick_cleanup_dead_cpu(cpu);
 	rcutree_migrate_callbacks(cpu);
 	return 0;
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index e85b5ad3e206..a643b29da34d 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4538,6 +4538,27 @@ void lockdep_softirqs_off(unsigned long ip)
 		debug_atomic_inc(redundant_softirqs_off);
 }
 
+/**
+ * lockdep_cleanup_dead_cpu - Ensure CPU lockdep state is cleanly stopped
+ *
+ * Invoked after the CPU is dead. Ensures that the tracing infrastructure
+ * is left in a suitable state for the CPU to be subsequently brought
+ * online again.
+ */
+void lockdep_cleanup_dead_cpu(unsigned int cpu, struct task_struct *idle)
+{
+	if (unlikely(!debug_locks))
+		return;
+
+	if (unlikely(per_cpu(hardirqs_enabled, cpu))) {
+		pr_warn("CPU %u left hardirqs enabled!", cpu);
+		if (idle)
+			print_irqtrace_events(idle);
+		/* Clean it up for when the CPU comes online again. */
+		per_cpu(hardirqs_enabled, cpu) = 0;
+	}
+}
+
 static int
 mark_usage(struct task_struct *curr, struct held_lock *hlock, int check)
 {
-- 
2.41.0



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-09-26 15:38 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-28 11:14 [PATCH] lockdep: add lockdep_cleanup_dead_cpu() David Woodhouse
2023-10-28 14:13 ` Thomas Gleixner
2023-10-28 17:14 ` kernel test robot
2023-10-28 19:24 ` [PATCH v2] " David Woodhouse
2023-10-29 10:05   ` kernel test robot
2023-10-29 17:33   ` Thomas Gleixner
2023-10-29 17:47     ` David Woodhouse
2023-10-30  8:45   ` [PATCH v3] " David Woodhouse
2024-09-24 14:20     ` David Woodhouse
2024-09-26 12:09       ` Boqun Feng
2024-09-26 12:16         ` David Woodhouse
2024-09-26 12:34           ` Boqun Feng
2024-09-26 14:34             ` David Woodhouse
2024-09-26 15:13               ` Boqun Feng
2024-09-26 15:38                 ` David Woodhouse
2023-10-30 16:37   ` [PATCH v2] " kernel test robot
2023-10-30 11:17 ` [PATCH] " Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).