linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Woodhouse <dwmw2@infradead.org>
To: Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	linux-kernel <linux-kernel@vger.kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Subject: [PATCH v2] lockdep: add lockdep_cleanup_dead_cpu()
Date: Sat, 28 Oct 2023 20:24:45 +0100	[thread overview]
Message-ID: <635fa006e8f3816b4a36b964d6281f0d8efa789b.camel@infradead.org> (raw)
In-Reply-To: <e5ba02138c31da60daf91ce505ac3860d022332b.camel@infradead.org>

[-- Attachment #1: Type: text/plain, Size: 4253 bytes --]

From: David Woodhouse <dwmw@amazon.co.uk>

Add a function to check that an offline CPU left the tracing infrastructure
in a sane state. The acpi_idle_play_dead() function was recently observed¹
calling safe_halt() instead of raw_safe_halt(), which had the side-effect
of setting the hardirqs_enabled flag for the offline CPU. On x86 this
triggered lockdep warnings when the CPU came back online, but too early
for the exception to be handled correctly, leading to a triple-fault.

Add lockdep_cleanup_dead_cpu() to check for this kind of failure mode,
print the events leading up to it, and correct it so that the CPU can
come online again correctly.

[   61.556652] smpboot: CPU 1 is now offline
[   61.556769] CPU 1 left hardirqs enabled!
[   61.556915] irq event stamp: 128149
[   61.556965] hardirqs last  enabled at (128149): [<ffffffff81720a36>] acpi_idle_play_dead+0x46/0x70
[   61.557055] hardirqs last disabled at (128148): [<ffffffff81124d50>] do_idle+0x90/0xe0
[   61.557117] softirqs last  enabled at (128078): [<ffffffff81cec74c>] __do_softirq+0x31c/0x423
[   61.557199] softirqs last disabled at (128065): [<ffffffff810baae1>] __irq_exit_rcu+0x91/0x100

¹ https://lore.kernel.org/lkml/a079bba5a0e47d6534b307553fc3772d26ce911b.camel@infradead.org/

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
---
v2: Fix spelling. 'Offlone' wasn't quite what I meant to type.
    Add reference to ACPI patch.
    Fix kerneldoc args for lockdep_cleanup_dead_cpu() (thanks lkp)
    Closes: https://lore.kernel.org/oe-kbuild-all/202310290041.L5ndwcQ9-lkp@intel.com/


 include/linux/irqflags.h |  4 ++++
 kernel/cpu.c             |  1 +
 kernel/locking/lockdep.c | 24 ++++++++++++++++++++++++
 3 files changed, 29 insertions(+)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 2b665c32f5fe..547ed55fc7ff 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -24,12 +24,16 @@
   extern void lockdep_hardirqs_on_prepare(void);
   extern void lockdep_hardirqs_on(unsigned long ip);
   extern void lockdep_hardirqs_off(unsigned long ip);
+  extern void lockdep_cleanup_dead_cpu(unsigned int cpu,
+				       struct task_struct *idle);
 #else
   static inline void lockdep_softirqs_on(unsigned long ip) { }
   static inline void lockdep_softirqs_off(unsigned long ip) { }
   static inline void lockdep_hardirqs_on_prepare(void) { }
   static inline void lockdep_hardirqs_on(unsigned long ip) { }
   static inline void lockdep_hardirqs_off(unsigned long ip) { }
+  static inline void lockdep_cleanup_dead_cpu(unsigned int cpu,
+					      struct task_struct *idle) {}
 #endif
 
 #ifdef CONFIG_TRACE_IRQFLAGS
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 6de7c6bb74ee..225f5bc3708f 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1371,6 +1371,7 @@ static int takedown_cpu(unsigned int cpu)
 
 	cpuhp_bp_sync_dead(cpu);
 
+	lockdep_cleanup_dead_cpu(cpu, idle_thread_get(cpu));
 	tick_cleanup_dead_cpu(cpu);
 	rcutree_migrate_callbacks(cpu);
 	return 0;
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index e85b5ad3e206..62bfda8991b8 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4538,6 +4538,30 @@ void lockdep_softirqs_off(unsigned long ip)
 		debug_atomic_inc(redundant_softirqs_off);
 }
 
+/**
+ * lockdep_cleanup_dead_cpu - Ensure CPU lockdep state is cleanly stopped
+ *
+ * @cpu: index of offlined CPU
+ * @idle: task pointer for offlined CPU's idle thread
+ *
+ * Invoked after the CPU is dead. Ensures that the tracing infrastructure
+ * is left in a suitable state for the CPU to be subsequently brought
+ * online again.
+ */
+void lockdep_cleanup_dead_cpu(unsigned int cpu, struct task_struct *idle)
+{
+	if (unlikely(!debug_locks))
+		return;
+
+	if (unlikely(per_cpu(hardirqs_enabled, cpu))) {
+		pr_warn("CPU %u left hardirqs enabled!", cpu);
+		if (idle)
+			print_irqtrace_events(idle);
+		/* Clean it up for when the CPU comes online again. */
+		per_cpu(hardirqs_enabled, cpu) = 0;
+	}
+}
+
 static int
 mark_usage(struct task_struct *curr, struct held_lock *hlock, int check)
 {
-- 
2.41.0



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

  parent reply	other threads:[~2023-10-28 19:25 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-28 11:14 [PATCH] lockdep: add lockdep_cleanup_dead_cpu() David Woodhouse
2023-10-28 14:13 ` Thomas Gleixner
2023-10-28 17:14 ` kernel test robot
2023-10-28 19:24 ` David Woodhouse [this message]
2023-10-29 10:05   ` [PATCH v2] " kernel test robot
2023-10-29 17:33   ` Thomas Gleixner
2023-10-29 17:47     ` David Woodhouse
2023-10-30  8:45   ` [PATCH v3] " David Woodhouse
2024-09-24 14:20     ` David Woodhouse
2024-09-26 12:09       ` Boqun Feng
2024-09-26 12:16         ` David Woodhouse
2024-09-26 12:34           ` Boqun Feng
2024-09-26 14:34             ` David Woodhouse
2024-09-26 15:13               ` Boqun Feng
2024-09-26 15:38                 ` David Woodhouse
2023-10-30 16:37   ` [PATCH v2] " kernel test robot
2023-10-30 11:17 ` [PATCH] " Peter Zijlstra
  -- strict thread matches above, loose matches on Subject: below --
2024-09-26 15:17 [PATCH v2] " David Woodhouse
2024-09-26 16:09 ` Boqun Feng
2024-09-26 16:37   ` David Woodhouse
2024-09-27  0:45     ` Boqun Feng
2024-09-27  8:45       ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=635fa006e8f3816b4a36b964d6281f0d8efa789b.camel@infradead.org \
    --to=dwmw2@infradead.org \
    --cc=boqun.feng@gmail.com \
    --cc=jgross@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).