* [PATCH] x86/virt: Fix RCU lockdep splat in emergency virt callback path
@ 2026-05-03 17:45 Mikhail Gavrilov
2026-05-04 17:48 ` Sean Christopherson
2026-05-04 23:54 ` [PATCH v2] x86/virt: Silence " Mikhail Gavrilov
0 siblings, 2 replies; 6+ messages in thread
From: Mikhail Gavrilov @ 2026-05-03 17:45 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H . Peter Anvin, Dan Williams, Chao Gao, x86, kvm, linux-kernel,
Mikhail Gavrilov
x86_virt_invoke_kvm_emergency_callback() reaches rcu_dereference()
through machine_crash_shutdown() with IRQs disabled but with RCU not
necessarily watching, which triggers a suspicious RCU usage splat on
debug kernels (CONFIG_PROVE_RCU=y) during panic/kdump:
WARNING: suspicious RCU usage
arch/x86/virt/hw.c:52 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by tee/11119:
#0: ffff8881fa32c440 (sb_writers#3){.+.+}-{0:0}, at: ksys_write
Call Trace:
<TASK>
dump_stack_lvl+0x84/0xd0
lockdep_rcu_suspicious.cold+0x37/0x8f
x86_virt_invoke_kvm_emergency_callback+0x5f/0x70
x86_svm_emergency_disable_virtualization_cpu+0x2a/0x30
x86_virt_emergency_disable_virtualization_cpu+0x6b/0x90
native_machine_crash_shutdown+0x72/0x170
__crash_kexec+0x137/0x280
panic+0xce/0xd0
sysrq_handle_crash+0x1f/0x20
__handle_sysrq.cold+0x192/0x335
write_sysrq_trigger+0x8c/0xc0
proc_reg_write+0x1c3/0x3c0
vfs_write+0x1d0/0xf80
ksys_write+0x116/0x250
do_syscall_64+0x11c/0x1480
entry_SYSCALL_64_after_hwframe+0x76/0x7e
</TASK>
The RCU usage is correct: writers
(x86_virt_{register,unregister}_emergency_callback()) serialize via
rcu_assign_pointer() + synchronize_rcu(), while the reader on the
emergency path runs with IRQs disabled (the only caller is
x86_virt_emergency_disable_virtualization_cpu(), which has
lockdep_assert_irqs_disabled()), which is a valid classic-RCU read-side
critical section.
Use rcu_dereference_check() with irqs_disabled() to silence the splat
without weakening the protection.
Reproducible on a debug kernel (CONFIG_PROVE_LOCKING=y, CONFIG_PROVE_RCU=y)
with kvm_amd or kvm_intel loaded by triggering kdump:
echo c > /proc/sysrq-trigger
Fixes: 428afac5a8ea ("KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystem")
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
---
arch/x86/virt/hw.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/x86/virt/hw.c b/arch/x86/virt/hw.c
index f647557d38ac..57eebc99299d 100644
--- a/arch/x86/virt/hw.c
+++ b/arch/x86/virt/hw.c
@@ -49,7 +49,13 @@ static void x86_virt_invoke_kvm_emergency_callback(void)
{
cpu_emergency_virt_cb *kvm_callback;
- kvm_callback = rcu_dereference(kvm_emergency_callback);
+ /*
+ * Callers invoke this with IRQs disabled (see
+ * x86_virt_emergency_disable_virtualization_cpu()), which is a valid
+ * RCU read-side critical section. Tell lockdep so it doesn't complain
+ * during panic/reboot paths.
+ */
+ kvm_callback = rcu_dereference_check(kvm_emergency_callback, irqs_disabled());
if (kvm_callback)
kvm_callback();
}
--
2.54.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH] x86/virt: Fix RCU lockdep splat in emergency virt callback path
2026-05-03 17:45 [PATCH] x86/virt: Fix RCU lockdep splat in emergency virt callback path Mikhail Gavrilov
@ 2026-05-04 17:48 ` Sean Christopherson
2026-05-04 18:50 ` Mikhail Gavrilov
2026-05-04 23:54 ` [PATCH v2] x86/virt: Silence " Mikhail Gavrilov
1 sibling, 1 reply; 6+ messages in thread
From: Sean Christopherson @ 2026-05-04 17:48 UTC (permalink / raw)
To: Mikhail Gavrilov
Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, Dan Williams, Chao Gao, x86, kvm,
linux-kernel
On Sun, May 03, 2026, Mikhail Gavrilov wrote:
> x86_virt_invoke_kvm_emergency_callback() reaches rcu_dereference()
> through machine_crash_shutdown() with IRQs disabled but with RCU not
> necessarily watching, which triggers a suspicious RCU usage splat on
> debug kernels (CONFIG_PROVE_RCU=y) during panic/kdump:
>
> WARNING: suspicious RCU usage
> arch/x86/virt/hw.c:52 suspicious rcu_dereference_check() usage!
>
> other info that might help us debug this:
>
> rcu_scheduler_active = 2, debug_locks = 1
> 1 lock held by tee/11119:
> #0: ffff8881fa32c440 (sb_writers#3){.+.+}-{0:0}, at: ksys_write
>
> Call Trace:
> <TASK>
> dump_stack_lvl+0x84/0xd0
> lockdep_rcu_suspicious.cold+0x37/0x8f
> x86_virt_invoke_kvm_emergency_callback+0x5f/0x70
> x86_svm_emergency_disable_virtualization_cpu+0x2a/0x30
> x86_virt_emergency_disable_virtualization_cpu+0x6b/0x90
> native_machine_crash_shutdown+0x72/0x170
> __crash_kexec+0x137/0x280
> panic+0xce/0xd0
> sysrq_handle_crash+0x1f/0x20
> __handle_sysrq.cold+0x192/0x335
> write_sysrq_trigger+0x8c/0xc0
> proc_reg_write+0x1c3/0x3c0
> vfs_write+0x1d0/0xf80
> ksys_write+0x116/0x250
> do_syscall_64+0x11c/0x1480
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> </TASK>
>
> The RCU usage is correct: writers
> (x86_virt_{register,unregister}_emergency_callback()) serialize via
> rcu_assign_pointer() + synchronize_rcu(), while the reader on the
> emergency path runs with IRQs disabled (the only caller is
> x86_virt_emergency_disable_virtualization_cpu(), which has
> lockdep_assert_irqs_disabled()), which is a valid classic-RCU read-side
> critical section.
>
> Use rcu_dereference_check() with irqs_disabled() to silence the splat
> without weakening the protection.
>
> Reproducible on a debug kernel (CONFIG_PROVE_LOCKING=y, CONFIG_PROVE_RCU=y)
> with kvm_amd or kvm_intel loaded by triggering kdump:
>
> echo c > /proc/sysrq-trigger
>
> Fixes: 428afac5a8ea ("KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystem")
> Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
> ---
> arch/x86/virt/hw.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/virt/hw.c b/arch/x86/virt/hw.c
> index f647557d38ac..57eebc99299d 100644
> --- a/arch/x86/virt/hw.c
> +++ b/arch/x86/virt/hw.c
> @@ -49,7 +49,13 @@ static void x86_virt_invoke_kvm_emergency_callback(void)
> {
> cpu_emergency_virt_cb *kvm_callback;
>
> - kvm_callback = rcu_dereference(kvm_emergency_callback);
> + /*
> + * Callers invoke this with IRQs disabled (see
> + * x86_virt_emergency_disable_virtualization_cpu()), which is a valid
> + * RCU read-side critical section. Tell lockdep so it doesn't complain
> + * during panic/reboot paths.
> + */
> + kvm_callback = rcu_dereference_check(kvm_emergency_callback, irqs_disabled());
This feels wrong. If RCU truly isn't watching this CPU, then isn't RCU allowed
to ignore this CPU when synchronizing?
> if (kvm_callback)
> kvm_callback();
> }
> --
> 2.54.0
>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] x86/virt: Fix RCU lockdep splat in emergency virt callback path
2026-05-04 17:48 ` Sean Christopherson
@ 2026-05-04 18:50 ` Mikhail Gavrilov
2026-05-04 21:40 ` Mikhail Gavrilov
0 siblings, 1 reply; 6+ messages in thread
From: Mikhail Gavrilov @ 2026-05-04 18:50 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, Dan Williams, Chao Gao, x86, kvm,
linux-kernel
On Mon, May 4, 2026 at 10:48 PM Sean Christopherson <seanjc@google.com> wrote:
>
> This feels wrong. If RCU truly isn't watching this CPU, then isn't RCU allowed
> to ignore this CPU when synchronizing?
>
You're correct that irqs_disabled() doesn't imply RCU is watching, and
in the general case that would be a real concern. However, on the
emergency virt callback path the practical situation is narrower:
1. The reader (x86_virt_invoke_kvm_emergency_callback) only runs from
panic/kexec/reboot via x86_virt_emergency_disable_virtualization_cpu()
and machine_crash_shutdown().
2. The writer (x86_virt_unregister_emergency_callback) calls
synchronize_rcu(), which would observe an RCU read-side critical
section started by rcu_read_lock(). But on the panic path we don't
have rcu_read_lock() — we just have IRQs disabled. So even with my
patch, a concurrent unregister could in principle free the callback
out from under us.
3. In practice, the writer can only run from KVM module unload. By
the time we're in panic context, all CPUs except the crashing one
have been NMI'd into x86_svm_emergency_disable_virtualization_cpu
too — a kvm_amd unload happening concurrently with panic seems
extraordinarily unlikely, and the system is going down regardless.
So the splat is technically a real issue, but the underlying race is
already so vanishingly small that I'm not sure what the right fix
shape is. Some options:
a) Treat this as "panic context can't be RCU-correct anyway" and
use rcu_dereference_raw() with a comment.
b) Convert kvm_emergency_callback away from RCU (it's only set/cleared
once per KVM module lifetime; a regular pointer with smp_store_release/
smp_load_acquire would suffice).
c) Keep my patch but document that it's a minor lockdep silencer for
a path where the use-after-free window is closed by other means
(panic-time module unload being unrealistic).
What direction would you prefer? I'm happy to spin v2 as needed.
--
Best Regards,
Mike Gavrilov.
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] x86/virt: Fix RCU lockdep splat in emergency virt callback path
2026-05-04 18:50 ` Mikhail Gavrilov
@ 2026-05-04 21:40 ` Mikhail Gavrilov
2026-05-04 23:03 ` Sean Christopherson
0 siblings, 1 reply; 6+ messages in thread
From: Mikhail Gavrilov @ 2026-05-04 21:40 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, Dan Williams, Chao Gao, x86, kvm,
linux-kernel
On Mon, May 4, 2026 at 11:50 PM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> What direction would you prefer? I'm happy to spin v2 as needed.
>
After looking at how other places in the kernel handle this — kernel/notifier.c,
kernel/cgroup/cgroup.c, kernel/fork.c, kernel/sched/fair.c all use
rcu_dereference_raw() when the caller has context-specific knowledge that
makes lockdep checks inappropriate.
I'll send v2 using rcu_dereference_raw() with a comment explaining the
panic-context reasoning. The diff would look like:
/*
* The crashing CPU may be outside RCU's watching set in panic context.
* Use rcu_dereference_raw() to avoid lockdep complaints — the writers
* (KVM module load/unload) cannot run during emergency virt callback
* invocation, so the pointer is effectively stable here.
*/
kvm_callback = rcu_dereference_raw(kvm_emergency_callback);
Let me know if you'd prefer a different approach (option (b) from my
previous mail — converting away from RCU entirely — is a bigger change
but I can do that instead).
--
Best Regards,
Mike Gavrilov.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/virt: Fix RCU lockdep splat in emergency virt callback path
2026-05-04 21:40 ` Mikhail Gavrilov
@ 2026-05-04 23:03 ` Sean Christopherson
0 siblings, 0 replies; 6+ messages in thread
From: Sean Christopherson @ 2026-05-04 23:03 UTC (permalink / raw)
To: Mikhail Gavrilov
Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, Dan Williams, Chao Gao, x86, kvm,
linux-kernel
On Tue, May 05, 2026, Mikhail Gavrilov wrote:
> On Mon, May 4, 2026 at 11:50 PM Mikhail Gavrilov
> <mikhail.v.gavrilov@gmail.com> wrote:
> >
> > What direction would you prefer? I'm happy to spin v2 as needed.
> >
>
> After looking at how other places in the kernel handle this — kernel/notifier.c,
> kernel/cgroup/cgroup.c, kernel/fork.c, kernel/sched/fair.c all use
> rcu_dereference_raw() when the caller has context-specific knowledge that
> makes lockdep checks inappropriate.
>
> I'll send v2 using rcu_dereference_raw() with a comment explaining the
> panic-context reasoning. The diff would look like:
>
> /*
> * The crashing CPU may be outside RCU's watching set in panic context.
> * Use rcu_dereference_raw() to avoid lockdep complaints — the writers
> * (KVM module load/unload) cannot run during emergency virt callback
> * invocation, so the pointer is effectively stable here.
AFAIK, nothing actually prevents module unload when the kernel is panicking and/or
rebooting. E.g. see commit 2baa33a8ddd6 ("KVM: x86: Leave user-return notifier
registered on reboot/shutdown").
> */
> kvm_callback = rcu_dereference_raw(kvm_emergency_callback);
>
> Let me know if you'd prefer a different approach (option (b) from my
> previous mail — converting away from RCU entirely — is a bigger change
> but I can do that instead).
For "normal" usage, if there really is even such a thing for this case,
smp_store_release() / smp_load_acquire() won't suffice, because the kernel needs
to ensure the module text isn't freed while the callback is in-flight.
But as you noted before, if the kernel is panicking, (a) the window for anything
to go wrong is comically small, and (b) at some point the kernel _can't_ guarantee
that everything will be "fine". So I'd probably be ok with just sweeping this
under the rug? Assuming we can't come up with an easy-ish solution that doesn't
require taking locks (which to me, would have a higher probability of causing
problems).
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2] x86/virt: Silence RCU lockdep splat in emergency virt callback path
2026-05-03 17:45 [PATCH] x86/virt: Fix RCU lockdep splat in emergency virt callback path Mikhail Gavrilov
2026-05-04 17:48 ` Sean Christopherson
@ 2026-05-04 23:54 ` Mikhail Gavrilov
1 sibling, 0 replies; 6+ messages in thread
From: Mikhail Gavrilov @ 2026-05-04 23:54 UTC (permalink / raw)
To: seanjc, pbonzini
Cc: tglx, mingo, bp, dave.hansen, hpa, djbw, chao.gao, x86, kvm,
linux-kernel, Mikhail Gavrilov
x86_virt_invoke_kvm_emergency_callback() reaches rcu_dereference()
through machine_crash_shutdown() with IRQs disabled but with RCU not
necessarily watching the crashing CPU, which triggers a suspicious
RCU usage splat on debug kernels (CONFIG_PROVE_RCU=y) during
panic/kdump:
WARNING: suspicious RCU usage
arch/x86/virt/hw.c:52 suspicious rcu_dereference_check() usage!
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by tee/11119:
#0: ffff8881fa32c440 (sb_writers#3){.+.+}-{0:0}, at: ksys_write
Call Trace:
<TASK>
dump_stack_lvl+0x84/0xd0
lockdep_rcu_suspicious.cold+0x37/0x8f
x86_virt_invoke_kvm_emergency_callback+0x5f/0x70
x86_svm_emergency_disable_virtualization_cpu+0x2a/0x30
x86_virt_emergency_disable_virtualization_cpu+0x6b/0x90
native_machine_crash_shutdown+0x72/0x170
__crash_kexec+0x137/0x280
panic+0xce/0xd0
sysrq_handle_crash+0x1f/0x20
__handle_sysrq.cold+0x192/0x335
write_sysrq_trigger+0x8c/0xc0
proc_reg_write+0x1c3/0x3c0
vfs_write+0x1d0/0xf80
ksys_write+0x116/0x250
do_syscall_64+0x11c/0x1480
entry_SYSCALL_64_after_hwframe+0x76/0x7e
</TASK>
A truly correct fix is non-trivial: the RCU usage genuinely is wrong in
panic context (RCU may ignore the crashing CPU during synchronization),
and a concurrent KVM module unload could in principle race with the
callback read; see commit 2baa33a8ddd6 ("KVM: x86: Leave user-return
notifier registered on reboot/shutdown") which notes that nothing
prevents module unload during panic/reboot.
However, the alternatives are worse:
- smp_store_release()/smp_load_acquire() handles ordering but not
liveness; the kernel still needs to keep the module text alive
while the callback is in flight.
- Taking a lock in the panic path is risky — any lock could be held
by a CPU that has already been NMI'd to a halt.
Use rcu_dereference_raw() to silence the splat and accept the
vanishingly small remaining race. Panic context inherently cannot
guarantee complete correctness; the goal here is to keep debug builds
quiet on the kdump path so the splat doesn't obscure the actual
kernel state being captured.
Reproducible on a debug kernel (CONFIG_PROVE_LOCKING=y, CONFIG_PROVE_RCU=y)
with kvm_amd or kvm_intel loaded by triggering kdump:
echo c > /proc/sysrq-trigger
Suggested-by: Sean Christopherson <seanjc@google.com>
Fixes: 428afac5a8ea ("KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystem")
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
---
arch/x86/virt/hw.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/x86/virt/hw.c b/arch/x86/virt/hw.c
index f647557d38ac..7e9091c640be 100644
--- a/arch/x86/virt/hw.c
+++ b/arch/x86/virt/hw.c
@@ -49,7 +49,20 @@ static void x86_virt_invoke_kvm_emergency_callback(void)
{
cpu_emergency_virt_cb *kvm_callback;
- kvm_callback = rcu_dereference(kvm_emergency_callback);
+ /*
+ * RCU may not be watching the crashing CPU here, so rcu_dereference()
+ * triggers a suspicious-RCU-usage splat. In principle, a concurrent
+ * KVM module unload could race with this read; see commit 2baa33a8ddd6
+ * ("KVM: x86: Leave user-return notifier registered on reboot/shutdown")
+ * which notes that nothing prevents module unload during panic/reboot.
+ *
+ * However, taking a lock here would be riskier than the current race:
+ * the system is going down via NMI shootdown, and any lock could be
+ * held by an already-stopped CPU. Use rcu_dereference_raw() to silence
+ * the lockdep splat and accept the comically small remaining race;
+ * panic context inherently cannot guarantee complete correctness.
+ */
+ kvm_callback = rcu_dereference_raw(kvm_emergency_callback);
if (kvm_callback)
kvm_callback();
}
--
2.54.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-05-04 23:55 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-03 17:45 [PATCH] x86/virt: Fix RCU lockdep splat in emergency virt callback path Mikhail Gavrilov
2026-05-04 17:48 ` Sean Christopherson
2026-05-04 18:50 ` Mikhail Gavrilov
2026-05-04 21:40 ` Mikhail Gavrilov
2026-05-04 23:03 ` Sean Christopherson
2026-05-04 23:54 ` [PATCH v2] x86/virt: Silence " Mikhail Gavrilov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox