* Re: [PATCH] x86/virt: Fix RCU lockdep splat in emergency virt callback path [not found] <20260503174534.45699-1-mikhail.v.gavrilov@gmail.com> @ 2026-05-04 17:48 ` Sean Christopherson 2026-05-04 18:50 ` Mikhail Gavrilov 2026-05-04 23:54 ` [PATCH v2] x86/virt: Silence " Mikhail Gavrilov 1 sibling, 1 reply; 7+ messages in thread From: Sean Christopherson @ 2026-05-04 17:48 UTC (permalink / raw) To: Mikhail Gavrilov Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, H . Peter Anvin, Dan Williams, Chao Gao, x86, kvm, linux-kernel On Sun, May 03, 2026, Mikhail Gavrilov wrote: > x86_virt_invoke_kvm_emergency_callback() reaches rcu_dereference() > through machine_crash_shutdown() with IRQs disabled but with RCU not > necessarily watching, which triggers a suspicious RCU usage splat on > debug kernels (CONFIG_PROVE_RCU=y) during panic/kdump: > > WARNING: suspicious RCU usage > arch/x86/virt/hw.c:52 suspicious rcu_dereference_check() usage! > > other info that might help us debug this: > > rcu_scheduler_active = 2, debug_locks = 1 > 1 lock held by tee/11119: > #0: ffff8881fa32c440 (sb_writers#3){.+.+}-{0:0}, at: ksys_write > > Call Trace: > <TASK> > dump_stack_lvl+0x84/0xd0 > lockdep_rcu_suspicious.cold+0x37/0x8f > x86_virt_invoke_kvm_emergency_callback+0x5f/0x70 > x86_svm_emergency_disable_virtualization_cpu+0x2a/0x30 > x86_virt_emergency_disable_virtualization_cpu+0x6b/0x90 > native_machine_crash_shutdown+0x72/0x170 > __crash_kexec+0x137/0x280 > panic+0xce/0xd0 > sysrq_handle_crash+0x1f/0x20 > __handle_sysrq.cold+0x192/0x335 > write_sysrq_trigger+0x8c/0xc0 > proc_reg_write+0x1c3/0x3c0 > vfs_write+0x1d0/0xf80 > ksys_write+0x116/0x250 > do_syscall_64+0x11c/0x1480 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > </TASK> > > The RCU usage is correct: writers > (x86_virt_{register,unregister}_emergency_callback()) serialize via > rcu_assign_pointer() + synchronize_rcu(), while the reader on the > emergency path runs with IRQs disabled (the only caller is > x86_virt_emergency_disable_virtualization_cpu(), which has > lockdep_assert_irqs_disabled()), which is a valid classic-RCU read-side > critical section. > > Use rcu_dereference_check() with irqs_disabled() to silence the splat > without weakening the protection. > > Reproducible on a debug kernel (CONFIG_PROVE_LOCKING=y, CONFIG_PROVE_RCU=y) > with kvm_amd or kvm_intel loaded by triggering kdump: > > echo c > /proc/sysrq-trigger > > Fixes: 428afac5a8ea ("KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystem") > Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> > --- > arch/x86/virt/hw.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/virt/hw.c b/arch/x86/virt/hw.c > index f647557d38ac..57eebc99299d 100644 > --- a/arch/x86/virt/hw.c > +++ b/arch/x86/virt/hw.c > @@ -49,7 +49,13 @@ static void x86_virt_invoke_kvm_emergency_callback(void) > { > cpu_emergency_virt_cb *kvm_callback; > > - kvm_callback = rcu_dereference(kvm_emergency_callback); > + /* > + * Callers invoke this with IRQs disabled (see > + * x86_virt_emergency_disable_virtualization_cpu()), which is a valid > + * RCU read-side critical section. Tell lockdep so it doesn't complain > + * during panic/reboot paths. > + */ > + kvm_callback = rcu_dereference_check(kvm_emergency_callback, irqs_disabled()); This feels wrong. If RCU truly isn't watching this CPU, then isn't RCU allowed to ignore this CPU when synchronizing? > if (kvm_callback) > kvm_callback(); > } > -- > 2.54.0 > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] x86/virt: Fix RCU lockdep splat in emergency virt callback path 2026-05-04 17:48 ` [PATCH] x86/virt: Fix RCU lockdep splat in emergency virt callback path Sean Christopherson @ 2026-05-04 18:50 ` Mikhail Gavrilov 2026-05-04 21:40 ` Mikhail Gavrilov 0 siblings, 1 reply; 7+ messages in thread From: Mikhail Gavrilov @ 2026-05-04 18:50 UTC (permalink / raw) To: Sean Christopherson Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, H . Peter Anvin, Dan Williams, Chao Gao, x86, kvm, linux-kernel On Mon, May 4, 2026 at 10:48 PM Sean Christopherson <seanjc@google.com> wrote: > > This feels wrong. If RCU truly isn't watching this CPU, then isn't RCU allowed > to ignore this CPU when synchronizing? > You're correct that irqs_disabled() doesn't imply RCU is watching, and in the general case that would be a real concern. However, on the emergency virt callback path the practical situation is narrower: 1. The reader (x86_virt_invoke_kvm_emergency_callback) only runs from panic/kexec/reboot via x86_virt_emergency_disable_virtualization_cpu() and machine_crash_shutdown(). 2. The writer (x86_virt_unregister_emergency_callback) calls synchronize_rcu(), which would observe an RCU read-side critical section started by rcu_read_lock(). But on the panic path we don't have rcu_read_lock() — we just have IRQs disabled. So even with my patch, a concurrent unregister could in principle free the callback out from under us. 3. In practice, the writer can only run from KVM module unload. By the time we're in panic context, all CPUs except the crashing one have been NMI'd into x86_svm_emergency_disable_virtualization_cpu too — a kvm_amd unload happening concurrently with panic seems extraordinarily unlikely, and the system is going down regardless. So the splat is technically a real issue, but the underlying race is already so vanishingly small that I'm not sure what the right fix shape is. Some options: a) Treat this as "panic context can't be RCU-correct anyway" and use rcu_dereference_raw() with a comment. b) Convert kvm_emergency_callback away from RCU (it's only set/cleared once per KVM module lifetime; a regular pointer with smp_store_release/ smp_load_acquire would suffice). c) Keep my patch but document that it's a minor lockdep silencer for a path where the use-after-free window is closed by other means (panic-time module unload being unrealistic). What direction would you prefer? I'm happy to spin v2 as needed. -- Best Regards, Mike Gavrilov. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] x86/virt: Fix RCU lockdep splat in emergency virt callback path 2026-05-04 18:50 ` Mikhail Gavrilov @ 2026-05-04 21:40 ` Mikhail Gavrilov 2026-05-04 23:03 ` Sean Christopherson 0 siblings, 1 reply; 7+ messages in thread From: Mikhail Gavrilov @ 2026-05-04 21:40 UTC (permalink / raw) To: Sean Christopherson Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, H . Peter Anvin, Dan Williams, Chao Gao, x86, kvm, linux-kernel On Mon, May 4, 2026 at 11:50 PM Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> wrote: > > What direction would you prefer? I'm happy to spin v2 as needed. > After looking at how other places in the kernel handle this — kernel/notifier.c, kernel/cgroup/cgroup.c, kernel/fork.c, kernel/sched/fair.c all use rcu_dereference_raw() when the caller has context-specific knowledge that makes lockdep checks inappropriate. I'll send v2 using rcu_dereference_raw() with a comment explaining the panic-context reasoning. The diff would look like: /* * The crashing CPU may be outside RCU's watching set in panic context. * Use rcu_dereference_raw() to avoid lockdep complaints — the writers * (KVM module load/unload) cannot run during emergency virt callback * invocation, so the pointer is effectively stable here. */ kvm_callback = rcu_dereference_raw(kvm_emergency_callback); Let me know if you'd prefer a different approach (option (b) from my previous mail — converting away from RCU entirely — is a bigger change but I can do that instead). -- Best Regards, Mike Gavrilov. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] x86/virt: Fix RCU lockdep splat in emergency virt callback path 2026-05-04 21:40 ` Mikhail Gavrilov @ 2026-05-04 23:03 ` Sean Christopherson 0 siblings, 0 replies; 7+ messages in thread From: Sean Christopherson @ 2026-05-04 23:03 UTC (permalink / raw) To: Mikhail Gavrilov Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, H . Peter Anvin, Dan Williams, Chao Gao, x86, kvm, linux-kernel On Tue, May 05, 2026, Mikhail Gavrilov wrote: > On Mon, May 4, 2026 at 11:50 PM Mikhail Gavrilov > <mikhail.v.gavrilov@gmail.com> wrote: > > > > What direction would you prefer? I'm happy to spin v2 as needed. > > > > After looking at how other places in the kernel handle this — kernel/notifier.c, > kernel/cgroup/cgroup.c, kernel/fork.c, kernel/sched/fair.c all use > rcu_dereference_raw() when the caller has context-specific knowledge that > makes lockdep checks inappropriate. > > I'll send v2 using rcu_dereference_raw() with a comment explaining the > panic-context reasoning. The diff would look like: > > /* > * The crashing CPU may be outside RCU's watching set in panic context. > * Use rcu_dereference_raw() to avoid lockdep complaints — the writers > * (KVM module load/unload) cannot run during emergency virt callback > * invocation, so the pointer is effectively stable here. AFAIK, nothing actually prevents module unload when the kernel is panicking and/or rebooting. E.g. see commit 2baa33a8ddd6 ("KVM: x86: Leave user-return notifier registered on reboot/shutdown"). > */ > kvm_callback = rcu_dereference_raw(kvm_emergency_callback); > > Let me know if you'd prefer a different approach (option (b) from my > previous mail — converting away from RCU entirely — is a bigger change > but I can do that instead). For "normal" usage, if there really is even such a thing for this case, smp_store_release() / smp_load_acquire() won't suffice, because the kernel needs to ensure the module text isn't freed while the callback is in-flight. But as you noted before, if the kernel is panicking, (a) the window for anything to go wrong is comically small, and (b) at some point the kernel _can't_ guarantee that everything will be "fine". So I'd probably be ok with just sweeping this under the rug? Assuming we can't come up with an easy-ish solution that doesn't require taking locks (which to me, would have a higher probability of causing problems). ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2] x86/virt: Silence RCU lockdep splat in emergency virt callback path [not found] <20260503174534.45699-1-mikhail.v.gavrilov@gmail.com> 2026-05-04 17:48 ` [PATCH] x86/virt: Fix RCU lockdep splat in emergency virt callback path Sean Christopherson @ 2026-05-04 23:54 ` Mikhail Gavrilov 2026-05-07 21:59 ` Sean Christopherson 1 sibling, 1 reply; 7+ messages in thread From: Mikhail Gavrilov @ 2026-05-04 23:54 UTC (permalink / raw) To: seanjc, pbonzini Cc: tglx, mingo, bp, dave.hansen, hpa, djbw, chao.gao, x86, kvm, linux-kernel, Mikhail Gavrilov x86_virt_invoke_kvm_emergency_callback() reaches rcu_dereference() through machine_crash_shutdown() with IRQs disabled but with RCU not necessarily watching the crashing CPU, which triggers a suspicious RCU usage splat on debug kernels (CONFIG_PROVE_RCU=y) during panic/kdump: WARNING: suspicious RCU usage arch/x86/virt/hw.c:52 suspicious rcu_dereference_check() usage! rcu_scheduler_active = 2, debug_locks = 1 1 lock held by tee/11119: #0: ffff8881fa32c440 (sb_writers#3){.+.+}-{0:0}, at: ksys_write Call Trace: <TASK> dump_stack_lvl+0x84/0xd0 lockdep_rcu_suspicious.cold+0x37/0x8f x86_virt_invoke_kvm_emergency_callback+0x5f/0x70 x86_svm_emergency_disable_virtualization_cpu+0x2a/0x30 x86_virt_emergency_disable_virtualization_cpu+0x6b/0x90 native_machine_crash_shutdown+0x72/0x170 __crash_kexec+0x137/0x280 panic+0xce/0xd0 sysrq_handle_crash+0x1f/0x20 __handle_sysrq.cold+0x192/0x335 write_sysrq_trigger+0x8c/0xc0 proc_reg_write+0x1c3/0x3c0 vfs_write+0x1d0/0xf80 ksys_write+0x116/0x250 do_syscall_64+0x11c/0x1480 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> A truly correct fix is non-trivial: the RCU usage genuinely is wrong in panic context (RCU may ignore the crashing CPU during synchronization), and a concurrent KVM module unload could in principle race with the callback read; see commit 2baa33a8ddd6 ("KVM: x86: Leave user-return notifier registered on reboot/shutdown") which notes that nothing prevents module unload during panic/reboot. However, the alternatives are worse: - smp_store_release()/smp_load_acquire() handles ordering but not liveness; the kernel still needs to keep the module text alive while the callback is in flight. - Taking a lock in the panic path is risky — any lock could be held by a CPU that has already been NMI'd to a halt. Use rcu_dereference_raw() to silence the splat and accept the vanishingly small remaining race. Panic context inherently cannot guarantee complete correctness; the goal here is to keep debug builds quiet on the kdump path so the splat doesn't obscure the actual kernel state being captured. Reproducible on a debug kernel (CONFIG_PROVE_LOCKING=y, CONFIG_PROVE_RCU=y) with kvm_amd or kvm_intel loaded by triggering kdump: echo c > /proc/sysrq-trigger Suggested-by: Sean Christopherson <seanjc@google.com> Fixes: 428afac5a8ea ("KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystem") Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> --- arch/x86/virt/hw.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/arch/x86/virt/hw.c b/arch/x86/virt/hw.c index f647557d38ac..7e9091c640be 100644 --- a/arch/x86/virt/hw.c +++ b/arch/x86/virt/hw.c @@ -49,7 +49,20 @@ static void x86_virt_invoke_kvm_emergency_callback(void) { cpu_emergency_virt_cb *kvm_callback; - kvm_callback = rcu_dereference(kvm_emergency_callback); + /* + * RCU may not be watching the crashing CPU here, so rcu_dereference() + * triggers a suspicious-RCU-usage splat. In principle, a concurrent + * KVM module unload could race with this read; see commit 2baa33a8ddd6 + * ("KVM: x86: Leave user-return notifier registered on reboot/shutdown") + * which notes that nothing prevents module unload during panic/reboot. + * + * However, taking a lock here would be riskier than the current race: + * the system is going down via NMI shootdown, and any lock could be + * held by an already-stopped CPU. Use rcu_dereference_raw() to silence + * the lockdep splat and accept the comically small remaining race; + * panic context inherently cannot guarantee complete correctness. + */ + kvm_callback = rcu_dereference_raw(kvm_emergency_callback); if (kvm_callback) kvm_callback(); } -- 2.54.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2] x86/virt: Silence RCU lockdep splat in emergency virt callback path 2026-05-04 23:54 ` [PATCH v2] x86/virt: Silence " Mikhail Gavrilov @ 2026-05-07 21:59 ` Sean Christopherson 2026-05-08 19:21 ` Mikhail Gavrilov 0 siblings, 1 reply; 7+ messages in thread From: Sean Christopherson @ 2026-05-07 21:59 UTC (permalink / raw) To: Mikhail Gavrilov Cc: pbonzini, tglx, mingo, bp, dave.hansen, hpa, djbw, chao.gao, x86, kvm, linux-kernel On Tue, May 05, 2026, Mikhail Gavrilov wrote: > x86_virt_invoke_kvm_emergency_callback() reaches rcu_dereference() > through machine_crash_shutdown() with IRQs disabled but with RCU not > necessarily watching the crashing CPU, which triggers a suspicious > RCU usage splat on debug kernels (CONFIG_PROVE_RCU=y) during > panic/kdump: > > WARNING: suspicious RCU usage > arch/x86/virt/hw.c:52 suspicious rcu_dereference_check() usage! > > rcu_scheduler_active = 2, debug_locks = 1 > 1 lock held by tee/11119: > #0: ffff8881fa32c440 (sb_writers#3){.+.+}-{0:0}, at: ksys_write > > Call Trace: > <TASK> > dump_stack_lvl+0x84/0xd0 > lockdep_rcu_suspicious.cold+0x37/0x8f > x86_virt_invoke_kvm_emergency_callback+0x5f/0x70 > x86_svm_emergency_disable_virtualization_cpu+0x2a/0x30 > x86_virt_emergency_disable_virtualization_cpu+0x6b/0x90 > native_machine_crash_shutdown+0x72/0x170 > __crash_kexec+0x137/0x280 > panic+0xce/0xd0 > sysrq_handle_crash+0x1f/0x20 > __handle_sysrq.cold+0x192/0x335 > write_sysrq_trigger+0x8c/0xc0 > proc_reg_write+0x1c3/0x3c0 > vfs_write+0x1d0/0xf80 > ksys_write+0x116/0x250 > do_syscall_64+0x11c/0x1480 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > </TASK> > > A truly correct fix is non-trivial: the RCU usage genuinely is wrong in > panic context (RCU may ignore the crashing CPU during synchronization), > and a concurrent KVM module unload could in principle race with the > callback read; see commit 2baa33a8ddd6 ("KVM: x86: Leave user-return > notifier registered on reboot/shutdown") which notes that nothing > prevents module unload during panic/reboot. > > However, the alternatives are worse: > > - smp_store_release()/smp_load_acquire() handles ordering but not > liveness; the kernel still needs to keep the module text alive > while the callback is in flight. > - Taking a lock in the panic path is risky — any lock could be held > by a CPU that has already been NMI'd to a halt. > > Use rcu_dereference_raw() to silence the splat and accept the > vanishingly small remaining race. Panic context inherently cannot > guarantee complete correctness; the goal here is to keep debug builds > quiet on the kdump path so the splat doesn't obscure the actual > kernel state being captured. > > Reproducible on a debug kernel (CONFIG_PROVE_LOCKING=y, CONFIG_PROVE_RCU=y) > with kvm_amd or kvm_intel loaded by triggering kdump: > > echo c > /proc/sysrq-trigger > > Suggested-by: Sean Christopherson <seanjc@google.com> > Fixes: 428afac5a8ea ("KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystem") > Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> > --- Acked-by: Sean Christopherson <seanjc@google.com> (I can also take this through kvm-x86; I have no preference whatsoever) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] x86/virt: Silence RCU lockdep splat in emergency virt callback path 2026-05-07 21:59 ` Sean Christopherson @ 2026-05-08 19:21 ` Mikhail Gavrilov 0 siblings, 0 replies; 7+ messages in thread From: Mikhail Gavrilov @ 2026-05-08 19:21 UTC (permalink / raw) To: Sean Christopherson Cc: pbonzini, tglx, mingo, bp, dave.hansen, hpa, djbw, chao.gao, x86, kvm, linux-kernel On Fri, May 8, 2026 at 2:59 AM Sean Christopherson <seanjc@google.com> wrote: > > Acked-by: Sean Christopherson <seanjc@google.com> > > (I can also take this through kvm-x86; I have no preference whatsoever) Thanks Sean! Whichever path is most convenient works for me. -- Best Regards, Mike Gavrilov. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-05-08 19:21 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260503174534.45699-1-mikhail.v.gavrilov@gmail.com>
2026-05-04 17:48 ` [PATCH] x86/virt: Fix RCU lockdep splat in emergency virt callback path Sean Christopherson
2026-05-04 18:50 ` Mikhail Gavrilov
2026-05-04 21:40 ` Mikhail Gavrilov
2026-05-04 23:03 ` Sean Christopherson
2026-05-04 23:54 ` [PATCH v2] x86/virt: Silence " Mikhail Gavrilov
2026-05-07 21:59 ` Sean Christopherson
2026-05-08 19:21 ` Mikhail Gavrilov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox