From: Sean Christopherson <seanjc@google.com>
To: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Michael Larabel <Michael@michaellarabel.com>,
Borislav Petkov <bp@alien8.de>
Subject: Re: [PATCH v2] KVM: SVM: Set/clear SRSO's BP_SPEC_REDUCE on 0 <=> 1 VM count transitions
Date: Tue, 6 May 2025 08:57:36 -0700 [thread overview]
Message-ID: <aBoxcOPWRWyFIgVE@google.com> (raw)
In-Reply-To: <aBoc0MhlvO4hR03u@google.com>
On Tue, May 06, 2025, Yosry Ahmed wrote:
> On Tue, May 06, 2025 at 07:16:06AM -0700, Sean Christopherson wrote:
> > On Tue, May 06, 2025, Yosry Ahmed wrote:
> > > On Mon, May 05, 2025 at 11:03:00AM -0700, Sean Christopherson wrote:
> > > > +static void svm_srso_vm_destroy(void)
> > > > +{
> > > > + if (!cpu_feature_enabled(X86_FEATURE_SRSO_BP_SPEC_REDUCE))
> > > > + return;
> > > > +
> > > > + if (atomic_dec_return(&srso_nr_vms))
> > > > + return;
> > > > +
> > > > + guard(spinlock)(&srso_lock);
> > > > +
> > > > + /*
> > > > + * Verify a new VM didn't come along, acquire the lock, and increment
> > > > + * the count before this task acquired the lock.
> > > > + */
> > > > + if (atomic_read(&srso_nr_vms))
> > > > + return;
> > > > +
> > > > + on_each_cpu(svm_srso_clear_bp_spec_reduce, NULL, 1);
> > >
> > > Just a passing-by comment. I get worried about sending IPIs while
> > > holding a spinlock because if someone ever tries to hold that spinlock
> > > with IRQs disabled, it may cause a deadlock.
> > >
> > > This is not the case for this lock, but it's not obvious (at least to
> > > me) that holding it in a different code path that doesn't send IPIs with
> > > IRQs disabled could cause a problem.
> > >
> > > You could add a comment, convert it to a mutex to make this scenario
> > > impossible,
> >
> > Using a mutex doesn't make deadlock impossible, it's still perfectly legal to
> > disable IRQs while holding a mutex.
>
> Right, but it's illegal to hold a mutex while disabling IRQs.
Nit on the wording: it's illegal to take a mutex while IRQs are disabled. Disabling
IRQs while already holding a mutex is fine.
And it's also illegal to take a spinlock while IRQs are disabled, becauase spinlocks
become sleepable mutexes with PREEMPT_RT=y. While PREEMPT_RT=y isn't super common,
people do run KVM with PREEMPT_RT=y, and I'm guessing bots/CI would trip any such
violation quite quickly.
E.g. with IRQs disabled around the guard(spinlock)(&srso_lock):
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 2799, name: qemu
preempt_count: 0, expected: 0
RCU nest depth: 0, expected: 0
1 lock held by qemu/2799:
#0: ffffffff8263f898 (srso_lock){....}-{3:3}, at: svm_vm_destroy+0x47/0xa0
irq event stamp: 9090
hardirqs last enabled at (9089): [<ffffffff81414087>] vprintk_store+0x467/0x4d0
hardirqs last disabled at (9090): [<ffffffff812fd1ce>] svm_vm_destroy+0x5e/0xa0
softirqs last enabled at (0): [<ffffffff8137585c>] copy_process+0xa1c/0x29f0
softirqs last disabled at (0): [<0000000000000000>] 0x0
Call Trace:
<TASK>
dump_stack_lvl+0x57/0x80
__might_resched.cold+0xcc/0xde
rt_spin_lock+0x5b/0x170
svm_vm_destroy+0x47/0xa0
kvm_destroy_vm+0x180/0x310
kvm_vm_release+0x1d/0x30
__fput+0x10d/0x2f0
task_work_run+0x58/0x90
do_exit+0x325/0xa80
do_group_exit+0x32/0xa0
get_signal+0xb5b/0xbb0
arch_do_signal_or_restart+0x29/0x230
syscall_exit_to_user_mode+0xea/0x180
do_syscall_64+0x7a/0x220
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fb50ae7fc4e
</TASK>
> In this case, if the other CPU is already holding the lock then there's no
> risk of deadlock, right?
Not on srso_lock, but there's still deadlock potential on the locks used to protect
the call_function_data structure.
> > Similarly, I don't want to add a comment, because there is absolutely nothing
> > special/unique about this situation/lock. E.g. KVM has tens of calls to
> > smp_call_function_many_cond() while holding a spinlock equivalent, in the form
> > of kvm_make_all_cpus_request() while holding mmu_lock.
>
> Agreed that it's not a unique situation at all. Ideally we'd have some
> debugging (lockdep?) magic that identifies that an IPI is being sent
> while a lock is held, and that this specific lock is never spinned on
> with IRQs disabled.
Sleepable spinlocks aside, the lockdep_assert_irqs_enabled() in
smp_call_function_many_cond() already provides sufficient of coverage for that
case. And if code is using some other form of IPI communication *and* taking raw
spinlocks, then I think it goes without saying that developers would need to be
very, very careful.
> > smp_call_function_many_cond() already asserts that IRQs are disabled, so I have
> > zero concerns about this flow breaking in the future.
>
> That doesn't really help tho, the problem is if another CPU spins on the
> lock with IRQs disabled, regardless of whether or not it. Basically if
> CPU 1 acquires the lock and sends an IPI while CPU 2 disables IRQs and
> spins on the lock.
Given that svm_srso_vm_destroy() is guaranteed to call on_each_cpu() with the
lock held at some point, I'm completely comfortable relying on its lockdep
assertion.
> > > or dismiss my comment as being too paranoid/ridiculous :)
> >
> > I wouldn't say your thought process is too paranoid; when writing the code, I had
> > to pause and think to remember whether or not using on_each_cpu() while holding a
> > spinlock is allowed. But I do think the conclusion is wrong :-)
>
> That's fair. I think protection against this should be done more generically
> as I mentioned earlier, but it felt like it would be easy-ish to side-step it
> in this case.
Eh, modifying this code in such a way that it could deadlock without lockdep
noticing would likely send up a comincal number of red flags during code review.
next prev parent reply other threads:[~2025-05-06 15:57 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-05 18:03 [PATCH v2] KVM: SVM: Set/clear SRSO's BP_SPEC_REDUCE on 0 <=> 1 VM count transitions Sean Christopherson
2025-05-06 9:48 ` Yosry Ahmed
2025-05-06 14:16 ` Sean Christopherson
2025-05-06 14:29 ` Yosry Ahmed
2025-05-06 15:57 ` Sean Christopherson [this message]
2025-05-07 7:05 ` Yosry Ahmed
2025-05-07 13:19 ` Sean Christopherson
2025-05-06 14:22 ` Borislav Petkov
2025-05-08 23:04 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aBoxcOPWRWyFIgVE@google.com \
--to=seanjc@google.com \
--cc=Michael@michaellarabel.com \
--cc=bp@alien8.de \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=yosry.ahmed@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.