* [PATCH 1/6] KVM: Explicitly verify target vCPU is online in kvm_get_vcpu()
2024-10-09 15:04 [PATCH 0/6] KVM: Fix bugs in vCPUs xarray usage Sean Christopherson
@ 2024-10-09 15:04 ` Sean Christopherson
2024-10-10 5:31 ` Gupta, Pankaj
2024-10-09 15:04 ` [PATCH 2/6] KVM: Verify there's at least one online vCPU when iterating over all vCPUs Sean Christopherson
` (6 subsequent siblings)
7 siblings, 1 reply; 18+ messages in thread
From: Sean Christopherson @ 2024-10-09 15:04 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm, linux-kernel, Will Deacon, Michal Luczaj,
Sean Christopherson, Alexander Potapenko, Marc Zyngier,
Oliver Upton
Explicitly verify the target vCPU is fully online _prior_ to clamping the
index in kvm_get_vcpu(). If the index is "bad", the nospec clamping will
generate '0', i.e. KVM will return vCPU0 instead of NULL.
In practice, the bug is unlikely to cause problems, as it will only come
into play if userspace or the guest is buggy or misbehaving, e.g. KVM may
send interrupts to vCPU0 instead of dropping them on the floor.
However, returning vCPU0 when it shouldn't exist per online_vcpus is
problematic now that KVM uses an xarray for the vCPUs array, as KVM needs
to insert into the xarray before publishing the vCPU to userspace (see
commit c5b077549136 ("KVM: Convert the kvm->vcpus array to a xarray")),
i.e. before vCPU creation is guaranteed to succeed.
As a result, incorrectly providing access to vCPU0 will trigger a
use-after-free if vCPU0 is dereferenced and kvm_vm_ioctl_create_vcpu()
bails out of vCPU creation due to an error and frees vCPU0. Commit
afb2acb2e3a3 ("KVM: Fix vcpu_array[0] races") papered over that issue, but
in doing so introduced an unsolvable teardown conundrum. Preventing
accesses to vCPU0 before it's fully online will allow reverting commit
afb2acb2e3a3, without re-introducing the vcpu_array[0] UAF race.
Fixes: 1d487e9bf8ba ("KVM: fix spectrev1 gadgets")
Cc: stable@vger.kernel.org
Cc: Will Deacon <will@kernel.org>
Cc: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
include/linux/kvm_host.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index db567d26f7b9..450dd0444a92 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -969,6 +969,15 @@ static inline struct kvm_io_bus *kvm_get_bus(struct kvm *kvm, enum kvm_bus idx)
static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
{
int num_vcpus = atomic_read(&kvm->online_vcpus);
+
+ /*
+ * Explicitly verify the target vCPU is online, as the anti-speculation
+ * logic only limits the CPU's ability to speculate, e.g. given a "bad"
+ * index, clamping the index to 0 would return vCPU0, not NULL.
+ */
+ if (i >= num_vcpus)
+ return NULL;
+
i = array_index_nospec(i, num_vcpus);
/* Pairs with smp_wmb() in kvm_vm_ioctl_create_vcpu. */
--
2.47.0.rc0.187.ge670bccf7e-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 1/6] KVM: Explicitly verify target vCPU is online in kvm_get_vcpu()
2024-10-09 15:04 ` [PATCH 1/6] KVM: Explicitly verify target vCPU is online in kvm_get_vcpu() Sean Christopherson
@ 2024-10-10 5:31 ` Gupta, Pankaj
2024-10-10 15:54 ` Sean Christopherson
0 siblings, 1 reply; 18+ messages in thread
From: Gupta, Pankaj @ 2024-10-10 5:31 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Will Deacon, Michal Luczaj,
Alexander Potapenko, Marc Zyngier, Oliver Upton
On 10/9/2024 5:04 PM, Sean Christopherson wrote:
> Explicitly verify the target vCPU is fully online _prior_ to clamping the
> index in kvm_get_vcpu(). If the index is "bad", the nospec clamping will
> generate '0', i.e. KVM will return vCPU0 instead of NULL.
>
> In practice, the bug is unlikely to cause problems, as it will only come
> into play if userspace or the guest is buggy or misbehaving, e.g. KVM may
> send interrupts to vCPU0 instead of dropping them on the floor.
>
> However, returning vCPU0 when it shouldn't exist per online_vcpus is
> problematic now that KVM uses an xarray for the vCPUs array, as KVM needs
> to insert into the xarray before publishing the vCPU to userspace (see
> commit c5b077549136 ("KVM: Convert the kvm->vcpus array to a xarray")),
> i.e. before vCPU creation is guaranteed to succeed.
>
> As a result, incorrectly providing access to vCPU0 will trigger a
> use-after-free if vCPU0 is dereferenced and kvm_vm_ioctl_create_vcpu()
> bails out of vCPU creation due to an error and frees vCPU0. Commit
> afb2acb2e3a3 ("KVM: Fix vcpu_array[0] races") papered over that issue, but
> in doing so introduced an unsolvable teardown conundrum. Preventing
> accesses to vCPU0 before it's fully online will allow reverting commit
> afb2acb2e3a3, without re-introducing the vcpu_array[0] UAF race.
I think I have observed this (the cause, not the effect on teardown)
accidentally when creating a vCPU for an overflowing vcpu_id.
>
> Fixes: 1d487e9bf8ba ("KVM: fix spectrev1 gadgets")
> Cc: stable@vger.kernel.org
> Cc: Will Deacon <will@kernel.org>
> Cc: Michal Luczaj <mhal@rbox.co>
> Signed-off-by: Sean Christopherson <seanjc@google.com > ---
> include/linux/kvm_host.h | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index db567d26f7b9..450dd0444a92 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -969,6 +969,15 @@ static inline struct kvm_io_bus *kvm_get_bus(struct kvm *kvm, enum kvm_bus idx)
> static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
> {
> int num_vcpus = atomic_read(&kvm->online_vcpus);
> +
> + /*
> + * Explicitly verify the target vCPU is online, as the anti-speculation
> + * logic only limits the CPU's ability to speculate, e.g. given a "bad"
> + * index, clamping the index to 0 would return vCPU0, not NULL.
> + */
> + if (i >= num_vcpus)
> + return NULL;
Would sev.c needs a NULL check for?
sev_migrate_from()
...
src_vcpu = kvm_get_vcpu(src_kvm, i);
src_svm = to_svm(src_vcpu);
...
Apart from the above comment, this looks good to me:
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
> +
> i = array_index_nospec(i, num_vcpus);
>
> /* Pairs with smp_wmb() in kvm_vm_ioctl_create_vcpu. */
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 1/6] KVM: Explicitly verify target vCPU is online in kvm_get_vcpu()
2024-10-10 5:31 ` Gupta, Pankaj
@ 2024-10-10 15:54 ` Sean Christopherson
2024-10-10 16:33 ` Gupta, Pankaj
0 siblings, 1 reply; 18+ messages in thread
From: Sean Christopherson @ 2024-10-10 15:54 UTC (permalink / raw)
To: Pankaj Gupta
Cc: Paolo Bonzini, kvm, linux-kernel, Will Deacon, Michal Luczaj,
Alexander Potapenko, Marc Zyngier, Oliver Upton
On Thu, Oct 10, 2024, Pankaj Gupta wrote:
> On 10/9/2024 5:04 PM, Sean Christopherson wrote:
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index db567d26f7b9..450dd0444a92 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -969,6 +969,15 @@ static inline struct kvm_io_bus *kvm_get_bus(struct kvm *kvm, enum kvm_bus idx)
> > static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
> > {
> > int num_vcpus = atomic_read(&kvm->online_vcpus);
> > +
> > + /*
> > + * Explicitly verify the target vCPU is online, as the anti-speculation
> > + * logic only limits the CPU's ability to speculate, e.g. given a "bad"
> > + * index, clamping the index to 0 would return vCPU0, not NULL.
> > + */
> > + if (i >= num_vcpus)
> > + return NULL;
>
> Would sev.c needs a NULL check for?
>
> sev_migrate_from()
> ...
> src_vcpu = kvm_get_vcpu(src_kvm, i);
> src_svm = to_svm(src_vcpu);
> ...
Nope, sev_check_source_vcpus() verifies the source and destination have the same
number of online vCPUs before calling sev_migrate_from(), and it's all done with
both VMs locked.
static int sev_check_source_vcpus(struct kvm *dst, struct kvm *src)
{
struct kvm_vcpu *src_vcpu;
unsigned long i;
if (!sev_es_guest(src))
return 0;
if (atomic_read(&src->online_vcpus) != atomic_read(&dst->online_vcpus))
return -EINVAL;
kvm_for_each_vcpu(i, src_vcpu, src) {
if (!src_vcpu->arch.guest_state_protected)
return -EINVAL;
}
return 0;
}
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 1/6] KVM: Explicitly verify target vCPU is online in kvm_get_vcpu()
2024-10-10 15:54 ` Sean Christopherson
@ 2024-10-10 16:33 ` Gupta, Pankaj
0 siblings, 0 replies; 18+ messages in thread
From: Gupta, Pankaj @ 2024-10-10 16:33 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Will Deacon, Michal Luczaj,
Alexander Potapenko, Marc Zyngier, Oliver Upton
On 10/10/2024 5:54 PM, Sean Christopherson wrote:
> On Thu, Oct 10, 2024, Pankaj Gupta wrote:
>> On 10/9/2024 5:04 PM, Sean Christopherson wrote:
>>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>>> index db567d26f7b9..450dd0444a92 100644
>>> --- a/include/linux/kvm_host.h
>>> +++ b/include/linux/kvm_host.h
>>> @@ -969,6 +969,15 @@ static inline struct kvm_io_bus *kvm_get_bus(struct kvm *kvm, enum kvm_bus idx)
>>> static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
>>> {
>>> int num_vcpus = atomic_read(&kvm->online_vcpus);
>>> +
>>> + /*
>>> + * Explicitly verify the target vCPU is online, as the anti-speculation
>>> + * logic only limits the CPU's ability to speculate, e.g. given a "bad"
>>> + * index, clamping the index to 0 would return vCPU0, not NULL.
>>> + */
>>> + if (i >= num_vcpus)
>>> + return NULL;
>>
>> Would sev.c needs a NULL check for?
>>
>> sev_migrate_from()
>> ...
>> src_vcpu = kvm_get_vcpu(src_kvm, i);
>> src_svm = to_svm(src_vcpu);
>> ...
>
> Nope, sev_check_source_vcpus() verifies the source and destination have the same
> number of online vCPUs before calling sev_migrate_from(), and it's all done with
> both VMs locked.
>
> static int sev_check_source_vcpus(struct kvm *dst, struct kvm *src)
> {
> struct kvm_vcpu *src_vcpu;
> unsigned long i;
>
> if (!sev_es_guest(src))
> return 0;
>
> if (atomic_read(&src->online_vcpus) != atomic_read(&dst->online_vcpus))
> return -EINVAL;
>
> kvm_for_each_vcpu(i, src_vcpu, src) {
> if (!src_vcpu->arch.guest_state_protected)
> return -EINVAL;
> }
>
> return 0;
> }
Yes, right!
Thanks,
Pankaj
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/6] KVM: Verify there's at least one online vCPU when iterating over all vCPUs
2024-10-09 15:04 [PATCH 0/6] KVM: Fix bugs in vCPUs xarray usage Sean Christopherson
2024-10-09 15:04 ` [PATCH 1/6] KVM: Explicitly verify target vCPU is online in kvm_get_vcpu() Sean Christopherson
@ 2024-10-09 15:04 ` Sean Christopherson
2024-10-09 15:04 ` [PATCH 3/6] KVM: Grab vcpu->mutex across installing the vCPU's fd and bumping online_vcpus Sean Christopherson
` (5 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2024-10-09 15:04 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm, linux-kernel, Will Deacon, Michal Luczaj,
Sean Christopherson, Alexander Potapenko, Marc Zyngier,
Oliver Upton
Explicitly check that there is at least online vCPU before iterating over
all vCPUs. Because the max index is an unsigned long, passing "0 - 1" in
the online_vcpus==0 case results in xa_for_each_range() using an unlimited
max, i.e. allows it to access vCPU0 when it shouldn't. This will allow
KVM to safely _erase_ from vcpu_array if the last stages of vCPU creation
fail, i.e. without generating a use-after-free if a different task happens
to be concurrently iterating over all vCPUs.
Note, because xa_for_each_range() is a macro, kvm_for_each_vcpu() subtly
reloads online_vcpus after each iteration, i.e. adding an extra load
doesn't meaningfully impact the total cost of iterating over all vCPUs.
And because online_vcpus is never decremented, there is no risk of a
reload triggering a walk of the entire xarray.
Cc: Will Deacon <will@kernel.org>
Cc: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
include/linux/kvm_host.h | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 450dd0444a92..5fe3b0c28fb3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -985,9 +985,10 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
return xa_load(&kvm->vcpu_array, i);
}
-#define kvm_for_each_vcpu(idx, vcpup, kvm) \
- xa_for_each_range(&kvm->vcpu_array, idx, vcpup, 0, \
- (atomic_read(&kvm->online_vcpus) - 1))
+#define kvm_for_each_vcpu(idx, vcpup, kvm) \
+ if (atomic_read(&kvm->online_vcpus)) \
+ xa_for_each_range(&kvm->vcpu_array, idx, vcpup, 0, \
+ (atomic_read(&kvm->online_vcpus) - 1))
static inline struct kvm_vcpu *kvm_get_vcpu_by_id(struct kvm *kvm, int id)
{
--
2.47.0.rc0.187.ge670bccf7e-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 3/6] KVM: Grab vcpu->mutex across installing the vCPU's fd and bumping online_vcpus
2024-10-09 15:04 [PATCH 0/6] KVM: Fix bugs in vCPUs xarray usage Sean Christopherson
2024-10-09 15:04 ` [PATCH 1/6] KVM: Explicitly verify target vCPU is online in kvm_get_vcpu() Sean Christopherson
2024-10-09 15:04 ` [PATCH 2/6] KVM: Verify there's at least one online vCPU when iterating over all vCPUs Sean Christopherson
@ 2024-10-09 15:04 ` Sean Christopherson
2024-10-09 15:04 ` [PATCH 4/6] Revert "KVM: Fix vcpu_array[0] races" Sean Christopherson
` (4 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2024-10-09 15:04 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm, linux-kernel, Will Deacon, Michal Luczaj,
Sean Christopherson, Alexander Potapenko, Marc Zyngier,
Oliver Upton
During vCPU creation, acquire vcpu->mutex prior to exposing the vCPU to
userspace, and hold the mutex until online_vcpus is bumped, i.e. until the
vCPU is fully online from KVM's perspective.
To ensure asynchronous vCPU ioctls also wait for the vCPU to come online,
explicitly check online_vcpus at the start of kvm_vcpu_ioctl(), and take
the vCPU's mutex to wait if necessary (having to wait for any ioctl should
be exceedingly rare, i.e. not worth optimizing).
Reported-by: Will Deacon <will@kernel.org>
Reported-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/all/20240730155646.1687-1-will@kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 47 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 46 insertions(+), 1 deletion(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 05cbb2548d99..fca9f74e9544 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4287,7 +4287,14 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
if (r)
goto unlock_vcpu_destroy;
- /* Now it's all set up, let userspace reach it */
+ /*
+ * Now it's all set up, let userspace reach it. Grab the vCPU's mutex
+ * so that userspace can't invoke vCPU ioctl()s until the vCPU is fully
+ * visible (per online_vcpus), e.g. so that KVM doesn't get tricked
+ * into a NULL-pointer dereference because KVM thinks the _current_
+ * vCPU doesn't exist.
+ */
+ mutex_lock(&vcpu->mutex);
kvm_get_kvm(kvm);
r = create_vcpu_fd(vcpu);
if (r < 0)
@@ -4304,6 +4311,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
*/
smp_wmb();
atomic_inc(&kvm->online_vcpus);
+ mutex_unlock(&vcpu->mutex);
mutex_unlock(&kvm->lock);
kvm_arch_vcpu_postcreate(vcpu);
@@ -4311,6 +4319,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
return r;
kvm_put_xa_release:
+ mutex_unlock(&vcpu->mutex);
kvm_put_kvm_no_destroy(kvm);
xa_release(&kvm->vcpu_array, vcpu->vcpu_idx);
unlock_vcpu_destroy:
@@ -4437,6 +4446,33 @@ static int kvm_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
}
#endif
+static int kvm_wait_for_vcpu_online(struct kvm_vcpu *vcpu)
+{
+ struct kvm *kvm = vcpu->kvm;
+
+ /*
+ * In practice, this happy path will always be taken, as a well-behaved
+ * VMM will never invoke a vCPU ioctl() before KVM_CREATE_VCPU returns.
+ */
+ if (likely(vcpu->vcpu_idx < atomic_read(&kvm->online_vcpus)))
+ return 0;
+
+ /*
+ * Acquire and release the vCPU's mutex to wait for vCPU creation to
+ * complete (kvm_vm_ioctl_create_vcpu() holds the mutex until the vCPU
+ * is fully online).
+ */
+ if (mutex_lock_killable(&vcpu->mutex))
+ return -EINTR;
+
+ mutex_unlock(&vcpu->mutex);
+
+ if (WARN_ON_ONCE(!kvm_get_vcpu(kvm, vcpu->vcpu_idx)))
+ return -EIO;
+
+ return 0;
+}
+
static long kvm_vcpu_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
@@ -4452,6 +4488,15 @@ static long kvm_vcpu_ioctl(struct file *filp,
if (unlikely(_IOC_TYPE(ioctl) != KVMIO))
return -EINVAL;
+ /*
+ * Wait for the vCPU to be online before handling the ioctl(), as KVM
+ * assumes the vCPU is reachable via vcpu_array, i.e. may dereference
+ * a NULL pointer if userspace invokes an ioctl() before KVM is ready.
+ */
+ r = kvm_wait_for_vcpu_online(vcpu);
+ if (r)
+ return r;
+
/*
* Some architectures have vcpu ioctls that are asynchronous to vcpu
* execution; mutex_lock() would break them.
--
2.47.0.rc0.187.ge670bccf7e-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH 4/6] Revert "KVM: Fix vcpu_array[0] races"
2024-10-09 15:04 [PATCH 0/6] KVM: Fix bugs in vCPUs xarray usage Sean Christopherson
` (2 preceding siblings ...)
2024-10-09 15:04 ` [PATCH 3/6] KVM: Grab vcpu->mutex across installing the vCPU's fd and bumping online_vcpus Sean Christopherson
@ 2024-10-09 15:04 ` Sean Christopherson
2024-10-10 12:46 ` Paolo Bonzini
2024-10-09 15:04 ` [PATCH 5/6] KVM: Don't BUG() the kernel if xa_insert() fails with -EBUSY Sean Christopherson
` (3 subsequent siblings)
7 siblings, 1 reply; 18+ messages in thread
From: Sean Christopherson @ 2024-10-09 15:04 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm, linux-kernel, Will Deacon, Michal Luczaj,
Sean Christopherson, Alexander Potapenko, Marc Zyngier,
Oliver Upton
Now that KVM loads from vcpu_array if and only if the target index is
valid with respect to online_vcpus, i.e. now that it is safe to erase a
not-fully-onlined vCPU entry, revert to storing into vcpu_array before
success is guaranteed.
If xa_store() fails, which _should_ be impossible, then putting the vCPU's
reference to 'struct kvm' results in a refcounting bug as the vCPU fd has
been installed and owns the vCPU's reference.
This was found by inspection, but forcing the xa_store() to fail
confirms the problem:
| Unable to handle kernel paging request at virtual address ffff800080ecd960
| Call trace:
| _raw_spin_lock_irq+0x2c/0x70
| kvm_irqfd_release+0x24/0xa0
| kvm_vm_release+0x1c/0x38
| __fput+0x88/0x2ec
| ____fput+0x10/0x1c
| task_work_run+0xb0/0xd4
| do_exit+0x210/0x854
| do_group_exit+0x70/0x98
| get_signal+0x6b0/0x73c
| do_signal+0xa4/0x11e8
| do_notify_resume+0x60/0x12c
| el0_svc+0x64/0x68
| el0t_64_sync_handler+0x84/0xfc
| el0t_64_sync+0x190/0x194
| Code: b9000909 d503201f 2a1f03e1 52800028 (88e17c08)
Practically speaking, this is a non-issue as xa_store() can't fail, absent
a nasty kernel bug. But the code is visually jarring and technically
broken.
This reverts commit afb2acb2e3a32e4d56f7fbd819769b98ed1b7520.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Michal Luczaj <mhal@rbox.co>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index fca9f74e9544..f081839521ef 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4283,7 +4283,8 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
}
vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
- r = xa_reserve(&kvm->vcpu_array, vcpu->vcpu_idx, GFP_KERNEL_ACCOUNT);
+ r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT);
+ BUG_ON(r == -EBUSY);
if (r)
goto unlock_vcpu_destroy;
@@ -4298,12 +4299,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
kvm_get_kvm(kvm);
r = create_vcpu_fd(vcpu);
if (r < 0)
- goto kvm_put_xa_release;
-
- if (KVM_BUG_ON(xa_store(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, 0), kvm)) {
- r = -EINVAL;
- goto kvm_put_xa_release;
- }
+ goto kvm_put_xa_erase;
/*
* Pairs with smp_rmb() in kvm_get_vcpu. Store the vcpu
@@ -4318,10 +4314,10 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
kvm_create_vcpu_debugfs(vcpu);
return r;
-kvm_put_xa_release:
+kvm_put_xa_erase:
mutex_unlock(&vcpu->mutex);
kvm_put_kvm_no_destroy(kvm);
- xa_release(&kvm->vcpu_array, vcpu->vcpu_idx);
+ xa_erase(&kvm->vcpu_array, vcpu->vcpu_idx);
unlock_vcpu_destroy:
mutex_unlock(&kvm->lock);
kvm_dirty_ring_free(&vcpu->dirty_ring);
--
2.47.0.rc0.187.ge670bccf7e-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 4/6] Revert "KVM: Fix vcpu_array[0] races"
2024-10-09 15:04 ` [PATCH 4/6] Revert "KVM: Fix vcpu_array[0] races" Sean Christopherson
@ 2024-10-10 12:46 ` Paolo Bonzini
2024-10-10 17:48 ` Sean Christopherson
0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2024-10-10 12:46 UTC (permalink / raw)
To: Sean Christopherson
Cc: kvm, linux-kernel, Will Deacon, Michal Luczaj,
Alexander Potapenko, Marc Zyngier, Oliver Upton
On 10/9/24 17:04, Sean Christopherson wrote:
> Now that KVM loads from vcpu_array if and only if the target index is
> valid with respect to online_vcpus, i.e. now that it is safe to erase a
> not-fully-onlined vCPU entry, revert to storing into vcpu_array before
> success is guaranteed.
>
> If xa_store() fails, which _should_ be impossible, then putting the vCPU's
> reference to 'struct kvm' results in a refcounting bug as the vCPU fd has
> been installed and owns the vCPU's reference.
>
> This was found by inspection, but forcing the xa_store() to fail
> confirms the problem:
>
> | Unable to handle kernel paging request at virtual address ffff800080ecd960
> | Call trace:
> | _raw_spin_lock_irq+0x2c/0x70
> | kvm_irqfd_release+0x24/0xa0
> | kvm_vm_release+0x1c/0x38
> | __fput+0x88/0x2ec
> | ____fput+0x10/0x1c
> | task_work_run+0xb0/0xd4
> | do_exit+0x210/0x854
> | do_group_exit+0x70/0x98
> | get_signal+0x6b0/0x73c
> | do_signal+0xa4/0x11e8
> | do_notify_resume+0x60/0x12c
> | el0_svc+0x64/0x68
> | el0t_64_sync_handler+0x84/0xfc
> | el0t_64_sync+0x190/0x194
> | Code: b9000909 d503201f 2a1f03e1 52800028 (88e17c08)
>
> Practically speaking, this is a non-issue as xa_store() can't fail, absent
> a nasty kernel bug. But the code is visually jarring and technically
> broken.
>
> This reverts commit afb2acb2e3a32e4d56f7fbd819769b98ed1b7520.
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Michal Luczaj <mhal@rbox.co>
> Cc: Alexander Potapenko <glider@google.com>
> Cc: Marc Zyngier <maz@kernel.org>
> Reported-by: Will Deacon <will@kernel.org>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> virt/kvm/kvm_main.c | 14 +++++---------
> 1 file changed, 5 insertions(+), 9 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index fca9f74e9544..f081839521ef 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4283,7 +4283,8 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
> }
>
> vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
> - r = xa_reserve(&kvm->vcpu_array, vcpu->vcpu_idx, GFP_KERNEL_ACCOUNT);
> + r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT);
> + BUG_ON(r == -EBUSY);
> if (r)
> goto unlock_vcpu_destroy;
>
> @@ -4298,12 +4299,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
> kvm_get_kvm(kvm);
> r = create_vcpu_fd(vcpu);
> if (r < 0)
> - goto kvm_put_xa_release;
> -
> - if (KVM_BUG_ON(xa_store(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, 0), kvm)) {
> - r = -EINVAL;
> - goto kvm_put_xa_release;
> - }
> + goto kvm_put_xa_erase;
I also find it a bit jarring though that we have to undo the insertion.
This is a chicken-and-egg situation where you are pick one operation B
that will have to undo operation A if it fails. But what xa_store is
doing, is breaking this deadlock.
The code is a bit longer, sure, but I don't see the point in
complicating the vcpu_array invariants and letting an entry disappear.
The rest of the series is still good, of course.
Paolo
> /*
> * Pairs with smp_rmb() in kvm_get_vcpu. Store the vcpu
> @@ -4318,10 +4314,10 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
> kvm_create_vcpu_debugfs(vcpu);
> return r;
>
> -kvm_put_xa_release:
> +kvm_put_xa_erase:
> mutex_unlock(&vcpu->mutex);
> kvm_put_kvm_no_destroy(kvm);
> - xa_release(&kvm->vcpu_array, vcpu->vcpu_idx);
> + xa_erase(&kvm->vcpu_array, vcpu->vcpu_idx);
> unlock_vcpu_destroy:
> mutex_unlock(&kvm->lock);
> kvm_dirty_ring_free(&vcpu->dirty_ring);
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 4/6] Revert "KVM: Fix vcpu_array[0] races"
2024-10-10 12:46 ` Paolo Bonzini
@ 2024-10-10 17:48 ` Sean Christopherson
2024-10-20 11:23 ` Paolo Bonzini
0 siblings, 1 reply; 18+ messages in thread
From: Sean Christopherson @ 2024-10-10 17:48 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm, linux-kernel, Will Deacon, Michal Luczaj,
Alexander Potapenko, Marc Zyngier, Oliver Upton
On Thu, Oct 10, 2024, Paolo Bonzini wrote:
> On 10/9/24 17:04, Sean Christopherson wrote:
> > Now that KVM loads from vcpu_array if and only if the target index is
> > valid with respect to online_vcpus, i.e. now that it is safe to erase a
> > not-fully-onlined vCPU entry, revert to storing into vcpu_array before
> > success is guaranteed.
> >
> > If xa_store() fails, which _should_ be impossible, then putting the vCPU's
> > reference to 'struct kvm' results in a refcounting bug as the vCPU fd has
> > been installed and owns the vCPU's reference.
> >
> > This was found by inspection, but forcing the xa_store() to fail
> > confirms the problem:
> >
> > | Unable to handle kernel paging request at virtual address ffff800080ecd960
> > | Call trace:
> > | _raw_spin_lock_irq+0x2c/0x70
> > | kvm_irqfd_release+0x24/0xa0
> > | kvm_vm_release+0x1c/0x38
> > | __fput+0x88/0x2ec
> > | ____fput+0x10/0x1c
> > | task_work_run+0xb0/0xd4
> > | do_exit+0x210/0x854
> > | do_group_exit+0x70/0x98
> > | get_signal+0x6b0/0x73c
> > | do_signal+0xa4/0x11e8
> > | do_notify_resume+0x60/0x12c
> > | el0_svc+0x64/0x68
> > | el0t_64_sync_handler+0x84/0xfc
> > | el0t_64_sync+0x190/0x194
> > | Code: b9000909 d503201f 2a1f03e1 52800028 (88e17c08)
> >
> > Practically speaking, this is a non-issue as xa_store() can't fail, absent
> > a nasty kernel bug. But the code is visually jarring and technically
> > broken.
> >
> > This reverts commit afb2acb2e3a32e4d56f7fbd819769b98ed1b7520.
> >
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Michal Luczaj <mhal@rbox.co>
> > Cc: Alexander Potapenko <glider@google.com>
> > Cc: Marc Zyngier <maz@kernel.org>
> > Reported-by: Will Deacon <will@kernel.org>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> > virt/kvm/kvm_main.c | 14 +++++---------
> > 1 file changed, 5 insertions(+), 9 deletions(-)
> >
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index fca9f74e9544..f081839521ef 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -4283,7 +4283,8 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
> > }
> > vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
> > - r = xa_reserve(&kvm->vcpu_array, vcpu->vcpu_idx, GFP_KERNEL_ACCOUNT);
> > + r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT);
> > + BUG_ON(r == -EBUSY);
> > if (r)
> > goto unlock_vcpu_destroy;
> > @@ -4298,12 +4299,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
> > kvm_get_kvm(kvm);
> > r = create_vcpu_fd(vcpu);
> > if (r < 0)
> > - goto kvm_put_xa_release;
> > -
> > - if (KVM_BUG_ON(xa_store(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, 0), kvm)) {
> > - r = -EINVAL;
> > - goto kvm_put_xa_release;
> > - }
> > + goto kvm_put_xa_erase;
>
> I also find it a bit jarring though that we have to undo the insertion. This
> is a chicken-and-egg situation where you are pick one operation B that will
> have to undo operation A if it fails. But what xa_store is doing, is
> breaking this deadlock.
>
> The code is a bit longer, sure, but I don't see the point in complicating
> the vcpu_array invariants and letting an entry disappear.
But we only need one rule: vcpu_array[x] is valid if and only if 'x' is less than
online_vcpus. And that rule is necessary regardless of whether or not vcpu_array[x]
is filled before success is guaranteed.
I'm not concerned about the code length, it's that we need to do _something_ if
xa_store() fails. Yeah, it should never happen, but knowingly doing nothing feels
all kinds of wrong. I don't like BUG(), because it's obviously very doable to
gracefully handle failure. And a WARN() is rather pointless, because continuing
on with an invalid entry is all but guaranteed to crash, i.e. is little more than a
deferred BUG() in this case.
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 4/6] Revert "KVM: Fix vcpu_array[0] races"
2024-10-10 17:48 ` Sean Christopherson
@ 2024-10-20 11:23 ` Paolo Bonzini
2024-12-16 23:05 ` Sean Christopherson
0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2024-10-20 11:23 UTC (permalink / raw)
To: Sean Christopherson
Cc: kvm, linux-kernel, Will Deacon, Michal Luczaj,
Alexander Potapenko, Marc Zyngier, Oliver Upton
On 10/10/24 19:48, Sean Christopherson wrote:
> On Thu, Oct 10, 2024, Paolo Bonzini wrote:
>> On 10/9/24 17:04, Sean Christopherson wrote:
>>> Now that KVM loads from vcpu_array if and only if the target index is
>>> valid with respect to online_vcpus, i.e. now that it is safe to erase a
>>> not-fully-onlined vCPU entry, revert to storing into vcpu_array before
>>> success is guaranteed.
>>>
>>> If xa_store() fails, which _should_ be impossible, then putting the vCPU's
>>> reference to 'struct kvm' results in a refcounting bug as the vCPU fd has
>>> been installed and owns the vCPU's reference.
>>>
>>> This was found by inspection, but forcing the xa_store() to fail
>>> confirms the problem:
>>>
>>> | Unable to handle kernel paging request at virtual address ffff800080ecd960
>>> | Call trace:
>>> | _raw_spin_lock_irq+0x2c/0x70
>>> | kvm_irqfd_release+0x24/0xa0
>>> | kvm_vm_release+0x1c/0x38
>>> | __fput+0x88/0x2ec
>>> | ____fput+0x10/0x1c
>>> | task_work_run+0xb0/0xd4
>>> | do_exit+0x210/0x854
>>> | do_group_exit+0x70/0x98
>>> | get_signal+0x6b0/0x73c
>>> | do_signal+0xa4/0x11e8
>>> | do_notify_resume+0x60/0x12c
>>> | el0_svc+0x64/0x68
>>> | el0t_64_sync_handler+0x84/0xfc
>>> | el0t_64_sync+0x190/0x194
>>> | Code: b9000909 d503201f 2a1f03e1 52800028 (88e17c08)
>>>
>>> Practically speaking, this is a non-issue as xa_store() can't fail, absent
>>> a nasty kernel bug. But the code is visually jarring and technically
>>> broken.
>>>
>>> This reverts commit afb2acb2e3a32e4d56f7fbd819769b98ed1b7520.
>>>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: Michal Luczaj <mhal@rbox.co>
>>> Cc: Alexander Potapenko <glider@google.com>
>>> Cc: Marc Zyngier <maz@kernel.org>
>>> Reported-by: Will Deacon <will@kernel.org>
>>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>>> ---
>>> virt/kvm/kvm_main.c | 14 +++++---------
>>> 1 file changed, 5 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>> index fca9f74e9544..f081839521ef 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -4283,7 +4283,8 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
>>> }
>>> vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
>>> - r = xa_reserve(&kvm->vcpu_array, vcpu->vcpu_idx, GFP_KERNEL_ACCOUNT);
>>> + r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT);
>>> + BUG_ON(r == -EBUSY);
>>> if (r)
>>> goto unlock_vcpu_destroy;
>>> @@ -4298,12 +4299,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
>>> kvm_get_kvm(kvm);
>>> r = create_vcpu_fd(vcpu);
>>> if (r < 0)
>>> - goto kvm_put_xa_release;
>>> -
>>> - if (KVM_BUG_ON(xa_store(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, 0), kvm)) {
>>> - r = -EINVAL;
>>> - goto kvm_put_xa_release;
>>> - }
>>> + goto kvm_put_xa_erase;
>>
>> I also find it a bit jarring though that we have to undo the insertion. This
>> is a chicken-and-egg situation where you are pick one operation B that will
>> have to undo operation A if it fails. But what xa_store is doing, is
>> breaking this deadlock.
>>
>> The code is a bit longer, sure, but I don't see the point in complicating
>> the vcpu_array invariants and letting an entry disappear.
>
> But we only need one rule: vcpu_array[x] is valid if and only if 'x' is less than
> online_vcpus. And that rule is necessary regardless of whether or not vcpu_array[x]
> is filled before success is guaranteed.
Even if the invariant is explainable I still find xa_erase to be uglier
than xa_release, but maybe it's just me.
The reason I'm not fully convinced by the explanation is that...
> I'm not concerned about the code length, it's that we need to do _something_ if
> xa_store() fails. Yeah, it should never happen, but knowingly doing nothing feels
> all kinds of wrong.
... it seems to me that this is not just an issue in KVM code; it should
apply to other uses of xa_reserve()/xa_store() as well. If xa_store()
fails after xa_reserve(), you're pretty much using the xarray API
incorrectly... and then, just make it a BUG()? I know that BUG() is
frowned upon, but if the API causes invalid memory accesses when used
incorrectly, one might as well fail as early as possible and before the
invalid memory access becomes exploitable.
> I don't like BUG(), because it's obviously very doable to
> gracefully handle failure.
Yes, you can by using a different API. But the point is that in the
reserve/store case the insert failure becomes a reserve failure, never a
store failure.
Maybe there should be an xa_store_reserved() that BUGs on failure, I
don't know.
Paolo
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 4/6] Revert "KVM: Fix vcpu_array[0] races"
2024-10-20 11:23 ` Paolo Bonzini
@ 2024-12-16 23:05 ` Sean Christopherson
0 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2024-12-16 23:05 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm, linux-kernel, Will Deacon, Michal Luczaj,
Alexander Potapenko, Marc Zyngier, Oliver Upton
On Sun, Oct 20, 2024, Paolo Bonzini wrote:
> On 10/10/24 19:48, Sean Christopherson wrote:
> > On Thu, Oct 10, 2024, Paolo Bonzini wrote:
...
> > > > @@ -4298,12 +4299,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
> > > > kvm_get_kvm(kvm);
> > > > r = create_vcpu_fd(vcpu);
> > > > if (r < 0)
> > > > - goto kvm_put_xa_release;
> > > > -
> > > > - if (KVM_BUG_ON(xa_store(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, 0), kvm)) {
> > > > - r = -EINVAL;
> > > > - goto kvm_put_xa_release;
> > > > - }
> > > > + goto kvm_put_xa_erase;
> > >
> > > I also find it a bit jarring though that we have to undo the insertion. This
> > > is a chicken-and-egg situation where you are pick one operation B that will
> > > have to undo operation A if it fails. But what xa_store is doing, is
> > > breaking this deadlock.
> > >
> > > The code is a bit longer, sure, but I don't see the point in complicating
> > > the vcpu_array invariants and letting an entry disappear.
> >
> > But we only need one rule: vcpu_array[x] is valid if and only if 'x' is less than
> > online_vcpus. And that rule is necessary regardless of whether or not vcpu_array[x]
> > is filled before success is guaranteed.
>
> Even if the invariant is explainable I still find xa_erase to be uglier than
> xa_release, but maybe it's just me.
It's uglier, but has the distinct advantage of not being broken :-D
And while uglier, IMO there's value in having a way for fuzzers to test KVM's
online_vcpus logic. As evidenced by patches 1-3, accessing vcpu_array[] without
first checking online_vcpus is dangerous regardless of how vcpu_array[] is populated.
Allowing fuzzers to trigger removal vcpu_array[] in KASAN kernels provides meaningful
coverage for that code (see Michal's original bug report). While well-intentioned,
Michal's change in afb2acb2e3a3 simply blamed the wrong code. Denying ourselves that
coverage and carrying flawed code just because the correct code isn't the prettiest
doesn't seem like a good tradeoff.
> The reason I'm not fully convinced by the explanation is that...
>
> > I'm not concerned about the code length, it's that we need to do _something_ if
> > xa_store() fails. Yeah, it should never happen, but knowingly doing nothing feels
> > all kinds of wrong.
>
> ... it seems to me that this is not just an issue in KVM code; it should
> apply to other uses of xa_reserve()/xa_store() as well. If xa_store() fails
> after xa_reserve(), you're pretty much using the xarray API incorrectly...
> and then, just make it a BUG()? I know that BUG() is frowned upon, but if
> the API causes invalid memory accesses when used incorrectly, one might as
> well fail as early as possible and before the invalid memory access becomes
> exploitable.
>
> > I don't like BUG(), because it's obviously very doable to
> > gracefully handle failure.
>
> Yes, you can by using a different API. But the point is that in the
> reserve/store case the insert failure becomes a reserve failure, never a
> store failure.
>
> Maybe there should be an xa_store_reserved() that BUGs on failure, I don't
> know.
I agree a version of xa_store() that guarantees success would be nice to have,
but I'm not exactly eager to potentially start a fight Willy *and* Linus :-)
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 5/6] KVM: Don't BUG() the kernel if xa_insert() fails with -EBUSY
2024-10-09 15:04 [PATCH 0/6] KVM: Fix bugs in vCPUs xarray usage Sean Christopherson
` (3 preceding siblings ...)
2024-10-09 15:04 ` [PATCH 4/6] Revert "KVM: Fix vcpu_array[0] races" Sean Christopherson
@ 2024-10-09 15:04 ` Sean Christopherson
2024-10-10 5:33 ` Gupta, Pankaj
2024-10-09 15:04 ` [PATCH 6/6] KVM: Drop hack that "manually" informs lockdep of kvm->lock vs. vcpu->mutex Sean Christopherson
` (2 subsequent siblings)
7 siblings, 1 reply; 18+ messages in thread
From: Sean Christopherson @ 2024-10-09 15:04 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm, linux-kernel, Will Deacon, Michal Luczaj,
Sean Christopherson, Alexander Potapenko, Marc Zyngier,
Oliver Upton
WARN once instead of triggering a BUG if xa_insert() fails because it
encountered an existing entry. While KVM guarantees there should be no
existing entry, there's no reason to BUG the kernel, as KVM needs to
gracefully handle failure anyways.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f081839521ef..ae216256ee9d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4284,7 +4284,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT);
- BUG_ON(r == -EBUSY);
+ WARN_ON_ONCE(r == -EBUSY);
if (r)
goto unlock_vcpu_destroy;
--
2.47.0.rc0.187.ge670bccf7e-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 5/6] KVM: Don't BUG() the kernel if xa_insert() fails with -EBUSY
2024-10-09 15:04 ` [PATCH 5/6] KVM: Don't BUG() the kernel if xa_insert() fails with -EBUSY Sean Christopherson
@ 2024-10-10 5:33 ` Gupta, Pankaj
0 siblings, 0 replies; 18+ messages in thread
From: Gupta, Pankaj @ 2024-10-10 5:33 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Will Deacon, Michal Luczaj,
Alexander Potapenko, Marc Zyngier, Oliver Upton
> WARN once instead of triggering a BUG if xa_insert() fails because it
> encountered an existing entry. While KVM guarantees there should be no
> existing entry, there's no reason to BUG the kernel, as KVM needs to
> gracefully handle failure anyways.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
> ---
> virt/kvm/kvm_main.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index f081839521ef..ae216256ee9d 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4284,7 +4284,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
>
> vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
> r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT);
> - BUG_ON(r == -EBUSY);
> + WARN_ON_ONCE(r == -EBUSY);
> if (r)
> goto unlock_vcpu_destroy;
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 6/6] KVM: Drop hack that "manually" informs lockdep of kvm->lock vs. vcpu->mutex
2024-10-09 15:04 [PATCH 0/6] KVM: Fix bugs in vCPUs xarray usage Sean Christopherson
` (4 preceding siblings ...)
2024-10-09 15:04 ` [PATCH 5/6] KVM: Don't BUG() the kernel if xa_insert() fails with -EBUSY Sean Christopherson
@ 2024-10-09 15:04 ` Sean Christopherson
2024-10-29 14:18 ` [PATCH 0/6] KVM: Fix bugs in vCPUs xarray usage Will Deacon
2024-12-19 2:40 ` Sean Christopherson
7 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2024-10-09 15:04 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm, linux-kernel, Will Deacon, Michal Luczaj,
Sean Christopherson, Alexander Potapenko, Marc Zyngier,
Oliver Upton
Now that KVM takes vcpu->mutex inside kvm->lock when creating a vCPU, drop
the hack to manually inform lockdep of the kvm->lock => vcpu->mutex
ordering.
This effectively reverts commit 42a90008f890 ("KVM: Ensure lockdep knows
about kvm->lock vs. vcpu->mutex ordering rule").
Cc: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ae216256ee9d..2dd3ff8764da 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4271,12 +4271,6 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
mutex_lock(&kvm->lock);
-#ifdef CONFIG_LOCKDEP
- /* Ensure that lockdep knows vcpu->mutex is taken *inside* kvm->lock */
- mutex_lock(&vcpu->mutex);
- mutex_unlock(&vcpu->mutex);
-#endif
-
if (kvm_get_vcpu_by_id(kvm, id)) {
r = -EEXIST;
goto unlock_vcpu_destroy;
@@ -4293,7 +4287,8 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
* so that userspace can't invoke vCPU ioctl()s until the vCPU is fully
* visible (per online_vcpus), e.g. so that KVM doesn't get tricked
* into a NULL-pointer dereference because KVM thinks the _current_
- * vCPU doesn't exist.
+ * vCPU doesn't exist. As a bonus, taking vcpu->mutex ensures lockdep
+ * knows it's taken *inside* kvm->lock.
*/
mutex_lock(&vcpu->mutex);
kvm_get_kvm(kvm);
--
2.47.0.rc0.187.ge670bccf7e-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH 0/6] KVM: Fix bugs in vCPUs xarray usage
2024-10-09 15:04 [PATCH 0/6] KVM: Fix bugs in vCPUs xarray usage Sean Christopherson
` (5 preceding siblings ...)
2024-10-09 15:04 ` [PATCH 6/6] KVM: Drop hack that "manually" informs lockdep of kvm->lock vs. vcpu->mutex Sean Christopherson
@ 2024-10-29 14:18 ` Will Deacon
2024-12-19 2:40 ` Sean Christopherson
7 siblings, 0 replies; 18+ messages in thread
From: Will Deacon @ 2024-10-29 14:18 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Michal Luczaj,
Alexander Potapenko, Marc Zyngier, Oliver Upton
On Wed, Oct 09, 2024 at 08:04:49AM -0700, Sean Christopherson wrote:
> This series stems from Will's observation[*] that kvm_vm_ioctl_create_vcpu()'s
> handling of xa_store() failure when inserting into vcpu_array is technically
> broken, although in practice it's impossible for xa_store() to fail.
>
> After much back and forth and staring, I realized that commit afb2acb2e3a3
> ("KVM: Fix vcpu_array[0] races") papered over underlying bugs in
> kvm_get_vcpu() and kvm_for_each_vcpu(). The core problem is that KVM
> allowed other tasks to see vCPU0 while online_vcpus==0, and thus trying
> to gracefully error out of vCPU creation led to use-after-free failures.
>
> So, rather than trying to solve the unsolvable problem for an error path
> that should be impossible to hit, fix the underlying issue and ensure that
> vcpu_array[0] is accessed if and only if online_vcpus is non-zero.
>
> Patch 3 fixes a race Michal identified when we were trying to figure out
> how to handle the xa_store() mess.
>
> Patch 4 reverts afb2acb2e3a3.
>
> Patches 5 and 6 are tangentially related cleanups.
Thanks, Sean. For the series:
Acked-by: Will Deacon <will@kernel.org>
I sympathise a little with Paolo on patch 4, but at the end of the day
it's a revert and I think that the code is better for it, even if the
whole scenario is messy.
Will
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 0/6] KVM: Fix bugs in vCPUs xarray usage
2024-10-09 15:04 [PATCH 0/6] KVM: Fix bugs in vCPUs xarray usage Sean Christopherson
` (6 preceding siblings ...)
2024-10-29 14:18 ` [PATCH 0/6] KVM: Fix bugs in vCPUs xarray usage Will Deacon
@ 2024-12-19 2:40 ` Sean Christopherson
2024-12-19 14:18 ` Paolo Bonzini
7 siblings, 1 reply; 18+ messages in thread
From: Sean Christopherson @ 2024-12-19 2:40 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Will Deacon, Michal Luczaj,
Alexander Potapenko, Marc Zyngier, Oliver Upton
On Wed, 09 Oct 2024 08:04:49 -0700, Sean Christopherson wrote:
> This series stems from Will's observation[*] that kvm_vm_ioctl_create_vcpu()'s
> handling of xa_store() failure when inserting into vcpu_array is technically
> broken, although in practice it's impossible for xa_store() to fail.
>
> After much back and forth and staring, I realized that commit afb2acb2e3a3
> ("KVM: Fix vcpu_array[0] races") papered over underlying bugs in
> kvm_get_vcpu() and kvm_for_each_vcpu(). The core problem is that KVM
> allowed other tasks to see vCPU0 while online_vcpus==0, and thus trying
> to gracefully error out of vCPU creation led to use-after-free failures.
>
> [...]
Applied to kvm-x86 vcpu_array to get coverage in -next, and to force Paolo's
hand :-).
Paolo, I put this in a dedicated branch so that it's easy to toss if you want to
go a different direction for the xarray insertion mess.
[1/6] KVM: Explicitly verify target vCPU is online in kvm_get_vcpu()
https://github.com/kvm-x86/linux/commit/1e7381f3617d
[2/6] KVM: Verify there's at least one online vCPU when iterating over all vCPUs
https://github.com/kvm-x86/linux/commit/0664dc74e9d0
[3/6] KVM: Grab vcpu->mutex across installing the vCPU's fd and bumping online_vcpus
https://github.com/kvm-x86/linux/commit/6e2b2358b3ef
[4/6] Revert "KVM: Fix vcpu_array[0] races"
https://github.com/kvm-x86/linux/commit/d0831edcd87e
[5/6] KVM: Don't BUG() the kernel if xa_insert() fails with -EBUSY
https://github.com/kvm-x86/linux/commit/e53dc37f5a06
[6/6] KVM: Drop hack that "manually" informs lockdep of kvm->lock vs. vcpu->mutex
https://github.com/kvm-x86/linux/commit/01528db67f28
--
https://github.com/kvm-x86/linux/tree/next
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 0/6] KVM: Fix bugs in vCPUs xarray usage
2024-12-19 2:40 ` Sean Christopherson
@ 2024-12-19 14:18 ` Paolo Bonzini
0 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2024-12-19 14:18 UTC (permalink / raw)
To: Sean Christopherson
Cc: kvm, linux-kernel, Will Deacon, Michal Luczaj,
Alexander Potapenko, Marc Zyngier, Oliver Upton
On Thu, Dec 19, 2024 at 3:44 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, 09 Oct 2024 08:04:49 -0700, Sean Christopherson wrote:
> > This series stems from Will's observation[*] that kvm_vm_ioctl_create_vcpu()'s
> > handling of xa_store() failure when inserting into vcpu_array is technically
> > broken, although in practice it's impossible for xa_store() to fail.
> >
> > After much back and forth and staring, I realized that commit afb2acb2e3a3
> > ("KVM: Fix vcpu_array[0] races") papered over underlying bugs in
> > kvm_get_vcpu() and kvm_for_each_vcpu(). The core problem is that KVM
> > allowed other tasks to see vCPU0 while online_vcpus==0, and thus trying
> > to gracefully error out of vCPU creation led to use-after-free failures.
> >
> > [...]
>
> Applied to kvm-x86 vcpu_array to get coverage in -next, and to force Paolo's
> hand :-).
>
> Paolo, I put this in a dedicated branch so that it's easy to toss if you want to
> go a different direction for the xarray insertion mess.
Go ahead; we already do more or less the same in
kvm_vm_set_mem_attributes(), so I guess that's just an unavoidable
weirdness of xa_reserve().
Paolo
>
> [1/6] KVM: Explicitly verify target vCPU is online in kvm_get_vcpu()
> https://github.com/kvm-x86/linux/commit/1e7381f3617d
> [2/6] KVM: Verify there's at least one online vCPU when iterating over all vCPUs
> https://github.com/kvm-x86/linux/commit/0664dc74e9d0
> [3/6] KVM: Grab vcpu->mutex across installing the vCPU's fd and bumping online_vcpus
> https://github.com/kvm-x86/linux/commit/6e2b2358b3ef
> [4/6] Revert "KVM: Fix vcpu_array[0] races"
> https://github.com/kvm-x86/linux/commit/d0831edcd87e
> [5/6] KVM: Don't BUG() the kernel if xa_insert() fails with -EBUSY
> https://github.com/kvm-x86/linux/commit/e53dc37f5a06
> [6/6] KVM: Drop hack that "manually" informs lockdep of kvm->lock vs. vcpu->mutex
> https://github.com/kvm-x86/linux/commit/01528db67f28
>
> --
> https://github.com/kvm-x86/linux/tree/next
>
^ permalink raw reply [flat|nested] 18+ messages in thread