From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
Wanpeng Li <wanpengli@tencent.com>,
Jim Mattson <jmattson@google.com>,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
Nitesh Narayan Lal <nitesh@redhat.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] KVM: Optimize kvm_make_vcpus_request_mask() a bit
Date: Mon, 23 Aug 2021 10:03:46 +0200 [thread overview]
Message-ID: <87y28sk2ot.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <YR/yTDZR29AhKw6M@google.com>
Sean Christopherson <seanjc@google.com> writes:
> On Fri, Aug 20, 2021, Vitaly Kuznetsov wrote:
>> Iterating over set bits in 'vcpu_bitmap' should be faster than going
>> through all vCPUs, especially when just a few bits are set.
>>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>> virt/kvm/kvm_main.c | 49 +++++++++++++++++++++++++++++----------------
>> 1 file changed, 32 insertions(+), 17 deletions(-)
>>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 3e67c93ca403..0f873c5ed538 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -257,34 +257,49 @@ static inline bool kvm_kick_many_cpus(const struct cpumask *cpus, bool wait)
>> return true;
>> }
>>
>> +static void kvm_make_vcpu_request(struct kvm *kvm, struct kvm_vcpu *vcpu,
>> + unsigned int req, cpumask_var_t tmp)
>> +{
>> + int cpu = vcpu->cpu;
>
> This reminds me, syzbot found a data race a while back[*] in kvm_vcpu_kick()
> related to reading vcpu->cpu. That race is benign, but legitimate. I believe
> this code has a similar race, and I'm not as confident that it's benign.
>
> If the target vCPU changes vcpu->cpu after it's read by this code, then the IPI
> can sent to the wrong pCPU, e.g. this pCPU gets waylaid by an IRQ and the target
> vCPU is migrated to a new pCPU.
>
> The TL;DR is that the race is benign because the target vCPU is still guaranteed
> to see the request before entering the guest, even if the IPI goes to the wrong
> pCPU. I believe the same holds true for KVM_REQUEST_WAIT, e.g. if the lockless
> shadow PTE walk gets migrated to a new pCPU just before setting vcpu->mode to
> READING_SHADOW_PAGE_TABLES, this code can use a stale "cpu" for __cpumask_set_cpu().
> The race is benign because the vCPU would have to enter READING_SHADOW_PAGE_TABLES
> _after_ the SPTE modifications were made, as vcpu->cpu can't change while the vCPU
> is reading SPTEs. The same logic holds true for the case where the vCPU is migrated
> after the call to __cpumask_set_cpu(); the goal is to wait for the vCPU to return to
> OUTSIDE_GUEST_MODE, which is guaranteed if the vCPU is migrated even if this path
> doesn't wait for an ack from the _new_ pCPU.
>
> I'll send patches to fix the races later today, maybe they can be folded into
> v2? Even though the races are benign, I think they're worth fixing, if only to
> provide an opportunity to document why it's ok to send IPIs to the
> wrong pCPU.
You're blazingly fast as usual :-) I'll do v2 on top of your patches.
>
> [*] On an upstream kernel, but I don't think the bug report was posted to LKML.
>
>> +
>> + kvm_make_request(req, vcpu);
>> +
>> + if (!(req & KVM_REQUEST_NO_WAKEUP) && kvm_vcpu_wake_up(vcpu))
>> + return;
>> +
>> + if (tmp != NULL && cpu != -1 && cpu != raw_smp_processor_id() &&
>
> For large VMs, might be worth keeping get_cpu() in the caller in passing in @me?
The only reason against was that I've tried keeping the newly introduced
kvm_make_vcpu_request()'s interface nicer, like it can be reused some
day. Will get back to get_cpu() in v2.
>
>> + kvm_request_needs_ipi(vcpu, req))
>> + __cpumask_set_cpu(cpu, tmp);
>> +}
>> +
>> bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req,
>> struct kvm_vcpu *except,
>> unsigned long *vcpu_bitmap, cpumask_var_t tmp)
>> {
>> - int i, cpu, me;
>> + int i;
>> struct kvm_vcpu *vcpu;
>> bool called;
>>
>> - me = get_cpu();
>> -
>> - kvm_for_each_vcpu(i, vcpu, kvm) {
>> - if ((vcpu_bitmap && !test_bit(i, vcpu_bitmap)) ||
>> - vcpu == except)
>> - continue;
>> -
>> - kvm_make_request(req, vcpu);
>> - cpu = vcpu->cpu;
>> -
>> - if (!(req & KVM_REQUEST_NO_WAKEUP) && kvm_vcpu_wake_up(vcpu))
>> - continue;
>> + preempt_disable();
>>
>> - if (tmp != NULL && cpu != -1 && cpu != me &&
>> - kvm_request_needs_ipi(vcpu, req))
>> - __cpumask_set_cpu(cpu, tmp);
>> + if (likely(vcpu_bitmap)) {
>
> I don't think this is actually "likely". kvm_make_all_cpus_request() is by far
> the most common caller and does not pass in a vcpu_bitmap. Practically speaking
> I highly don't the code organization will matter, but from a documentation
> perspective it's wrong.
Right, I was thinking more about two other users: IOAPIC and Hyper-V who
call kvm_make_vcpus_request_mask() directly but I agree that
kvm_make_all_cpus_request() is probably much more common.
>
>> + for_each_set_bit(i, vcpu_bitmap, KVM_MAX_VCPUS) {
>> + vcpu = kvm_get_vcpu(kvm, i);
>> + if (!vcpu || vcpu == except)
>> + continue;
>> + kvm_make_vcpu_request(kvm, vcpu, req, tmp);
>> + }
>> + } else {
>> + kvm_for_each_vcpu(i, vcpu, kvm) {
>> + if (vcpu == except)
>> + continue;
>> + kvm_make_vcpu_request(kvm, vcpu, req, tmp);
>> + }
>> }
>>
>> called = kvm_kick_many_cpus(tmp, !!(req & KVM_REQUEST_WAIT));
>> - put_cpu();
>> +
>> + preempt_enable();
>>
>> return called;
>> }
>> --
>> 2.31.1
>>
>
--
Vitaly
prev parent reply other threads:[~2021-08-23 8:03 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-20 12:43 [PATCH 1/2] KVM: Optimize kvm_make_vcpus_request_mask() a bit Vitaly Kuznetsov
2021-08-20 12:43 ` [PATCH 2/2] KVM: x86: Fix stack-out-of-bounds memory access from ioapic_write_indirect() Vitaly Kuznetsov
2021-08-20 18:19 ` [PATCH 1/2] KVM: Optimize kvm_make_vcpus_request_mask() a bit Sean Christopherson
2021-08-23 8:03 ` Vitaly Kuznetsov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y28sk2ot.fsf@vitty.brq.redhat.com \
--to=vkuznets@redhat.com \
--cc=dgilbert@redhat.com \
--cc=jmattson@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nitesh@redhat.com \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=wanpengli@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox