From: Sean Christopherson <seanjc@google.com>
To: Maxim Levitsky <mlevitsk@redhat.com>
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
x86@kernel.org, Borislav Petkov <bp@alien8.de>,
linux-kernel@vger.kernel.org,
Dave Hansen <dave.hansen@linux.intel.com>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH v2 2/2] KVM: VMX: disable preemption when touching segment fields
Date: Tue, 16 Jul 2024 15:36:11 -0700 [thread overview]
Message-ID: <Zpb127FsRoLdlaBb@google.com> (raw)
In-Reply-To: <20240716022014.240960-3-mlevitsk@redhat.com>
On Mon, Jul 15, 2024, Maxim Levitsky wrote:
> VMX code uses segment cache to avoid reading guest segment fields.
>
> The cache is reset each time a segment's field (e.g base/access rights/etc)
> is written, and then a new value of this field is written.
>
> However if the vCPU is preempted between these two events, and this
> segment field is read (e.g kvm reads SS's access rights to check
> if the vCPU is in kernel mode), then old field value will get
> cached and never updated.
It'be super helpful to include the gory details about how kvm_arch_vcpu_put()
reads stale data. Without that information, it's very hard to figure out how
getting preempted is problematic.
vmx_vcpu_reset resets the segment cache bitmask and then initializes
the segments in the vmcs, however if the vcpus is preempted in the
middle of this code, the kvm_arch_vcpu_put is called which
reads SS's AR bytes to determine if the vCPU is in the kernel mode,
which caches the old value.
> Usually a lock is required to avoid such race but since vCPU segments
> are only accessed by its vCPU thread, we can avoid a lock and
> only disable preemption, in places where the segment cache
> is invalidated and segment fields are updated.
This doesn't fully fix the problem. It's not just kvm_sched_out() => kvm_arch_vcpu_put()
that's problematic, it's any path that executes KVM code in interrupt context.
And it's not just limited to segment registers, any register that is conditionally
cached via arch.regs_avail is susceptible to races.
Specifically, kvm_guest_state() and kvm_guest_get_ip() will read SS.AR_bytes and
RIP in NMI and/or IRQ context when handling a PMI.
A few possible ideas.
1. Force reads from IRQ/NMI context to skip the cache and go to the VMCS.
2. Same thing as #1, but focus it specifically on kvm_arch_vcpu_in_kernel()
and kvm_arch_vcpu_get_ip(), and WARN if kvm_register_is_available() or
vmx_segment_cache_test_set() is invoked from IRQ or NMI context.
3. Force caching of SS.AR_bytes, CS.AR_bytes, and RIP prior to kvm_after_interrupt(),
rename preempted_in_kernel to something like "exited_in_kernel" and snapshot
it before kvm_after_interrupt(), and add the same hardening as #2.
This is doable because kvm_guest_state() should never read guest state for
PMIs that occur between VM-Exit and kvm_after_interrupt(), nor should KVM
write guest state in that window. And the intent of the "preempted in kernel"
check is to query vCPU state at the time of exit.
5. Do a combination of #3 and patch 02 (#3 fixes PMIs, patch 02 fixes preemption).
My vote is probably for #2 or #4. I definitely think we need WARNs in the caching
code, and in general kvm_arch_vcpu_put() shouldn't be reading cacheable state, i.e.
I am fairly confident we can restrict it to checking CPL.
I don't hate this patch by any means, but I don't love disabling preemption in a
bunch of flows just so that the preempted_in_kernel logic works.
next prev parent reply other threads:[~2024-07-16 22:36 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-16 2:20 [PATCH v2 0/2] Fix for a very old KVM bug in the segment cache Maxim Levitsky
2024-07-16 2:20 ` [PATCH v2 1/2] KVM: nVMX: use vmx_segment_cache_clear Maxim Levitsky
2024-07-16 21:07 ` Sean Christopherson
2024-07-24 18:18 ` Maxim Levitsky
2024-07-16 2:20 ` [PATCH v2 2/2] KVM: VMX: disable preemption when touching segment fields Maxim Levitsky
2024-07-16 22:36 ` Sean Christopherson [this message]
2024-07-25 12:59 ` Maxim Levitsky
2024-07-25 17:37 ` Maxim Levitsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zpb127FsRoLdlaBb@google.com \
--to=seanjc@google.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=mlevitsk@redhat.com \
--cc=pbonzini@redhat.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.