public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Michal Luczaj <mhal@rbox.co>
Cc: pbonzini@redhat.com, shuah@kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH 1/3] KVM: x86: Fix out-of-bounds access in kvm_recalculate_phys_map()
Date: Fri, 26 May 2023 11:11:36 -0700	[thread overview]
Message-ID: <ZHD2WFeYmfkpuzJQ@google.com> (raw)
In-Reply-To: <f4c2108d-b5a0-bdee-f354-28ed7e5d4bd5@rbox.co>

On Fri, May 26, 2023, Michal Luczaj wrote:
> On 5/26/23 18:17, Sean Christopherson wrote:
> > On Fri, May 26, 2023, Michal Luczaj wrote:
> >> On 5/26/23 02:07, Sean Christopherson wrote:
> >>> On Thu, May 25, 2023, Michal Luczaj wrote:
> >>>> @@ -265,10 +265,14 @@ static int kvm_recalculate_phys_map(struct kvm_apic_map *new,
> >>>>  		 * mapped, i.e. is aliased to multiple vCPUs.  The optimized
> >>>>  		 * map requires a strict 1:1 mapping between IDs and vCPUs.
> >>>>  		 */
> >>>> -		if (apic_x2apic_mode(apic))
> >>>> +		if (apic_x2apic_mode(apic)) {
> >>>> +			if (x2apic_id > new->max_apic_id)
> >>>> +				return -EINVAL;
> >>>
> >>> Hmm, disabling the optimized map just because userspace created a new vCPU is
> >>> unfortunate and unnecessary.  Rather than return -EINVAL and only perform the
> >>> check when x2APIC is enabled, what if we instead do the check immediately and
> >>> return -E2BIG?  Then the caller can retry with a bigger array size.  Preemption
> >>> is enabled and retries are bounded by the number of possible vCPUs, so I don't
> >>> see any obvious issues with retrying.
> >>
> >> Right, that makes perfect sense.
> >>
> >> Just a note, it changes the logic a bit:
> >>
> >> - x2apic_format: an overflowing x2apic_id won't be silently ignored.
> > 
> > Nit, I wouldn't describe the current behavior as silently ignored.  KVM doesn't
> > ignore the case, KVM instead disables the optimized map.
> 
> I may be misusing "silently ignored",

You're not.

> but currently if (x2apic_format && apic_x2apic_mode && x2apic_id >
> new->max_apic_id) new->phys_map[x2apic_id] remains unchanged, then
> kvm_recalculate_phys_map() returns 0 (not -EINVAL).  I.e. this does not
> result in rcu_assign_pointer(kvm->arch.apic_map, NULL).

My bad.  I was thinking of the -EINVAL from your patch.

Hmm, thinking more about the "silently ignored" case, it's only temporarily
ignored.  If a x2APIC ID is out-of-bounds, then there had to have been a write
to a vCPU's x2APIC ID between the two instances of kvm_for_each_vcpu(), and that
means that whatever changed the ID must also mark apic_map_dirty DIRTY and call
kvm_recalculate_apic_map().  I.e. another recalc will soon occur and cleanup the
mess.

There's still value in retrying, but it's not as much as an optimization as it
first appears.  That means the bug fix can be a separate patch from the retry.
Oh, and the retry can also redo the DIRTY=>IN_PROGRESS so that whatever modified
a vCPU's x2APIC ID doesn't perform an unnecessary recalculation.

E.g. this (plus comments) for stable@

---
 arch/x86/kvm/lapic.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e542cf285b51..7a83df7f45c5 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -228,6 +228,12 @@ static int kvm_recalculate_phys_map(struct kvm_apic_map *new,
 	u32 xapic_id = kvm_xapic_id(apic);
 	u32 physical_id;
 
+	if (WARN_ON_ONCE(xapic_id >= new->max_apic_id))
+		return -EINVAL;
+
+	if (x2apic_id >= new->max_apic_id)
+		return -E2BIG;
+
 	/*
 	 * Deliberately truncate the vCPU ID when detecting a mismatched APIC
 	 * ID to avoid false positives if the vCPU ID, i.e. x2APIC ID, is a
@@ -253,8 +259,7 @@ static int kvm_recalculate_phys_map(struct kvm_apic_map *new,
 	 */
 	if (vcpu->kvm->arch.x2apic_format) {
 		/* See also kvm_apic_match_physical_addr(). */
-		if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
-			x2apic_id <= new->max_apic_id)
+		if (apic_x2apic_mode(apic) || x2apic_id > 0xff)
 			new->phys_map[x2apic_id] = apic;
 
 		if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])

base-commit: 39428f6ea9eace95011681628717062ff7f5eb5f
-- 


and then as a pure optimization

---
 arch/x86/kvm/lapic.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 7a83df7f45c5..09e13ded34ef 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -369,8 +369,9 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 	struct kvm_apic_map *new, *old = NULL;
 	struct kvm_vcpu *vcpu;
 	unsigned long i;
-	u32 max_id = 255; /* enough space for any xAPIC ID */
-	bool xapic_id_mismatch = false;
+	u32 max_id;
+	bool xapic_id_mismatch;
+	int r;
 
 	/* Read kvm->arch.apic_map_dirty before kvm->arch.apic_map.  */
 	if (atomic_read_acquire(&kvm->arch.apic_map_dirty) == CLEAN)
@@ -380,9 +381,14 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 		  "Dirty APIC map without an in-kernel local APIC");
 
 	mutex_lock(&kvm->arch.apic_map_lock);
+
+retry:
 	/*
-	 * Read kvm->arch.apic_map_dirty before kvm->arch.apic_map
-	 * (if clean) or the APIC registers (if dirty).
+	 * Read kvm->arch.apic_map_dirty before kvm->arch.apic_map (if clean)
+	 * or the APIC registers (if dirty).  Note, on retry the map may have
+	 * not yet been marked dirty by whatever task changed a vCPU's x2APIC
+	 * ID, i.e. the map may still show up as in-progress.  In that case
+	 * this task still needs to retry and copmlete its calculation.
 	 */
 	if (atomic_cmpxchg_acquire(&kvm->arch.apic_map_dirty,
 				   DIRTY, UPDATE_IN_PROGRESS) == CLEAN) {
@@ -391,6 +397,9 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 		return;
 	}
 
+	max_id = 255; /* enough space for any xAPIC ID */
+	xapic_id_mismatch = false;
+
 	kvm_for_each_vcpu(i, vcpu, kvm)
 		if (kvm_apic_present(vcpu))
 			max_id = max(max_id, kvm_x2apic_id(vcpu->arch.apic));
@@ -409,9 +418,13 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 		if (!kvm_apic_present(vcpu))
 			continue;
 
-		if (kvm_recalculate_phys_map(new, vcpu, &xapic_id_mismatch)) {
+		r = kvm_recalculate_phys_map(new, vcpu, &xapic_id_mismatch);
+		if (r) {
 			kvfree(new);
 			new = NULL;
+			if (r == -E2BIG)
+				goto retry;
+
 			goto out;
 		}
 

base-commit: 15cc2fa07b5867ae5a6f17656a6f272cfb393810
-- 

  reply	other threads:[~2023-05-26 18:11 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-25 18:33 [PATCH 0/3] Out-of-bounds access in kvm_recalculate_phys_map() Michal Luczaj
2023-05-25 18:33 ` [PATCH 1/3] KVM: x86: Fix out-of-bounds " Michal Luczaj
2023-05-26  0:07   ` Sean Christopherson
2023-05-26 10:52     ` Michal Luczaj
2023-05-26 16:17       ` Sean Christopherson
2023-05-26 17:17         ` Michal Luczaj
2023-05-26 18:11           ` Sean Christopherson [this message]
2023-05-26 22:27           ` Sean Christopherson
2023-05-25 18:33 ` [PATCH 2/3] KVM: x86: Simplify APIC ID selection " Michal Luczaj
2023-05-25 18:33 ` [PATCH 3/3] KVM: selftests: Add test for race in kvm_recalculate_apic_map() Michal Luczaj
2023-05-26 23:40   ` Sean Christopherson
2023-05-28 17:26     ` Michal Luczaj
2023-06-02  0:26       ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZHD2WFeYmfkpuzJQ@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=mhal@rbox.co \
    --cc=pbonzini@redhat.com \
    --cc=shuah@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox