Re: [RFC PATCH 13/13] KVM: nSVM: Stop bombing the TLB on nested transitions

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 13/13] KVM: nSVM: Stop bombing the TLB on nested transitions
Date: Wed, 5 Mar 2025 06:45:14 +0000	[thread overview]
Message-ID: <Z8fy-saRNCC031jw@google.com> (raw)
In-Reply-To: <36d8ffbda9e69c5245ded717e7491f6fcd5ca72e.camel@redhat.com>

On Tue, Mar 04, 2025 at 10:14:40PM -0500, Maxim Levitsky wrote:
> On Mon, 2025-03-03 at 22:21 +0000, Yosry Ahmed wrote:
> > On Fri, Feb 28, 2025 at 09:21:54PM -0500, Maxim Levitsky wrote:
> > > On Wed, 2025-02-05 at 18:24 +0000, Yosry Ahmed wrote:
> > > > Now that nested TLB flushes are properly tracked with a well-maintained
> > > > separate ASID for L2 and proper handling of L1's TLB flush requests,
> > > > drop the unconditional flushes and syncs on nested transitions.
> > > > 
> > > > On a Milan machine, an L1 and L2 guests were booted, both with a single
> > > > vCPU, and pinned to a single physical CPU to maximize TLB collisions. In
> > > > this setup, the cpuid_rate microbenchmark [1] showed the following
> > > > changes with this patch:
> > > > 
> > > > +--------+--------+-------------------+----------------------+
> > > > > L0     | L1     | cpuid_rate (base) | cpuid_rate (patched) |
> > > > +========+========+===================+======================+
> > > > > NPT    | NPT    | 256621            | 301113 (+17.3%)      |
> > > > > NPT    | Shadow | 180017            | 203347 (+12.96%)     |
> > > > > Shadow | Shadow | 177006            | 189150 (+6.86%)      |
> > > > +--------+--------+-------------------+----------------------+
> > > > 
> > > > [1]https://lore.kernel.org/kvm/20231109180646.2963718-1-khorenko@virtuozzo.com/
> > > > 
> > > > Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> > > > ---
> > > >  arch/x86/kvm/svm/nested.c | 7 -------
> > > >  1 file changed, 7 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > > > index 8e40ff21f7353..45a187d4c23d1 100644
> > > > --- a/arch/x86/kvm/svm/nested.c
> > > > +++ b/arch/x86/kvm/svm/nested.c
> > > > @@ -512,9 +512,6 @@ static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
> > > >  		svm->nested.last_asid = svm->nested.ctl.asid;
> > > >  		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> > > >  	}
> > > > -	/* TODO: optimize unconditional TLB flush/MMU sync */
> > > > -	kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
> > > > -	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> > > >  }
> > > >  
> > > >  static void nested_svm_exit_tlb_flush(struct kvm_vcpu *vcpu)
> > > > @@ -530,10 +527,6 @@ static void nested_svm_exit_tlb_flush(struct kvm_vcpu *vcpu)
> > > >  	 */
> > > >  	if (svm->nested.ctl.tlb_ctl == TLB_CONTROL_FLUSH_ALL_ASID)
> > > >  		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> > > > -
> > > > -	/* TODO: optimize unconditional TLB flush/MMU sync */
> > > > -	kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
> > > > -	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> > > >  }
> > > >  
> > > >  /*
> > > 
> > > Assuming that all previous patches are correct this one should work as well.
> > > 
> > > However only a very heavy stress testing, including hyperv, windows guests
> > > of various types, etc can give me confidence that there is no some ugly bug lurking
> > > somewhere.
> > 
> > I tried booting an L2 and running some workloads like netperf in there.
> > I also tried booting an L3.
> > 
> > I am planning to try and run some testing with a windows L2 guest. I am
> > assuming this exercises the hyper-V emulation in L1, which could be
> > interesting.
> > 
> > I am not sure if I will be able to test more scenarios though,
> > especially Windows as an L1 (and something else as an L2).
> > 
> > Let me know if you have something specific in mind.
> 
> 
> KVM can run itself 'under' HyperV (although in this case when it runs a guest
> the guest will be L3 overall, so not really something supported but still something that might
> reveal bugs).
> In this case KVM/L1 can take advantage of L0's TLB flush interface.

I don't think I will be able to test on Hyper-V.

> 
> Stress testing L3s also can be nice, although in this case from L0 POV, it doesn't see L3 at all.
> Instead it sees that L1 runs two different L2s back to back, so the current code will
> likely flush everything all the time.

I did run an L3 in an attempt to shake out any bugs.

> 
> 
> The direct TLB flush that hyperv does, especially from L2 to L0 should also be tested,
> it's a relatively new feature, so we need to check that L2 actually uses it.

Is this when KVM is emulating Hyper-V for nested guests, or when KVM is
running on top of Hyper-V? If the latter, as I said earlier I am not
sure if I will be able to test that.

> 
> KVM also has its own way of TLB flushing paravirtualization, which can in theory interfere with this.
> 
> 
> It's also nice to run a hyperv enabled Windows as KVM guest, and run a guest in it (can be Windows or Linux or anything else)
> Such guest will run two L2 VMs, Windows itself and the VM you run inside.

Yeah that's something I intend on doing. Sean mentioned that recent
Windows versions run the OS in L1 on top of the hypervisor in L0, so I
think if I run a Windows VM I automatically get both L1 and L2. So just
running a Windows VM should exercise the TLB flushes. I will also try to
run WSL to have multiple L2 VMs. I believe that's what you are talking
about here.

> 
> 
> You can also try other L1s, like VirtualBox, VMware, running in Windows or Linux L1,
> and themselves can run a windows or Linux L2. 
> 
> You can also test other OSes like BSD* and such as L1, they might have a different TLB access pattern and
> might reveal something, who knows. These can also run L2s using their own hypervisors.
> 
> Running a very old (say Windows XP, or some very old Linux) as L2 might also reveal something.

Honestly, I don't think I have the time or resources to test other
operating systems or L1s tbh. Currently my plan is to try and exercise
more scenarios in a Linux L2 guest, and run a Windows guest as I
mentioned earlier.

> 
> (But don't try to run win95/98 - this OS is known to not flush TLB properly (it doesn't use INVLPG when it should),
> so it doesn't work well on AMD at all because of this).

Good to know :)

> 
> Finally, it might be worth it to develop a TLB stress test if one doesn't exist yet.

I also thought about this, but I think it would be very tricky to cover
all the cases, and we'd need the test to create an L1 that is
sophisticated enough to exercise different TLB flushing scenarios. I
think running an actual OS as L1 is probably exercising the TLB code
more that any test.

That being said, Sean did mention the 'access' tests in KUT, and I plan
to check how relevant they are and if they can easily extended to add
some coverage for this.

     prev parent reply	other threads:[~2025-03-05  6:45 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-05 18:23 [RFC PATCH 00/13] Optimize nSVM TLB flushes Yosry Ahmed
2025-02-05 18:23 ` [RFC PATCH 01/13] KVM: nSVM: Track the ASID per-VMCB Yosry Ahmed
2025-03-01  0:03   ` Sean Christopherson
2025-03-03 17:51     ` Jim Mattson
2025-03-03 18:53       ` Sean Christopherson
2025-03-03 19:18     ` Yosry Ahmed
2025-03-01  1:23   ` Maxim Levitsky
2025-03-03 19:31     ` Yosry Ahmed
2025-02-05 18:23 ` [RFC PATCH 02/13] KVM: nSVM: Rework svm_flush_tlb_asid() to operate on a given VMCB Yosry Ahmed
2025-03-01  1:29   ` Maxim Levitsky
2025-03-03 21:58     ` Yosry Ahmed
2025-03-05  2:52       ` Maxim Levitsky
2025-02-05 18:23 ` [RFC PATCH 03/13] KVM: nSVM: Split nested_svm_transition_tlb_flush() into entry/exit fns Yosry Ahmed
2025-03-01  1:34   ` Maxim Levitsky
2025-02-05 18:23 ` [RFC PATCH 04/13] KVM: SVM: Introduce helpers for updating TLB_CONTROL Yosry Ahmed
2025-03-01  1:37   ` Maxim Levitsky
2025-02-05 18:23 ` [RFC PATCH 05/13] KVM: x86/mmu: rename __kvm_mmu_invalidate_addr() Yosry Ahmed
2025-02-05 18:23 ` [RFC PATCH 06/13] KVM: x86/mmu: Allow skipping the gva flush in kvm_mmu_invalidate_addr() Yosry Ahmed
2025-02-05 18:23 ` [RFC PATCH 07/13] KVM: nSVM: Handle INVLPGA interception correctly Yosry Ahmed
2025-03-01  1:55   ` Maxim Levitsky
2025-03-03 22:05     ` Yosry Ahmed
2025-03-05  2:54       ` Maxim Levitsky
2025-03-05  6:20         ` Yosry Ahmed
2025-02-05 18:23 ` [RFC PATCH 08/13] KVM: nSVM: Flush both L1 and L2 ASIDs on KVM_REQ_TLB_FLUSH Yosry Ahmed
2025-03-01  1:58   ` Maxim Levitsky
2025-03-03 22:06     ` Yosry Ahmed
2025-02-05 18:23 ` [RFC PATCH 09/13] KVM: nSVM: Handle nested TLB flush requests through TLB_CONTROL Yosry Ahmed
2025-02-05 21:45   ` Yosry Ahmed
2025-03-01  2:06   ` Maxim Levitsky
2025-03-03 22:10     ` Yosry Ahmed
2025-03-05  2:57       ` Maxim Levitsky
2025-02-05 18:23 ` [RFC PATCH 10/13] KVM: nSVM: Flush the TLB if L1 changes L2's ASID Yosry Ahmed
2025-03-01  2:13   ` Maxim Levitsky
2025-02-05 18:24 ` [RFC PATCH 11/13] KVM: nSVM: Do not reset TLB_CONTROL in VMCB02 on nested entry Yosry Ahmed
2025-03-01  2:17   ` Maxim Levitsky
2025-03-03 22:14     ` Yosry Ahmed
2025-03-05  3:01       ` Maxim Levitsky
2025-03-05  6:34         ` Yosry Ahmed
2025-02-05 18:24 ` [RFC PATCH 12/13] KVM: nSVM: Service local TLB flushes before nested transitions Yosry Ahmed
2025-03-01  2:20   ` Maxim Levitsky
2025-03-03 22:18     ` Yosry Ahmed
2025-03-05  3:03       ` Maxim Levitsky
2025-03-05  6:37         ` Yosry Ahmed
2025-02-05 18:24 ` [RFC PATCH 13/13] KVM: nSVM: Stop bombing the TLB on " Yosry Ahmed
2025-03-01  2:21   ` Maxim Levitsky
2025-03-03 22:21     ` Yosry Ahmed
2025-03-05  3:14       ` Maxim Levitsky
2025-03-05  6:45         ` Yosry Ahmed [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z8fy-saRNCC031jw@google.com \
    --to=yosry.ahmed@linux.dev \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).