All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Maxim Levitsky <mlevitsk@redhat.com>
Cc: kvm@vger.kernel.org, Sandipan Das <sandipan.das@amd.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Jim Mattson <jmattson@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Borislav Petkov <bp@alien8.de>,
	Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	Josh Poimboeuf <jpoimboe@kernel.org>,
	Daniel Sneddon <daniel.sneddon@linux.intel.com>,
	Jiaxi Chen <jiaxi.chen@linux.intel.com>,
	Babu Moger <babu.moger@amd.com>,
	linux-kernel@vger.kernel.org, Jing Liu <jing2.liu@intel.com>,
	Wyes Karny <wyes.karny@amd.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [PATCH v2 02/11] KVM: nSVM: clean up the copying of V_INTR bits from vmcb02 to vmcb12
Date: Tue, 31 Jan 2023 01:44:12 +0000	[thread overview]
Message-ID: <Y9hybI65So5X2LFg@google.com> (raw)
In-Reply-To: <Y9RuQz8dAT7DZGYk@google.com>

On Sat, Jan 28, 2023, Sean Christopherson wrote:
> On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> > the V_IRQ and v_TPR bits don't exist when virtual interrupt
> > masking is not enabled, therefore the KVM should not copy these
> > bits regardless of V_IRQ intercept.
> 
> Hmm, the APM disagrees:
> 
>  The APIC's TPR always controls the task priority for physical interrupts, and the
>  V_TPR always controls virtual interrupts.
> 
>    While running a guest with V_INTR_MASKING cleared to 0:
>      • Writes to CR8 affect both the APIC's TPR and the V_TPR register.
> 
> 
>  ...
> 
>  The three VMCB fields V_IRQ, V_INTR_PRIO, and V_INTR_VECTOR indicate whether there
>  is a virtual interrupt pending, and, if so, what its vector number and priority are.
> 
> IIUC, V_INTR_MASKING_MASK is mostly about EFLAGS.IF, with a small side effect on
> TPR.  E.g. a VMM could pend a V_IRQ but clear V_INTR_MASKING and expect the guest
> to take the V_IRQ.  At least, that's my reading of things.
>
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/svm/nested.c | 23 ++++++++---------------
> >  1 file changed, 8 insertions(+), 15 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index 37af0338da7c32..aad3145b2f62fe 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -412,24 +412,17 @@ void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
> >   */
> >  void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
> >  {
> > -	u32 mask;
> > +	u32 mask = 0;
> >  	svm->nested.ctl.event_inj      = svm->vmcb->control.event_inj;
> >  	svm->nested.ctl.event_inj_err  = svm->vmcb->control.event_inj_err;
> >  
> > -	/* Only a few fields of int_ctl are written by the processor.  */
> > -	mask = V_IRQ_MASK | V_TPR_MASK;
> > -	if (!(svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) &&
> > -	    svm_is_intercept(svm, INTERCEPT_VINTR)) {
> > -		/*
> > -		 * In order to request an interrupt window, L0 is usurping
> > -		 * svm->vmcb->control.int_ctl and possibly setting V_IRQ
> > -		 * even if it was clear in L1's VMCB.  Restoring it would be
> > -		 * wrong.  However, in this case V_IRQ will remain true until
> > -		 * interrupt_window_interception calls svm_clear_vintr and
> > -		 * restores int_ctl.  We can just leave it aside.
> > -		 */
> > -		mask &= ~V_IRQ_MASK;

Argh! *shakes fist at KVM and SVM*

This is ridiculously convoluted, and I'm pretty sure there are existing bugs.  If
L1 runs L2 with V_IRQ=1 and V_INTR_MASKING=1, and KVM requests an interrupt window,
then KVM will overwrite vmcb02's int_vector and int_ctl, i.e. clobber L1's V_IRQ,
but then silently clear INTERCEPT_VINTR in recalc_intercepts() and thus prevent
svm_clear_vintr() from being reached, i.e. prevent restoring L1's V_IRQ.

Bug #1 is that KVM shouldn't clobber the V_IRQ fields if KVM ultimately decides
not to open an interrupt window.  Bug #2 is that KVM needs to open an interrupt
window if save.RFLAGS.IF=1, as interrupts may become unblocked in that case,
e.g. if L2 is in an interrupt shadow.

So I think this over two patches?

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 05d38944a6c0..ad1e70ac8669 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -139,13 +139,18 @@ void recalc_intercepts(struct vcpu_svm *svm)
 
        if (g->int_ctl & V_INTR_MASKING_MASK) {
                /*
-                * Once running L2 with HF_VINTR_MASK, EFLAGS.IF and CR8
-                * does not affect any interrupt we may want to inject;
-                * therefore, writes to CR8 are irrelevant to L0, as are
-                * interrupt window vmexits.
+                * If L2 is active and V_INTR_MASKING is enabled in vmcb12,
+                * disable intercept of CR8 writes as L2's CR8 does not affect
+                * any interrupt KVM may want to inject.
+                *
+                * Similarly, disable intercept of virtual interrupts (used to
+                * detect interrupt windows) if the saved RFLAGS.IF is '0', as
+                * the effective RFLAGS.IF for L1 interrupts will never be set
+                * while L2 is running (L2's RFLAGS.IF doesn't affect L1 IRQs).
                 */
                vmcb_clr_intercept(c, INTERCEPT_CR8_WRITE);
-               vmcb_clr_intercept(c, INTERCEPT_VINTR);
+               if (!(svm->vmcb01.ptr->save.rflags & X86_EFLAGS_IF))
+                       vmcb_clr_intercept(c, INTERCEPT_VINTR);
        }
 
        /*
@@ -416,18 +421,18 @@ void nested_sync_control_from_vmcb02(struct vcpu_svm *svm)
 
        /* Only a few fields of int_ctl are written by the processor.  */
        mask = V_IRQ_MASK | V_TPR_MASK;
-       if (!(svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) &&
-           svm_is_intercept(svm, INTERCEPT_VINTR)) {
-               /*
-                * In order to request an interrupt window, L0 is usurping
-                * svm->vmcb->control.int_ctl and possibly setting V_IRQ
-                * even if it was clear in L1's VMCB.  Restoring it would be
-                * wrong.  However, in this case V_IRQ will remain true until
-                * interrupt_window_interception calls svm_clear_vintr and
-                * restores int_ctl.  We can just leave it aside.
-                */
+
+       /*
+        * Don't sync vmcb02 V_IRQ back to vmcb12 if KVM (L0) is intercepting
+        * virtual interrupts in order to request an interrupt window, as KVM
+        * has usurped vmcb02's int_ctl.  If an interrupt window opens before
+        * the next VM-Exit, svm_clear_vintr() will restore vmcb12's int_ctl.
+        * If no window opens, V_IRQ will be correctly preserved in vmcb12's
+        * int_ctl (because it was never recognized while L2 was running).
+        */
+       if (svm_is_intercept(svm, INTERCEPT_VINTR) &&
+           !test_bit(INTERCEPT_VINTR, (unsigned long *)svm->nested.ctl.intercepts))
                mask &= ~V_IRQ_MASK;
-       }
 
        if (nested_vgif_enabled(svm))
                mask |= V_GIF_MASK;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b103fe7cbc82..59d2891662ef 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1580,6 +1580,16 @@ static void svm_set_vintr(struct vcpu_svm *svm)
 
        svm_set_intercept(svm, INTERCEPT_VINTR);
 
+       /*
+        * Recalculating intercepts may have clear the VINTR intercept.  If
+        * V_INTR_MASKING is enabled in vmcb12, then the effective RFLAGS.IF
+        * for L1 physical interrupts is L1's RFLAGS.IF at the time of VMRUN.
+        * Requesting an interrupt window if save.RFLAGS.IF=0 is pointless as
+        * interrupts will never be unblocked while L2 is running.
+        */
+       if (!svm_is_intercept(svm, INTERCEPT_VINTR))
+               return;
+
        /*
         * This is just a dummy VINTR to actually cause a vmexit to happen.
         * Actual injection of virtual interrupts happens through EVENTINJ.

  reply	other threads:[~2023-01-31  1:44 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-29 19:37 [PATCH v2 00/11] SVM: vNMI (with my fixes) Maxim Levitsky
2022-11-29 19:37 ` [PATCH v2 01/11] KVM: nSVM: don't sync back tlb_ctl on nested VM exit Maxim Levitsky
2022-12-05 14:05   ` Santosh Shukla
2022-12-06 12:13     ` Maxim Levitsky
2022-11-29 19:37 ` [PATCH v2 02/11] KVM: nSVM: clean up the copying of V_INTR bits from vmcb02 to vmcb12 Maxim Levitsky
2023-01-28  0:37   ` Sean Christopherson
2023-01-31  1:44     ` Sean Christopherson [this message]
2023-02-24 14:38       ` Maxim Levitsky
2023-02-24 16:48         ` Sean Christopherson
2022-11-29 19:37 ` [PATCH v2 03/11] KVM: nSVM: explicitly raise KVM_REQ_EVENT on nested VM exit if L1 doesn't intercept interrupts Maxim Levitsky
2023-01-28  0:56   ` Sean Christopherson
2023-01-30 18:41     ` Sean Christopherson
2022-11-29 19:37 ` [PATCH v2 04/11] KVM: SVM: drop the SVM specific H_FLAGS Maxim Levitsky
2022-12-05 15:31   ` Santosh Shukla
2023-01-28  0:56   ` Sean Christopherson
2022-11-29 19:37 ` [PATCH v2 05/11] KVM: x86: emulator: stop using raw host flags Maxim Levitsky
2023-01-28  0:58   ` Sean Christopherson
2023-02-24 14:38     ` Maxim Levitsky
2022-11-29 19:37 ` [PATCH v2 06/11] KVM: SVM: add wrappers to enable/disable IRET interception Maxim Levitsky
2022-12-05 15:41   ` Santosh Shukla
2022-12-06 12:14     ` Maxim Levitsky
2022-12-08 12:09       ` Santosh Shukla
2022-12-08 13:44         ` Maxim Levitsky
2023-01-31 21:07           ` Sean Christopherson
2023-02-13 14:50             ` Santosh Shukla
2022-11-29 19:37 ` [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface Maxim Levitsky
2023-01-28  1:09   ` Sean Christopherson
2023-01-31 21:12     ` Sean Christopherson
2023-02-08  9:35       ` Santosh Shukla
2023-02-08  9:32     ` Santosh Shukla
2023-02-24 14:39     ` Maxim Levitsky
2023-01-31 22:28   ` Sean Christopherson
2023-02-01  0:06     ` Sean Christopherson
2023-02-08  9:51       ` Santosh Shukla
2023-02-08 16:09         ` Sean Christopherson
2023-02-08  9:43     ` Santosh Shukla
2023-02-08 16:06       ` Sean Christopherson
2023-02-14 10:22         ` Santosh Shukla
2023-02-15 22:43           ` Sean Christopherson
2023-02-16  0:22             ` Sean Christopherson
2023-02-17  7:56               ` Santosh Shukla
2022-11-29 19:37 ` [PATCH v2 08/11] x86/cpu: Add CPUID feature bit for VNMI Maxim Levitsky
2022-11-29 19:37 ` [PATCH v2 09/11] KVM: SVM: Add VNMI bit definition Maxim Levitsky
2023-01-31 22:42   ` Sean Christopherson
2023-02-02  9:42     ` Santosh Shukla
2022-11-29 19:37 ` [PATCH v2 10/11] KVM: SVM: implement support for vNMI Maxim Levitsky
2022-12-04 17:18   ` Maxim Levitsky
2022-12-05 17:07   ` Santosh Shukla
2023-01-28  1:10   ` Sean Christopherson
2023-02-10 12:02     ` Santosh Shukla
2023-02-01  0:22   ` Sean Christopherson
2023-02-01  0:39     ` Sean Christopherson
2023-02-10 12:24     ` Santosh Shukla
2023-02-10 16:44       ` Sean Christopherson
2022-11-29 19:37 ` [PATCH v2 11/11] KVM: nSVM: implement support for nested VNMI Maxim Levitsky
2022-12-05 17:14   ` Santosh Shukla
2022-12-06 12:19     ` Maxim Levitsky
2022-12-08 12:11       ` Santosh Shukla
2023-02-01  0:44   ` Sean Christopherson
2022-12-06  9:58 ` [PATCH v2 00/11] SVM: vNMI (with my fixes) Santosh Shukla
2023-02-01  0:24   ` Sean Christopherson
2022-12-20 10:27 ` Maxim Levitsky
2022-12-21 18:44   ` Sean Christopherson
2023-01-15  9:05 ` Maxim Levitsky
2023-01-28  1:13 ` Sean Christopherson
2023-02-01 19:13 ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y9hybI65So5X2LFg@google.com \
    --to=seanjc@google.com \
    --cc=babu.moger@amd.com \
    --cc=bp@alien8.de \
    --cc=daniel.sneddon@linux.intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jiaxi.chen@linux.intel.com \
    --cc=jing2.liu@intel.com \
    --cc=jmattson@google.com \
    --cc=jpoimboe@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mlevitsk@redhat.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sandipan.das@amd.com \
    --cc=tglx@linutronix.de \
    --cc=wyes.karny@amd.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.