From: Tao Su <tao1.su@linux.intel.com>
To: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, pbonzini@redhat.com, chao.gao@intel.com,
guang.zeng@intel.com, yi1.lai@intel.com
Subject: Re: [PATCH 2/2] KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit
Date: Thu, 7 Sep 2023 17:56:31 +0800 [thread overview]
Message-ID: <ZPmeKHNJQpgoIZmW@linux.bj.intel.com> (raw)
In-Reply-To: <ZPj6iF0Q7iynn62p@google.com>
On Wed, Sep 06, 2023 at 03:17:44PM -0700, Sean Christopherson wrote:
> On Wed, Sep 06, 2023, Tao Su wrote:
> > On Tue, Sep 05, 2023 at 04:03:36PM -0700, Sean Christopherson wrote:
> > > +Suravee
> > >
> > > On Mon, Sep 04, 2023, Tao Su wrote:
> > > > When IPI virtualization is enabled, a WARN is triggered if bit12 of ICR
> > > > MSR is set after APIC-write VM-exit. The reason is kvm_apic_send_ipi()
> > > > thinks the APIC_ICR_BUSY bit should be cleared because KVM has no delay,
> > > > but kvm_apic_write_nodecode() doesn't clear the APIC_ICR_BUSY bit.
> > > >
> > > > Since bit12 of ICR is no longer BUSY bit but UNUSED bit in x2APIC mode,
> > > > and SDM has no detail about how hardware will handle the UNUSED bit12
> > > > set, we tested on Intel CPU (SRF/GNR) with IPI virtualization and found
> > > > the UNUSED bit12 was also cleared by hardware without #GP. Therefore,
> > > > the clearing of bit12 should be still kept being consistent with the
> > > > hardware behavior.
> > >
> > > I'm confused. If hardware clears the bit, then why is it set in the vAPIC page
> > > after a trap-like APIC-write VM-Exit? In other words, how is this not a ucode
> > > or hardware bug?
> >
> > Sorry, I didn't describe it clearly.
> >
> > On bare-metal, bit12 of ICR MSR will be cleared after setting this bit.
> >
> > If bit12 is set in guest, the bit is not cleared in the vAPIC page after APIC-write
> > VM-Exit. So whether to clear bit12 in vAPIC page needs to be considered.
>
> I got that, the behavior just seems odd to me. And I'm grumpy that Intel punted
> the problem to software. But the SDM specifically calls out that this is the
> correct behavior :-/
>
> Specifically, in the context of IPI virtualization:
>
> If ECX contains 830H, a general-protection fault occurs if any of bits 31:20,
> 17:16, or 13 of EAX is non-zero.
>
> and
>
> If ECX contains 830H, the processor then checks the value of VICR to determine
> whether the following are all true:
>
> Bits 19:18 (destination shorthand) are 00B (no shorthand).
> Bit 15 (trigger mode) is 0 (edge).
> Bit 12 (unused) is 0.
> Bit 11 (destination mode) is 0 (physical).
> Bits 10:8 (delivery mode) are 000B (fixed).
>
> If all of the items above are true, the processor performs IPI virtualization
> using the 8-bit vector in byte 0 of VICR and the 32-bit APIC ID in VICR[63:32]
> (see Section 30.1.6). Otherwise, the logical processor causes an APIC-write VM
> exit (see Section 30.4.3.3). If ECX contains 830H, the processor then checks
> the value of VICR to determine whether the following are all true:
>
> I.e. the "unused" busy bit must be zero. The part that makes me grumpy is that
> hardware does check that other reserved bits are actually zero:
>
> If special processing applies, no general-protection exception is produced due
> to the fact that the local APIC is in xAPIC mode. However, WRMSR does perform
> the normal reserved-bit checking:
> - If ECX contains 808H or 83FH, a general-protection fault occurs if either EDX or EAX[31:8] is non-zero.
> - If ECX contains 80BH, a general-protection fault occurs if either EDX or EAX is non-zero.
> - If ECX contains 830H, a general-protection fault occurs if any of bits 31:20, 17:16, or 13 of EAX is non-zero.
>
> Which implies that the hardware *does* enforce all the other reserved bits, but
> punted bit 12 to the hypervisor :-(
>
> That said, I think we have an "out". Under the x2APIC section, regarding ICR,
> the SDM also says:
>
> It remains readable only to aid in debugging; however, software should not
> assume the value returned by reading the ICR is the last written value.
>
> I.e. KVM basically has free reign to do whatever it wants, so long as it doesn't
> confuse userspace or break KVM's ABI. As much as I want to say "do nothing",
> clearing bit 12 so that it reads back as '0' is the safer approach. Just please
> don't add a new #define for, it's far easier to understand what's going on if we
> just use APIC_ICR_BUSY, especially given that I highly doubt the bit will actually
> be repurposed for something new.
Thanks for such a detailed analysis. I will submit a patch to clear bit12 since it is
a safer approach, and also drop the unnecessary alias.
>
> FWIW, I also suspect that hardware isn't clearing the busy bit per se, I suspect
> that hardware simply reads the bit as zero.
>
> Side topic, unless I'm blind, KVM is missing the reserved bits #GP checks for ICR
> bits bits 31:20, 17:16, and 13.
Yes, it is needed to check the reserved bits of ICR and inject #GP when IPI virtualization
disabled (otherwise it is injected by hardware). The related kselftest is also necessary
to verify the #GP is correctly emulated.
Thanks,
Tao
next prev parent reply other threads:[~2023-09-07 16:41 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-04 1:35 [PATCH 0/2] KVM: x86: Fix a WARN in kvm_apic_send_ipi() Tao Su
2023-09-04 1:35 ` [PATCH 1/2] x86/apic: Introduce X2APIC_ICR_UNUSED_12 for x2APIC mode Tao Su
2023-09-04 2:58 ` Chao Gao
2023-09-04 3:03 ` Tao Su
2023-09-04 1:35 ` [PATCH 2/2] KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit Tao Su
2023-09-04 2:46 ` Chao Gao
2023-09-04 3:00 ` Tao Su
2023-09-04 4:16 ` kernel test robot
2023-09-04 5:02 ` Tao Su
2023-09-05 23:03 ` Sean Christopherson
2023-09-06 5:07 ` Tao Su
2023-09-06 22:17 ` Sean Christopherson
2023-09-07 9:56 ` Tao Su [this message]
2023-09-24 13:58 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZPmeKHNJQpgoIZmW@linux.bj.intel.com \
--to=tao1.su@linux.intel.com \
--cc=chao.gao@intel.com \
--cc=guang.zeng@intel.com \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=yi1.lai@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.