From: "Huang, Kai" <kai.huang@intel.com>
To: "Christopherson, Sean J" <sean.j.christopherson@intel.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"guangrong.xiao@gmail.com" <guangrong.xiao@gmail.com>,
"hpa@zytor.com" <hpa@zytor.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"bp@alien8.de" <bp@alien8.de>,
"thomas.lendacky@amd.com" <thomas.lendacky@amd.com>,
"rkrcmar@redhat.com" <rkrcmar@redhat.com>,
"brijesh.singh@amd.com" <brijesh.singh@amd.com>,
"junaids@google.com" <junaids@google.com>
Subject: Re: [PATCH] kvm: x86: Fix several SPTE mask calculation errors caused by MKTME
Date: Tue, 23 Apr 2019 23:38:17 +0000 [thread overview]
Message-ID: <1556062690.2776.95.camel@intel.com> (raw)
In-Reply-To: <20190423150808.GB10720@linux.intel.com>
On Tue, 2019-04-23 at 08:08 -0700, Sean Christopherson wrote:
> On Mon, Apr 22, 2019 at 06:57:01PM -0700, Huang, Kai wrote:
> > On Mon, 2019-04-22 at 09:39 -0700, Sean Christopherson wrote:
> > > On Tue, Apr 16, 2019 at 09:19:48PM +1200, Kai Huang wrote:
> > > > With both Intel MKTME and AMD SME/SEV, physical address bits are reduced
> > > > due to several high bits of physical address are repurposed for memory
> > > > encryption. To honor such behavior those repurposed bits are reduced from
> > > > cpuinfo_x86->x86_phys_bits for both Intel MKTME and AMD SME/SEV, thus
> > > > boot_cpu_data.x86_phys_bits doesn't hold physical address bits reported
> > > > by CPUID anymore.
> > >
> > > This neglects to mention the most relevant tidbit of information in terms
> > > of justification for this patch: the number of bits stolen for MKTME is
> > > programmed by BIOS, i.e. bits may be repurposed for MKTME regardless of
> > > kernel support.
> >
> > I can add BIOS part. But the key issue is kernel adjusts
> > boot_cpu_data.x86_phys_bits, isn't it?
> >
> > If kernel doesn't adjust boot_cpu_data.x86_phys_bits then this patch
> > theoretically is not needed?
>
> True, but the context matters, e.g. readers might wonder why this code
> doesn't simply check a feature flag to see if MKTME is enabled. Knowing
> that PA bits can be repurposed regardless of (full) kernel support is just
> as important as knowing that the kernel adjusts boot_cpu_data.x86_phys_bits.
Fine.
[...]
> > >
> > > > +
> > > >
> > > > static void mmu_spte_set(u64 *sptep, u64 spte);
> > > > static union kvm_mmu_page_role
> > > > @@ -303,6 +319,34 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask, u64 mmio_value)
> > > > }
> > > > EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
> > > >
> > > > +void kvm_set_mmio_spte_mask(void)
> > >
> > > Moving this to mmu.c makes sense, but calling it from kvm_arch_init() is
> > > silly. The current call site is immediately after kvm_mmu_module_init(),
> > > I don't see any reason not to just move the call there.
> >
> > I don't know whether calling kvm_set_mmio_spte_mask() from kvm_arch_init() is
> > silly but maybe there's histroy -- KVM calls kvm_mmu_set_mask_ptes() from
> > kvm_arch_init() too, which may also be silly according to your judgement. I
> > have no problem calling kvm_set_mmio_spte_mask() from kvm_mmu_module_init(),
> > but IMHO this logic is irrelevant to this patch, and it's better to have a
> > separate patch for this purpose if necessary?
>
> A separate patch would be fine, but I would do it as a prereq, i.e. move
> the function first and modify it second. That would help review the
> functional changes.
I think I am still a little bit confused. Assuming you want to call kvm_set_mmio_spte_mask() from
kvm_mmu_module_init(), right after kvm_mmu_reset_all_pte_masks() is called, does this change do any
real fix related to this issue? Or exactly what are you suggesting if I am misunderstanding?
Btw I think calling kvm_set_mmio_spte_mask() from kvm_arch_init() has benefits, since we can change
to call it after kvm_x86_ops has been setup, and then we can do some kvm_x86_ops->xxx() if needed
inside.
And we also have kvm_mmu_set_mask_ptes(), which seems to be similar thing as
kvm_set_mmio_spte_mask(). I don't see why we should call kvm_set_mmio_spte_mask() in
kvm_mmu_module_init() but leave kvm_mmu_set_mask_ptes() outside.
[...]
> > >
> >
> > > {MK}TME provides a CPUID feature bit and MSRs to query support, and this
> > > code only runs at module init so the overhead of a RDMSR is negligible.
> >
> > Yes can check against CPUID feature bit but not sure why it is better than CPU vendor.
>
> It helps readers understand the code flow, i.e. it's a mental cue that the
> bit stealing only applies to MKTME. Checking for "Intel" could easiliy be
> misinterpreted as "x86_phys_bits isn't accurate for Intel CPUs."
OK. Fine to me.
>
> > > > + shadow_first_rsvd_bit = boot_cpu_data.x86_phys_bits;
> > > > + else
> > > > + shadow_first_rsvd_bit = kvm_get_cpuid_phys_bits();
> > >
> > > If you rename the helper to be less specific and actually check for MKTME
> > > support then the MKTME comment doesn't need to be as verbose, e.g.:
> > >
> > > static u8 kvm_get_shadow_phys_bits(void)
> > > {
> > > if (!<has MKTME> ||
> > > WARN_ON_ONCE(boot_cpu_data.extended_cpuid_level < 0x80000008))
> > > return boot_cpu_data.x86_phys_bits;
> >
> > Why do we need WARN_ON_ONCE here?
>
> Because KVM would essentially be consuming known bad data since we've
> already established that 'x86_phys_bits' is wrong when MKTME is enabled.
> I.e. we should never encounter a platform with MKTME but not CPUID leaf
> 0x80000008.
Can you make assumption that compiler won't do WARN_ON_ONCE part before checking !<has_mktme>?
>
> > I don't have problem using as you suggested, but I don't get why checking
> > against CPU vendor is last resort?
>
> A few reasons of the top of my head:
>
> - CPU vendor is less precise, e.g. by checking for MKTME support KVM can
> WARN if it's consuming known bad data.
>
> - Checking for features makes the code self-documenting to some extent.
>
> - Multiple vendors may support a feature, now or in the future. E.g. the
> Hygon case is a great example.
OK.
>
> > >
> > > /*
> > > * MKTME steals physical address bits for key IDs, but the key ID bits
> > > * are not treated as reserved. x86_phys_bits is adjusted to account
> > > * for the stolen bits, use CPUID.MAX_PA_WIDTH directly which reports
> > > * the number of software-available bits irrespective of MKTME.
> > > */
> > > return cpuid_eax(0x80000008) & 0xff;
> > > }
> > >
> > > > +
> > > > + /*
> > > > + * Only Intel is impacted by L1TF, therefore for AMD and other x86
> > > > + * vendors L1TF mitigation is not needed.
> > > > + *
> > > > + * For Intel CPU, if it has 46 or less physical address bits, then set
> > > > + * an appropriate mask to guard against L1TF attacks. Otherwise, it is
> > > > * assumed that the CPU is not vulnerable to L1TF.
> > > > */
> > > > + if ((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) &&
> > >
> > > Again, checking vendors is bad. Not employing a mitigation technique due
> > > to an assumption about what processors are affected by a vulnerability is
> > > even worse.
> >
> > Why, isn't it a fact that L1TF only impacts Intel CPUs? What's the benefit of
> > employing a mitigation to CPUs that don't have L1TF issue? Such mitigation
> > only makes performance worse, even not noticable. For example, why not
> > employing such mitigation regardless to physical address bits at all?
>
> This particular mitigation has minimal effect on performance, e.g. adds
> a few SHL/SHR/AND/OR operations, and that minimal overhead is incurred
> regardless of whether or not reserved bits are set. I.e. KVM is paying
> the penalty no matter what, so being extra paranoid is "free".
>
> > > The existing code makes the assumption regarding processors
> > > with >46 bits of address space because a) no such processor existed before
> > > the discovery of L1TF, and it's reasonable to assume hardware vendors
> > > won't ship future processors with such an obvious vulnerability, and b)
> > > hardcoding the number of reserved bits to set simplifies the code.
> >
> > Yes, but we cannot simply use 'shadow_phys_bits' to check against 46 anymore,
> > right?
> >
> > For example, if AMD has 52 phys bits, but it reduces 5 or more bits, then
> > current KVM code would employ l1tf mitigation, but actually it really
> > shouldn't?
>
> What do you mean by "shouldn't"? Sure, it's not absolutely necessary, but
> again setting the bits is free since adjusting the GPA is hardcoded into
> mark_mmio_spte() and get_mmio_spte_gfn().
OK. I am not sure whether CPU has any difference when doing x << 5 and x << 0 but this is really
trival so will do what you suggested.
Or I found there's one CPU bug flag (software) X86_BUG_L1TF, do you think we should check whether
boot_cpu_has(X86_BUG_L1TF), and don't setup shadow_nonpresent_or_rsvd_mask if it doesn't?
>
> > > > + (shadow_first_rsvd_bit <
> > > > + 52 - shadow_nonpresent_or_rsvd_mask_len))
> > > > + need_l1tf = true;
> > > > + else
> > > > + need_l1tf = false;
> > > > +
> > > > low_phys_bits = boot_cpu_data.x86_phys_bits;
> > > > - if (boot_cpu_data.x86_phys_bits <
> > > > - 52 - shadow_nonpresent_or_rsvd_mask_len) {
> > > > + shadow_nonpresent_or_rsvd_mask = 0;
> > > > + if (need_l1tf) {
> > > > shadow_nonpresent_or_rsvd_mask =
> > > > rsvd_bits(boot_cpu_data.x86_phys_bits -
> > > > shadow_nonpresent_or_rsvd_mask_len,
> > >
> > > This is broken, the reserved bits mask is being calculated with the wrong
> > > number of physical bits. I think fixing this would eliminate the need for
> > > the high_gpa_offset shenanigans.
> >
> > You are right. should use 'shadow_phys_bits' instead. Thanks. Let me think whether
> > high_gpa_offset
> > can be avoided.
> >
> > >
> > > > @@ -4326,7 +4422,7 @@ static inline bool is_last_gpte(struct kvm_mmu *mmu,
> > > > static void
> > > > __reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
> > > > struct rsvd_bits_validate *rsvd_check,
> > > > - int maxphyaddr, int level, bool nx, bool gbpages,
> > > > + int first_rsvd_bit, int level, bool nx, bool gbpages,
> > > > bool pse, bool amd)
> > >
> > > Similar to the earlier comment regarding 'first', it's probably less
> > > confusing overall to just leave this as 'maxphyaddr'.
> >
> > Both work for me. But maybe 'non_rsvd_maxphyaddr' is better?
>
> Maybe? My personal preference would be to stay with maxphyaddr.
Fine to maxphyaddr. Btw, which one do you think is better, shadow_phys_bits, or more explicitly,
shadow_non_rsvd_phys_bits?
Thanks,
-Kai
next prev parent reply other threads:[~2019-04-23 23:40 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-16 9:19 [PATCH] kvm: x86: Fix several SPTE mask calculation errors caused by MKTME Kai Huang
2019-04-22 16:39 ` Sean Christopherson
2019-04-23 1:57 ` Huang, Kai
2019-04-23 15:08 ` Sean Christopherson
2019-04-23 23:38 ` Huang, Kai [this message]
2019-04-24 15:35 ` Sean Christopherson
2019-04-24 12:13 ` Huang, Kai
2019-04-24 15:57 ` Sean Christopherson
2019-04-24 20:51 ` Huang, Kai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1556062690.2776.95.camel@intel.com \
--to=kai.huang@intel.com \
--cc=bp@alien8.de \
--cc=brijesh.singh@amd.com \
--cc=guangrong.xiao@gmail.com \
--cc=hpa@zytor.com \
--cc=junaids@google.com \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=rkrcmar@redhat.com \
--cc=sean.j.christopherson@intel.com \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox