kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Binbin Wu <binbin.wu@linux.intel.com>
Cc: Vishal Annapurve <vannapurve@google.com>,
	Nikolay Borisov <nik.borisov@suse.com>,
	 Jianxiong Gao <jxgao@google.com>,
	"Borislav Petkov (AMD)" <bp@alien8.de>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	Dionna Glaze <dionnaglaze@google.com>,
	 "H. Peter Anvin" <hpa@zytor.com>,
	jgross@suse.com,
	 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	kvm@vger.kernel.org,  linux-kernel@vger.kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	pbonzini@redhat.com,  Peter Gonda <pgonda@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	 Tom Lendacky <thomas.lendacky@amd.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	x86@kernel.org,  Rick Edgecombe <rick.p.edgecombe@intel.com>,
	jiewen.yao@intel.com
Subject: Re: [PATCH 0/2] x86/kvm: Force legacy PCI hole as WB under SNP/TDX
Date: Wed, 20 Aug 2025 10:56:16 -0700	[thread overview]
Message-ID: <aKYMQP5AEC2RkOvi@google.com> (raw)
In-Reply-To: <697aa804-b321-4dba-9060-7ac17e0a489f@linux.intel.com>

On Wed, Aug 20, 2025, Binbin Wu wrote:
> On 8/20/2025 6:03 PM, Binbin Wu wrote:
> > > > > Presumably this an EDK2 bug?  If it's not an EDK2 bug, then how is the kernel's
> > > > > ACPI driver supposed to know that some ranges of SystemMemory must be mapped UC?
> > 
> > Checked with Jiewen offline.
> > 
> > He didn't think there was an existing interface to tell the OS to map a
> > OperationRegion of SystemMemory as UC via the ACPI table. He thought the
> > OS/ACPI driver still needed to rely on MTRRs for the hint before there was an
> > alternative way.
> > 
> > > > According to the ACPI spec 6.6, an operation region of SystemMemory has no
> > > > interface to specify the cacheable attribute.
> > > > 
> > > > One solution could be using MTRRs to communicate the memory attribute of legacy
> > > > PCI hole to the kernel.

So IIUC, there are no bugs anywhere, just a gap in specs that has been hidden
until now :-(

> > > > But during the PUCK meeting last week, Sean mentioned
> > > > that "long-term, firmware should not be using MTRRs to communicate anything to
> > > > the kernel." So this solution is not preferred.
> > > > 
> > > > If not MTRRs, there should be an alternative way to do the job.
> > > > 1. ACPI table
> > > >      According to the ACPI spec, neither operation region nor 32-Bit Fixed Memory
> > > >      Range Descriptor can specify the cacheable attribute.
> > > >      "Address Space Resource Descriptors" could be used to describe a memory range
> > > >      and the they can specify the cacheable attribute via "Type Specific Flags".
> > > >      One of the Address Space Resource Descriptors could be added to the ACPI
> > > >      table as a hint when the kernel do the mapping for operation region.
> > > >      (There is "System Physical Address (SPA) Range Structure", which also can
> > > >      specify the cacheable attribute. But it's should be used for NVDIMMs.)
> > > > 2. EFI memory map descriptor
> > > >      EFI memory descriptor can specify the cacheable attribute. Firmware can add
> > > >      a EFI memory descriptor for the TPM TIS device as a hint when the kernel do
> > > >      the mapping for operation region.
> > > > 
> > > > Operation region of SystemMemory is still needed if a "Control Method" of APCI
> > > > needs to access a field, e.g., the method _STA. Checking another descriptor for
> > > > cacheable attribute, either "Address Space Resource Descriptor" or "EFI memory
> > > > map descriptor" during the ACPI code doing the mapping for operation region
> > > > makes the code complicated.
> > > > 
> > > > Another thing is if long-term firmware should not be using MTRRs to to
> > > > communicate anything to the kernel. It seems it's safer to use ioremap() instead
> > > > of ioremap_cache() for MMIO resource when the kernel do the mapping for the
> > > > operation region access?
> > > > 
> > > Would it work if instead of doubling down on declaring the low memory
> > > above TOLUD as WB, guest kernel reserves the range as uncacheable by
> > > default i.e. effectively simulating a ioremap before ACPI tries to map
> > > the memory as WB?
> > 
> > It seems as hacky as this patch set?
> > 
> > 
> Hi Sean,
> 
> Since guest_force_mtrr_state() also supports to force MTRR variable ranges,
> I am wondering if we could use guest_force_mtrr_state() to set the legacy PCI
> hole range as UC?
> 
> Is it less hacky?

Oh!  That's a way better idea than my hack.  I missed that the kernel would still
consult MTRRs.

Compile tested only, but something like this?

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 8ae750cde0c6..45c8871cdda1 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -933,6 +933,13 @@ static void kvm_sev_hc_page_enc_status(unsigned long pfn, int npages, bool enc)
 
 static void __init kvm_init_platform(void)
 {
+       u64 tolud = e820__end_of_low_ram_pfn() << PAGE_SHIFT;
+       struct mtrr_var_range pci_hole = {
+               .base_lo = tolud | X86_MEMTYPE_UC,
+               .mask_lo = (u32)(~(SZ_4G - tolud - 1)) | BIT(11),
+               .mask_hi = (BIT_ULL(boot_cpu_data.x86_phys_bits) - 1) >> 32,
+       };
+
        if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) &&
            kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL)) {
                unsigned long nr_pages;
@@ -982,8 +989,12 @@ static void __init kvm_init_platform(void)
        kvmclock_init();
        x86_platform.apic_post_init = kvm_apic_init;
 
-       /* Set WB as the default cache mode for SEV-SNP and TDX */
-       guest_force_mtrr_state(NULL, 0, MTRR_TYPE_WRBACK);
+       /*
+        * Set WB as the default cache mode for SEV-SNP and TDX, with a single
+        * UC range for the legacy PCI hole, e.g. so that devices that expect
+        * to get UC/WC mappings don't get surprised with WB.
+        */
+       guest_force_mtrr_state(&pci_hole, 1, MTRR_TYPE_WRBACK);
 }
 
 #if defined(CONFIG_AMD_MEM_ENCRYPT)

  reply	other threads:[~2025-08-20 17:56 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-09 16:54 [PATCH 0/2] x86/kvm: Force legacy PCI hole as WB under SNP/TDX Jianxiong Gao
2025-07-14  9:06 ` Binbin Wu
2025-07-14 11:24   ` Nikolay Borisov
2025-07-15  2:53     ` Binbin Wu
2025-07-16  9:51       ` Binbin Wu
2025-07-23 14:34     ` Sean Christopherson
2025-07-24  3:16       ` Binbin Wu
2025-07-28 15:33         ` Sean Christopherson
2025-07-30  7:34           ` Binbin Wu
2025-08-15 23:55             ` Korakit Seemakhupt
2025-08-18 11:07               ` Binbin Wu
2025-08-20  3:07             ` Vishal Annapurve
2025-08-20 10:03               ` Binbin Wu
2025-08-20 11:13                 ` Binbin Wu
2025-08-20 17:56                   ` Sean Christopherson [this message]
2025-08-21  3:30                     ` Binbin Wu
2025-08-21  5:23                       ` Binbin Wu
2025-08-21  6:02                         ` Jürgen Groß
2025-08-21 15:27                         ` Sean Christopherson
2025-08-28  0:07                           ` Sean Christopherson
  -- strict thread matches above, loose matches on Subject: below --
2025-02-01  0:50 Sean Christopherson
2025-02-01 14:25 ` Dionna Amalie Glaze
2025-02-03 18:14 ` Edgecombe, Rick P
2025-02-03 20:33   ` Sean Christopherson
2025-02-03 23:01     ` Edgecombe, Rick P
2025-02-04  0:27       ` Sean Christopherson
2025-02-05  3:51         ` Edgecombe, Rick P
2025-02-05  7:49           ` Xu, Min M
2025-02-10 15:29         ` Binbin Wu
2025-07-08 14:24 ` Nikolay Borisov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aKYMQP5AEC2RkOvi@google.com \
    --to=seanjc@google.com \
    --cc=binbin.wu@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dionnaglaze@google.com \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=jiewen.yao@intel.com \
    --cc=jxgao@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nik.borisov@suse.com \
    --cc=pbonzini@redhat.com \
    --cc=pgonda@google.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=vannapurve@google.com \
    --cc=vkuznets@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).