From: Sean Christopherson <seanjc@google.com>
To: Binbin Wu <binbin.wu@linux.intel.com>
Cc: Vishal Annapurve <vannapurve@google.com>,
Nikolay Borisov <nik.borisov@suse.com>,
Jianxiong Gao <jxgao@google.com>,
"Borislav Petkov (AMD)" <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
Dionna Glaze <dionnaglaze@google.com>,
"H. Peter Anvin" <hpa@zytor.com>,
jgross@suse.com,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Ingo Molnar <mingo@redhat.com>,
pbonzini@redhat.com, Peter Gonda <pgonda@google.com>,
Thomas Gleixner <tglx@linutronix.de>,
Tom Lendacky <thomas.lendacky@amd.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
x86@kernel.org, Rick Edgecombe <rick.p.edgecombe@intel.com>,
jiewen.yao@intel.com
Subject: Re: [PATCH 0/2] x86/kvm: Force legacy PCI hole as WB under SNP/TDX
Date: Wed, 20 Aug 2025 10:56:16 -0700 [thread overview]
Message-ID: <aKYMQP5AEC2RkOvi@google.com> (raw)
In-Reply-To: <697aa804-b321-4dba-9060-7ac17e0a489f@linux.intel.com>
On Wed, Aug 20, 2025, Binbin Wu wrote:
> On 8/20/2025 6:03 PM, Binbin Wu wrote:
> > > > > Presumably this an EDK2 bug? If it's not an EDK2 bug, then how is the kernel's
> > > > > ACPI driver supposed to know that some ranges of SystemMemory must be mapped UC?
> >
> > Checked with Jiewen offline.
> >
> > He didn't think there was an existing interface to tell the OS to map a
> > OperationRegion of SystemMemory as UC via the ACPI table. He thought the
> > OS/ACPI driver still needed to rely on MTRRs for the hint before there was an
> > alternative way.
> >
> > > > According to the ACPI spec 6.6, an operation region of SystemMemory has no
> > > > interface to specify the cacheable attribute.
> > > >
> > > > One solution could be using MTRRs to communicate the memory attribute of legacy
> > > > PCI hole to the kernel.
So IIUC, there are no bugs anywhere, just a gap in specs that has been hidden
until now :-(
> > > > But during the PUCK meeting last week, Sean mentioned
> > > > that "long-term, firmware should not be using MTRRs to communicate anything to
> > > > the kernel." So this solution is not preferred.
> > > >
> > > > If not MTRRs, there should be an alternative way to do the job.
> > > > 1. ACPI table
> > > > According to the ACPI spec, neither operation region nor 32-Bit Fixed Memory
> > > > Range Descriptor can specify the cacheable attribute.
> > > > "Address Space Resource Descriptors" could be used to describe a memory range
> > > > and the they can specify the cacheable attribute via "Type Specific Flags".
> > > > One of the Address Space Resource Descriptors could be added to the ACPI
> > > > table as a hint when the kernel do the mapping for operation region.
> > > > (There is "System Physical Address (SPA) Range Structure", which also can
> > > > specify the cacheable attribute. But it's should be used for NVDIMMs.)
> > > > 2. EFI memory map descriptor
> > > > EFI memory descriptor can specify the cacheable attribute. Firmware can add
> > > > a EFI memory descriptor for the TPM TIS device as a hint when the kernel do
> > > > the mapping for operation region.
> > > >
> > > > Operation region of SystemMemory is still needed if a "Control Method" of APCI
> > > > needs to access a field, e.g., the method _STA. Checking another descriptor for
> > > > cacheable attribute, either "Address Space Resource Descriptor" or "EFI memory
> > > > map descriptor" during the ACPI code doing the mapping for operation region
> > > > makes the code complicated.
> > > >
> > > > Another thing is if long-term firmware should not be using MTRRs to to
> > > > communicate anything to the kernel. It seems it's safer to use ioremap() instead
> > > > of ioremap_cache() for MMIO resource when the kernel do the mapping for the
> > > > operation region access?
> > > >
> > > Would it work if instead of doubling down on declaring the low memory
> > > above TOLUD as WB, guest kernel reserves the range as uncacheable by
> > > default i.e. effectively simulating a ioremap before ACPI tries to map
> > > the memory as WB?
> >
> > It seems as hacky as this patch set?
> >
> >
> Hi Sean,
>
> Since guest_force_mtrr_state() also supports to force MTRR variable ranges,
> I am wondering if we could use guest_force_mtrr_state() to set the legacy PCI
> hole range as UC?
>
> Is it less hacky?
Oh! That's a way better idea than my hack. I missed that the kernel would still
consult MTRRs.
Compile tested only, but something like this?
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 8ae750cde0c6..45c8871cdda1 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -933,6 +933,13 @@ static void kvm_sev_hc_page_enc_status(unsigned long pfn, int npages, bool enc)
static void __init kvm_init_platform(void)
{
+ u64 tolud = e820__end_of_low_ram_pfn() << PAGE_SHIFT;
+ struct mtrr_var_range pci_hole = {
+ .base_lo = tolud | X86_MEMTYPE_UC,
+ .mask_lo = (u32)(~(SZ_4G - tolud - 1)) | BIT(11),
+ .mask_hi = (BIT_ULL(boot_cpu_data.x86_phys_bits) - 1) >> 32,
+ };
+
if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) &&
kvm_para_has_feature(KVM_FEATURE_MIGRATION_CONTROL)) {
unsigned long nr_pages;
@@ -982,8 +989,12 @@ static void __init kvm_init_platform(void)
kvmclock_init();
x86_platform.apic_post_init = kvm_apic_init;
- /* Set WB as the default cache mode for SEV-SNP and TDX */
- guest_force_mtrr_state(NULL, 0, MTRR_TYPE_WRBACK);
+ /*
+ * Set WB as the default cache mode for SEV-SNP and TDX, with a single
+ * UC range for the legacy PCI hole, e.g. so that devices that expect
+ * to get UC/WC mappings don't get surprised with WB.
+ */
+ guest_force_mtrr_state(&pci_hole, 1, MTRR_TYPE_WRBACK);
}
#if defined(CONFIG_AMD_MEM_ENCRYPT)
next prev parent reply other threads:[~2025-08-20 17:56 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-09 16:54 [PATCH 0/2] x86/kvm: Force legacy PCI hole as WB under SNP/TDX Jianxiong Gao
2025-07-14 9:06 ` Binbin Wu
2025-07-14 11:24 ` Nikolay Borisov
2025-07-15 2:53 ` Binbin Wu
2025-07-16 9:51 ` Binbin Wu
2025-07-23 14:34 ` Sean Christopherson
2025-07-24 3:16 ` Binbin Wu
2025-07-28 15:33 ` Sean Christopherson
2025-07-30 7:34 ` Binbin Wu
2025-08-15 23:55 ` Korakit Seemakhupt
2025-08-18 11:07 ` Binbin Wu
2025-08-20 3:07 ` Vishal Annapurve
2025-08-20 10:03 ` Binbin Wu
2025-08-20 11:13 ` Binbin Wu
2025-08-20 17:56 ` Sean Christopherson [this message]
2025-08-21 3:30 ` Binbin Wu
2025-08-21 5:23 ` Binbin Wu
2025-08-21 6:02 ` Jürgen Groß
2025-08-21 15:27 ` Sean Christopherson
2025-08-28 0:07 ` Sean Christopherson
-- strict thread matches above, loose matches on Subject: below --
2025-02-01 0:50 Sean Christopherson
2025-02-01 14:25 ` Dionna Amalie Glaze
2025-02-03 18:14 ` Edgecombe, Rick P
2025-02-03 20:33 ` Sean Christopherson
2025-02-03 23:01 ` Edgecombe, Rick P
2025-02-04 0:27 ` Sean Christopherson
2025-02-05 3:51 ` Edgecombe, Rick P
2025-02-05 7:49 ` Xu, Min M
2025-02-10 15:29 ` Binbin Wu
2025-07-08 14:24 ` Nikolay Borisov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aKYMQP5AEC2RkOvi@google.com \
--to=seanjc@google.com \
--cc=binbin.wu@linux.intel.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=dionnaglaze@google.com \
--cc=hpa@zytor.com \
--cc=jgross@suse.com \
--cc=jiewen.yao@intel.com \
--cc=jxgao@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nik.borisov@suse.com \
--cc=pbonzini@redhat.com \
--cc=pgonda@google.com \
--cc=rick.p.edgecombe@intel.com \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
--cc=vannapurve@google.com \
--cc=vkuznets@redhat.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).