From: Binbin Wu <binbin.wu@linux.intel.com>
To: Sean Christopherson <seanjc@google.com>,
Vishal Annapurve <vannapurve@google.com>
Cc: Nikolay Borisov <nik.borisov@suse.com>,
Jianxiong Gao <jxgao@google.com>,
"Borislav Petkov (AMD)" <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
Dionna Glaze <dionnaglaze@google.com>,
"H. Peter Anvin" <hpa@zytor.com>,
jgross@suse.com,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
Ingo Molnar <mingo@redhat.com>,
pbonzini@redhat.com, Peter Gonda <pgonda@google.com>,
Thomas Gleixner <tglx@linutronix.de>,
Tom Lendacky <thomas.lendacky@amd.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
x86@kernel.org, Rick Edgecombe <rick.p.edgecombe@intel.com>,
jiewen.yao@intel.com
Subject: Re: [PATCH 0/2] x86/kvm: Force legacy PCI hole as WB under SNP/TDX
Date: Wed, 20 Aug 2025 19:13:33 +0800 [thread overview]
Message-ID: <697aa804-b321-4dba-9060-7ac17e0a489f@linux.intel.com> (raw)
In-Reply-To: <20641696-242d-4fb6-a3c1-1a8e7cf83b18@linux.intel.com>
On 8/20/2025 6:03 PM, Binbin Wu wrote:
>
>
> On 8/20/2025 11:07 AM, Vishal Annapurve wrote:
>> On Wed, Jul 30, 2025 at 12:34 AM Binbin Wu <binbin.wu@linux.intel.com> wrote:
>>>
>>>
>>> On 7/28/2025 11:33 PM, Sean Christopherson wrote:
>>>> +Jiewen
>>> Jiewen is out of the office until August 4th.
>> Hi Jiewen, can we get some help in deciding the next steps here?
>
> Please see below.
>
>>
>>>> Summary, with the questions at the end.
>>>>
>>>> Recent upstream kernels running in GCE SNP/TDX VMs fail to probe the TPM due to
>>>> the TPM driver's ioremap (with UC) failing because the kernel has already mapped
>>>> the range using a cachaeable mapping (WB).
>>>>
>>>> ioremap error for 0xfed40000-0xfed45000, requested 0x2, got 0x0
>>>> tpm_tis MSFT0101:00: probe with driver tpm_tis failed with error -12
>>>>
>>>> The "guilty" commit is 8e690b817e38 ("x86/kvm: Override default caching mode for
>>>> SEV-SNP and TDX"), which as the subject suggests, forces the kernel's MTRR memtype
>>>> to WB. With SNP and TDX, the virtual MTRR state is (a) controlled by the VMM and
>>>> thus is untrusted, and (b) _should_ be irrelevant because no known hypervisor
>>>> actually honors the memtypes programmed into the virtual MTRRs.
>>>>
>>>> It turns out that the kernel has been relying on the MTRRs to force the TPM TIS
>>>> region (and potentially other regions) to be UC, so that the kernel ACPI driver's
>>>> attempts to map of SystemMemory entries as cacheable get forced to UC. With MTRRs
>>>> forced WB, x86_acpi_os_ioremap() succeeds in creating a WB mapping, which in turn
>>>> causes the ioremap infrastructure to reject the TPM driver's UC mapping.
>>>>
>>>> IIUC, the TPM entry(s) in the ACPI tables for GCE VMs are derived (built?) from
>>>> EDK2's TPM ASL. And (again, IIUC), this code in SecurityPkg/Tcg/Tcg2Acpi/Tpm.asl[1]
>>>>
>>>> //
>>>> // Operational region for TPM access
>>>> //
>>>> OperationRegion (TPMR, SystemMemory, 0xfed40000, 0x5000)
>>>>
>>>> generates the problematic SystemMemory entry that triggers the ACPI driver's
>>>> auto-mapping logic.
>>>>
>>>> QEMU-based VMs don't suffer the same fate, as QEMU intentionally[2] doesn't use
>>>> EDK2's AML for the TPM, and QEMU doesn't define a SystemMemory entry, just a
>>>> Memory32Fixed entry.
>>>>
>>>> Presumably this an EDK2 bug? If it's not an EDK2 bug, then how is the kernel's
>>>> ACPI driver supposed to know that some ranges of SystemMemory must be mapped UC?
>
> Checked with Jiewen offline.
>
> He didn't think there was an existing interface to tell the OS to map a
> OperationRegion of SystemMemory as UC via the ACPI table. He thought the
> OS/ACPI driver still needed to rely on MTRRs for the hint before there was an
> alternative way.
>
>
>>> According to the ACPI spec 6.6, an operation region of SystemMemory has no
>>> interface to specify the cacheable attribute.
>>>
>>> One solution could be using MTRRs to communicate the memory attribute of legacy
>>> PCI hole to the kernel. But during the PUCK meeting last week, Sean mentioned
>>> that "long-term, firmware should not be using MTRRs to communicate anything to
>>> the kernel." So this solution is not preferred.
>>>
>>> If not MTRRs, there should be an alternative way to do the job.
>>> 1. ACPI table
>>> According to the ACPI spec, neither operation region nor 32-Bit Fixed Memory
>>> Range Descriptor can specify the cacheable attribute.
>>> "Address Space Resource Descriptors" could be used to describe a memory range
>>> and the they can specify the cacheable attribute via "Type Specific Flags".
>>> One of the Address Space Resource Descriptors could be added to the ACPI
>>> table as a hint when the kernel do the mapping for operation region.
>>> (There is "System Physical Address (SPA) Range Structure", which also can
>>> specify the cacheable attribute. But it's should be used for NVDIMMs.)
>>> 2. EFI memory map descriptor
>>> EFI memory descriptor can specify the cacheable attribute. Firmware can add
>>> a EFI memory descriptor for the TPM TIS device as a hint when the kernel do
>>> the mapping for operation region.
>>>
>>> Operation region of SystemMemory is still needed if a "Control Method" of APCI
>>> needs to access a field, e.g., the method _STA. Checking another descriptor for
>>> cacheable attribute, either "Address Space Resource Descriptor" or "EFI memory
>>> map descriptor" during the ACPI code doing the mapping for operation region
>>> makes the code complicated.
>>>
>>> Another thing is if long-term firmware should not be using MTRRs to to
>>> communicate anything to the kernel. It seems it's safer to use ioremap() instead
>>> of ioremap_cache() for MMIO resource when the kernel do the mapping for the
>>> operation region access?
>>>
>> Would it work if instead of doubling down on declaring the low memory
>> above TOLUD as WB, guest kernel reserves the range as uncacheable by
>> default i.e. effectively simulating a ioremap before ACPI tries to map
>> the memory as WB?
>
> It seems as hacky as this patch set?
>
>
Hi Sean,
Since guest_force_mtrr_state() also supports to force MTRR variable ranges,
I am wondering if we could use guest_force_mtrr_state() to set the legacy PCI
hole range as UC?
Is it less hacky?
next prev parent reply other threads:[~2025-08-20 11:13 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-09 16:54 [PATCH 0/2] x86/kvm: Force legacy PCI hole as WB under SNP/TDX Jianxiong Gao
2025-07-14 9:06 ` Binbin Wu
2025-07-14 11:24 ` Nikolay Borisov
2025-07-15 2:53 ` Binbin Wu
2025-07-16 9:51 ` Binbin Wu
2025-07-23 14:34 ` Sean Christopherson
2025-07-24 3:16 ` Binbin Wu
2025-07-28 15:33 ` Sean Christopherson
2025-07-30 7:34 ` Binbin Wu
2025-08-15 23:55 ` Korakit Seemakhupt
2025-08-18 11:07 ` Binbin Wu
2025-08-20 3:07 ` Vishal Annapurve
2025-08-20 10:03 ` Binbin Wu
2025-08-20 11:13 ` Binbin Wu [this message]
2025-08-20 17:56 ` Sean Christopherson
2025-08-21 3:30 ` Binbin Wu
2025-08-21 5:23 ` Binbin Wu
2025-08-21 6:02 ` Jürgen Groß
2025-08-21 15:27 ` Sean Christopherson
2025-08-28 0:07 ` Sean Christopherson
-- strict thread matches above, loose matches on Subject: below --
2025-02-01 0:50 Sean Christopherson
2025-02-01 14:25 ` Dionna Amalie Glaze
2025-02-03 18:14 ` Edgecombe, Rick P
2025-02-03 20:33 ` Sean Christopherson
2025-02-03 23:01 ` Edgecombe, Rick P
2025-02-04 0:27 ` Sean Christopherson
2025-02-05 3:51 ` Edgecombe, Rick P
2025-02-05 7:49 ` Xu, Min M
2025-02-10 15:29 ` Binbin Wu
2025-07-08 14:24 ` Nikolay Borisov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=697aa804-b321-4dba-9060-7ac17e0a489f@linux.intel.com \
--to=binbin.wu@linux.intel.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=dionnaglaze@google.com \
--cc=hpa@zytor.com \
--cc=jgross@suse.com \
--cc=jiewen.yao@intel.com \
--cc=jxgao@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nik.borisov@suse.com \
--cc=pbonzini@redhat.com \
--cc=pgonda@google.com \
--cc=rick.p.edgecombe@intel.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
--cc=vannapurve@google.com \
--cc=vkuznets@redhat.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).