public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Huang, Kai" <kai.huang@intel.com>
To: "Hansen, Dave" <dave.hansen@intel.com>,
	"seanjc@google.com" <seanjc@google.com>,
	"bp@alien8.de" <bp@alien8.de>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"Williams, Dan J" <dan.j.williams@intel.com>
Cc: "nik.borisov@suse.com" <nik.borisov@suse.com>,
	"Hunter, Adrian" <adrian.hunter@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"Yamahata, Isaku" <isaku.yamahata@intel.com>
Subject: Re: [PATCH v8 8/9] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes
Date: Wed, 4 Dec 2024 14:22:27 +0000	[thread overview]
Message-ID: <5caabdaf1d7c92cc6cc5f4dc895615fccfa366cb.camel@intel.com> (raw)
In-Reply-To: <23bb421e9bf5443a823e163fb2d899760d9f14a3.1731498635.git.kai.huang@intel.com>

On Thu, 2024-11-14 at 00:57 +1300, Kai Huang wrote:
> A TDX module initialization failure was reported on a Emerald Rapids
> platform [*]:
> 
>   virt/tdx: initialization failed: TDMR [0x0, 0x80000000): reserved areas exhausted.
>   virt/tdx: module initialization failed (-28)
> 
> As part of initializing the TDX module, the kernel informs the TDX
> module of all "TDX-usable memory regions" using an array of TDX defined
> structure "TD Memory Region" (TDMR).  Each TDMR must be in 1GB aligned
> and in 1GB granularity, and all "non-TDX-usable memory holes" within a
> given TDMR are marked as "reserved areas".  The TDX module reports a
> maximum number of reserved areas that can be supported per TDMR (16).
> 
> The kernel builds the "TDX-usable memory regions" based on memblocks
> (which reflects e820), and uses this list to find all "reserved areas"
> for each TDMR.
> 
> It turns out that the kernel's view of memory holes is too fine grained
> and sometimes exceeds the number of holes that the TDX module can track
> per TDMR [1], resulting in the above failure.
> 
> Thankfully the module also lists memory that is potentially convertible
> in a list of "Convertible Memory Regions" (CMRs).  That coarser grained
> CMR list tends to track usable memory in the memory map even if it might
> be reserved for host usage like 'ACPI data' [2].
> 
> Use that list to relax what the kernel considers unusable memory.  If it
> falls in a CMR no need to instantiate a hole, and rely on the fact that
> kernel will keep what it considers 'reserved' out of the page allocator.
> 

Hi Dave,

I realized there's one issue if we change to use CMRs to fill up reserved areas
for TDMRs after some internal discussion with Dan:

Currently we requires all memory regions in memblocks.memory to be TDX
convertible memory at the time of initializing the TDX module to make module
initialization successful.  We build a list of those memory regions as a list of
"TDX memory blocks", and use them to construct the TDMRs to configure the
module.  But we don't explicitly check those memory regions against CMRs to make
sure they are truly TDX convertible, instead we depend on TDH.SYS.CONFIG to
catch any non-CMR memory regions that end up to the TDX memory blocks.

This works fine because currently we use those TDX memory blocks to fill up the
reserved areas of TDMRs, i.e., any non-CMR regions in TDX memory blocks will end
up to "non-reserved areas" in TDMR(s), and this will cause TDH.SYS.CONFIG to
fail.

After we change to using CMRs to fill up reserved areas of TDMRs, then all non-
CMR regions will be marked as "reserved areas", regardless whether there is any
non-CMR memory region in the TDX memory blocks.  This will result in TDX module
initialization being successful while there are non-CMR pages going into page
allocator.

To avoid this, we need to explicitly check all the TDX memory blocks against
CMRs to make sure they are actually TDX convertible, before using CMRs to fill
up reserved areas.

I ended up with below incremental diff with some additional text for the
changelog.

Do you have any comments?

------

    Currently, the kernel does not explicitly check all TDX-usable memory
    regions (that come from memblock) to be truly TDX convertible (not
    feasible w/o CMRs anyway) but depends on the TDH.SYS.CONFIG to fail if
    there's any non-CMR memory region ended up as TDX-usable memory.
    
    After changing to using CMRs to fill up reserved areas, unfortunately
    the TDH.SYS.CONFIG will no longer be able to catch such non-CMR regions.
    This is because any non-CMR region will always end up with reserved
    area even if it is included in TDX-usable memory regions.
    
    To still make sure of that, explicitly check all TDX-usable memory
    regions are truly TDX convertible against CMRs when stashing them from
    the memblocks.
    
    Also print an error message if any region is non-CMR so the users can
    easily know the reason of the failure.  This is nice to have anyway
    because a clear message is better than having to decipher the error code
    of the TDH.SYS.CONFIG.
 
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index e4a7e0e17812..d25bdf8ca136 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -176,6 +176,23 @@ int tdx_cpu_enable(void)
 }
 EXPORT_SYMBOL_GPL(tdx_cpu_enable);
 
+/* Check whether a given memory region is a sub-region of any CMR. */
+static bool is_cmr_sub_region(unsigned long start_pfn, unsigned long end_pfn,
+                             struct tdx_sys_info_cmr *sysinfo_cmr)
+{
+       int i;
+
+       for (i = 0; i < sysinfo_cmr->num_cmrs; i++) {
+               u64 cmr_base_pfn = sysinfo_cmr->cmr_base[i] >> PAGE_SHIFT;
+               u64 cmr_npages = sysinfo_cmr->cmr_size[i] >> PAGE_SHIFT;
+
+               if (start_pfn >= cmr_base_pfn &&
+                               end_pfn <= (cmr_base_pfn + cmr_npages))
+                       return true;
+       }
+
+       return false;
+}
 /*
  * Add a memory region as a TDX memory block.  The caller must make sure
  * all memory regions are added in address ascending order and don't
@@ -218,7 +235,8 @@ static void free_tdx_memlist(struct list_head *tmb_list)
  * ranges off in a secondary structure because memblock is modified
  * in memory hotplug while TDX memory regions are fixed.
  */
-static int build_tdx_memlist(struct list_head *tmb_list)
+static int build_tdx_memlist(struct list_head *tmb_list,
+                            struct tdx_sys_info_cmr *sysinfo_cmr)
 {
        unsigned long start_pfn, end_pfn;
        int i, nid, ret;
@@ -234,6 +252,27 @@ static int build_tdx_memlist(struct list_head *tmb_list)
                if (start_pfn >= end_pfn)
                        continue;
 
+               /*
+                * Make sure the to-be-added memory region is truly TDX
+                * convertible memory.
+                *
+                * Note:
+                *
+                * The to-be-added memory region here doesn't cross NUMA
+                * nodes.  The check here assumes the memory region does
+                * not cross any adjacent CMRs which are contiguous
+                * (i.e., the end of the first CMR is the start of the
+                * next) _AND_ are in the same NUMA node.  A sane BIOS
+                * should never report memory regions and CMRs in such
+                * way.
+                */
+               if (!is_cmr_sub_region(start_pfn, end_pfn, sysinfo_cmr)) {
+                       pr_err("memory region [0x%lx, 0x%lx) is not TDX
convertible memory.\n",
+                                       PHYS_PFN(start_pfn), PHYS_PFN(end_pfn));
+                       ret = -EINVAL;
+                       goto err;
+               }
+
                /*
                 * Add the memory regions as TDX memory.  The regions in
                 * memblock has already guaranteed they are in address
@@ -940,6 +979,15 @@ static int construct_tdmrs(struct list_head *tmb_list,
        if (ret)
                return ret;
 
+       /*
+        * Use CMRs instead of the TDX memory blocks to populate the
+        * reserved areas to reduce consumption of reserved areas for
+        * each TDMR.  On some large systems (e.g., a machine with 4 or
+        * more sockets), the BIOS may report many usable memory regions
+        * in the first 1GB in e820.  This may result in reserved areas
+        * of the first TDMR being exhausted if TDX memory blocks are
+        * used to fill up reserved areas.
+        */
        ret = tdmrs_populate_rsvd_areas_all(tdmr_list, sysinfo_cmr,
                        sysinfo_tdmr->max_reserved_per_tdmr);
        if (ret)
@@ -1124,7 +1172,7 @@ static int init_tdx_module(void)
         */
        get_online_mems();
 
-       ret = build_tdx_memlist(&tdx_memlist);
+       ret = build_tdx_memlist(&tdx_memlist, &sysinfo.cmr);
        if (ret)
                goto out_put_tdxmem;


 

  reply	other threads:[~2024-12-04 14:22 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-13 11:57 [PATCH v8 0/9] TDX host: metadata reading tweaks and bug fixes Kai Huang
2024-11-13 11:57 ` [PATCH v8 1/9] x86/virt/tdx: Rename 'struct tdx_tdmr_sysinfo' to reflect the spec better Kai Huang
2024-11-13 11:57 ` [PATCH v8 2/9] x86/virt/tdx: Start to track all global metadata in one structure Kai Huang
2024-11-13 11:57 ` [PATCH v8 3/9] x86/virt/tdx: Use auto-generated code to read global metadata Kai Huang
2024-12-13 11:17   ` Huang, Kai
2024-11-13 11:57 ` [PATCH v8 4/9] x86/virt/tdx: Use dedicated struct members for PAMT entry sizes Kai Huang
2024-11-13 11:57 ` [PATCH v8 5/9] x86/virt/tdx: Add missing header file inclusion to local tdx.h Kai Huang
2024-11-13 11:57 ` [PATCH v8 6/9] x86/virt/tdx: Switch to use auto-generated global metadata reading code Kai Huang
2024-11-13 11:57 ` [PATCH v8 7/9] x86/virt/tdx: Trim away tail null CMRs Kai Huang
2024-11-13 11:57 ` [PATCH v8 8/9] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes Kai Huang
2024-12-04 14:22   ` Huang, Kai [this message]
2024-12-05 12:45     ` Huang, Kai
2024-12-05 12:40   ` [PATCH v8 8.1/9] " Kai Huang
2024-12-05 18:10     ` Dave Hansen
2024-12-06  2:45       ` Huang, Kai
2024-12-09  6:57       ` Huang, Kai
2024-12-09  6:50   ` [PATCH v8 8.2/9] " Kai Huang
2024-12-09 22:54     ` Dave Hansen
2024-12-10  2:26       ` Huang, Kai
2024-12-10  2:46         ` Dan Williams
2024-12-10  4:24           ` Huang, Kai
2024-12-10 16:58             ` Dave Hansen
2024-12-11  4:34               ` Huang, Kai
2024-11-13 11:57 ` [PATCH v8 9/9] x86/virt/tdx: Require the module to assert it has the NO_RBP_MOD mitigation Kai Huang
2024-11-13 22:25 ` [PATCH v8 0/9] TDX host: metadata reading tweaks and bug fixes Edgecombe, Rick P
2024-11-13 22:40   ` Huang, Kai
2024-11-13 22:53     ` Edgecombe, Rick P
2024-11-13 23:35       ` Huang, Kai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5caabdaf1d7c92cc6cc5f4dc895615fccfa366cb.camel@intel.com \
    --to=kai.huang@intel.com \
    --cc=adrian.hunter@intel.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=hpa@zytor.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nik.borisov@suse.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox