public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/10] TDX host: metadata reading tweaks, bug fix and info dump
@ 2024-07-17  3:40 Kai Huang
  2024-07-17  3:40 ` [PATCH v2 01/10] x86/virt/tdx: Rename _offset to _member for TD_SYSINFO_MAP() macro Kai Huang
                   ` (11 more replies)
  0 siblings, 12 replies; 44+ messages in thread
From: Kai Huang @ 2024-07-17  3:40 UTC (permalink / raw)
  To: dave.hansen, kirill.shutemov, bp, tglx, peterz, mingo, hpa,
	seanjc, pbonzini, dan.j.williams
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu, kai.huang

TL;DR:

This series does necessary tweaks to TDX host "global metadata" reading
code to fix some immediate issues in the TDX module initialization code,
with intention to also provide a flexible code base to support sharing
global metadata to KVM (and other kernel components) for future needs.

This series, and additional patches to initialize TDX when loading KVM
module and read essential metadata fields for KVM TDX can be found at:

https://github.com/intel/tdx/commits/kvm-tdxinit/

Dear maintainers,

This series targets x86 tip.  I also added Dan, KVM maintainers and KVM
list so people can review and comment.  Thanks for your time.

v1 -> v2:
  - Fix comments from Chao and Nikolay.
  - A new patch to refine an out-dated comment by Nikolay.
  - Collect tags from Nikolay (thanks!).

v1: https://lore.kernel.org/linux-kernel/cover.1718538552.git.kai.huang@intel.com/T/

=== More info ===

TDX module provides a set of "global metadata fields" for software to
query.  They report things like TDX module version, supported features
fields required for creating TDX guests and so on.

Today the TDX host code already reads "TD Memory Region" (TDMR) related
metadata fields for module initialization.  There are immediate needs
that require TDX host code to read more metadata fields:

 - Dump basic TDX module info [1];
 - Reject module with no NO_RBP_MOD feature support [2];
 - Read CMR info to fix a module initialization failure bug [3].

Also, the upstreaming-on-going KVM TDX support [4] requires to read more
global metadata fields.  In the longer term, the TDX Connect [5] (which
supports assigning trusted IO devices to TDX guest) may also require
other kernel components (e.g., pci/vt-d) to access more metadata.

To meet all of those, the idea is the TDX host core-kernel to provide a
centralized, canonical, and read-only structure to contain all global
metadata that comes out of TDX module for all kernel components to use.

There is an "alternative option to manage global metadata" (see below)
but it is not as straightforward as this.

This series starts to track all global metadata fields into a single
'struct tdx_sysinfo', and reads more metadata fields to that structure
to address the immediate needs as mentioned above.

More fields will be added in the near future to support KVM TDX, and the
actual sharing/export the "read-only" global metadata for KVM will also
be sent out in the near future when that becomes immediate (also see
"Share global metadata to KVM" below).

Note, the first couple of patches in this series were from the old
patchset "TDX host: Provide TDX module metadata reading APIs" [6].

=== Further read ===

1) Altertive option to manage global metadata

The TDX host core-kernel could also expose/export APIs for reading
metadata out of TDX module directly, and all in-kernel TDX users use
these APIs and manage their own metadata fields.

However this isn't as straightforward as exposing/exporting structure,
because the API to read multi fields to a structure requires the caller
to build a "mapping table" between field ID to structure member:

	struct kvm_used_metadata {
		u64 member1;
		...
	};

	#define TD_SYSINFO_KVM_MAP(_field_id, _member)	\
		TD_SYSINFO_MAP(_field_id, struct kvm_used_metadata, \
				_member)

	struct tdx_metadata_field_mapping fields[] = {
		TD_SYSINFO_KVM_MAP(FIELD_ID1, member1),
		...
	};

	ret = tdx_sysmd_read_multi(fields, ARRAY_SIZE(fields), buf);

Another problem is some metadata field may be accessed by multiple
kernel components, e.g., the one reports TDX module features, in which
case there will be duplicated code comparing to exposing structure
directly.

2) Share global metadata to KVM

To achieve "read-only" centralized global metadata structure, the idea
way is to use __ro_after_init.  However currently all global metadata
are read by tdx_enable(), which is supposed to be called at any time at
runtime thus isn't annotated with __init.

The __ro_after_init can be done eventually, but it can only be done
after moving VMXON out of KVM to the core-kernel: after that we can
read all metadata during kernel boot (thus __ro_after_init), but
doesn't necessarily have to do it in tdx_enable().

However moving VMXON out of KVM is NOT considered as dependency for the
initial KVM TDX support [7].  Thus for the initial support, the idea is
TDX host to export a function which returns a "const struct pointer" so
KVM won't be able to modify any global metadata.

3) TDH.SYS.RD vs TDH.SYS.RDALL

The kernel can use two SEAMCALLs to read global metadata: TDH.SYS.RD and
TDH.SYS.RDALL.  The former simply reads one metadata field to a 'u64'.
The latter tries to read all fields to a 4KB buffer.

Currently the kernel only uses the former to read metadata, and this
series doesn't choose to use TDH.SYS.RDALL.

The main reason is the "layout of all fields in the 4KB buffer" that
returned by TDH.SYS.RDALL isn't architectural consistent among different
TDX module versions.

E.g., some metadata fields may not be supported by the old module, thus
they may or may not be in the 4KB buffer depending on module version.
And it is impractical to know whether those fields are in the buffer or
not.

TDH.SYS.RDALL may be useful to read one small set of metadata fields,
e.g., fields in one "Class" (TDX categories all global metadata fields
in different "Class"es).  But this is only an optimization even if
TDH.SYS.RDALL can be used, so leave this to future consideration.


[1] https://lore.kernel.org/lkml/e2d844ad-182a-4fc0-a06a-d609c9cbef74@suse.com/T/#m352829aedf6680d4628c7e40dc40b332eda93355
[2] https://lore.kernel.org/lkml/e2d844ad-182a-4fc0-a06a-d609c9cbef74@suse.com/T/#mef98469c51e2382ead2c537ea189752360bd2bef
[3] https://github.com/canonical/tdx/issues/135#issuecomment-2151570238
[4] https://lore.kernel.org/kvm/cover.1708933498.git.isaku.yamahata@intel.com/T/
[5] https://cdrdv2.intel.com/v1/dl/getContent/773614
[6] https://lore.kernel.org/lkml/cover.1709288433.git.kai.huang@intel.com/T/
[7] https://lore.kernel.org/kvm/cover.1708933498.git.isaku.yamahata@intel.com/T/#me0c081438074341ad70d3525f6cf3472d7d81b0e



Kai Huang (10):
  x86/virt/tdx: Rename _offset to _member for TD_SYSINFO_MAP() macro
  x86/virt/tdx: Unbind global metadata read with 'struct
    tdx_tdmr_sysinfo'
  x86/virt/tdx: Support global metadata read for all element sizes
  x86/virt/tdx: Abstract reading multiple global metadata fields as a
    helper
  x86/virt/tdx: Move field mapping table of getting TDMR info to
    function local
  x86/virt/tdx: Refine a comment to reflect the latest TDX spec
  x86/virt/tdx: Start to track all global metadata in one structure
  x86/virt/tdx: Print TDX module basic information
  x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find
    memory holes
  x86/virt/tdx: Don't initialize module that doesn't support NO_RBP_MOD
    feature

 arch/x86/virt/vmx/tdx/tdx.c | 279 +++++++++++++++++++++++++++++-------
 arch/x86/virt/vmx/tdx/tdx.h |  82 +++++++++--
 2 files changed, 302 insertions(+), 59 deletions(-)


base-commit: 0c3836482481200ead7b416ca80c68a29cfdaabd
-- 
2.45.2


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 01/10] x86/virt/tdx: Rename _offset to _member for TD_SYSINFO_MAP() macro
  2024-07-17  3:40 [PATCH v2 00/10] TDX host: metadata reading tweaks, bug fix and info dump Kai Huang
@ 2024-07-17  3:40 ` Kai Huang
  2024-08-05 22:37   ` Dan Williams
  2024-07-17  3:40 ` [PATCH v2 02/10] x86/virt/tdx: Unbind global metadata read with 'struct tdx_tdmr_sysinfo' Kai Huang
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 44+ messages in thread
From: Kai Huang @ 2024-07-17  3:40 UTC (permalink / raw)
  To: dave.hansen, kirill.shutemov, bp, tglx, peterz, mingo, hpa,
	seanjc, pbonzini, dan.j.williams
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu, kai.huang

TD_SYSINFO_MAP() macro actually takes the member of the 'struct
tdx_tdmr_sysinfo' as the second argument and uses the offsetof() to
calculate the offset for that member.

Rename the macro argument _offset to _member to reflect this.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/virt/vmx/tdx/tdx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 49a1c6890b55..d8fa9325bf5e 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -296,9 +296,9 @@ struct field_mapping {
 	int offset;
 };
 
-#define TD_SYSINFO_MAP(_field_id, _offset) \
+#define TD_SYSINFO_MAP(_field_id, _member) \
 	{ .field_id = MD_FIELD_ID_##_field_id,	   \
-	  .offset   = offsetof(struct tdx_tdmr_sysinfo, _offset) }
+	  .offset   = offsetof(struct tdx_tdmr_sysinfo, _member) }
 
 /* Map TD_SYSINFO fields into 'struct tdx_tdmr_sysinfo': */
 static const struct field_mapping fields[] = {
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 02/10] x86/virt/tdx: Unbind global metadata read with 'struct tdx_tdmr_sysinfo'
  2024-07-17  3:40 [PATCH v2 00/10] TDX host: metadata reading tweaks, bug fix and info dump Kai Huang
  2024-07-17  3:40 ` [PATCH v2 01/10] x86/virt/tdx: Rename _offset to _member for TD_SYSINFO_MAP() macro Kai Huang
@ 2024-07-17  3:40 ` Kai Huang
  2024-08-05 23:32   ` Dan Williams
  2024-07-17  3:40 ` [PATCH v2 03/10] x86/virt/tdx: Support global metadata read for all element sizes Kai Huang
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 44+ messages in thread
From: Kai Huang @ 2024-07-17  3:40 UTC (permalink / raw)
  To: dave.hansen, kirill.shutemov, bp, tglx, peterz, mingo, hpa,
	seanjc, pbonzini, dan.j.williams
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu, kai.huang

The TDX module provides a set of "global metadata fields".  They report
things like TDX module version, supported features, and fields related
to create/run TDX guests and so on.

For now the kernel only reads "TD Memory Region" (TDMR) related global
metadata fields to a 'struct tdx_tdmr_sysinfo' for initializing the TDX
module, and the metadata reading code can only work with that structure.

Future changes will need to read other metadata fields that don't make
sense to populate to the "struct tdx_tdmr_sysinfo".  It's essential to
provide a generic metadata read infrastructure which is not bound to any
specific structure.

To start providing such infrastructure, unbind the metadata reading code
with the 'struct tdx_tdmr_sysinfo'.

Note the kernel has a helper macro, TD_SYSINFO_MAP(), for marshaling the
metadata into the 'struct tdx_tdmr_sysinfo', and currently the macro
hardcodes the structure name.  As part of unbinding the metadata reading
code with 'struct tdx_tdmr_sysinfo', it is extended to accept different
structures.

Unfortunately, this will result in the usage of TD_SYSINFO_MAP() for
populating 'struct tdx_tdmr_sysinfo' to be changed to use the structure
name explicitly for each structure member and make the code longer.  Add
a wrapper macro which hides the 'struct tdx_tdmr_sysinfo' internally to
make the code shorter thus better readability.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
---

v1 -> v2:
 - 'st_member' -> 'member'. (Nikolay)

---
 arch/x86/virt/vmx/tdx/tdx.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index d8fa9325bf5e..2ce03c3ea017 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -272,9 +272,9 @@ static int read_sys_metadata_field(u64 field_id, u64 *data)
 
 static int read_sys_metadata_field16(u64 field_id,
 				     int offset,
-				     struct tdx_tdmr_sysinfo *ts)
+				     void *stbuf)
 {
-	u16 *ts_member = ((void *)ts) + offset;
+	u16 *member = stbuf + offset;
 	u64 tmp;
 	int ret;
 
@@ -286,7 +286,7 @@ static int read_sys_metadata_field16(u64 field_id,
 	if (ret)
 		return ret;
 
-	*ts_member = tmp;
+	*member = tmp;
 
 	return 0;
 }
@@ -296,17 +296,20 @@ struct field_mapping {
 	int offset;
 };
 
-#define TD_SYSINFO_MAP(_field_id, _member) \
-	{ .field_id = MD_FIELD_ID_##_field_id,	   \
-	  .offset   = offsetof(struct tdx_tdmr_sysinfo, _member) }
+#define TD_SYSINFO_MAP(_field_id, _struct, _member)	\
+	{ .field_id = MD_FIELD_ID_##_field_id,		\
+	  .offset   = offsetof(_struct, _member) }
+
+#define TD_SYSINFO_MAP_TDMR_INFO(_field_id, _member)	\
+	TD_SYSINFO_MAP(_field_id, struct tdx_tdmr_sysinfo, _member)
 
 /* Map TD_SYSINFO fields into 'struct tdx_tdmr_sysinfo': */
 static const struct field_mapping fields[] = {
-	TD_SYSINFO_MAP(MAX_TDMRS,	      max_tdmrs),
-	TD_SYSINFO_MAP(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr),
-	TD_SYSINFO_MAP(PAMT_4K_ENTRY_SIZE,    pamt_entry_size[TDX_PS_4K]),
-	TD_SYSINFO_MAP(PAMT_2M_ENTRY_SIZE,    pamt_entry_size[TDX_PS_2M]),
-	TD_SYSINFO_MAP(PAMT_1G_ENTRY_SIZE,    pamt_entry_size[TDX_PS_1G]),
+	TD_SYSINFO_MAP_TDMR_INFO(MAX_TDMRS,		max_tdmrs),
+	TD_SYSINFO_MAP_TDMR_INFO(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr),
+	TD_SYSINFO_MAP_TDMR_INFO(PAMT_4K_ENTRY_SIZE,    pamt_entry_size[TDX_PS_4K]),
+	TD_SYSINFO_MAP_TDMR_INFO(PAMT_2M_ENTRY_SIZE,    pamt_entry_size[TDX_PS_2M]),
+	TD_SYSINFO_MAP_TDMR_INFO(PAMT_1G_ENTRY_SIZE,    pamt_entry_size[TDX_PS_1G]),
 };
 
 static int get_tdx_tdmr_sysinfo(struct tdx_tdmr_sysinfo *tdmr_sysinfo)
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 03/10] x86/virt/tdx: Support global metadata read for all element sizes
  2024-07-17  3:40 [PATCH v2 00/10] TDX host: metadata reading tweaks, bug fix and info dump Kai Huang
  2024-07-17  3:40 ` [PATCH v2 01/10] x86/virt/tdx: Rename _offset to _member for TD_SYSINFO_MAP() macro Kai Huang
  2024-07-17  3:40 ` [PATCH v2 02/10] x86/virt/tdx: Unbind global metadata read with 'struct tdx_tdmr_sysinfo' Kai Huang
@ 2024-07-17  3:40 ` Kai Huang
  2024-08-05 23:45   ` Dan Williams
  2024-07-17  3:40 ` [PATCH v2 04/10] x86/virt/tdx: Abstract reading multiple global metadata fields as a helper Kai Huang
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 44+ messages in thread
From: Kai Huang @ 2024-07-17  3:40 UTC (permalink / raw)
  To: dave.hansen, kirill.shutemov, bp, tglx, peterz, mingo, hpa,
	seanjc, pbonzini, dan.j.williams
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu, kai.huang

The TDX module provides "global metadata fields" for software to query.
Each metadata field is accessible by a unique "metadata field ID".  TDX
supports 8/16/32/64 bits metadata element sizes.  The size of each
metadata field is encoded in its associated metadata field ID.

For now the kernel only reads "TD Memory Region" (TDMR) related global
metadata fields for module initialization.  All these metadata fields
are 16-bit, and the kernel only supports reading 16-bit fields.

Future changes will need to read more metadata fields with other element
sizes.  To resolve this once for all, extend the existing metadata
reading code to support reading all element sizes.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---

v1 -> v2 (Nikolay):
 - MD_FIELD_BYTES() -> MD_FIELD_ELE_SIZE().
 - 'bytes' -> 'size' in stbuf_read_sysmd_field().

---
 arch/x86/virt/vmx/tdx/tdx.c | 29 ++++++++++++++++-------------
 arch/x86/virt/vmx/tdx/tdx.h |  3 ++-
 2 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 2ce03c3ea017..4644b324ff86 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -270,23 +270,25 @@ static int read_sys_metadata_field(u64 field_id, u64 *data)
 	return 0;
 }
 
-static int read_sys_metadata_field16(u64 field_id,
-				     int offset,
-				     void *stbuf)
+/*
+ * Read one global metadata field and store the data to a location of a
+ * given buffer specified by the offset and size (in bytes).
+ */
+static int stbuf_read_sysmd_field(u64 field_id, void *stbuf, int offset,
+				  int size)
 {
-	u16 *member = stbuf + offset;
+	void *member = stbuf + offset;
 	u64 tmp;
 	int ret;
 
-	if (WARN_ON_ONCE(MD_FIELD_ID_ELE_SIZE_CODE(field_id) !=
-			MD_FIELD_ID_ELE_SIZE_16BIT))
+	if (WARN_ON_ONCE(MD_FIELD_ELE_SIZE(field_id) != size))
 		return -EINVAL;
 
 	ret = read_sys_metadata_field(field_id, &tmp);
 	if (ret)
 		return ret;
 
-	*member = tmp;
+	memcpy(member, &tmp, size);
 
 	return 0;
 }
@@ -294,11 +296,13 @@ static int read_sys_metadata_field16(u64 field_id,
 struct field_mapping {
 	u64 field_id;
 	int offset;
+	int size;
 };
 
-#define TD_SYSINFO_MAP(_field_id, _struct, _member)	\
-	{ .field_id = MD_FIELD_ID_##_field_id,		\
-	  .offset   = offsetof(_struct, _member) }
+#define TD_SYSINFO_MAP(_field_id, _struct, _member)		\
+	{ .field_id = MD_FIELD_ID_##_field_id,			\
+	  .offset   = offsetof(_struct, _member),		\
+	  .size	    = sizeof(typeof(((_struct *)0)->_member)) }
 
 #define TD_SYSINFO_MAP_TDMR_INFO(_field_id, _member)	\
 	TD_SYSINFO_MAP(_field_id, struct tdx_tdmr_sysinfo, _member)
@@ -319,9 +323,8 @@ static int get_tdx_tdmr_sysinfo(struct tdx_tdmr_sysinfo *tdmr_sysinfo)
 
 	/* Populate 'tdmr_sysinfo' fields using the mapping structure above: */
 	for (i = 0; i < ARRAY_SIZE(fields); i++) {
-		ret = read_sys_metadata_field16(fields[i].field_id,
-						fields[i].offset,
-						tdmr_sysinfo);
+		ret = stbuf_read_sysmd_field(fields[i].field_id, tdmr_sysinfo,
+				fields[i].offset, fields[i].size);
 		if (ret)
 			return ret;
 	}
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index b701f69485d3..fdb879ef6c45 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -53,7 +53,8 @@
 #define MD_FIELD_ID_ELE_SIZE_CODE(_field_id)	\
 		(((_field_id) & GENMASK_ULL(33, 32)) >> 32)
 
-#define MD_FIELD_ID_ELE_SIZE_16BIT	1
+#define MD_FIELD_ELE_SIZE(_field_id)	\
+		(1 << MD_FIELD_ID_ELE_SIZE_CODE(_field_id))
 
 struct tdmr_reserved_area {
 	u64 offset;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 04/10] x86/virt/tdx: Abstract reading multiple global metadata fields as a helper
  2024-07-17  3:40 [PATCH v2 00/10] TDX host: metadata reading tweaks, bug fix and info dump Kai Huang
                   ` (2 preceding siblings ...)
  2024-07-17  3:40 ` [PATCH v2 03/10] x86/virt/tdx: Support global metadata read for all element sizes Kai Huang
@ 2024-07-17  3:40 ` Kai Huang
  2024-07-17  3:40 ` [PATCH v2 05/10] x86/virt/tdx: Move field mapping table of getting TDMR info to function local Kai Huang
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 44+ messages in thread
From: Kai Huang @ 2024-07-17  3:40 UTC (permalink / raw)
  To: dave.hansen, kirill.shutemov, bp, tglx, peterz, mingo, hpa,
	seanjc, pbonzini, dan.j.williams
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu, kai.huang

For now the kernel only reads "TD Memory Region" (TDMR) related global
metadata fields to a 'struct tdx_tdmr_sysinfo' for initializing the TDX
module.  Future changes will need to read other metadata fields that
don't make sense to populate to the "struct tdx_tdmr_sysinfo".

Now the code in get_tdx_tdmr_sysinfo() to read multiple global metadata
fields is not bound to the 'struct tdx_tdmr_sysinfo', and can support
reading all metadata element sizes.  Abstract this code as a helper for
future use.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/virt/vmx/tdx/tdx.c | 27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 4644b324ff86..50d49c539e63 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -304,6 +304,21 @@ struct field_mapping {
 	  .offset   = offsetof(_struct, _member),		\
 	  .size	    = sizeof(typeof(((_struct *)0)->_member)) }
 
+static int stbuf_read_sysmd_multi(const struct field_mapping *fields,
+				  int nr_fields, void *stbuf)
+{
+	int i, ret;
+
+	for (i = 0; i < nr_fields; i++) {
+		ret = stbuf_read_sysmd_field(fields[i].field_id, stbuf,
+				      fields[i].offset, fields[i].size);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 #define TD_SYSINFO_MAP_TDMR_INFO(_field_id, _member)	\
 	TD_SYSINFO_MAP(_field_id, struct tdx_tdmr_sysinfo, _member)
 
@@ -318,18 +333,8 @@ static const struct field_mapping fields[] = {
 
 static int get_tdx_tdmr_sysinfo(struct tdx_tdmr_sysinfo *tdmr_sysinfo)
 {
-	int ret;
-	int i;
-
 	/* Populate 'tdmr_sysinfo' fields using the mapping structure above: */
-	for (i = 0; i < ARRAY_SIZE(fields); i++) {
-		ret = stbuf_read_sysmd_field(fields[i].field_id, tdmr_sysinfo,
-				fields[i].offset, fields[i].size);
-		if (ret)
-			return ret;
-	}
-
-	return 0;
+	return stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields), tdmr_sysinfo);
 }
 
 /* Calculate the actual TDMR size */
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 05/10] x86/virt/tdx: Move field mapping table of getting TDMR info to function local
  2024-07-17  3:40 [PATCH v2 00/10] TDX host: metadata reading tweaks, bug fix and info dump Kai Huang
                   ` (3 preceding siblings ...)
  2024-07-17  3:40 ` [PATCH v2 04/10] x86/virt/tdx: Abstract reading multiple global metadata fields as a helper Kai Huang
@ 2024-07-17  3:40 ` Kai Huang
  2024-08-05 23:48   ` Dan Williams
  2024-07-17  3:40 ` [PATCH v2 06/10] x86/virt/tdx: Refine a comment to reflect the latest TDX spec Kai Huang
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 44+ messages in thread
From: Kai Huang @ 2024-07-17  3:40 UTC (permalink / raw)
  To: dave.hansen, kirill.shutemov, bp, tglx, peterz, mingo, hpa,
	seanjc, pbonzini, dan.j.williams
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu, kai.huang

For now the kernel only reads "TD Memory Region" (TDMR) related global
metadata fields to a 'struct tdx_tdmr_sysinfo' for initializing the TDX
module.  The kernel populates the relevant metadata fields into the
structure using a "field mapping table" of metadata field IDs and the
structure members.

Currently the scope of this "field mapping table" is the entire C file.
Future changes will need to read more global metadata fields that will
be organized in other structures and use this kind of field mapping
tables for other structures too.

Move the field mapping table to the function local to limit its scope so
that the same name can also be used by other functions.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
---

v1 -> v2:
 - Added Nikolay's tag.

---
 arch/x86/virt/vmx/tdx/tdx.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 50d49c539e63..86c47db64e42 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -322,17 +322,17 @@ static int stbuf_read_sysmd_multi(const struct field_mapping *fields,
 #define TD_SYSINFO_MAP_TDMR_INFO(_field_id, _member)	\
 	TD_SYSINFO_MAP(_field_id, struct tdx_tdmr_sysinfo, _member)
 
-/* Map TD_SYSINFO fields into 'struct tdx_tdmr_sysinfo': */
-static const struct field_mapping fields[] = {
-	TD_SYSINFO_MAP_TDMR_INFO(MAX_TDMRS,		max_tdmrs),
-	TD_SYSINFO_MAP_TDMR_INFO(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr),
-	TD_SYSINFO_MAP_TDMR_INFO(PAMT_4K_ENTRY_SIZE,    pamt_entry_size[TDX_PS_4K]),
-	TD_SYSINFO_MAP_TDMR_INFO(PAMT_2M_ENTRY_SIZE,    pamt_entry_size[TDX_PS_2M]),
-	TD_SYSINFO_MAP_TDMR_INFO(PAMT_1G_ENTRY_SIZE,    pamt_entry_size[TDX_PS_1G]),
-};
-
 static int get_tdx_tdmr_sysinfo(struct tdx_tdmr_sysinfo *tdmr_sysinfo)
 {
+	/* Map TD_SYSINFO fields into 'struct tdx_tdmr_sysinfo': */
+	static const struct field_mapping fields[] = {
+		TD_SYSINFO_MAP_TDMR_INFO(MAX_TDMRS,		max_tdmrs),
+		TD_SYSINFO_MAP_TDMR_INFO(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr),
+		TD_SYSINFO_MAP_TDMR_INFO(PAMT_4K_ENTRY_SIZE,    pamt_entry_size[TDX_PS_4K]),
+		TD_SYSINFO_MAP_TDMR_INFO(PAMT_2M_ENTRY_SIZE,    pamt_entry_size[TDX_PS_2M]),
+		TD_SYSINFO_MAP_TDMR_INFO(PAMT_1G_ENTRY_SIZE,    pamt_entry_size[TDX_PS_1G]),
+	};
+
 	/* Populate 'tdmr_sysinfo' fields using the mapping structure above: */
 	return stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields), tdmr_sysinfo);
 }
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 06/10] x86/virt/tdx: Refine a comment to reflect the latest TDX spec
  2024-07-17  3:40 [PATCH v2 00/10] TDX host: metadata reading tweaks, bug fix and info dump Kai Huang
                   ` (4 preceding siblings ...)
  2024-07-17  3:40 ` [PATCH v2 05/10] x86/virt/tdx: Move field mapping table of getting TDMR info to function local Kai Huang
@ 2024-07-17  3:40 ` Kai Huang
  2024-08-06  3:43   ` Dan Williams
  2024-07-17  3:40 ` [PATCH v2 07/10] x86/virt/tdx: Start to track all global metadata in one structure Kai Huang
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 44+ messages in thread
From: Kai Huang @ 2024-07-17  3:40 UTC (permalink / raw)
  To: dave.hansen, kirill.shutemov, bp, tglx, peterz, mingo, hpa,
	seanjc, pbonzini, dan.j.williams
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu, kai.huang

The old versions of "Intel TDX Module v1.5 ABI Specification" contain
the definitions of all global metadata field IDs directly in a table.

However, the latest spec moves those definitions to a dedicated
'global_metadata.json' file as part of a new (separate) "Intel TDX
Module v1.5 ABI definitions" [1].

Update the comment to reflect this.

[1]: https://cdrdv2.intel.com/v1/dl/getContent/795381

Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v1 -> v2:
 - New patch to fix a comment spotted by Nikolay.

---
 arch/x86/virt/vmx/tdx/tdx.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index fdb879ef6c45..4e43cec19917 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -29,7 +29,7 @@
 /*
  * Global scope metadata field ID.
  *
- * See Table "Global Scope Metadata", TDX module 1.5 ABI spec.
+ * See the "global_metadata.json" in the "TDX 1.5 ABI definitions".
  */
 #define MD_FIELD_ID_MAX_TDMRS			0x9100000100000008ULL
 #define MD_FIELD_ID_MAX_RESERVED_PER_TDMR	0x9100000100000009ULL
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 07/10] x86/virt/tdx: Start to track all global metadata in one structure
  2024-07-17  3:40 [PATCH v2 00/10] TDX host: metadata reading tweaks, bug fix and info dump Kai Huang
                   ` (5 preceding siblings ...)
  2024-07-17  3:40 ` [PATCH v2 06/10] x86/virt/tdx: Refine a comment to reflect the latest TDX spec Kai Huang
@ 2024-07-17  3:40 ` Kai Huang
  2024-08-06  3:51   ` Dan Williams
  2024-07-17  3:40 ` [PATCH v2 08/10] x86/virt/tdx: Print TDX module basic information Kai Huang
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 44+ messages in thread
From: Kai Huang @ 2024-07-17  3:40 UTC (permalink / raw)
  To: dave.hansen, kirill.shutemov, bp, tglx, peterz, mingo, hpa,
	seanjc, pbonzini, dan.j.williams
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu, kai.huang

The TDX module provides a set of "global metadata fields".  They report
things like TDX module version, supported features, and fields related
to create/run TDX guests and so on.

Currently the kernel only reads "TD Memory Region" (TDMR) related fields
for module initialization.  There are immediate needs which require the
TDX module initialization to read more global metadata including module
version, supported features and "Convertible Memory Regions" (CMRs).

Also, KVM will need to read more metadata fields to support baseline TDX
guests.  In the longer term, other TDX features like TDX Connect (which
supports assigning trusted IO devices to TDX guest) may also require
other kernel components such as pci/vt-d to access global metadata.

To meet all those requirements, the idea is the TDX host core-kernel to
to provide a centralized, canonical, and read-only structure for the
global metadata that comes out from the TDX module for all kernel
components to use.

As the first step, introduce a new 'struct tdx_sysinfo' to track all
global metadata fields.

TDX categories global metadata fields into different "Class"es.  E.g.,
the current TDMR related fields are under class "TDMR Info".  Instead of
making 'struct tdx_sysinfo' a plain structure to contain all metadata
fields, organize them in smaller structures based on the "Class".

This allows those metadata fields to be used in finer granularity thus
makes the code more clear.  E.g., the current construct_tdmr() can just
take the structure which contains "TDMR Info" metadata fields.

Start with moving 'struct tdx_tdmr_sysinfo' to 'struct tdx_sysinfo', and
rename 'struct tdx_tdmr_sysinfo' to 'struct tdx_sysinfo_tdmr_info' to
make it consistent with the "class name".

Add a new function get_tdx_sysinfo() as the place to read all metadata
fields, and call it at the beginning of init_tdx_module().  Move the
existing get_tdx_tdmr_sysinfo() to get_tdx_sysinfo().

Note there is a functional change: get_tdx_tdmr_sysinfo() is moved from
after build_tdx_memlist() to before it, but it is fine to do so.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/virt/vmx/tdx/tdx.c | 29 +++++++++++++++++------------
 arch/x86/virt/vmx/tdx/tdx.h | 32 +++++++++++++++++++++++++-------
 2 files changed, 42 insertions(+), 19 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 86c47db64e42..3253cdfa5207 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -320,11 +320,11 @@ static int stbuf_read_sysmd_multi(const struct field_mapping *fields,
 }
 
 #define TD_SYSINFO_MAP_TDMR_INFO(_field_id, _member)	\
-	TD_SYSINFO_MAP(_field_id, struct tdx_tdmr_sysinfo, _member)
+	TD_SYSINFO_MAP(_field_id, struct tdx_sysinfo_tdmr_info, _member)
 
-static int get_tdx_tdmr_sysinfo(struct tdx_tdmr_sysinfo *tdmr_sysinfo)
+static int get_tdx_tdmr_sysinfo(struct tdx_sysinfo_tdmr_info *tdmr_sysinfo)
 {
-	/* Map TD_SYSINFO fields into 'struct tdx_tdmr_sysinfo': */
+	/* Map TD_SYSINFO fields into 'struct tdx_sysinfo_tdmr_info': */
 	static const struct field_mapping fields[] = {
 		TD_SYSINFO_MAP_TDMR_INFO(MAX_TDMRS,		max_tdmrs),
 		TD_SYSINFO_MAP_TDMR_INFO(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr),
@@ -337,6 +337,11 @@ static int get_tdx_tdmr_sysinfo(struct tdx_tdmr_sysinfo *tdmr_sysinfo)
 	return stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields), tdmr_sysinfo);
 }
 
+static int get_tdx_sysinfo(struct tdx_sysinfo *sysinfo)
+{
+	return get_tdx_tdmr_sysinfo(&sysinfo->tdmr_info);
+}
+
 /* Calculate the actual TDMR size */
 static int tdmr_size_single(u16 max_reserved_per_tdmr)
 {
@@ -353,7 +358,7 @@ static int tdmr_size_single(u16 max_reserved_per_tdmr)
 }
 
 static int alloc_tdmr_list(struct tdmr_info_list *tdmr_list,
-			   struct tdx_tdmr_sysinfo *tdmr_sysinfo)
+			   struct tdx_sysinfo_tdmr_info *tdmr_sysinfo)
 {
 	size_t tdmr_sz, tdmr_array_sz;
 	void *tdmr_array;
@@ -936,7 +941,7 @@ static int tdmrs_populate_rsvd_areas_all(struct tdmr_info_list *tdmr_list,
  */
 static int construct_tdmrs(struct list_head *tmb_list,
 			   struct tdmr_info_list *tdmr_list,
-			   struct tdx_tdmr_sysinfo *tdmr_sysinfo)
+			   struct tdx_sysinfo_tdmr_info *tdmr_sysinfo)
 {
 	int ret;
 
@@ -1109,9 +1114,13 @@ static int init_tdmrs(struct tdmr_info_list *tdmr_list)
 
 static int init_tdx_module(void)
 {
-	struct tdx_tdmr_sysinfo tdmr_sysinfo;
+	struct tdx_sysinfo sysinfo;
 	int ret;
 
+	ret = get_tdx_sysinfo(&sysinfo);
+	if (ret)
+		return ret;
+
 	/*
 	 * To keep things simple, assume that all TDX-protected memory
 	 * will come from the page allocator.  Make sure all pages in the
@@ -1128,17 +1137,13 @@ static int init_tdx_module(void)
 	if (ret)
 		goto out_put_tdxmem;
 
-	ret = get_tdx_tdmr_sysinfo(&tdmr_sysinfo);
-	if (ret)
-		goto err_free_tdxmem;
-
 	/* Allocate enough space for constructing TDMRs */
-	ret = alloc_tdmr_list(&tdx_tdmr_list, &tdmr_sysinfo);
+	ret = alloc_tdmr_list(&tdx_tdmr_list, &sysinfo.tdmr_info);
 	if (ret)
 		goto err_free_tdxmem;
 
 	/* Cover all TDX-usable memory regions in TDMRs */
-	ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &tdmr_sysinfo);
+	ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &sysinfo.tdmr_info);
 	if (ret)
 		goto err_free_tdmrs;
 
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 4e43cec19917..b5eb7c35f1dc 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -100,13 +100,6 @@ struct tdx_memblock {
 	int nid;
 };
 
-/* "TDMR info" part of "Global Scope Metadata" for constructing TDMRs */
-struct tdx_tdmr_sysinfo {
-	u16 max_tdmrs;
-	u16 max_reserved_per_tdmr;
-	u16 pamt_entry_size[TDX_PS_NR];
-};
-
 /* Warn if kernel has less than TDMR_NR_WARN TDMRs after allocation */
 #define TDMR_NR_WARN 4
 
@@ -119,4 +112,29 @@ struct tdmr_info_list {
 	int max_tdmrs;	/* How many 'tdmr_info's are allocated */
 };
 
+/*
+ * Kernel-defined structures to contain "Global Scope Metadata".
+ *
+ * TDX global metadata fields are categorized by "Class".  See the
+ * "global_metadata.json" in the "TDX 1.5 ABI Definitions".
+ *
+ * 'struct tdx_sysinfo' is the main structure to contain all metadata
+ * used by the kernel.  It contains sub-structures with each reflecting
+ * the "Class" in the 'global_metadata.json'.
+ *
+ * Note not all metadata fields in each class are defined, only those
+ * used by the kernel are.
+ */
+
+/* Class "TDMR Info" */
+struct tdx_sysinfo_tdmr_info {
+	u16 max_tdmrs;
+	u16 max_reserved_per_tdmr;
+	u16 pamt_entry_size[TDX_PS_NR];
+};
+
+struct tdx_sysinfo {
+	struct tdx_sysinfo_tdmr_info tdmr_info;
+};
+
 #endif
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 08/10] x86/virt/tdx: Print TDX module basic information
  2024-07-17  3:40 [PATCH v2 00/10] TDX host: metadata reading tweaks, bug fix and info dump Kai Huang
                   ` (6 preceding siblings ...)
  2024-07-17  3:40 ` [PATCH v2 07/10] x86/virt/tdx: Start to track all global metadata in one structure Kai Huang
@ 2024-07-17  3:40 ` Kai Huang
  2024-08-06  4:19   ` Dan Williams
  2024-08-08 10:31   ` Chenyi Qiang
  2024-07-17  3:40 ` [PATCH v2 09/10] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes Kai Huang
                   ` (3 subsequent siblings)
  11 siblings, 2 replies; 44+ messages in thread
From: Kai Huang @ 2024-07-17  3:40 UTC (permalink / raw)
  To: dave.hansen, kirill.shutemov, bp, tglx, peterz, mingo, hpa,
	seanjc, pbonzini, dan.j.williams
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu, kai.huang

Currently the kernel doesn't print any information regarding the TDX
module itself, e.g. module version.  In practice such information is
useful, especially to the developers.

For instance, there are a couple of use cases for dumping module basic
information:

1) When something goes wrong around using TDX, the information like TDX
   module version, supported features etc could be helpful [1][2].

2) For Linux, when the user wants to update the TDX module, one needs to
   replace the old module in a specific location in the EFI partition
   with the new one so that after reboot the BIOS can load it.  However,
   after kernel boots, currently the user has no way to verify it is
   indeed the new module that gets loaded and initialized (e.g., error
   could happen when replacing the old module).  With the module version
   dumped the user can verify this easily.

So dump the basic TDX module information:

 - TDX module version, and the build date.
 - TDX module type: Debug or Production.
 - TDX_FEATURES0: Supported TDX features.

And dump the information right after reading global metadata, so that
this information is printed no matter whether module initialization
fails or not.

The actual dmesg will look like:

  virt/tdx: Initializing TDX module: 1.5.00.00.0481 (build_date 20230323, Production module), TDX_FEATURES0 0xfbf

Link: https://lore.kernel.org/lkml/e2d844ad-182a-4fc0-a06a-d609c9cbef74@suse.com/T/#m352829aedf6680d4628c7e40dc40b332eda93355 [1]
Link: https://lore.kernel.org/lkml/e2d844ad-182a-4fc0-a06a-d609c9cbef74@suse.com/T/#m351ebcbc006d2e5bc3e7650206a087cb2708d451 [2]
Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v1 -> v2 (Nikolay):
 - Change the format to dump TDX basic info.
 - Slightly improve changelog.

---
 arch/x86/virt/vmx/tdx/tdx.c | 64 +++++++++++++++++++++++++++++++++++++
 arch/x86/virt/vmx/tdx/tdx.h | 33 ++++++++++++++++++-
 2 files changed, 96 insertions(+), 1 deletion(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 3253cdfa5207..5ac0c411f4f7 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -319,6 +319,58 @@ static int stbuf_read_sysmd_multi(const struct field_mapping *fields,
 	return 0;
 }
 
+#define TD_SYSINFO_MAP_MOD_INFO(_field_id, _member)	\
+	TD_SYSINFO_MAP(_field_id, struct tdx_sysinfo_module_info, _member)
+
+static int get_tdx_module_info(struct tdx_sysinfo_module_info *modinfo)
+{
+	static const struct field_mapping fields[] = {
+		TD_SYSINFO_MAP_MOD_INFO(SYS_ATTRIBUTES, sys_attributes),
+		TD_SYSINFO_MAP_MOD_INFO(TDX_FEATURES0,  tdx_features0),
+	};
+
+	return stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields), modinfo);
+}
+
+#define TD_SYSINFO_MAP_MOD_VERSION(_field_id, _member)	\
+	TD_SYSINFO_MAP(_field_id, struct tdx_sysinfo_module_version, _member)
+
+static int get_tdx_module_version(struct tdx_sysinfo_module_version *modver)
+{
+	static const struct field_mapping fields[] = {
+		TD_SYSINFO_MAP_MOD_VERSION(MAJOR_VERSION,    major),
+		TD_SYSINFO_MAP_MOD_VERSION(MINOR_VERSION,    minor),
+		TD_SYSINFO_MAP_MOD_VERSION(UPDATE_VERSION,   update),
+		TD_SYSINFO_MAP_MOD_VERSION(INTERNAL_VERSION, internal),
+		TD_SYSINFO_MAP_MOD_VERSION(BUILD_NUM,	     build_num),
+		TD_SYSINFO_MAP_MOD_VERSION(BUILD_DATE,	     build_date),
+	};
+
+	return stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields), modver);
+}
+
+static void print_basic_sysinfo(struct tdx_sysinfo *sysinfo)
+{
+	struct tdx_sysinfo_module_version *modver = &sysinfo->module_version;
+	struct tdx_sysinfo_module_info *modinfo = &sysinfo->module_info;
+	bool debug = modinfo->sys_attributes & TDX_SYS_ATTR_DEBUG_MODULE;
+
+	/*
+	 * TDX module version encoding:
+	 *
+	 *   <major>.<minor>.<update>.<internal>.<build_num>
+	 *
+	 * When printed as text, <major> and <minor> are 1-digit,
+	 * <update> and <internal> are 2-digits and <build_num>
+	 * is 4-digits.
+	 */
+	pr_info("Initializing TDX module: %u.%u.%02u.%02u.%04u (build_date %u, %s module), TDX_FEATURES0 0x%llx\n",
+			modver->major, modver->minor, modver->update,
+			modver->internal, modver->build_num,
+			modver->build_date, debug ? "Debug" : "Production",
+			modinfo->tdx_features0);
+}
+
 #define TD_SYSINFO_MAP_TDMR_INFO(_field_id, _member)	\
 	TD_SYSINFO_MAP(_field_id, struct tdx_sysinfo_tdmr_info, _member)
 
@@ -339,6 +391,16 @@ static int get_tdx_tdmr_sysinfo(struct tdx_sysinfo_tdmr_info *tdmr_sysinfo)
 
 static int get_tdx_sysinfo(struct tdx_sysinfo *sysinfo)
 {
+	int ret;
+
+	ret = get_tdx_module_info(&sysinfo->module_info);
+	if (ret)
+		return ret;
+
+	ret = get_tdx_module_version(&sysinfo->module_version);
+	if (ret)
+		return ret;
+
 	return get_tdx_tdmr_sysinfo(&sysinfo->tdmr_info);
 }
 
@@ -1121,6 +1183,8 @@ static int init_tdx_module(void)
 	if (ret)
 		return ret;
 
+	print_basic_sysinfo(&sysinfo);
+
 	/*
 	 * To keep things simple, assume that all TDX-protected memory
 	 * will come from the page allocator.  Make sure all pages in the
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index b5eb7c35f1dc..861ddf2c2e88 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -31,6 +31,15 @@
  *
  * See the "global_metadata.json" in the "TDX 1.5 ABI definitions".
  */
+#define MD_FIELD_ID_SYS_ATTRIBUTES		0x0A00000200000000ULL
+#define MD_FIELD_ID_TDX_FEATURES0		0x0A00000300000008ULL
+#define MD_FIELD_ID_BUILD_DATE			0x8800000200000001ULL
+#define MD_FIELD_ID_BUILD_NUM			0x8800000100000002ULL
+#define MD_FIELD_ID_MINOR_VERSION		0x0800000100000003ULL
+#define MD_FIELD_ID_MAJOR_VERSION		0x0800000100000004ULL
+#define MD_FIELD_ID_UPDATE_VERSION		0x0800000100000005ULL
+#define MD_FIELD_ID_INTERNAL_VERSION		0x0800000100000006ULL
+
 #define MD_FIELD_ID_MAX_TDMRS			0x9100000100000008ULL
 #define MD_FIELD_ID_MAX_RESERVED_PER_TDMR	0x9100000100000009ULL
 #define MD_FIELD_ID_PAMT_4K_ENTRY_SIZE		0x9100000100000010ULL
@@ -124,8 +133,28 @@ struct tdmr_info_list {
  *
  * Note not all metadata fields in each class are defined, only those
  * used by the kernel are.
+ *
+ * Also note the "bit definitions" are architectural.
  */
 
+/* Class "TDX Module Info" */
+struct tdx_sysinfo_module_info {
+	u32 sys_attributes;
+	u64 tdx_features0;
+};
+
+#define TDX_SYS_ATTR_DEBUG_MODULE	0x1
+
+/* Class "TDX Module Version" */
+struct tdx_sysinfo_module_version {
+	u16 major;
+	u16 minor;
+	u16 update;
+	u16 internal;
+	u16 build_num;
+	u32 build_date;
+};
+
 /* Class "TDMR Info" */
 struct tdx_sysinfo_tdmr_info {
 	u16 max_tdmrs;
@@ -134,7 +163,9 @@ struct tdx_sysinfo_tdmr_info {
 };
 
 struct tdx_sysinfo {
-	struct tdx_sysinfo_tdmr_info tdmr_info;
+	struct tdx_sysinfo_module_info		module_info;
+	struct tdx_sysinfo_module_version	module_version;
+	struct tdx_sysinfo_tdmr_info		tdmr_info;
 };
 
 #endif
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 09/10] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes
  2024-07-17  3:40 [PATCH v2 00/10] TDX host: metadata reading tweaks, bug fix and info dump Kai Huang
                   ` (7 preceding siblings ...)
  2024-07-17  3:40 ` [PATCH v2 08/10] x86/virt/tdx: Print TDX module basic information Kai Huang
@ 2024-07-17  3:40 ` Kai Huang
  2024-08-06  4:47   ` Dan Williams
  2024-08-20 18:38   ` Adrian Hunter
  2024-07-17  3:40 ` [PATCH v2 10/10] x86/virt/tdx: Don't initialize module that doesn't support NO_RBP_MOD feature Kai Huang
                   ` (2 subsequent siblings)
  11 siblings, 2 replies; 44+ messages in thread
From: Kai Huang @ 2024-07-17  3:40 UTC (permalink / raw)
  To: dave.hansen, kirill.shutemov, bp, tglx, peterz, mingo, hpa,
	seanjc, pbonzini, dan.j.williams
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu, kai.huang

A TDX module initialization failure was reported on a Emerald Rapids
platform:

  virt/tdx: initialization failed: TDMR [0x0, 0x80000000): reserved areas exhausted.
  virt/tdx: module initialization failed (-28)

As part of initializing the TDX module, the kernel informs the TDX
module of all "TDX-usable memory regions" using an array of TDX defined
structure "TD Memory Region" (TDMR).  Each TDMR must be in 1GB aligned
and in 1GB granularity, and all "non-TDX-usable memory holes" within a
given TDMR must be marked as "reserved areas".  The TDX module reports a
maximum number of reserved areas that can be supported per TDMR.

Currently, the kernel finds those "non-TDX-usable memory holes" within a
given TDMR by walking over a list of "TDX-usable memory regions", which
essentially reflects the "usable" regions in the e820 table (w/o memory
hotplug operations precisely, but this is not relevant here).

As shown above, the root cause of this failure is when the kernel tries
to construct a TDMR to cover address range [0x0, 0x80000000), there
are too many memory holes within that range and the number of memory
holes exceeds the maximum number of reserved areas.

The E820 table of that platform (see [1] below) reflects this: the
number of memory holes among e820 "usable" entries exceeds 16, which is
the maximum number of reserved areas TDX module supports in practice.

=== Fix ===

There are two options to fix this: 1) reduce the number of memory holes
when constructing a TDMR to save "reserved areas"; 2) reduce the TDMR's
size to cover fewer memory regions, thus fewer memory holes.

Option 1) is possible, and in fact is easier and preferable:

TDX actually has a concept of "Convertible Memory Regions" (CMRs).  TDX
reports a list of CMRs that meet TDX's security requirements on memory.
TDX requires all the "TDX-usable memory regions" that the kernel passes
to the module via TDMRs, a.k.a, all the "non-reserved regions in TDMRs",
must be convertible memory.

In other words, if a memory hole is indeed CMR, then it's not mandatory
for the kernel to add it to the reserved areas.  By doing so, the number
of consumed reserved areas can be reduced w/o having any functional
impact.  The kernel still allocates TDX memory from the page allocator.
There's no harm if the kernel tells the TDX module some memory regions
are "TDX-usable" but they will never be allocated by the kernel as TDX
memory.

Note this doesn't have any security impact either because the kernel is
out of TDX's TCB anyway.

This is feasible because in practice the CMRs just reflect the nature of
whether the RAM can indeed be used by TDX, thus each CMR tends to be a
large, uninterrupted range of memory, i.e., unlike the e820 table which
contains numerous "ACPI *" entries in the first 2G range.  Refer to [2]
for CMRs reported on the problematic platform using off-tree TDX code.

So for this particular module initialization failure, the memory holes
that are within [0x0, 0x80000000) are mostly indeed CMR.  By not adding
them to the reserved areas, the number of consumed reserved areas for
the TDMR [0x0, 0x80000000) can be dramatically reduced.

Option 2) is also theoretically feasible, but it is not desired:

It requires more complicated logic to handle splitting TDMR into smaller
ones, which isn't trivial.  There are limitations to splitting TDMR too,
thus it may not always work: 1) The smallest TDMR is 1GB, and it cannot
be split any further; 2) This also increases the total number of TDMRs,
which also has a maximum value limited by the TDX module.

So, fix this issue by using option 1):

1) reading out the CMRs from the TDX module global metadata, and
2) changing to find memory holes for a given TDMR based on CMRs, but not
   based on the list of "TDX-usable memory regions".

Also dump the CMRs in dmesg.  They are helpful when something goes wrong
around "constructing the TDMRs and configuring the TDX module with
them".  Note there are no existing userspace tools that the user can get
CMRs since they can only be read via SEAMCALL (no CPUID, MSR etc).

[1] BIOS-E820 table of the problematic platform:

  BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable
  BIOS-e820: [mem 0x000000000009f000-0x00000000000fffff] reserved
  BIOS-e820: [mem 0x0000000000100000-0x000000005d168fff] usable
  BIOS-e820: [mem 0x000000005d169000-0x000000005d22afff] ACPI data
  BIOS-e820: [mem 0x000000005d22b000-0x000000005d3cefff] usable
  BIOS-e820: [mem 0x000000005d3cf000-0x000000005d469fff] reserved
  BIOS-e820: [mem 0x000000005d46a000-0x000000005e5b2fff] usable
  BIOS-e820: [mem 0x000000005e5b3000-0x000000005e5c2fff] reserved
  BIOS-e820: [mem 0x000000005e5c3000-0x000000005e5d2fff] usable
  BIOS-e820: [mem 0x000000005e5d3000-0x000000005e5e4fff] reserved
  BIOS-e820: [mem 0x000000005e5e5000-0x000000005eb57fff] usable
  BIOS-e820: [mem 0x000000005eb58000-0x0000000061357fff] ACPI NVS
  BIOS-e820: [mem 0x0000000061358000-0x000000006172afff] usable
  BIOS-e820: [mem 0x000000006172b000-0x0000000061794fff] ACPI data
  BIOS-e820: [mem 0x0000000061795000-0x00000000617fefff] usable
  BIOS-e820: [mem 0x00000000617ff000-0x0000000061912fff] ACPI data
  BIOS-e820: [mem 0x0000000061913000-0x0000000061998fff] usable
  BIOS-e820: [mem 0x0000000061999000-0x00000000619dffff] ACPI data
  BIOS-e820: [mem 0x00000000619e0000-0x00000000619e1fff] usable
  BIOS-e820: [mem 0x00000000619e2000-0x00000000619e9fff] reserved
  BIOS-e820: [mem 0x00000000619ea000-0x0000000061a26fff] usable
  BIOS-e820: [mem 0x0000000061a27000-0x0000000061baefff] ACPI data
  BIOS-e820: [mem 0x0000000061baf000-0x00000000623c2fff] usable
  BIOS-e820: [mem 0x00000000623c3000-0x0000000062471fff] reserved
  BIOS-e820: [mem 0x0000000062472000-0x0000000062823fff] usable
  BIOS-e820: [mem 0x0000000062824000-0x0000000063a24fff] reserved
  BIOS-e820: [mem 0x0000000063a25000-0x0000000063d57fff] usable
  BIOS-e820: [mem 0x0000000063d58000-0x0000000064157fff] reserved
  BIOS-e820: [mem 0x0000000064158000-0x0000000064158fff] usable
  BIOS-e820: [mem 0x0000000064159000-0x0000000064194fff] reserved
  BIOS-e820: [mem 0x0000000064195000-0x000000006e9cefff] usable
  BIOS-e820: [mem 0x000000006e9cf000-0x000000006eccefff] reserved
  BIOS-e820: [mem 0x000000006eccf000-0x000000006f6fefff] ACPI NVS
  BIOS-e820: [mem 0x000000006f6ff000-0x000000006f7fefff] ACPI data
  BIOS-e820: [mem 0x000000006f7ff000-0x000000006f7fffff] usable
  BIOS-e820: [mem 0x000000006f800000-0x000000008fffffff] reserved
  ......

[2] Convertible Memory Regions of the problematic platform:

  virt/tdx: CMR: [0x100000, 0x6f800000)
  virt/tdx: CMR: [0x100000000, 0x107a000000)
  virt/tdx: CMR: [0x1080000000, 0x207c000000)
  virt/tdx: CMR: [0x2080000000, 0x307c000000)
  virt/tdx: CMR: [0x3080000000, 0x407c000000)

Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v1 -> v2:
 - Change to walk over CMRs directly to find out memory holes, instead
   of walking over TDX memory blocks and explicitly check whether a hole
   is subregion of CMR.  (Chao)
 - Mention any constant macro definitions in global metadata structures
   are TDX architectural. (Binbin)
 - Slightly improve the changelog.

---
 arch/x86/virt/vmx/tdx/tdx.c | 116 ++++++++++++++++++++++++++++++------
 arch/x86/virt/vmx/tdx/tdx.h |  15 ++++-
 2 files changed, 113 insertions(+), 18 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 5ac0c411f4f7..3c19295f1f8f 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -293,6 +293,10 @@ static int stbuf_read_sysmd_field(u64 field_id, void *stbuf, int offset,
 	return 0;
 }
 
+/* Wrapper to read one metadata field to u8/u16/u32/u64 */
+#define stbuf_read_sysmd_single(_field_id, _pdata)	\
+	stbuf_read_sysmd_field(_field_id, _pdata, 0, sizeof(typeof(*(_pdata))))
+
 struct field_mapping {
 	u64 field_id;
 	int offset;
@@ -349,6 +353,76 @@ static int get_tdx_module_version(struct tdx_sysinfo_module_version *modver)
 	return stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields), modver);
 }
 
+/* Update the @cmr_info->num_cmrs to trim tail empty CMRs */
+static void trim_empty_tail_cmrs(struct tdx_sysinfo_cmr_info *cmr_info)
+{
+	int i;
+
+	for (i = 0; i < cmr_info->num_cmrs; i++) {
+		u64 cmr_base = cmr_info->cmr_base[i];
+		u64 cmr_size = cmr_info->cmr_size[i];
+
+		if (!cmr_size) {
+			WARN_ON_ONCE(cmr_base);
+			break;
+		}
+
+		/* TDX architecture: CMR must be 4KB aligned */
+		WARN_ON_ONCE(!PAGE_ALIGNED(cmr_base) ||
+				!PAGE_ALIGNED(cmr_size));
+	}
+
+	cmr_info->num_cmrs = i;
+}
+
+#define TD_SYSINFO_MAP_CMR_INFO(_field_id, _member)	\
+	TD_SYSINFO_MAP(_field_id, struct tdx_sysinfo_cmr_info, _member)
+
+static int get_tdx_cmr_info(struct tdx_sysinfo_cmr_info *cmr_info)
+{
+	int i, ret;
+
+	ret = stbuf_read_sysmd_single(MD_FIELD_ID_NUM_CMRS,
+			&cmr_info->num_cmrs);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < cmr_info->num_cmrs; i++) {
+		const struct field_mapping fields[] = {
+			TD_SYSINFO_MAP_CMR_INFO(CMR_BASE0 + i, cmr_base[i]),
+			TD_SYSINFO_MAP_CMR_INFO(CMR_SIZE0 + i, cmr_size[i]),
+		};
+
+		ret = stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields),
+				cmr_info);
+		if (ret)
+			return ret;
+	}
+
+	/*
+	 * The TDX module may just report the maximum number of CMRs that
+	 * TDX architecturally supports as the actual number of CMRs,
+	 * despite the latter is smaller.  In this case all the tail
+	 * CMRs will be empty.  Trim them away.
+	 */
+	trim_empty_tail_cmrs(cmr_info);
+
+	return 0;
+}
+
+static void print_cmr_info(struct tdx_sysinfo_cmr_info *cmr_info)
+{
+	int i;
+
+	for (i = 0; i < cmr_info->num_cmrs; i++) {
+		u64 cmr_base = cmr_info->cmr_base[i];
+		u64 cmr_size = cmr_info->cmr_size[i];
+
+		pr_info("CMR[%d]: [0x%llx, 0x%llx)\n", i, cmr_base,
+				cmr_base + cmr_size);
+	}
+}
+
 static void print_basic_sysinfo(struct tdx_sysinfo *sysinfo)
 {
 	struct tdx_sysinfo_module_version *modver = &sysinfo->module_version;
@@ -369,6 +443,8 @@ static void print_basic_sysinfo(struct tdx_sysinfo *sysinfo)
 			modver->internal, modver->build_num,
 			modver->build_date, debug ? "Debug" : "Production",
 			modinfo->tdx_features0);
+
+	print_cmr_info(&sysinfo->cmr_info);
 }
 
 #define TD_SYSINFO_MAP_TDMR_INFO(_field_id, _member)	\
@@ -401,6 +477,10 @@ static int get_tdx_sysinfo(struct tdx_sysinfo *sysinfo)
 	if (ret)
 		return ret;
 
+	ret = get_tdx_cmr_info(&sysinfo->cmr_info);
+	if (ret)
+		return ret;
+
 	return get_tdx_tdmr_sysinfo(&sysinfo->tdmr_info);
 }
 
@@ -825,29 +905,28 @@ static int tdmr_add_rsvd_area(struct tdmr_info *tdmr, int *p_idx, u64 addr,
 }
 
 /*
- * Go through @tmb_list to find holes between memory areas.  If any of
+ * Go through all CMRs in @cmr_info to find memory holes.  If any of
  * those holes fall within @tdmr, set up a TDMR reserved area to cover
  * the hole.
  */
-static int tdmr_populate_rsvd_holes(struct list_head *tmb_list,
+static int tdmr_populate_rsvd_holes(struct tdx_sysinfo_cmr_info *cmr_info,
 				    struct tdmr_info *tdmr,
 				    int *rsvd_idx,
 				    u16 max_reserved_per_tdmr)
 {
-	struct tdx_memblock *tmb;
 	u64 prev_end;
-	int ret;
+	int i, ret;
 
 	/*
 	 * Start looking for reserved blocks at the
 	 * beginning of the TDMR.
 	 */
 	prev_end = tdmr->base;
-	list_for_each_entry(tmb, tmb_list, list) {
+	for (i = 0; i < cmr_info->num_cmrs; i++) {
 		u64 start, end;
 
-		start = PFN_PHYS(tmb->start_pfn);
-		end   = PFN_PHYS(tmb->end_pfn);
+		start = cmr_info->cmr_base[i];
+		end   = start + cmr_info->cmr_size[i];
 
 		/* Break if this region is after the TDMR */
 		if (start >= tdmr_end(tdmr))
@@ -948,16 +1027,16 @@ static int rsvd_area_cmp_func(const void *a, const void *b)
 
 /*
  * Populate reserved areas for the given @tdmr, including memory holes
- * (via @tmb_list) and PAMTs (via @tdmr_list).
+ * (via @cmr_info) and PAMTs (via @tdmr_list).
  */
 static int tdmr_populate_rsvd_areas(struct tdmr_info *tdmr,
-				    struct list_head *tmb_list,
+				    struct tdx_sysinfo_cmr_info *cmr_info,
 				    struct tdmr_info_list *tdmr_list,
 				    u16 max_reserved_per_tdmr)
 {
 	int ret, rsvd_idx = 0;
 
-	ret = tdmr_populate_rsvd_holes(tmb_list, tdmr, &rsvd_idx,
+	ret = tdmr_populate_rsvd_holes(cmr_info, tdmr, &rsvd_idx,
 			max_reserved_per_tdmr);
 	if (ret)
 		return ret;
@@ -976,10 +1055,10 @@ static int tdmr_populate_rsvd_areas(struct tdmr_info *tdmr,
 
 /*
  * Populate reserved areas for all TDMRs in @tdmr_list, including memory
- * holes (via @tmb_list) and PAMTs.
+ * holes (via @cmr_info) and PAMTs.
  */
 static int tdmrs_populate_rsvd_areas_all(struct tdmr_info_list *tdmr_list,
-					 struct list_head *tmb_list,
+					 struct tdx_sysinfo_cmr_info *cmr_info,
 					 u16 max_reserved_per_tdmr)
 {
 	int i;
@@ -988,7 +1067,7 @@ static int tdmrs_populate_rsvd_areas_all(struct tdmr_info_list *tdmr_list,
 		int ret;
 
 		ret = tdmr_populate_rsvd_areas(tdmr_entry(tdmr_list, i),
-				tmb_list, tdmr_list, max_reserved_per_tdmr);
+				cmr_info, tdmr_list, max_reserved_per_tdmr);
 		if (ret)
 			return ret;
 	}
@@ -999,11 +1078,13 @@ static int tdmrs_populate_rsvd_areas_all(struct tdmr_info_list *tdmr_list,
 /*
  * Construct a list of TDMRs on the preallocated space in @tdmr_list
  * to cover all TDX memory regions in @tmb_list based on the TDX module
- * TDMR global information in @tdmr_sysinfo.
+ * TDMR global information in @tdmr_sysinfo and CMR information in
+ * @cmr_info.
  */
 static int construct_tdmrs(struct list_head *tmb_list,
 			   struct tdmr_info_list *tdmr_list,
-			   struct tdx_sysinfo_tdmr_info *tdmr_sysinfo)
+			   struct tdx_sysinfo_tdmr_info *tdmr_sysinfo,
+			   struct tdx_sysinfo_cmr_info *cmr_info)
 {
 	int ret;
 
@@ -1016,7 +1097,7 @@ static int construct_tdmrs(struct list_head *tmb_list,
 	if (ret)
 		return ret;
 
-	ret = tdmrs_populate_rsvd_areas_all(tdmr_list, tmb_list,
+	ret = tdmrs_populate_rsvd_areas_all(tdmr_list, cmr_info,
 			tdmr_sysinfo->max_reserved_per_tdmr);
 	if (ret)
 		tdmrs_free_pamt_all(tdmr_list);
@@ -1207,7 +1288,8 @@ static int init_tdx_module(void)
 		goto err_free_tdxmem;
 
 	/* Cover all TDX-usable memory regions in TDMRs */
-	ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &sysinfo.tdmr_info);
+	ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &sysinfo.tdmr_info,
+			&sysinfo.cmr_info);
 	if (ret)
 		goto err_free_tdmrs;
 
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 861ddf2c2e88..4b43eb774ffa 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -40,6 +40,10 @@
 #define MD_FIELD_ID_UPDATE_VERSION		0x0800000100000005ULL
 #define MD_FIELD_ID_INTERNAL_VERSION		0x0800000100000006ULL
 
+#define MD_FIELD_ID_NUM_CMRS			0x9000000100000000ULL
+#define MD_FIELD_ID_CMR_BASE0			0x9000000300000080ULL
+#define MD_FIELD_ID_CMR_SIZE0			0x9000000300000100ULL
+
 #define MD_FIELD_ID_MAX_TDMRS			0x9100000100000008ULL
 #define MD_FIELD_ID_MAX_RESERVED_PER_TDMR	0x9100000100000009ULL
 #define MD_FIELD_ID_PAMT_4K_ENTRY_SIZE		0x9100000100000010ULL
@@ -134,7 +138,7 @@ struct tdmr_info_list {
  * Note not all metadata fields in each class are defined, only those
  * used by the kernel are.
  *
- * Also note the "bit definitions" are architectural.
+ * Also note the "bit/constant definitions" are architectural.
  */
 
 /* Class "TDX Module Info" */
@@ -155,6 +159,14 @@ struct tdx_sysinfo_module_version {
 	u32 build_date;
 };
 
+/* Class "CMR Info" */
+#define TDX_MAX_CMRS	32
+struct tdx_sysinfo_cmr_info {
+	u16 num_cmrs;
+	u64 cmr_base[TDX_MAX_CMRS];
+	u64 cmr_size[TDX_MAX_CMRS];
+};
+
 /* Class "TDMR Info" */
 struct tdx_sysinfo_tdmr_info {
 	u16 max_tdmrs;
@@ -165,6 +177,7 @@ struct tdx_sysinfo_tdmr_info {
 struct tdx_sysinfo {
 	struct tdx_sysinfo_module_info		module_info;
 	struct tdx_sysinfo_module_version	module_version;
+	struct tdx_sysinfo_cmr_info		cmr_info;
 	struct tdx_sysinfo_tdmr_info		tdmr_info;
 };
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 10/10] x86/virt/tdx: Don't initialize module that doesn't support NO_RBP_MOD feature
  2024-07-17  3:40 [PATCH v2 00/10] TDX host: metadata reading tweaks, bug fix and info dump Kai Huang
                   ` (8 preceding siblings ...)
  2024-07-17  3:40 ` [PATCH v2 09/10] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes Kai Huang
@ 2024-07-17  3:40 ` Kai Huang
  2024-08-06  4:55   ` Dan Williams
  2024-08-05 12:03 ` [PATCH v2 00/10] TDX host: metadata reading tweaks, bug fix and info dump Huang, Kai
  2024-08-05 22:36 ` Dan Williams
  11 siblings, 1 reply; 44+ messages in thread
From: Kai Huang @ 2024-07-17  3:40 UTC (permalink / raw)
  To: dave.hansen, kirill.shutemov, bp, tglx, peterz, mingo, hpa,
	seanjc, pbonzini, dan.j.williams
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu, kai.huang

Old TDX modules can clobber RBP in the TDH.VP.ENTER SEAMCALL.  However
RBP is used as frame pointer in the x86_64 calling convention, and
clobbering RBP could result in bad things like being unable to unwind
the stack if any non-maskable exceptions (NMI, #MC etc) happens in that
gap.

A new "NO_RBP_MOD" feature was introduced to more recent TDX modules to
not clobber RBP.  This feature is reported in the TDX_FEATURES0 global
metadata field via bit 18.

Don't initialize the TDX module if this feature is not supported [1].

Link: https://lore.kernel.org/all/c0067319-2653-4cbd-8fee-1ccf21b1e646@suse.com/T/#mef98469c51e2382ead2c537ea189752360bd2bef [1]
Signed-off-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
---

v1 -> v2:
 - Add tag from Nikolay.

---
 arch/x86/virt/vmx/tdx/tdx.c | 17 +++++++++++++++++
 arch/x86/virt/vmx/tdx/tdx.h |  1 +
 2 files changed, 18 insertions(+)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 3c19295f1f8f..ec6156728423 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -484,6 +484,18 @@ static int get_tdx_sysinfo(struct tdx_sysinfo *sysinfo)
 	return get_tdx_tdmr_sysinfo(&sysinfo->tdmr_info);
 }
 
+static int check_module_compatibility(struct tdx_sysinfo *sysinfo)
+{
+	u64 tdx_features0 = sysinfo->module_info.tdx_features0;
+
+	if (!(tdx_features0 & TDX_FEATURES0_NO_RBP_MOD)) {
+		pr_err("NO_RBP_MOD feature is not supported\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 /* Calculate the actual TDMR size */
 static int tdmr_size_single(u16 max_reserved_per_tdmr)
 {
@@ -1266,6 +1278,11 @@ static int init_tdx_module(void)
 
 	print_basic_sysinfo(&sysinfo);
 
+	/* Check whether the kernel can support this module */
+	ret = check_module_compatibility(&sysinfo);
+	if (ret)
+		return ret;
+
 	/*
 	 * To keep things simple, assume that all TDX-protected memory
 	 * will come from the page allocator.  Make sure all pages in the
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 4b43eb774ffa..20fd7cb5937a 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -148,6 +148,7 @@ struct tdx_sysinfo_module_info {
 };
 
 #define TDX_SYS_ATTR_DEBUG_MODULE	0x1
+#define TDX_FEATURES0_NO_RBP_MOD	_BITULL(18)
 
 /* Class "TDX Module Version" */
 struct tdx_sysinfo_module_version {
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 00/10] TDX host: metadata reading tweaks, bug fix and info dump
  2024-07-17  3:40 [PATCH v2 00/10] TDX host: metadata reading tweaks, bug fix and info dump Kai Huang
                   ` (9 preceding siblings ...)
  2024-07-17  3:40 ` [PATCH v2 10/10] x86/virt/tdx: Don't initialize module that doesn't support NO_RBP_MOD feature Kai Huang
@ 2024-08-05 12:03 ` Huang, Kai
  2024-08-05 22:36 ` Dan Williams
  11 siblings, 0 replies; 44+ messages in thread
From: Huang, Kai @ 2024-08-05 12:03 UTC (permalink / raw)
  To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
	peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
	kirill.shutemov@linux.intel.com, tglx@linutronix.de,
	pbonzini@redhat.com, Williams, Dan J
  Cc: Gao, Chao, kvm@vger.kernel.org, binbin.wu@linux.intel.com,
	linux-kernel@vger.kernel.org, Edgecombe, Rick P, x86@kernel.org,
	Yamahata, Isaku

On Wed, 2024-07-17 at 15:40 +1200, Kai Huang wrote:
> TL;DR:
> 
> This series does necessary tweaks to TDX host "global metadata" reading
> code to fix some immediate issues in the TDX module initialization code,
> with intention to also provide a flexible code base to support sharing
> global metadata to KVM (and other kernel components) for future needs.
> 
> This series, and additional patches to initialize TDX when loading KVM
> module and read essential metadata fields for KVM TDX can be found at:
> 
> https://github.com/intel/tdx/commits/kvm-tdxinit/
> 
> Dear maintainers,
> 
> This series targets x86 tip.  I also added Dan, KVM maintainers and KVM
> list so people can review and comment.  Thanks for your time.
> 

Kindly ping.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 00/10] TDX host: metadata reading tweaks, bug fix and info dump
  2024-07-17  3:40 [PATCH v2 00/10] TDX host: metadata reading tweaks, bug fix and info dump Kai Huang
                   ` (10 preceding siblings ...)
  2024-08-05 12:03 ` [PATCH v2 00/10] TDX host: metadata reading tweaks, bug fix and info dump Huang, Kai
@ 2024-08-05 22:36 ` Dan Williams
  2024-08-08  0:19   ` Huang, Kai
  11 siblings, 1 reply; 44+ messages in thread
From: Dan Williams @ 2024-08-05 22:36 UTC (permalink / raw)
  To: Kai Huang, dave.hansen, kirill.shutemov, bp, tglx, peterz, mingo,
	hpa, seanjc, pbonzini, dan.j.williams
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu, kai.huang

Kai Huang wrote:
> TL;DR:
> 
> This series does necessary tweaks to TDX host "global metadata" reading
> code to fix some immediate issues in the TDX module initialization code,
> with intention to also provide a flexible code base to support sharing
> global metadata to KVM (and other kernel components) for future needs.
> 
> This series, and additional patches to initialize TDX when loading KVM
> module and read essential metadata fields for KVM TDX can be found at:
> 
> https://github.com/intel/tdx/commits/kvm-tdxinit/
> 
> Dear maintainers,
> 
> This series targets x86 tip.  I also added Dan, KVM maintainers and KVM
> list so people can review and comment.  Thanks for your time.
> 
> v1 -> v2:
>   - Fix comments from Chao and Nikolay.
>   - A new patch to refine an out-dated comment by Nikolay.
>   - Collect tags from Nikolay (thanks!).
> 
> v1: https://lore.kernel.org/linux-kernel/cover.1718538552.git.kai.huang@intel.com/T/
> 
> === More info ===
> 
> TDX module provides a set of "global metadata fields" for software to
> query.  They report things like TDX module version, supported features
> fields required for creating TDX guests and so on.
> 
> Today the TDX host code already reads "TD Memory Region" (TDMR) related
> metadata fields for module initialization.  There are immediate needs
> that require TDX host code to read more metadata fields:
> 
>  - Dump basic TDX module info [1];
>  - Reject module with no NO_RBP_MOD feature support [2];
>  - Read CMR info to fix a module initialization failure bug [3].
> 
> Also, the upstreaming-on-going KVM TDX support [4] requires to read more
> global metadata fields.  In the longer term, the TDX Connect [5] (which
> supports assigning trusted IO devices to TDX guest) may also require
> other kernel components (e.g., pci/vt-d) to access more metadata.
> 
> To meet all of those, the idea is the TDX host core-kernel to provide a
> centralized, canonical, and read-only structure to contain all global
> metadata that comes out of TDX module for all kernel components to use.
> 
> There is an "alternative option to manage global metadata" (see below)
> but it is not as straightforward as this.
> 
> This series starts to track all global metadata fields into a single
> 'struct tdx_sysinfo', and reads more metadata fields to that structure
> to address the immediate needs as mentioned above.
> 
> More fields will be added in the near future to support KVM TDX, and the
> actual sharing/export the "read-only" global metadata for KVM will also
> be sent out in the near future when that becomes immediate (also see
> "Share global metadata to KVM" below).

I think it is important to share why this unified data structure
proposal reached escape velocity from internal review. The idea that x86
gets to review growth to this structure over time is an asset for
maintainability and oversight of what is happening in the downstream
consumers like KVM and TSM (for TDX Connect).

A dynamic retrieval API removes that natural auditing of data structure
patches from tip.git.

Yes, it requires more touches than letting use cases consume new
metadata fields at will, but that's net positive for maintainence of the
kernel and the feedback loop to the TDX module.

> Note, the first couple of patches in this series were from the old
> patchset "TDX host: Provide TDX module metadata reading APIs" [6].
> 
> === Further read ===
> 
> 1) Altertive option to manage global metadata
> 
> The TDX host core-kernel could also expose/export APIs for reading
> metadata out of TDX module directly, and all in-kernel TDX users use
> these APIs and manage their own metadata fields.
> 
> However this isn't as straightforward as exposing/exporting structure,
> because the API to read multi fields to a structure requires the caller
> to build a "mapping table" between field ID to structure member:
> 
> 	struct kvm_used_metadata {
> 		u64 member1;
> 		...
> 	};
> 
> 	#define TD_SYSINFO_KVM_MAP(_field_id, _member)	\
> 		TD_SYSINFO_MAP(_field_id, struct kvm_used_metadata, \
> 				_member)
> 
> 	struct tdx_metadata_field_mapping fields[] = {
> 		TD_SYSINFO_KVM_MAP(FIELD_ID1, member1),
> 		...
> 	};
> 
> 	ret = tdx_sysmd_read_multi(fields, ARRAY_SIZE(fields), buf);
> 
> Another problem is some metadata field may be accessed by multiple
> kernel components, e.g., the one reports TDX module features, in which
> case there will be duplicated code comparing to exposing structure
> directly.

A full explanation of what this patch is not doing is a bit overkill.

> 2) Share global metadata to KVM
> 
> To achieve "read-only" centralized global metadata structure, the idea
> way is to use __ro_after_init.  However currently all global metadata
> are read by tdx_enable(), which is supposed to be called at any time at
> runtime thus isn't annotated with __init.
> 
> The __ro_after_init can be done eventually, but it can only be done
> after moving VMXON out of KVM to the core-kernel: after that we can
> read all metadata during kernel boot (thus __ro_after_init), but
> doesn't necessarily have to do it in tdx_enable().
> 
> However moving VMXON out of KVM is NOT considered as dependency for the
> initial KVM TDX support [7].  Thus for the initial support, the idea is
> TDX host to export a function which returns a "const struct pointer" so
> KVM won't be able to modify any global metadata.

For now I think it is sufficient to say that metadata just gets
populated to a central data structure. Follow on work to protect that
data structure against post-init updates can come later.

> 3) TDH.SYS.RD vs TDH.SYS.RDALL
> 
> The kernel can use two SEAMCALLs to read global metadata: TDH.SYS.RD and
> TDH.SYS.RDALL.  The former simply reads one metadata field to a 'u64'.
> The latter tries to read all fields to a 4KB buffer.
> 
> Currently the kernel only uses the former to read metadata, and this
> series doesn't choose to use TDH.SYS.RDALL.
> 
> The main reason is the "layout of all fields in the 4KB buffer" that
> returned by TDH.SYS.RDALL isn't architectural consistent among different
> TDX module versions.
> 
> E.g., some metadata fields may not be supported by the old module, thus
> they may or may not be in the 4KB buffer depending on module version.
> And it is impractical to know whether those fields are in the buffer or
> not.
> 
> TDH.SYS.RDALL may be useful to read one small set of metadata fields,
> e.g., fields in one "Class" (TDX categories all global metadata fields
> in different "Class"es).  But this is only an optimization even if
> TDH.SYS.RDALL can be used, so leave this to future consideration.

I appreciate the effort to include some of the discussions had while
boiling this patchset down to its simplest near term form, but this
much text makes the simple patches look much more controversial than
they are. This TDH.SYS.RDALL consideration is not relevant to the
current proposal.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 01/10] x86/virt/tdx: Rename _offset to _member for TD_SYSINFO_MAP() macro
  2024-07-17  3:40 ` [PATCH v2 01/10] x86/virt/tdx: Rename _offset to _member for TD_SYSINFO_MAP() macro Kai Huang
@ 2024-08-05 22:37   ` Dan Williams
  0 siblings, 0 replies; 44+ messages in thread
From: Dan Williams @ 2024-08-05 22:37 UTC (permalink / raw)
  To: Kai Huang, dave.hansen, kirill.shutemov, bp, tglx, peterz, mingo,
	hpa, seanjc, pbonzini, dan.j.williams
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu, kai.huang

Kai Huang wrote:
> TD_SYSINFO_MAP() macro actually takes the member of the 'struct
> tdx_tdmr_sysinfo' as the second argument and uses the offsetof() to
> calculate the offset for that member.
> 
> Rename the macro argument _offset to _member to reflect this.
> 
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

Makes sense

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 02/10] x86/virt/tdx: Unbind global metadata read with 'struct tdx_tdmr_sysinfo'
  2024-07-17  3:40 ` [PATCH v2 02/10] x86/virt/tdx: Unbind global metadata read with 'struct tdx_tdmr_sysinfo' Kai Huang
@ 2024-08-05 23:32   ` Dan Williams
  2024-08-06  0:09     ` Huang, Kai
  0 siblings, 1 reply; 44+ messages in thread
From: Dan Williams @ 2024-08-05 23:32 UTC (permalink / raw)
  To: Kai Huang, dave.hansen, kirill.shutemov, bp, tglx, peterz, mingo,
	hpa, seanjc, pbonzini, dan.j.williams
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu, kai.huang

Kai Huang wrote:
> The TDX module provides a set of "global metadata fields".  They report
> things like TDX module version, supported features, and fields related
> to create/run TDX guests and so on.
> 
> For now the kernel only reads "TD Memory Region" (TDMR) related global
> metadata fields to a 'struct tdx_tdmr_sysinfo' for initializing the TDX
> module, and the metadata reading code can only work with that structure.
> 
> Future changes will need to read other metadata fields that don't make
> sense to populate to the "struct tdx_tdmr_sysinfo".  It's essential to
> provide a generic metadata read infrastructure which is not bound to any
> specific structure.
> 
> To start providing such infrastructure, unbind the metadata reading code
> with the 'struct tdx_tdmr_sysinfo'.
> 
> Note the kernel has a helper macro, TD_SYSINFO_MAP(), for marshaling the
> metadata into the 'struct tdx_tdmr_sysinfo', and currently the macro
> hardcodes the structure name.  As part of unbinding the metadata reading
> code with 'struct tdx_tdmr_sysinfo', it is extended to accept different
> structures.
> 
> Unfortunately, this will result in the usage of TD_SYSINFO_MAP() for
> populating 'struct tdx_tdmr_sysinfo' to be changed to use the structure
> name explicitly for each structure member and make the code longer.  Add
> a wrapper macro which hides the 'struct tdx_tdmr_sysinfo' internally to
> make the code shorter thus better readability.
> 
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> 
> v1 -> v2:
>  - 'st_member' -> 'member'. (Nikolay)
> 
> ---
>  arch/x86/virt/vmx/tdx/tdx.c | 25 ++++++++++++++-----------
>  1 file changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index d8fa9325bf5e..2ce03c3ea017 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -272,9 +272,9 @@ static int read_sys_metadata_field(u64 field_id, u64 *data)
>  
>  static int read_sys_metadata_field16(u64 field_id,
>  				     int offset,
> -				     struct tdx_tdmr_sysinfo *ts)
> +				     void *stbuf)

The loss of all type-safety sticks out, and points to the fact that
@offset was awkward to pass in from the beginning. I would have expected
a calling convention like:

static int read_sys_metadata_field16(u64 field_id, u16 *val)

...and make the caller calculate the buffer in a type-safe way.

The problem with the current code is that it feels like it is planning
ahead for a dynamic metdata reading future, that is not coming, Instead
it could be leaning further into initializing all metadata once.

In other words what is the point of defining:

static const struct field_mapping fields[]

...only to throw away all type-safety and run it in a loop. Why not
unroll the loop, skip the array, and the runtime warning with something
like?

read_sys_metadata_field16(MD_FIELD_ID_MAX_TDMRS, &ts->max_tdmrs);
read_sys_metadata_field16(MD_FIELD_ID_MAX_RESERVED_PER_TDMR, &ts->max_reserved_per_tdmr);
...etc

The unrolled loop is the same amount of work as maintaining @fields.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 03/10] x86/virt/tdx: Support global metadata read for all element sizes
  2024-07-17  3:40 ` [PATCH v2 03/10] x86/virt/tdx: Support global metadata read for all element sizes Kai Huang
@ 2024-08-05 23:45   ` Dan Williams
  0 siblings, 0 replies; 44+ messages in thread
From: Dan Williams @ 2024-08-05 23:45 UTC (permalink / raw)
  To: Kai Huang, dave.hansen, kirill.shutemov, bp, tglx, peterz, mingo,
	hpa, seanjc, pbonzini, dan.j.williams
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu, kai.huang

Kai Huang wrote:
> The TDX module provides "global metadata fields" for software to query.
> Each metadata field is accessible by a unique "metadata field ID".  TDX
> supports 8/16/32/64 bits metadata element sizes.  The size of each
> metadata field is encoded in its associated metadata field ID.
> 
> For now the kernel only reads "TD Memory Region" (TDMR) related global
> metadata fields for module initialization.  All these metadata fields
> are 16-bit, and the kernel only supports reading 16-bit fields.
> 
> Future changes will need to read more metadata fields with other element
> sizes.  To resolve this once for all, extend the existing metadata
> reading code to support reading all element sizes.
> 
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
> 
> v1 -> v2 (Nikolay):
>  - MD_FIELD_BYTES() -> MD_FIELD_ELE_SIZE().
>  - 'bytes' -> 'size' in stbuf_read_sysmd_field().
> 
> ---
>  arch/x86/virt/vmx/tdx/tdx.c | 29 ++++++++++++++++-------------
>  arch/x86/virt/vmx/tdx/tdx.h |  3 ++-
>  2 files changed, 18 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 2ce03c3ea017..4644b324ff86 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -270,23 +270,25 @@ static int read_sys_metadata_field(u64 field_id, u64 *data)
>  	return 0;
>  }
>  
> -static int read_sys_metadata_field16(u64 field_id,
> -				     int offset,
> -				     void *stbuf)
> +/*
> + * Read one global metadata field and store the data to a location of a
> + * given buffer specified by the offset and size (in bytes).
> + */
> +static int stbuf_read_sysmd_field(u64 field_id, void *stbuf, int offset,
> +				  int size)
>  {
> -	u16 *member = stbuf + offset;
> +	void *member = stbuf + offset;
>  	u64 tmp;
>  	int ret;
>  
> -	if (WARN_ON_ONCE(MD_FIELD_ID_ELE_SIZE_CODE(field_id) !=
> -			MD_FIELD_ID_ELE_SIZE_16BIT))
> +	if (WARN_ON_ONCE(MD_FIELD_ELE_SIZE(field_id) != size))
>  		return -EINVAL;

Per the last patch, re: unrolling @fields, it's unfortunate to have a
runtime warning for something that could have been verified at compile
time.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 05/10] x86/virt/tdx: Move field mapping table of getting TDMR info to function local
  2024-07-17  3:40 ` [PATCH v2 05/10] x86/virt/tdx: Move field mapping table of getting TDMR info to function local Kai Huang
@ 2024-08-05 23:48   ` Dan Williams
  0 siblings, 0 replies; 44+ messages in thread
From: Dan Williams @ 2024-08-05 23:48 UTC (permalink / raw)
  To: Kai Huang, dave.hansen, kirill.shutemov, bp, tglx, peterz, mingo,
	hpa, seanjc, pbonzini, dan.j.williams
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu, kai.huang

Kai Huang wrote:
> For now the kernel only reads "TD Memory Region" (TDMR) related global
> metadata fields to a 'struct tdx_tdmr_sysinfo' for initializing the TDX
> module.  The kernel populates the relevant metadata fields into the
> structure using a "field mapping table" of metadata field IDs and the
> structure members.
> 
> Currently the scope of this "field mapping table" is the entire C file.
> Future changes will need to read more global metadata fields that will
> be organized in other structures and use this kind of field mapping
> tables for other structures too.
> 
> Move the field mapping table to the function local to limit its scope so
> that the same name can also be used by other functions.
> 
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
> ---
> 
> v1 -> v2:
>  - Added Nikolay's tag.
> 
> ---
>  arch/x86/virt/vmx/tdx/tdx.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 50d49c539e63..86c47db64e42 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -322,17 +322,17 @@ static int stbuf_read_sysmd_multi(const struct field_mapping *fields,
>  #define TD_SYSINFO_MAP_TDMR_INFO(_field_id, _member)	\
>  	TD_SYSINFO_MAP(_field_id, struct tdx_tdmr_sysinfo, _member)
>  
> -/* Map TD_SYSINFO fields into 'struct tdx_tdmr_sysinfo': */
> -static const struct field_mapping fields[] = {
> -	TD_SYSINFO_MAP_TDMR_INFO(MAX_TDMRS,		max_tdmrs),
> -	TD_SYSINFO_MAP_TDMR_INFO(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr),
> -	TD_SYSINFO_MAP_TDMR_INFO(PAMT_4K_ENTRY_SIZE,    pamt_entry_size[TDX_PS_4K]),
> -	TD_SYSINFO_MAP_TDMR_INFO(PAMT_2M_ENTRY_SIZE,    pamt_entry_size[TDX_PS_2M]),
> -	TD_SYSINFO_MAP_TDMR_INFO(PAMT_1G_ENTRY_SIZE,    pamt_entry_size[TDX_PS_1G]),
> -};
> -
>  static int get_tdx_tdmr_sysinfo(struct tdx_tdmr_sysinfo *tdmr_sysinfo)
>  {
> +	/* Map TD_SYSINFO fields into 'struct tdx_tdmr_sysinfo': */
> +	static const struct field_mapping fields[] = {
> +		TD_SYSINFO_MAP_TDMR_INFO(MAX_TDMRS,		max_tdmrs),
> +		TD_SYSINFO_MAP_TDMR_INFO(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr),
> +		TD_SYSINFO_MAP_TDMR_INFO(PAMT_4K_ENTRY_SIZE,    pamt_entry_size[TDX_PS_4K]),
> +		TD_SYSINFO_MAP_TDMR_INFO(PAMT_2M_ENTRY_SIZE,    pamt_entry_size[TDX_PS_2M]),
> +		TD_SYSINFO_MAP_TDMR_INFO(PAMT_1G_ENTRY_SIZE,    pamt_entry_size[TDX_PS_1G]),
> +	};
> +
>  	/* Populate 'tdmr_sysinfo' fields using the mapping structure above: */
>  	return stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields), tdmr_sysinfo);

Same symbol namespace benefits from skipping the array indirection
altogether.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 02/10] x86/virt/tdx: Unbind global metadata read with 'struct tdx_tdmr_sysinfo'
  2024-08-05 23:32   ` Dan Williams
@ 2024-08-06  0:09     ` Huang, Kai
  2024-08-06  1:13       ` Dan Williams
  0 siblings, 1 reply; 44+ messages in thread
From: Huang, Kai @ 2024-08-06  0:09 UTC (permalink / raw)
  To: Williams, Dan J, Hansen, Dave, kirill.shutemov@linux.intel.com,
	bp@alien8.de, tglx@linutronix.de, peterz@infradead.org,
	mingo@redhat.com, hpa@zytor.com, seanjc@google.com,
	pbonzini@redhat.com
  Cc: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Edgecombe, Rick P, Yamahata, Isaku, Gao, Chao,
	binbin.wu@linux.intel.com



On 6/08/2024 11:32 am, Williams, Dan J wrote:
> Kai Huang wrote:
>> The TDX module provides a set of "global metadata fields".  They report
>> things like TDX module version, supported features, and fields related
>> to create/run TDX guests and so on.
>>
>> For now the kernel only reads "TD Memory Region" (TDMR) related global
>> metadata fields to a 'struct tdx_tdmr_sysinfo' for initializing the TDX
>> module, and the metadata reading code can only work with that structure.
>>
>> Future changes will need to read other metadata fields that don't make
>> sense to populate to the "struct tdx_tdmr_sysinfo".  It's essential to
>> provide a generic metadata read infrastructure which is not bound to any
>> specific structure.
>>
>> To start providing such infrastructure, unbind the metadata reading code
>> with the 'struct tdx_tdmr_sysinfo'.
>>
>> Note the kernel has a helper macro, TD_SYSINFO_MAP(), for marshaling the
>> metadata into the 'struct tdx_tdmr_sysinfo', and currently the macro
>> hardcodes the structure name.  As part of unbinding the metadata reading
>> code with 'struct tdx_tdmr_sysinfo', it is extended to accept different
>> structures.
>>
>> Unfortunately, this will result in the usage of TD_SYSINFO_MAP() for
>> populating 'struct tdx_tdmr_sysinfo' to be changed to use the structure
>> name explicitly for each structure member and make the code longer.  Add
>> a wrapper macro which hides the 'struct tdx_tdmr_sysinfo' internally to
>> make the code shorter thus better readability.
>>
>> Signed-off-by: Kai Huang <kai.huang@intel.com>
>> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
>> ---
>>
>> v1 -> v2:
>>   - 'st_member' -> 'member'. (Nikolay)
>>
>> ---
>>   arch/x86/virt/vmx/tdx/tdx.c | 25 ++++++++++++++-----------
>>   1 file changed, 14 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
>> index d8fa9325bf5e..2ce03c3ea017 100644
>> --- a/arch/x86/virt/vmx/tdx/tdx.c
>> +++ b/arch/x86/virt/vmx/tdx/tdx.c
>> @@ -272,9 +272,9 @@ static int read_sys_metadata_field(u64 field_id, u64 *data)
>>   
>>   static int read_sys_metadata_field16(u64 field_id,
>>   				     int offset,
>> -				     struct tdx_tdmr_sysinfo *ts)
>> +				     void *stbuf)
> 
> The loss of all type-safety sticks out, and points to the fact that
> @offset was awkward to pass in from the beginning. I would have expected
> a calling convention like:
> 
> static int read_sys_metadata_field16(u64 field_id, u16 *val)
> 
> ...and make the caller calculate the buffer in a type-safe way.
> 
> The problem with the current code is that it feels like it is planning
> ahead for a dynamic metdata reading future, that is not coming, Instead
> it could be leaning further into initializing all metadata once.
> 
> In other words what is the point of defining:
> 
> static const struct field_mapping fields[]
> 
> ...only to throw away all type-safety and run it in a loop. Why not
> unroll the loop, skip the array, and the runtime warning with something
> like?
> 
> read_sys_metadata_field16(MD_FIELD_ID_MAX_TDMRS, &ts->max_tdmrs);
> read_sys_metadata_field16(MD_FIELD_ID_MAX_RESERVED_PER_TDMR, &ts->max_reserved_per_tdmr);
> ...etc
> 
> The unrolled loop is the same amount of work as maintaining @fields.

Hi Dan,

Thanks for the feedback.

AFAICT Dave didn't like this way:

https://lore.kernel.org/lkml/cover.1699527082.git.kai.huang@intel.com/T/#me6f615d7845215c278753b57a0bce1162960209d

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 02/10] x86/virt/tdx: Unbind global metadata read with 'struct tdx_tdmr_sysinfo'
  2024-08-06  0:09     ` Huang, Kai
@ 2024-08-06  1:13       ` Dan Williams
  2024-08-07 12:09         ` Huang, Kai
  0 siblings, 1 reply; 44+ messages in thread
From: Dan Williams @ 2024-08-06  1:13 UTC (permalink / raw)
  To: Huang, Kai, Williams, Dan J, Hansen, Dave,
	kirill.shutemov@linux.intel.com, bp@alien8.de, tglx@linutronix.de,
	peterz@infradead.org, mingo@redhat.com, hpa@zytor.com,
	seanjc@google.com, pbonzini@redhat.com
  Cc: x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Edgecombe, Rick P, Yamahata, Isaku, Gao, Chao,
	binbin.wu@linux.intel.com

Huang, Kai wrote:
[..]
> > The unrolled loop is the same amount of work as maintaining @fields.
> 
> Hi Dan,
> 
> Thanks for the feedback.
> 
> AFAICT Dave didn't like this way:
> 
> https://lore.kernel.org/lkml/cover.1699527082.git.kai.huang@intel.com/T/#me6f615d7845215c278753b57a0bce1162960209d

I agree with Dave that the original was unreadable. However, I also
think he glossed over the loss of type-safety and the silliness of
defining an array to precisely map fields only to turn around and do a
runtime check that the statically defined array was filled out
correctly. So I think lets solve the readability problem *and* make the
array definition identical in appearance to unrolled type-safe
execution, something like (UNTESTED!):


diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 4e2b2e2ac9f9..a177fb7332ae 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -270,60 +270,42 @@ static int read_sys_metadata_field(u64 field_id, u64 *data)
 	return 0;
 }
 
-static int read_sys_metadata_field16(u64 field_id,
-				     int offset,
-				     struct tdx_tdmr_sysinfo *ts)
+static int read_sys_metadata_field16(u64 field_id, u16 *val)
 {
-	u16 *ts_member = ((void *)ts) + offset;
 	u64 tmp;
 	int ret;
 
-	if (WARN_ON_ONCE(MD_FIELD_ID_ELE_SIZE_CODE(field_id) !=
-			MD_FIELD_ID_ELE_SIZE_16BIT))
-		return -EINVAL;
-
 	ret = read_sys_metadata_field(field_id, &tmp);
 	if (ret)
 		return ret;
 
-	*ts_member = tmp;
+	*val = tmp;
 
 	return 0;
 }
 
-struct field_mapping {
-	u64 field_id;
-	int offset;
-};
-
-#define TD_SYSINFO_MAP(_field_id, _offset) \
-	{ .field_id = MD_FIELD_ID_##_field_id,	   \
-	  .offset   = offsetof(struct tdx_tdmr_sysinfo, _offset) }
-
-/* Map TD_SYSINFO fields into 'struct tdx_tdmr_sysinfo': */
-static const struct field_mapping fields[] = {
-	TD_SYSINFO_MAP(MAX_TDMRS,	      max_tdmrs),
-	TD_SYSINFO_MAP(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr),
-	TD_SYSINFO_MAP(PAMT_4K_ENTRY_SIZE,    pamt_entry_size[TDX_PS_4K]),
-	TD_SYSINFO_MAP(PAMT_2M_ENTRY_SIZE,    pamt_entry_size[TDX_PS_2M]),
-	TD_SYSINFO_MAP(PAMT_1G_ENTRY_SIZE,    pamt_entry_size[TDX_PS_1G]),
-};
-
-static int get_tdx_tdmr_sysinfo(struct tdx_tdmr_sysinfo *tdmr_sysinfo)
+/*
+ * Assumes locally defined @ret and @ts to convey the error code and the
+ * 'struct tdx_tdmr_sysinfo' instance to fill out
+ */
+#define TD_SYSINFO_MAP(_field_id, _offset)                              \
+	({                                                              \
+		if (ret == 0)                                           \
+			ret = read_sys_metadata_field16(                \
+				MD_FIELD_ID_##_field_id, &ts->_offset); \
+	})
+
+static int get_tdx_tdmr_sysinfo(struct tdx_tdmr_sysinfo *ts)
 {
-	int ret;
-	int i;
+	int ret = 0;
 
-	/* Populate 'tdmr_sysinfo' fields using the mapping structure above: */
-	for (i = 0; i < ARRAY_SIZE(fields); i++) {
-		ret = read_sys_metadata_field16(fields[i].field_id,
-						fields[i].offset,
-						tdmr_sysinfo);
-		if (ret)
-			return ret;
-	}
+	TD_SYSINFO_MAP(MAX_TDMRS,	      max_tdmrs);
+	TD_SYSINFO_MAP(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr);
+	TD_SYSINFO_MAP(PAMT_4K_ENTRY_SIZE,    pamt_entry_size[TDX_PS_4K]);
+	TD_SYSINFO_MAP(PAMT_2M_ENTRY_SIZE,    pamt_entry_size[TDX_PS_2M]);
+	TD_SYSINFO_MAP(PAMT_1G_ENTRY_SIZE,    pamt_entry_size[TDX_PS_1G]);
 
-	return 0;
+	return ret;
 }
 
 /* Calculate the actual TDMR size */






^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 06/10] x86/virt/tdx: Refine a comment to reflect the latest TDX spec
  2024-07-17  3:40 ` [PATCH v2 06/10] x86/virt/tdx: Refine a comment to reflect the latest TDX spec Kai Huang
@ 2024-08-06  3:43   ` Dan Williams
  2024-08-06 11:23     ` Huang, Kai
  0 siblings, 1 reply; 44+ messages in thread
From: Dan Williams @ 2024-08-06  3:43 UTC (permalink / raw)
  To: Kai Huang, dave.hansen, kirill.shutemov, bp, tglx, peterz, mingo,
	hpa, seanjc, pbonzini, dan.j.williams
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu, kai.huang

Kai Huang wrote:
> The old versions of "Intel TDX Module v1.5 ABI Specification" contain
> the definitions of all global metadata field IDs directly in a table.
> 
> However, the latest spec moves those definitions to a dedicated
> 'global_metadata.json' file as part of a new (separate) "Intel TDX
> Module v1.5 ABI definitions" [1].
> 
> Update the comment to reflect this.
> 
> [1]: https://cdrdv2.intel.com/v1/dl/getContent/795381
> 
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
> 
> v1 -> v2:
>  - New patch to fix a comment spotted by Nikolay.
> 
> ---
>  arch/x86/virt/vmx/tdx/tdx.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index fdb879ef6c45..4e43cec19917 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -29,7 +29,7 @@
>  /*
>   * Global scope metadata field ID.
>   *
> - * See Table "Global Scope Metadata", TDX module 1.5 ABI spec.
> + * See the "global_metadata.json" in the "TDX 1.5 ABI definitions".
>   */
>  #define MD_FIELD_ID_MAX_TDMRS			0x9100000100000008ULL
>  #define MD_FIELD_ID_MAX_RESERVED_PER_TDMR	0x9100000100000009ULL

Given this is JSON any plan to just check-in "global_metadata.json"
somewhere in tools/ with a script that queries for a set of fields and
spits them out into a Linux data structure + set of TD_SYSINFO_*_MAP()
calls? Then no future review bandwidth needs to be spent on manually
checking offsets names and values, they will just be pulled from the
script.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 07/10] x86/virt/tdx: Start to track all global metadata in one structure
  2024-07-17  3:40 ` [PATCH v2 07/10] x86/virt/tdx: Start to track all global metadata in one structure Kai Huang
@ 2024-08-06  3:51   ` Dan Williams
  2024-08-06 11:29     ` Huang, Kai
  0 siblings, 1 reply; 44+ messages in thread
From: Dan Williams @ 2024-08-06  3:51 UTC (permalink / raw)
  To: Kai Huang, dave.hansen, kirill.shutemov, bp, tglx, peterz, mingo,
	hpa, seanjc, pbonzini, dan.j.williams
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu, kai.huang

Kai Huang wrote:
> The TDX module provides a set of "global metadata fields".  They report
> things like TDX module version, supported features, and fields related
> to create/run TDX guests and so on.
> 
> Currently the kernel only reads "TD Memory Region" (TDMR) related fields
> for module initialization.  There are immediate needs which require the
> TDX module initialization to read more global metadata including module
> version, supported features and "Convertible Memory Regions" (CMRs).
> 
> Also, KVM will need to read more metadata fields to support baseline TDX
> guests.  In the longer term, other TDX features like TDX Connect (which
> supports assigning trusted IO devices to TDX guest) may also require
> other kernel components such as pci/vt-d to access global metadata.
> 
> To meet all those requirements, the idea is the TDX host core-kernel to
> to provide a centralized, canonical, and read-only structure for the
> global metadata that comes out from the TDX module for all kernel
> components to use.
> 
> As the first step, introduce a new 'struct tdx_sysinfo' to track all
> global metadata fields.
> 
> TDX categories global metadata fields into different "Class"es.  E.g.,
> the current TDMR related fields are under class "TDMR Info".  Instead of
> making 'struct tdx_sysinfo' a plain structure to contain all metadata
> fields, organize them in smaller structures based on the "Class".
> 
> This allows those metadata fields to be used in finer granularity thus
> makes the code more clear.  E.g., the current construct_tdmr() can just
> take the structure which contains "TDMR Info" metadata fields.
> 
> Start with moving 'struct tdx_tdmr_sysinfo' to 'struct tdx_sysinfo', and
> rename 'struct tdx_tdmr_sysinfo' to 'struct tdx_sysinfo_tdmr_info' to
> make it consistent with the "class name".

How about 'struct tdx_sys_info' and 'struct tdx_sys_tdmr_info' to avoid
duplicating 'info' in the symbol name?

Do pure renames indpendent of logic changes to make patches like
this easier to read.

I would also move the pure rename to the front of the patches so the
reviewer spends as minimal amount of time with the deprecated name in
the set.

> Add a new function get_tdx_sysinfo() as the place to read all metadata
> fields, and call it at the beginning of init_tdx_module().  Move the
> existing get_tdx_tdmr_sysinfo() to get_tdx_sysinfo().
> 
> Note there is a functional change: get_tdx_tdmr_sysinfo() is moved from
> after build_tdx_memlist() to before it, but it is fine to do so.
> 
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>  arch/x86/virt/vmx/tdx/tdx.c | 29 +++++++++++++++++------------
>  arch/x86/virt/vmx/tdx/tdx.h | 32 +++++++++++++++++++++++++-------
>  2 files changed, 42 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 86c47db64e42..3253cdfa5207 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -320,11 +320,11 @@ static int stbuf_read_sysmd_multi(const struct field_mapping *fields,
>  }
>  
>  #define TD_SYSINFO_MAP_TDMR_INFO(_field_id, _member)	\
> -	TD_SYSINFO_MAP(_field_id, struct tdx_tdmr_sysinfo, _member)
> +	TD_SYSINFO_MAP(_field_id, struct tdx_sysinfo_tdmr_info, _member)
>  
> -static int get_tdx_tdmr_sysinfo(struct tdx_tdmr_sysinfo *tdmr_sysinfo)
> +static int get_tdx_tdmr_sysinfo(struct tdx_sysinfo_tdmr_info *tdmr_sysinfo)
>  {
> -	/* Map TD_SYSINFO fields into 'struct tdx_tdmr_sysinfo': */
> +	/* Map TD_SYSINFO fields into 'struct tdx_sysinfo_tdmr_info': */
>  	static const struct field_mapping fields[] = {
>  		TD_SYSINFO_MAP_TDMR_INFO(MAX_TDMRS,		max_tdmrs),
>  		TD_SYSINFO_MAP_TDMR_INFO(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr),
> @@ -337,6 +337,11 @@ static int get_tdx_tdmr_sysinfo(struct tdx_tdmr_sysinfo *tdmr_sysinfo)
>  	return stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields), tdmr_sysinfo);
>  }
>  
> +static int get_tdx_sysinfo(struct tdx_sysinfo *sysinfo)
> +{
> +	return get_tdx_tdmr_sysinfo(&sysinfo->tdmr_info);
> +}
> +
>  /* Calculate the actual TDMR size */
>  static int tdmr_size_single(u16 max_reserved_per_tdmr)
>  {
> @@ -353,7 +358,7 @@ static int tdmr_size_single(u16 max_reserved_per_tdmr)
>  }
>  
>  static int alloc_tdmr_list(struct tdmr_info_list *tdmr_list,
> -			   struct tdx_tdmr_sysinfo *tdmr_sysinfo)
> +			   struct tdx_sysinfo_tdmr_info *tdmr_sysinfo)
>  {
>  	size_t tdmr_sz, tdmr_array_sz;
>  	void *tdmr_array;
> @@ -936,7 +941,7 @@ static int tdmrs_populate_rsvd_areas_all(struct tdmr_info_list *tdmr_list,
>   */
>  static int construct_tdmrs(struct list_head *tmb_list,
>  			   struct tdmr_info_list *tdmr_list,
> -			   struct tdx_tdmr_sysinfo *tdmr_sysinfo)
> +			   struct tdx_sysinfo_tdmr_info *tdmr_sysinfo)
>  {
>  	int ret;
>  
> @@ -1109,9 +1114,13 @@ static int init_tdmrs(struct tdmr_info_list *tdmr_list)
>  
>  static int init_tdx_module(void)
>  {
> -	struct tdx_tdmr_sysinfo tdmr_sysinfo;
> +	struct tdx_sysinfo sysinfo;
>  	int ret;
>  
> +	ret = get_tdx_sysinfo(&sysinfo);
> +	if (ret)
> +		return ret;
> +
>  	/*
>  	 * To keep things simple, assume that all TDX-protected memory
>  	 * will come from the page allocator.  Make sure all pages in the
> @@ -1128,17 +1137,13 @@ static int init_tdx_module(void)
>  	if (ret)
>  		goto out_put_tdxmem;
>  
> -	ret = get_tdx_tdmr_sysinfo(&tdmr_sysinfo);
> -	if (ret)
> -		goto err_free_tdxmem;
> -
>  	/* Allocate enough space for constructing TDMRs */
> -	ret = alloc_tdmr_list(&tdx_tdmr_list, &tdmr_sysinfo);
> +	ret = alloc_tdmr_list(&tdx_tdmr_list, &sysinfo.tdmr_info);
>  	if (ret)
>  		goto err_free_tdxmem;
>  
>  	/* Cover all TDX-usable memory regions in TDMRs */
> -	ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &tdmr_sysinfo);
> +	ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &sysinfo.tdmr_info);
>  	if (ret)
>  		goto err_free_tdmrs;
>  
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index 4e43cec19917..b5eb7c35f1dc 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -100,13 +100,6 @@ struct tdx_memblock {
>  	int nid;
>  };
>  
> -/* "TDMR info" part of "Global Scope Metadata" for constructing TDMRs */
> -struct tdx_tdmr_sysinfo {
> -	u16 max_tdmrs;
> -	u16 max_reserved_per_tdmr;
> -	u16 pamt_entry_size[TDX_PS_NR];
> -};
> -
>  /* Warn if kernel has less than TDMR_NR_WARN TDMRs after allocation */
>  #define TDMR_NR_WARN 4
>  
> @@ -119,4 +112,29 @@ struct tdmr_info_list {
>  	int max_tdmrs;	/* How many 'tdmr_info's are allocated */
>  };
>  
> +/*
> + * Kernel-defined structures to contain "Global Scope Metadata".
> + *
> + * TDX global metadata fields are categorized by "Class".  See the
> + * "global_metadata.json" in the "TDX 1.5 ABI Definitions".
> + *
> + * 'struct tdx_sysinfo' is the main structure to contain all metadata
> + * used by the kernel.  It contains sub-structures with each reflecting
> + * the "Class" in the 'global_metadata.json'.
> + *
> + * Note not all metadata fields in each class are defined, only those
> + * used by the kernel are.
> + */
> +
> +/* Class "TDMR Info" */
> +struct tdx_sysinfo_tdmr_info {
> +	u16 max_tdmrs;
> +	u16 max_reserved_per_tdmr;
> +	u16 pamt_entry_size[TDX_PS_NR];
> +};
> +
> +struct tdx_sysinfo {
> +	struct tdx_sysinfo_tdmr_info tdmr_info;
> +};

I would just call this member 'tdmr' since the 'info' is already applied
by being in tdx_sys_info.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 08/10] x86/virt/tdx: Print TDX module basic information
  2024-07-17  3:40 ` [PATCH v2 08/10] x86/virt/tdx: Print TDX module basic information Kai Huang
@ 2024-08-06  4:19   ` Dan Williams
  2024-08-06 11:51     ` Huang, Kai
  2024-08-08 10:31   ` Chenyi Qiang
  1 sibling, 1 reply; 44+ messages in thread
From: Dan Williams @ 2024-08-06  4:19 UTC (permalink / raw)
  To: Kai Huang, dave.hansen, kirill.shutemov, bp, tglx, peterz, mingo,
	hpa, seanjc, pbonzini, dan.j.williams
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu, kai.huang

Kai Huang wrote:
> Currently the kernel doesn't print any information regarding the TDX
> module itself, e.g. module version.  In practice such information is
> useful, especially to the developers.
> 
> For instance, there are a couple of use cases for dumping module basic
> information:
> 
> 1) When something goes wrong around using TDX, the information like TDX
>    module version, supported features etc could be helpful [1][2].
> 
> 2) For Linux, when the user wants to update the TDX module, one needs to
>    replace the old module in a specific location in the EFI partition
>    with the new one so that after reboot the BIOS can load it.  However,
>    after kernel boots, currently the user has no way to verify it is
>    indeed the new module that gets loaded and initialized (e.g., error
>    could happen when replacing the old module).  With the module version
>    dumped the user can verify this easily.
> 
> So dump the basic TDX module information:
> 
>  - TDX module version, and the build date.
>  - TDX module type: Debug or Production.
>  - TDX_FEATURES0: Supported TDX features.
> 
> And dump the information right after reading global metadata, so that
> this information is printed no matter whether module initialization
> fails or not.
> 
> The actual dmesg will look like:
> 
>   virt/tdx: Initializing TDX module: 1.5.00.00.0481 (build_date 20230323, Production module), TDX_FEATURES0 0xfbf
> 
> Link: https://lore.kernel.org/lkml/e2d844ad-182a-4fc0-a06a-d609c9cbef74@suse.com/T/#m352829aedf6680d4628c7e40dc40b332eda93355 [1]
> Link: https://lore.kernel.org/lkml/e2d844ad-182a-4fc0-a06a-d609c9cbef74@suse.com/T/#m351ebcbc006d2e5bc3e7650206a087cb2708d451 [2]
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
> 
> v1 -> v2 (Nikolay):
>  - Change the format to dump TDX basic info.
>  - Slightly improve changelog.
> 
> ---
>  arch/x86/virt/vmx/tdx/tdx.c | 64 +++++++++++++++++++++++++++++++++++++
>  arch/x86/virt/vmx/tdx/tdx.h | 33 ++++++++++++++++++-
>  2 files changed, 96 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 3253cdfa5207..5ac0c411f4f7 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -319,6 +319,58 @@ static int stbuf_read_sysmd_multi(const struct field_mapping *fields,
>  	return 0;
>  }
>  
> +#define TD_SYSINFO_MAP_MOD_INFO(_field_id, _member)	\
> +	TD_SYSINFO_MAP(_field_id, struct tdx_sysinfo_module_info, _member)
> +
> +static int get_tdx_module_info(struct tdx_sysinfo_module_info *modinfo)
> +{
> +	static const struct field_mapping fields[] = {
> +		TD_SYSINFO_MAP_MOD_INFO(SYS_ATTRIBUTES, sys_attributes),
> +		TD_SYSINFO_MAP_MOD_INFO(TDX_FEATURES0,  tdx_features0),
> +	};
> +
> +	return stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields), modinfo);
> +}
> +
> +#define TD_SYSINFO_MAP_MOD_VERSION(_field_id, _member)	\
> +	TD_SYSINFO_MAP(_field_id, struct tdx_sysinfo_module_version, _member)
> +
> +static int get_tdx_module_version(struct tdx_sysinfo_module_version *modver)
> +{
> +	static const struct field_mapping fields[] = {
> +		TD_SYSINFO_MAP_MOD_VERSION(MAJOR_VERSION,    major),
> +		TD_SYSINFO_MAP_MOD_VERSION(MINOR_VERSION,    minor),
> +		TD_SYSINFO_MAP_MOD_VERSION(UPDATE_VERSION,   update),
> +		TD_SYSINFO_MAP_MOD_VERSION(INTERNAL_VERSION, internal),
> +		TD_SYSINFO_MAP_MOD_VERSION(BUILD_NUM,	     build_num),
> +		TD_SYSINFO_MAP_MOD_VERSION(BUILD_DATE,	     build_date),
> +	};
> +
> +	return stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields), modver);

Looks good if stbuf_read_sysmd_multi() is replaced with the work being
done internal to TD_SYSINFO_MAP_MOD_VERSION().

> +}
> +
> +static void print_basic_sysinfo(struct tdx_sysinfo *sysinfo)
> +{
> +	struct tdx_sysinfo_module_version *modver = &sysinfo->module_version;
> +	struct tdx_sysinfo_module_info *modinfo = &sysinfo->module_info;
> +	bool debug = modinfo->sys_attributes & TDX_SYS_ATTR_DEBUG_MODULE;

Why is this casually checking for debug modules, but doing nothing with
that indication? Shouldn't the kernel have policy around whether it
wants to interoperate with a debug module? I would expect that kernel
operation with a debug module would need explicit opt-in consideration.

> +
> +	/*
> +	 * TDX module version encoding:
> +	 *
> +	 *   <major>.<minor>.<update>.<internal>.<build_num>
> +	 *
> +	 * When printed as text, <major> and <minor> are 1-digit,
> +	 * <update> and <internal> are 2-digits and <build_num>
> +	 * is 4-digits.
> +	 */
> +	pr_info("Initializing TDX module: %u.%u.%02u.%02u.%04u (build_date %u, %s module), TDX_FEATURES0 0x%llx\n",
> +			modver->major, modver->minor, modver->update,
> +			modver->internal, modver->build_num,
> +			modver->build_date, debug ? "Debug" : "Production",
> +			modinfo->tdx_features0);

Another nice thing about json scripting is that this flag fields could
be pretty-printed with symbolic names for the flags, but that can come
later.

> +}
> +
>  #define TD_SYSINFO_MAP_TDMR_INFO(_field_id, _member)	\
>  	TD_SYSINFO_MAP(_field_id, struct tdx_sysinfo_tdmr_info, _member)
>  
> @@ -339,6 +391,16 @@ static int get_tdx_tdmr_sysinfo(struct tdx_sysinfo_tdmr_info *tdmr_sysinfo)
>  
>  static int get_tdx_sysinfo(struct tdx_sysinfo *sysinfo)
>  {
> +	int ret;
> +
> +	ret = get_tdx_module_info(&sysinfo->module_info);
> +	if (ret)
> +		return ret;
> +
> +	ret = get_tdx_module_version(&sysinfo->module_version);
> +	if (ret)
> +		return ret;
> +
>  	return get_tdx_tdmr_sysinfo(&sysinfo->tdmr_info);
>  }
>  
> @@ -1121,6 +1183,8 @@ static int init_tdx_module(void)
>  	if (ret)
>  		return ret;
>  
> +	print_basic_sysinfo(&sysinfo);
> +
>  	/*
>  	 * To keep things simple, assume that all TDX-protected memory
>  	 * will come from the page allocator.  Make sure all pages in the
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index b5eb7c35f1dc..861ddf2c2e88 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -31,6 +31,15 @@
>   *
>   * See the "global_metadata.json" in the "TDX 1.5 ABI definitions".
>   */
> +#define MD_FIELD_ID_SYS_ATTRIBUTES		0x0A00000200000000ULL
> +#define MD_FIELD_ID_TDX_FEATURES0		0x0A00000300000008ULL
> +#define MD_FIELD_ID_BUILD_DATE			0x8800000200000001ULL
> +#define MD_FIELD_ID_BUILD_NUM			0x8800000100000002ULL
> +#define MD_FIELD_ID_MINOR_VERSION		0x0800000100000003ULL
> +#define MD_FIELD_ID_MAJOR_VERSION		0x0800000100000004ULL
> +#define MD_FIELD_ID_UPDATE_VERSION		0x0800000100000005ULL
> +#define MD_FIELD_ID_INTERNAL_VERSION		0x0800000100000006ULL

This is where I would rather not take your word for it, or go review
these constants myself if these were autogenerated from parsing json.

> +
>  #define MD_FIELD_ID_MAX_TDMRS			0x9100000100000008ULL
>  #define MD_FIELD_ID_MAX_RESERVED_PER_TDMR	0x9100000100000009ULL
>  #define MD_FIELD_ID_PAMT_4K_ENTRY_SIZE		0x9100000100000010ULL
> @@ -124,8 +133,28 @@ struct tdmr_info_list {
>   *
>   * Note not all metadata fields in each class are defined, only those
>   * used by the kernel are.
> + *
> + * Also note the "bit definitions" are architectural.
>   */
>  
> +/* Class "TDX Module Info" */
> +struct tdx_sysinfo_module_info {

This name feels too generic, perhaps 'tdx_sys_info_features' makes it
clearer?

> +	u32 sys_attributes;
> +	u64 tdx_features0;
> +};
> +
> +#define TDX_SYS_ATTR_DEBUG_MODULE	0x1
> +
> +/* Class "TDX Module Version" */
> +struct tdx_sysinfo_module_version {
> +	u16 major;
> +	u16 minor;
> +	u16 update;
> +	u16 internal;
> +	u16 build_num;
> +	u32 build_date;
> +};
> +
>  /* Class "TDMR Info" */
>  struct tdx_sysinfo_tdmr_info {
>  	u16 max_tdmrs;
> @@ -134,7 +163,9 @@ struct tdx_sysinfo_tdmr_info {
>  };
>  
>  struct tdx_sysinfo {
> -	struct tdx_sysinfo_tdmr_info tdmr_info;
> +	struct tdx_sysinfo_module_info		module_info;
> +	struct tdx_sysinfo_module_version	module_version;
> +	struct tdx_sysinfo_tdmr_info		tdmr_info;

Compare that to:

        struct tdx_sys_info {
                struct tdx_sys_info_features features;
                struct tdx_sys_info_version version;
                struct tdx_sys_info_tdmr tdmr;
        };

...and tell me which oine is easier to read.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 09/10] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes
  2024-07-17  3:40 ` [PATCH v2 09/10] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes Kai Huang
@ 2024-08-06  4:47   ` Dan Williams
  2024-08-06 12:17     ` Huang, Kai
  2024-08-20 18:38   ` Adrian Hunter
  1 sibling, 1 reply; 44+ messages in thread
From: Dan Williams @ 2024-08-06  4:47 UTC (permalink / raw)
  To: Kai Huang, dave.hansen, kirill.shutemov, bp, tglx, peterz, mingo,
	hpa, seanjc, pbonzini, dan.j.williams
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu, kai.huang

Kai Huang wrote:
> A TDX module initialization failure was reported on a Emerald Rapids
> platform:
> 
>   virt/tdx: initialization failed: TDMR [0x0, 0x80000000): reserved areas exhausted.
>   virt/tdx: module initialization failed (-28)
> 
> As part of initializing the TDX module, the kernel informs the TDX
> module of all "TDX-usable memory regions" using an array of TDX defined
> structure "TD Memory Region" (TDMR).  Each TDMR must be in 1GB aligned
> and in 1GB granularity, and all "non-TDX-usable memory holes" within a
> given TDMR must be marked as "reserved areas".  The TDX module reports a
> maximum number of reserved areas that can be supported per TDMR.
> 
> Currently, the kernel finds those "non-TDX-usable memory holes" within a
> given TDMR by walking over a list of "TDX-usable memory regions", which
> essentially reflects the "usable" regions in the e820 table (w/o memory
> hotplug operations precisely, but this is not relevant here).
> 
> As shown above, the root cause of this failure is when the kernel tries
> to construct a TDMR to cover address range [0x0, 0x80000000), there
> are too many memory holes within that range and the number of memory
> holes exceeds the maximum number of reserved areas.
> 
> The E820 table of that platform (see [1] below) reflects this: the
> number of memory holes among e820 "usable" entries exceeds 16, which is
> the maximum number of reserved areas TDX module supports in practice.
> 
> === Fix ===
> 
> There are two options to fix this: 1) reduce the number of memory holes
> when constructing a TDMR to save "reserved areas"; 2) reduce the TDMR's
> size to cover fewer memory regions, thus fewer memory holes.

What about option3? Fix the BIOS. If the BIOS scattershots low memory
holes why does the kernel need to suffer the burden of putting it back
together?

What about option4? Don't use for_each_mem_pfn_range() to populate TDMRs
and instead walk the resource tree asking if each resource is covered by
a CMR then add it to the TDMR list. In other words starting from
page-allocatable memory to populate the list seems like the wrong
starting point.

> Option 1) is possible, and in fact is easier and preferable:
> 
> TDX actually has a concept of "Convertible Memory Regions" (CMRs).  TDX
> reports a list of CMRs that meet TDX's security requirements on memory.
> TDX requires all the "TDX-usable memory regions" that the kernel passes
> to the module via TDMRs, a.k.a, all the "non-reserved regions in TDMRs",
> must be convertible memory.
> 
> In other words, if a memory hole is indeed CMR, then it's not mandatory
> for the kernel to add it to the reserved areas.  By doing so, the number
> of consumed reserved areas can be reduced w/o having any functional
> impact.  The kernel still allocates TDX memory from the page allocator.
> There's no harm if the kernel tells the TDX module some memory regions
> are "TDX-usable" but they will never be allocated by the kernel as TDX
> memory.
> 
> Note this doesn't have any security impact either because the kernel is
> out of TDX's TCB anyway.
> 
> This is feasible because in practice the CMRs just reflect the nature of
> whether the RAM can indeed be used by TDX, thus each CMR tends to be a
> large, uninterrupted range of memory, i.e., unlike the e820 table which
> contains numerous "ACPI *" entries in the first 2G range.  Refer to [2]
> for CMRs reported on the problematic platform using off-tree TDX code.
> 
> So for this particular module initialization failure, the memory holes
> that are within [0x0, 0x80000000) are mostly indeed CMR.  By not adding
> them to the reserved areas, the number of consumed reserved areas for
> the TDMR [0x0, 0x80000000) can be dramatically reduced.
> 
> Option 2) is also theoretically feasible, but it is not desired:
> 
> It requires more complicated logic to handle splitting TDMR into smaller
> ones, which isn't trivial.  There are limitations to splitting TDMR too,
> thus it may not always work: 1) The smallest TDMR is 1GB, and it cannot
> be split any further; 2) This also increases the total number of TDMRs,
> which also has a maximum value limited by the TDX module.
> 
> So, fix this issue by using option 1):
> 
> 1) reading out the CMRs from the TDX module global metadata, and
> 2) changing to find memory holes for a given TDMR based on CMRs, but not
>    based on the list of "TDX-usable memory regions".
> 
> Also dump the CMRs in dmesg.  They are helpful when something goes wrong
> around "constructing the TDMRs and configuring the TDX module with
> them".  Note there are no existing userspace tools that the user can get
> CMRs since they can only be read via SEAMCALL (no CPUID, MSR etc).
> 
> [1] BIOS-E820 table of the problematic platform:
> 
>   BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable
>   BIOS-e820: [mem 0x000000000009f000-0x00000000000fffff] reserved
>   BIOS-e820: [mem 0x0000000000100000-0x000000005d168fff] usable
>   BIOS-e820: [mem 0x000000005d169000-0x000000005d22afff] ACPI data
>   BIOS-e820: [mem 0x000000005d22b000-0x000000005d3cefff] usable
>   BIOS-e820: [mem 0x000000005d3cf000-0x000000005d469fff] reserved
>   BIOS-e820: [mem 0x000000005d46a000-0x000000005e5b2fff] usable
>   BIOS-e820: [mem 0x000000005e5b3000-0x000000005e5c2fff] reserved
>   BIOS-e820: [mem 0x000000005e5c3000-0x000000005e5d2fff] usable
>   BIOS-e820: [mem 0x000000005e5d3000-0x000000005e5e4fff] reserved
>   BIOS-e820: [mem 0x000000005e5e5000-0x000000005eb57fff] usable
>   BIOS-e820: [mem 0x000000005eb58000-0x0000000061357fff] ACPI NVS
>   BIOS-e820: [mem 0x0000000061358000-0x000000006172afff] usable
>   BIOS-e820: [mem 0x000000006172b000-0x0000000061794fff] ACPI data
>   BIOS-e820: [mem 0x0000000061795000-0x00000000617fefff] usable
>   BIOS-e820: [mem 0x00000000617ff000-0x0000000061912fff] ACPI data
>   BIOS-e820: [mem 0x0000000061913000-0x0000000061998fff] usable
>   BIOS-e820: [mem 0x0000000061999000-0x00000000619dffff] ACPI data
>   BIOS-e820: [mem 0x00000000619e0000-0x00000000619e1fff] usable
>   BIOS-e820: [mem 0x00000000619e2000-0x00000000619e9fff] reserved
>   BIOS-e820: [mem 0x00000000619ea000-0x0000000061a26fff] usable
>   BIOS-e820: [mem 0x0000000061a27000-0x0000000061baefff] ACPI data
>   BIOS-e820: [mem 0x0000000061baf000-0x00000000623c2fff] usable
>   BIOS-e820: [mem 0x00000000623c3000-0x0000000062471fff] reserved
>   BIOS-e820: [mem 0x0000000062472000-0x0000000062823fff] usable
>   BIOS-e820: [mem 0x0000000062824000-0x0000000063a24fff] reserved
>   BIOS-e820: [mem 0x0000000063a25000-0x0000000063d57fff] usable
>   BIOS-e820: [mem 0x0000000063d58000-0x0000000064157fff] reserved
>   BIOS-e820: [mem 0x0000000064158000-0x0000000064158fff] usable
>   BIOS-e820: [mem 0x0000000064159000-0x0000000064194fff] reserved
>   BIOS-e820: [mem 0x0000000064195000-0x000000006e9cefff] usable
>   BIOS-e820: [mem 0x000000006e9cf000-0x000000006eccefff] reserved
>   BIOS-e820: [mem 0x000000006eccf000-0x000000006f6fefff] ACPI NVS
>   BIOS-e820: [mem 0x000000006f6ff000-0x000000006f7fefff] ACPI data
>   BIOS-e820: [mem 0x000000006f7ff000-0x000000006f7fffff] usable
>   BIOS-e820: [mem 0x000000006f800000-0x000000008fffffff] reserved
>   ......
> 
> [2] Convertible Memory Regions of the problematic platform:
> 
>   virt/tdx: CMR: [0x100000, 0x6f800000)
>   virt/tdx: CMR: [0x100000000, 0x107a000000)
>   virt/tdx: CMR: [0x1080000000, 0x207c000000)
>   virt/tdx: CMR: [0x2080000000, 0x307c000000)
>   virt/tdx: CMR: [0x3080000000, 0x407c000000)
> 
> Signed-off-by: Kai Huang <kai.huang@intel.com>

It bothers me that this fix buried behind a bunch of other cleanup, but
I guess that is ok if this issue is not urgent. If it *is* urgent then
maybe a fix without so many dependencies would be more appropriate.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 10/10] x86/virt/tdx: Don't initialize module that doesn't support NO_RBP_MOD feature
  2024-07-17  3:40 ` [PATCH v2 10/10] x86/virt/tdx: Don't initialize module that doesn't support NO_RBP_MOD feature Kai Huang
@ 2024-08-06  4:55   ` Dan Williams
  2024-08-06 12:18     ` Huang, Kai
  0 siblings, 1 reply; 44+ messages in thread
From: Dan Williams @ 2024-08-06  4:55 UTC (permalink / raw)
  To: Kai Huang, dave.hansen, kirill.shutemov, bp, tglx, peterz, mingo,
	hpa, seanjc, pbonzini, dan.j.williams
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu, kai.huang

Kai Huang wrote:
> Old TDX modules can clobber RBP in the TDH.VP.ENTER SEAMCALL.  However
> RBP is used as frame pointer in the x86_64 calling convention, and
> clobbering RBP could result in bad things like being unable to unwind
> the stack if any non-maskable exceptions (NMI, #MC etc) happens in that
> gap.
> 
> A new "NO_RBP_MOD" feature was introduced to more recent TDX modules to
> not clobber RBP.  This feature is reported in the TDX_FEATURES0 global
> metadata field via bit 18.
> 
> Don't initialize the TDX module if this feature is not supported [1].
> 
> Link: https://lore.kernel.org/all/c0067319-2653-4cbd-8fee-1ccf21b1e646@suse.com/T/#mef98469c51e2382ead2c537ea189752360bd2bef [1]
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
> ---
> 
> v1 -> v2:
>  - Add tag from Nikolay.
> 
> ---
>  arch/x86/virt/vmx/tdx/tdx.c | 17 +++++++++++++++++
>  arch/x86/virt/vmx/tdx/tdx.h |  1 +
>  2 files changed, 18 insertions(+)
> 
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 3c19295f1f8f..ec6156728423 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -484,6 +484,18 @@ static int get_tdx_sysinfo(struct tdx_sysinfo *sysinfo)
>  	return get_tdx_tdmr_sysinfo(&sysinfo->tdmr_info);
>  }
>  
> +static int check_module_compatibility(struct tdx_sysinfo *sysinfo)

How about check_features()? Almost everything having to do with TDX
concerns the TDX module, so using "module" in a symbol name rarely adds
any useful context.

> +{
> +	u64 tdx_features0 = sysinfo->module_info.tdx_features0;
> +
> +	if (!(tdx_features0 & TDX_FEATURES0_NO_RBP_MOD)) {
> +		pr_err("NO_RBP_MOD feature is not supported\n");

A user would have no idea with this error message how about something
like:

pr_err("frame pointer (RBP) clobber bug present, upgrade TDX module\n");

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 06/10] x86/virt/tdx: Refine a comment to reflect the latest TDX spec
  2024-08-06  3:43   ` Dan Williams
@ 2024-08-06 11:23     ` Huang, Kai
  2024-08-06 19:06       ` Dan Williams
  0 siblings, 1 reply; 44+ messages in thread
From: Huang, Kai @ 2024-08-06 11:23 UTC (permalink / raw)
  To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
	peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
	Williams, Dan J, kirill.shutemov@linux.intel.com,
	pbonzini@redhat.com, tglx@linutronix.de
  Cc: Gao, Chao, kvm@vger.kernel.org, binbin.wu@linux.intel.com,
	linux-kernel@vger.kernel.org, Edgecombe, Rick P, x86@kernel.org,
	Yamahata, Isaku

On Mon, 2024-08-05 at 20:43 -0700, Williams, Dan J wrote:
> Kai Huang wrote:
> > The old versions of "Intel TDX Module v1.5 ABI Specification" contain
> > the definitions of all global metadata field IDs directly in a table.
> > 
> > However, the latest spec moves those definitions to a dedicated
> > 'global_metadata.json' file as part of a new (separate) "Intel TDX
> > Module v1.5 ABI definitions" [1].
> > 
> > Update the comment to reflect this.
> > 
> > [1]: https://cdrdv2.intel.com/v1/dl/getContent/795381
> > 
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > ---
> > 
> > v1 -> v2:
> >  - New patch to fix a comment spotted by Nikolay.
> > 
> > ---
> >  arch/x86/virt/vmx/tdx/tdx.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> > index fdb879ef6c45..4e43cec19917 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.h
> > +++ b/arch/x86/virt/vmx/tdx/tdx.h
> > @@ -29,7 +29,7 @@
> >  /*
> >   * Global scope metadata field ID.
> >   *
> > - * See Table "Global Scope Metadata", TDX module 1.5 ABI spec.
> > + * See the "global_metadata.json" in the "TDX 1.5 ABI definitions".
> >   */
> >  #define MD_FIELD_ID_MAX_TDMRS			0x9100000100000008ULL
> >  #define MD_FIELD_ID_MAX_RESERVED_PER_TDMR	0x9100000100000009ULL
> 
> Given this is JSON any plan to just check-in "global_metadata.json"
> somewhere in tools/ with a script that queries for a set of fields and
> spits them out into a Linux data structure + set of TD_SYSINFO_*_MAP()
> calls? Then no future review bandwidth needs to be spent on manually
> checking offsets names and values, they will just be pulled from the
> script.

This seems a good idea.  I'll add this to my TODO list and evaluate it
first.

One minor issue is some metadata fields may need special handling.  E.g.,
MAX_VCPUS_PER_TD (which is u16) may not be supported by some old TDX
modules, but this isn't an error because we can just treats it as
U16_MAX.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 07/10] x86/virt/tdx: Start to track all global metadata in one structure
  2024-08-06  3:51   ` Dan Williams
@ 2024-08-06 11:29     ` Huang, Kai
  0 siblings, 0 replies; 44+ messages in thread
From: Huang, Kai @ 2024-08-06 11:29 UTC (permalink / raw)
  To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
	peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
	Williams, Dan J, kirill.shutemov@linux.intel.com,
	pbonzini@redhat.com, tglx@linutronix.de
  Cc: Gao, Chao, kvm@vger.kernel.org, binbin.wu@linux.intel.com,
	linux-kernel@vger.kernel.org, Edgecombe, Rick P, x86@kernel.org,
	Yamahata, Isaku

On Mon, 2024-08-05 at 20:51 -0700, Williams, Dan J wrote:
> Kai Huang wrote:
> > The TDX module provides a set of "global metadata fields".  They report
> > things like TDX module version, supported features, and fields related
> > to create/run TDX guests and so on.
> > 
> > Currently the kernel only reads "TD Memory Region" (TDMR) related fields
> > for module initialization.  There are immediate needs which require the
> > TDX module initialization to read more global metadata including module
> > version, supported features and "Convertible Memory Regions" (CMRs).
> > 
> > Also, KVM will need to read more metadata fields to support baseline TDX
> > guests.  In the longer term, other TDX features like TDX Connect (which
> > supports assigning trusted IO devices to TDX guest) may also require
> > other kernel components such as pci/vt-d to access global metadata.
> > 
> > To meet all those requirements, the idea is the TDX host core-kernel to
> > to provide a centralized, canonical, and read-only structure for the
> > global metadata that comes out from the TDX module for all kernel
> > components to use.
> > 
> > As the first step, introduce a new 'struct tdx_sysinfo' to track all
> > global metadata fields.
> > 
> > TDX categories global metadata fields into different "Class"es.  E.g.,
> > the current TDMR related fields are under class "TDMR Info".  Instead of
> > making 'struct tdx_sysinfo' a plain structure to contain all metadata
> > fields, organize them in smaller structures based on the "Class".
> > 
> > This allows those metadata fields to be used in finer granularity thus
> > makes the code more clear.  E.g., the current construct_tdmr() can just
> > take the structure which contains "TDMR Info" metadata fields.
> > 
> > Start with moving 'struct tdx_tdmr_sysinfo' to 'struct tdx_sysinfo', and
> > rename 'struct tdx_tdmr_sysinfo' to 'struct tdx_sysinfo_tdmr_info' to
> > make it consistent with the "class name".
> 
> How about 'struct tdx_sys_info' and 'struct tdx_sys_tdmr_info' to avoid
> duplicating 'info' in the symbol name?

Fine to me.  Will do.

> 
> Do pure renames indpendent of logic changes to make patches like
> this easier to read.
> 
> I would also move the pure rename to the front of the patches so the
> reviewer spends as minimal amount of time with the deprecated name in
> the set.

OK.  I'll split out the pure rename patch and move it to earlier place.


[...]

> > +
> > +/* Class "TDMR Info" */
> > +struct tdx_sysinfo_tdmr_info {
> > +	u16 max_tdmrs;
> > +	u16 max_reserved_per_tdmr;
> > +	u16 pamt_entry_size[TDX_PS_NR];
> > +};
> > +
> > +struct tdx_sysinfo {
> > +	struct tdx_sysinfo_tdmr_info tdmr_info;
> > +};
> 
> I would just call this member 'tdmr' since the 'info' is already applied
> by being in tdx_sys_info.

Yeah fine to me.  Thanks.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 08/10] x86/virt/tdx: Print TDX module basic information
  2024-08-06  4:19   ` Dan Williams
@ 2024-08-06 11:51     ` Huang, Kai
  2024-08-06 12:48       ` Huang, Kai
  2024-08-07 21:56       ` Dan Williams
  0 siblings, 2 replies; 44+ messages in thread
From: Huang, Kai @ 2024-08-06 11:51 UTC (permalink / raw)
  To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
	peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
	Williams, Dan J, kirill.shutemov@linux.intel.com,
	pbonzini@redhat.com, tglx@linutronix.de
  Cc: Gao, Chao, kvm@vger.kernel.org, binbin.wu@linux.intel.com,
	linux-kernel@vger.kernel.org, Edgecombe, Rick P, x86@kernel.org,
	Yamahata, Isaku

On Mon, 2024-08-05 at 21:19 -0700, Williams, Dan J wrote:
> Kai Huang wrote:
> > Currently the kernel doesn't print any information regarding the TDX
> > module itself, e.g. module version.  In practice such information is
> > useful, especially to the developers.
> > 
> > For instance, there are a couple of use cases for dumping module basic
> > information:
> > 
> > 1) When something goes wrong around using TDX, the information like TDX
> >    module version, supported features etc could be helpful [1][2].
> > 
> > 2) For Linux, when the user wants to update the TDX module, one needs to
> >    replace the old module in a specific location in the EFI partition
> >    with the new one so that after reboot the BIOS can load it.  However,
> >    after kernel boots, currently the user has no way to verify it is
> >    indeed the new module that gets loaded and initialized (e.g., error
> >    could happen when replacing the old module).  With the module version
> >    dumped the user can verify this easily.
> > 
> > So dump the basic TDX module information:
> > 
> >  - TDX module version, and the build date.
> >  - TDX module type: Debug or Production.
> >  - TDX_FEATURES0: Supported TDX features.
> > 
> > And dump the information right after reading global metadata, so that
> > this information is printed no matter whether module initialization
> > fails or not.
> > 
> > The actual dmesg will look like:
> > 
> >   virt/tdx: Initializing TDX module: 1.5.00.00.0481 (build_date 20230323, Production module), TDX_FEATURES0 0xfbf
> > 
> > Link: https://lore.kernel.org/lkml/e2d844ad-182a-4fc0-a06a-d609c9cbef74@suse.com/T/#m352829aedf6680d4628c7e40dc40b332eda93355 [1]
> > Link: https://lore.kernel.org/lkml/e2d844ad-182a-4fc0-a06a-d609c9cbef74@suse.com/T/#m351ebcbc006d2e5bc3e7650206a087cb2708d451 [2]
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > ---
> > 
> > v1 -> v2 (Nikolay):
> >  - Change the format to dump TDX basic info.
> >  - Slightly improve changelog.
> > 
> > ---
> >  arch/x86/virt/vmx/tdx/tdx.c | 64 +++++++++++++++++++++++++++++++++++++
> >  arch/x86/virt/vmx/tdx/tdx.h | 33 ++++++++++++++++++-
> >  2 files changed, 96 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > index 3253cdfa5207..5ac0c411f4f7 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > @@ -319,6 +319,58 @@ static int stbuf_read_sysmd_multi(const struct field_mapping *fields,
> >  	return 0;
> >  }
> >  
> > +#define TD_SYSINFO_MAP_MOD_INFO(_field_id, _member)	\
> > +	TD_SYSINFO_MAP(_field_id, struct tdx_sysinfo_module_info, _member)
> > +
> > +static int get_tdx_module_info(struct tdx_sysinfo_module_info *modinfo)
> > +{
> > +	static const struct field_mapping fields[] = {
> > +		TD_SYSINFO_MAP_MOD_INFO(SYS_ATTRIBUTES, sys_attributes),
> > +		TD_SYSINFO_MAP_MOD_INFO(TDX_FEATURES0,  tdx_features0),
> > +	};
> > +
> > +	return stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields), modinfo);
> > +}
> > +
> > +#define TD_SYSINFO_MAP_MOD_VERSION(_field_id, _member)	\
> > +	TD_SYSINFO_MAP(_field_id, struct tdx_sysinfo_module_version, _member)
> > +
> > +static int get_tdx_module_version(struct tdx_sysinfo_module_version *modver)
> > +{
> > +	static const struct field_mapping fields[] = {
> > +		TD_SYSINFO_MAP_MOD_VERSION(MAJOR_VERSION,    major),
> > +		TD_SYSINFO_MAP_MOD_VERSION(MINOR_VERSION,    minor),
> > +		TD_SYSINFO_MAP_MOD_VERSION(UPDATE_VERSION,   update),
> > +		TD_SYSINFO_MAP_MOD_VERSION(INTERNAL_VERSION, internal),
> > +		TD_SYSINFO_MAP_MOD_VERSION(BUILD_NUM,	     build_num),
> > +		TD_SYSINFO_MAP_MOD_VERSION(BUILD_DATE,	     build_date),
> > +	};
> > +
> > +	return stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields), modver);
> 
> Looks good if stbuf_read_sysmd_multi() is replaced with the work being
> done internal to TD_SYSINFO_MAP_MOD_VERSION().
> 
> > +}
> > +
> > +static void print_basic_sysinfo(struct tdx_sysinfo *sysinfo)
> > +{
> > +	struct tdx_sysinfo_module_version *modver = &sysinfo->module_version;
> > +	struct tdx_sysinfo_module_info *modinfo = &sysinfo->module_info;
> > +	bool debug = modinfo->sys_attributes & TDX_SYS_ATTR_DEBUG_MODULE;
> 
> Why is this casually checking for debug modules, but doing nothing with
> that indication? Shouldn't the kernel have policy around whether it
> wants to interoperate with a debug module? I would expect that kernel
> operation with a debug module would need explicit opt-in consideration.

For now the purpose is just to print whether module is debug or
production in the dmesg to let the user easily see, just like the module
version info.

Currently Linux depends on the BIOS to load the TDX module.  For that we
need to put the module at /boot/efi/EFI/TDX/ and name it TDX-SEAM.so.  So
given a machine, it's hard for the user to know whether a module is debug
one (the user may be able to get such info from the BIOS log, but it is
not always available for the user).

Yes I agree we should have a policy in the kernel to handle debug module,
but I don't see urgent need of it.  So I would prefer to leave it as
future work when needed.

> 
> > +
> > +	/*
> > +	 * TDX module version encoding:
> > +	 *
> > +	 *   <major>.<minor>.<update>.<internal>.<build_num>
> > +	 *
> > +	 * When printed as text, <major> and <minor> are 1-digit,
> > +	 * <update> and <internal> are 2-digits and <build_num>
> > +	 * is 4-digits.
> > +	 */
> > +	pr_info("Initializing TDX module: %u.%u.%02u.%02u.%04u (build_date %u, %s module), TDX_FEATURES0 0x%llx\n",
> > +			modver->major, modver->minor, modver->update,
> > +			modver->internal, modver->build_num,
> > +			modver->build_date, debug ? "Debug" : "Production",
> > +			modinfo->tdx_features0);
> 
> Another nice thing about json scripting is that this flag fields could
> be pretty-printed with symbolic names for the flags, but that can come
> later.

I'll look into how to auto-generate based on JSON file.

> 
> > +}
> > +
> >  #define TD_SYSINFO_MAP_TDMR_INFO(_field_id, _member)	\
> >  	TD_SYSINFO_MAP(_field_id, struct tdx_sysinfo_tdmr_info, _member)
> >  
> > @@ -339,6 +391,16 @@ static int get_tdx_tdmr_sysinfo(struct tdx_sysinfo_tdmr_info *tdmr_sysinfo)
> >  
> >  static int get_tdx_sysinfo(struct tdx_sysinfo *sysinfo)
> >  {
> > +	int ret;
> > +
> > +	ret = get_tdx_module_info(&sysinfo->module_info);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = get_tdx_module_version(&sysinfo->module_version);
> > +	if (ret)
> > +		return ret;
> > +
> >  	return get_tdx_tdmr_sysinfo(&sysinfo->tdmr_info);
> >  }
> >  
> > @@ -1121,6 +1183,8 @@ static int init_tdx_module(void)
> >  	if (ret)
> >  		return ret;
> >  
> > +	print_basic_sysinfo(&sysinfo);
> > +
> >  	/*
> >  	 * To keep things simple, assume that all TDX-protected memory
> >  	 * will come from the page allocator.  Make sure all pages in the
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> > index b5eb7c35f1dc..861ddf2c2e88 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.h
> > +++ b/arch/x86/virt/vmx/tdx/tdx.h
> > @@ -31,6 +31,15 @@
> >   *
> >   * See the "global_metadata.json" in the "TDX 1.5 ABI definitions".
> >   */
> > +#define MD_FIELD_ID_SYS_ATTRIBUTES		0x0A00000200000000ULL
> > +#define MD_FIELD_ID_TDX_FEATURES0		0x0A00000300000008ULL
> > +#define MD_FIELD_ID_BUILD_DATE			0x8800000200000001ULL
> > +#define MD_FIELD_ID_BUILD_NUM			0x8800000100000002ULL
> > +#define MD_FIELD_ID_MINOR_VERSION		0x0800000100000003ULL
> > +#define MD_FIELD_ID_MAJOR_VERSION		0x0800000100000004ULL
> > +#define MD_FIELD_ID_UPDATE_VERSION		0x0800000100000005ULL
> > +#define MD_FIELD_ID_INTERNAL_VERSION		0x0800000100000006ULL
> 
> This is where I would rather not take your word for it, or go review
> these constants myself if these were autogenerated from parsing json.

I'll look into how to auto-generate based on JSON file.

> 
> > +
> >  #define MD_FIELD_ID_MAX_TDMRS			0x9100000100000008ULL
> >  #define MD_FIELD_ID_MAX_RESERVED_PER_TDMR	0x9100000100000009ULL
> >  #define MD_FIELD_ID_PAMT_4K_ENTRY_SIZE		0x9100000100000010ULL
> > @@ -124,8 +133,28 @@ struct tdmr_info_list {
> >   *
> >   * Note not all metadata fields in each class are defined, only those
> >   * used by the kernel are.
> > + *
> > + * Also note the "bit definitions" are architectural.
> >   */
> >  
> > +/* Class "TDX Module Info" */
> > +struct tdx_sysinfo_module_info {
> 
> This name feels too generic, perhaps 'tdx_sys_info_features' makes it
> clearer?

I wanted to name the structure following the "Class" name in the JSON
file.  Both 'sys_attributes' and 'tdx_featueres0' are under class "Module
Info".

I guess "attributes" are not necessarily features.

> > +	u32 sys_attributes;
> > +	u64 tdx_features0;
> > +};
> > +
> > +#define TDX_SYS_ATTR_DEBUG_MODULE	0x1
> > +
> > +/* Class "TDX Module Version" */
> > +struct tdx_sysinfo_module_version {
> > +	u16 major;
> > +	u16 minor;
> > +	u16 update;
> > +	u16 internal;
> > +	u16 build_num;
> > +	u32 build_date;
> > +};
> > +
> >  /* Class "TDMR Info" */
> >  struct tdx_sysinfo_tdmr_info {
> >  	u16 max_tdmrs;
> > @@ -134,7 +163,9 @@ struct tdx_sysinfo_tdmr_info {
> >  };
> >  
> >  struct tdx_sysinfo {
> > -	struct tdx_sysinfo_tdmr_info tdmr_info;
> > +	struct tdx_sysinfo_module_info		module_info;
> > +	struct tdx_sysinfo_module_version	module_version;
> > +	struct tdx_sysinfo_tdmr_info		tdmr_info;
> 
> Compare that to:
> 
>         struct tdx_sys_info {
>                 struct tdx_sys_info_features features;
>                 struct tdx_sys_info_version version;
>                 struct tdx_sys_info_tdmr tdmr;
>         };
> 
> ...and tell me which oine is easier to read.

I agree this is easier to read if we don't look at the JSON file.  On the
other hand, following JSON file's "Class" names IMHO we can more easily
find which class to look at for a given member.

So I think they both have pros/cons, and I have no hard opinion on this.




^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 09/10] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes
  2024-08-06  4:47   ` Dan Williams
@ 2024-08-06 12:17     ` Huang, Kai
  0 siblings, 0 replies; 44+ messages in thread
From: Huang, Kai @ 2024-08-06 12:17 UTC (permalink / raw)
  To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
	peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
	Williams, Dan J, kirill.shutemov@linux.intel.com,
	pbonzini@redhat.com, tglx@linutronix.de
  Cc: Gao, Chao, kvm@vger.kernel.org, binbin.wu@linux.intel.com,
	linux-kernel@vger.kernel.org, Edgecombe, Rick P, x86@kernel.org,
	Yamahata, Isaku

On Mon, 2024-08-05 at 21:47 -0700, Williams, Dan J wrote:
> Kai Huang wrote:
> > A TDX module initialization failure was reported on a Emerald Rapids
> > platform:
> > 
> >   virt/tdx: initialization failed: TDMR [0x0, 0x80000000): reserved areas exhausted.
> >   virt/tdx: module initialization failed (-28)
> > 
> > As part of initializing the TDX module, the kernel informs the TDX
> > module of all "TDX-usable memory regions" using an array of TDX defined
> > structure "TD Memory Region" (TDMR).  Each TDMR must be in 1GB aligned
> > and in 1GB granularity, and all "non-TDX-usable memory holes" within a
> > given TDMR must be marked as "reserved areas".  The TDX module reports a
> > maximum number of reserved areas that can be supported per TDMR.
> > 
> > Currently, the kernel finds those "non-TDX-usable memory holes" within a
> > given TDMR by walking over a list of "TDX-usable memory regions", which
> > essentially reflects the "usable" regions in the e820 table (w/o memory
> > hotplug operations precisely, but this is not relevant here).
> > 
> > As shown above, the root cause of this failure is when the kernel tries
> > to construct a TDMR to cover address range [0x0, 0x80000000), there
> > are too many memory holes within that range and the number of memory
> > holes exceeds the maximum number of reserved areas.
> > 
> > The E820 table of that platform (see [1] below) reflects this: the
> > number of memory holes among e820 "usable" entries exceeds 16, which is
> > the maximum number of reserved areas TDX module supports in practice.
> > 
> > === Fix ===
> > 
> > There are two options to fix this: 1) reduce the number of memory holes
> > when constructing a TDMR to save "reserved areas"; 2) reduce the TDMR's
> > size to cover fewer memory regions, thus fewer memory holes.
> 
> What about option3? Fix the BIOS. If the BIOS scattershots low memory
> holes why does the kernel need to suffer the burden of putting it back
> together?

I don't feel we have strong justification to ask BIOS to fix this.  I
think it's arguable whether this is a true issue from BIOS's perspective,
because putting bunch of "ACPI data" in the first 2G area doesn't seem
illegitimate?

> 
> What about option4? Don't use for_each_mem_pfn_range() to populate TDMRs
> and instead walk the resource tree asking if each resource is covered by
> a CMR then add it to the TDMR list. In other words starting from
> page-allocatable memory to populate the list seems like the wrong
> starting point.

A "TDX-usable" memory region added to TDMR list must be covered by CMR,
so this is already the logic in the current upstream code.

IIUC the memblock and resource tree should both reflect system RAM
regions that are managed by the kernel MM, so there shouldn't have big
difference between using them.

The problem is not how we add TDX-usable memory regions (based on
memblock or resource tree), but how we find the memory holes.

> 
> > Option 1) is possible, and in fact is easier and preferable:
> > 
> > TDX actually has a concept of "Convertible Memory Regions" (CMRs).  TDX
> > reports a list of CMRs that meet TDX's security requirements on memory.
> > TDX requires all the "TDX-usable memory regions" that the kernel passes
> > to the module via TDMRs, a.k.a, all the "non-reserved regions in TDMRs",
> > must be convertible memory.
> > 
> > In other words, if a memory hole is indeed CMR, then it's not mandatory
> > for the kernel to add it to the reserved areas.  By doing so, the number
> > of consumed reserved areas can be reduced w/o having any functional
> > impact.  The kernel still allocates TDX memory from the page allocator.
> > There's no harm if the kernel tells the TDX module some memory regions
> > are "TDX-usable" but they will never be allocated by the kernel as TDX
> > memory.
> > 
> > Note this doesn't have any security impact either because the kernel is
> > out of TDX's TCB anyway.
> > 
> > This is feasible because in practice the CMRs just reflect the nature of
> > whether the RAM can indeed be used by TDX, thus each CMR tends to be a
> > large, uninterrupted range of memory, i.e., unlike the e820 table which
> > contains numerous "ACPI *" entries in the first 2G range.  Refer to [2]
> > for CMRs reported on the problematic platform using off-tree TDX code.
> > 
> > So for this particular module initialization failure, the memory holes
> > that are within [0x0, 0x80000000) are mostly indeed CMR.  By not adding
> > them to the reserved areas, the number of consumed reserved areas for
> > the TDMR [0x0, 0x80000000) can be dramatically reduced.
> > 
> > Option 2) is also theoretically feasible, but it is not desired:
> > 
> > It requires more complicated logic to handle splitting TDMR into smaller
> > ones, which isn't trivial.  There are limitations to splitting TDMR too,
> > thus it may not always work: 1) The smallest TDMR is 1GB, and it cannot
> > be split any further; 2) This also increases the total number of TDMRs,
> > which also has a maximum value limited by the TDX module.
> > 
> > So, fix this issue by using option 1):
> > 
> > 1) reading out the CMRs from the TDX module global metadata, and
> > 2) changing to find memory holes for a given TDMR based on CMRs, but not
> >    based on the list of "TDX-usable memory regions".
> > 
> > Also dump the CMRs in dmesg.  They are helpful when something goes wrong
> > around "constructing the TDMRs and configuring the TDX module with
> > them".  Note there are no existing userspace tools that the user can get
> > CMRs since they can only be read via SEAMCALL (no CPUID, MSR etc).
> > 
> > [1] BIOS-E820 table of the problematic platform:
> > 
> >   BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable
> >   BIOS-e820: [mem 0x000000000009f000-0x00000000000fffff] reserved
> >   BIOS-e820: [mem 0x0000000000100000-0x000000005d168fff] usable
> >   BIOS-e820: [mem 0x000000005d169000-0x000000005d22afff] ACPI data
> >   BIOS-e820: [mem 0x000000005d22b000-0x000000005d3cefff] usable
> >   BIOS-e820: [mem 0x000000005d3cf000-0x000000005d469fff] reserved
> >   BIOS-e820: [mem 0x000000005d46a000-0x000000005e5b2fff] usable
> >   BIOS-e820: [mem 0x000000005e5b3000-0x000000005e5c2fff] reserved
> >   BIOS-e820: [mem 0x000000005e5c3000-0x000000005e5d2fff] usable
> >   BIOS-e820: [mem 0x000000005e5d3000-0x000000005e5e4fff] reserved
> >   BIOS-e820: [mem 0x000000005e5e5000-0x000000005eb57fff] usable
> >   BIOS-e820: [mem 0x000000005eb58000-0x0000000061357fff] ACPI NVS
> >   BIOS-e820: [mem 0x0000000061358000-0x000000006172afff] usable
> >   BIOS-e820: [mem 0x000000006172b000-0x0000000061794fff] ACPI data
> >   BIOS-e820: [mem 0x0000000061795000-0x00000000617fefff] usable
> >   BIOS-e820: [mem 0x00000000617ff000-0x0000000061912fff] ACPI data
> >   BIOS-e820: [mem 0x0000000061913000-0x0000000061998fff] usable
> >   BIOS-e820: [mem 0x0000000061999000-0x00000000619dffff] ACPI data
> >   BIOS-e820: [mem 0x00000000619e0000-0x00000000619e1fff] usable
> >   BIOS-e820: [mem 0x00000000619e2000-0x00000000619e9fff] reserved
> >   BIOS-e820: [mem 0x00000000619ea000-0x0000000061a26fff] usable
> >   BIOS-e820: [mem 0x0000000061a27000-0x0000000061baefff] ACPI data
> >   BIOS-e820: [mem 0x0000000061baf000-0x00000000623c2fff] usable
> >   BIOS-e820: [mem 0x00000000623c3000-0x0000000062471fff] reserved
> >   BIOS-e820: [mem 0x0000000062472000-0x0000000062823fff] usable
> >   BIOS-e820: [mem 0x0000000062824000-0x0000000063a24fff] reserved
> >   BIOS-e820: [mem 0x0000000063a25000-0x0000000063d57fff] usable
> >   BIOS-e820: [mem 0x0000000063d58000-0x0000000064157fff] reserved
> >   BIOS-e820: [mem 0x0000000064158000-0x0000000064158fff] usable
> >   BIOS-e820: [mem 0x0000000064159000-0x0000000064194fff] reserved
> >   BIOS-e820: [mem 0x0000000064195000-0x000000006e9cefff] usable
> >   BIOS-e820: [mem 0x000000006e9cf000-0x000000006eccefff] reserved
> >   BIOS-e820: [mem 0x000000006eccf000-0x000000006f6fefff] ACPI NVS
> >   BIOS-e820: [mem 0x000000006f6ff000-0x000000006f7fefff] ACPI data
> >   BIOS-e820: [mem 0x000000006f7ff000-0x000000006f7fffff] usable
> >   BIOS-e820: [mem 0x000000006f800000-0x000000008fffffff] reserved
> >   ......
> > 
> > [2] Convertible Memory Regions of the problematic platform:
> > 
> >   virt/tdx: CMR: [0x100000, 0x6f800000)
> >   virt/tdx: CMR: [0x100000000, 0x107a000000)
> >   virt/tdx: CMR: [0x1080000000, 0x207c000000)
> >   virt/tdx: CMR: [0x2080000000, 0x307c000000)
> >   virt/tdx: CMR: [0x3080000000, 0x407c000000)
> > 
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> 
> It bothers me that this fix buried behind a bunch of other cleanup, but
> I guess that is ok if this issue is not urgent. If it *is* urgent then
> maybe a fix without so many dependencies would be more appropriate.

This is not urgent from upstream kernel's perpective, because currently
there's no in-kernel TDX code user despite we have TDX module
initialization code in the tree.

This issue can only be triggered with the off tree, upstreaming-ongoing
KVM TDX patchset.  However there are customers using the whole off tree
TDX code so theoretically they could meet this issue (in fact, this issue
was recently reported from one of the customers).


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 10/10] x86/virt/tdx: Don't initialize module that doesn't support NO_RBP_MOD feature
  2024-08-06  4:55   ` Dan Williams
@ 2024-08-06 12:18     ` Huang, Kai
  0 siblings, 0 replies; 44+ messages in thread
From: Huang, Kai @ 2024-08-06 12:18 UTC (permalink / raw)
  To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
	peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
	Williams, Dan J, kirill.shutemov@linux.intel.com,
	pbonzini@redhat.com, tglx@linutronix.de
  Cc: Gao, Chao, kvm@vger.kernel.org, binbin.wu@linux.intel.com,
	linux-kernel@vger.kernel.org, Edgecombe, Rick P, x86@kernel.org,
	Yamahata, Isaku

On Mon, 2024-08-05 at 21:55 -0700, Williams, Dan J wrote:
> Kai Huang wrote:
> > Old TDX modules can clobber RBP in the TDH.VP.ENTER SEAMCALL.  However
> > RBP is used as frame pointer in the x86_64 calling convention, and
> > clobbering RBP could result in bad things like being unable to unwind
> > the stack if any non-maskable exceptions (NMI, #MC etc) happens in that
> > gap.
> > 
> > A new "NO_RBP_MOD" feature was introduced to more recent TDX modules to
> > not clobber RBP.  This feature is reported in the TDX_FEATURES0 global
> > metadata field via bit 18.
> > 
> > Don't initialize the TDX module if this feature is not supported [1].
> > 
> > Link: https://lore.kernel.org/all/c0067319-2653-4cbd-8fee-1ccf21b1e646@suse.com/T/#mef98469c51e2382ead2c537ea189752360bd2bef [1]
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
> > ---
> > 
> > v1 -> v2:
> >  - Add tag from Nikolay.
> > 
> > ---
> >  arch/x86/virt/vmx/tdx/tdx.c | 17 +++++++++++++++++
> >  arch/x86/virt/vmx/tdx/tdx.h |  1 +
> >  2 files changed, 18 insertions(+)
> > 
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > index 3c19295f1f8f..ec6156728423 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > @@ -484,6 +484,18 @@ static int get_tdx_sysinfo(struct tdx_sysinfo *sysinfo)
> >  	return get_tdx_tdmr_sysinfo(&sysinfo->tdmr_info);
> >  }
> >  
> > +static int check_module_compatibility(struct tdx_sysinfo *sysinfo)
> 
> How about check_features()? Almost everything having to do with TDX
> concerns the TDX module, so using "module" in a symbol name rarely adds
> any useful context.

Yeah fine to me.  Will do.
> 
> > +{
> > +	u64 tdx_features0 = sysinfo->module_info.tdx_features0;
> > +
> > +	if (!(tdx_features0 & TDX_FEATURES0_NO_RBP_MOD)) {
> > +		pr_err("NO_RBP_MOD feature is not supported\n");
> 
> A user would have no idea with this error message how about something
> like:
> 
> pr_err("frame pointer (RBP) clobber bug present, upgrade TDX module\n");

Yeah this is certainly better.  Will do.  Thanks!

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 08/10] x86/virt/tdx: Print TDX module basic information
  2024-08-06 11:51     ` Huang, Kai
@ 2024-08-06 12:48       ` Huang, Kai
  2024-08-07 21:56       ` Dan Williams
  1 sibling, 0 replies; 44+ messages in thread
From: Huang, Kai @ 2024-08-06 12:48 UTC (permalink / raw)
  To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
	peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
	kirill.shutemov@linux.intel.com, Williams, Dan J,
	pbonzini@redhat.com, tglx@linutronix.de
  Cc: Edgecombe, Rick P, kvm@vger.kernel.org, Yamahata, Isaku,
	linux-kernel@vger.kernel.org, Gao, Chao,
	binbin.wu@linux.intel.com, x86@kernel.org


> > >  struct tdx_sysinfo {
> > > -	struct tdx_sysinfo_tdmr_info tdmr_info;
> > > +	struct tdx_sysinfo_module_info		module_info;
> > > +	struct tdx_sysinfo_module_version	module_version;
> > > +	struct tdx_sysinfo_tdmr_info		tdmr_info;
> > 
> > Compare that to:
> > 
> >         struct tdx_sys_info {
> >                 struct tdx_sys_info_features features;
> >                 struct tdx_sys_info_version version;
> >                 struct tdx_sys_info_tdmr tdmr;
> >         };
> > 
> > ...and tell me which oine is easier to read.
> 
> I agree this is easier to read if we don't look at the JSON file.  On the
> other hand, following JSON file's "Class" names IMHO we can more easily
> find which class to look at for a given member.
> 
> So I think they both have pros/cons, and I have no hard opinion on this.
> 

Hi Dan,

Btw, if we aim (either now or eventually) to auto generate all metadata
fields based on JSON file, I think it would be easier to name the
structures based on the "Class" names.  Otherwise we will need to do some
class-specific tweaks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 06/10] x86/virt/tdx: Refine a comment to reflect the latest TDX spec
  2024-08-06 11:23     ` Huang, Kai
@ 2024-08-06 19:06       ` Dan Williams
  2024-08-06 21:01         ` Huang, Kai
  0 siblings, 1 reply; 44+ messages in thread
From: Dan Williams @ 2024-08-06 19:06 UTC (permalink / raw)
  To: Huang, Kai, Hansen, Dave, seanjc@google.com, bp@alien8.de,
	peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
	Williams, Dan J, kirill.shutemov@linux.intel.com,
	pbonzini@redhat.com, tglx@linutronix.de
  Cc: Gao, Chao, kvm@vger.kernel.org, binbin.wu@linux.intel.com,
	linux-kernel@vger.kernel.org, Edgecombe, Rick P, x86@kernel.org,
	Yamahata, Isaku

Huang, Kai wrote:
[..]
> > Given this is JSON any plan to just check-in "global_metadata.json"
> > somewhere in tools/ with a script that queries for a set of fields and
> > spits them out into a Linux data structure + set of TD_SYSINFO_*_MAP()
> > calls? Then no future review bandwidth needs to be spent on manually
> > checking offsets names and values, they will just be pulled from the
> > script.
> 
> This seems a good idea.  I'll add this to my TODO list and evaluate it
> first.
> 
> One minor issue is some metadata fields may need special handling.  E.g.,
> MAX_VCPUS_PER_TD (which is u16) may not be supported by some old TDX
> modules, but this isn't an error because we can just treats it as
> U16_MAX.

TDX Module had better not be breaking us when they remove metadata
fields. So if you know of fields that get removed the module absolutely
cannot cause existing code paths. Linux could maybe grant that some
values start returning an explicit "deprecated" error code in the future
and Linux adds handling for that common case. Outside of that metadata
fields are forever and the module needs to ship placeholder values that
fail gracefully on older kernels.

OS software should not be expected to keep up with the whims of metadata
field removals without an explicit plan to make those future removals
benign to legacy kernels.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 06/10] x86/virt/tdx: Refine a comment to reflect the latest TDX spec
  2024-08-06 19:06       ` Dan Williams
@ 2024-08-06 21:01         ` Huang, Kai
  0 siblings, 0 replies; 44+ messages in thread
From: Huang, Kai @ 2024-08-06 21:01 UTC (permalink / raw)
  To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
	peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
	Williams, Dan J, kirill.shutemov@linux.intel.com,
	pbonzini@redhat.com, tglx@linutronix.de
  Cc: Edgecombe, Rick P, kvm@vger.kernel.org, Yamahata, Isaku,
	linux-kernel@vger.kernel.org, Gao, Chao,
	binbin.wu@linux.intel.com, x86@kernel.org

On Tue, 2024-08-06 at 12:06 -0700, Williams, Dan J wrote:
> Huang, Kai wrote:
> [..]
> > > Given this is JSON any plan to just check-in "global_metadata.json"
> > > somewhere in tools/ with a script that queries for a set of fields and
> > > spits them out into a Linux data structure + set of TD_SYSINFO_*_MAP()
> > > calls? Then no future review bandwidth needs to be spent on manually
> > > checking offsets names and values, they will just be pulled from the
> > > script.
> > 
> > This seems a good idea.  I'll add this to my TODO list and evaluate it
> > first.
> > 
> > One minor issue is some metadata fields may need special handling.  E.g.,
> > MAX_VCPUS_PER_TD (which is u16) may not be supported by some old TDX
> > modules, but this isn't an error because we can just treats it as
> > U16_MAX.
> 
> TDX Module had better not be breaking us when they remove metadata
> fields. So if you know of fields that get removed the module absolutely
> cannot cause existing code paths. 
> 

I don't think they will remove any metadata field.  They may add more
field(s) when new feature is added to a new version module, but not
remove.

One thing we might need to confirm is an new version of module should
always support the features that old modules support.  But IMHO this
should not be relevant if we have a policy below.

> Linux could maybe grant that some
> values start returning an explicit "deprecated" error code in the future
> and Linux adds handling for that common case. Outside of that metadata
> fields are forever and the module needs to ship placeholder values that
> fail gracefully on older kernels.
> 
> OS software should not be expected to keep up with the whims of metadata
> field removals without an explicit plan to make those future removals
> benign to legacy kernels.

I think we can have a simple rule like below?

The kernel decides which metadata fields are essential, and which are
optional.  If any essential field is missing, then kernel cannot use TDX.

In this way the kernel doesn't depend on whims of TDX module version
changes.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 02/10] x86/virt/tdx: Unbind global metadata read with 'struct tdx_tdmr_sysinfo'
  2024-08-06  1:13       ` Dan Williams
@ 2024-08-07 12:09         ` Huang, Kai
  2024-08-26 15:38           ` Adrian Hunter
  0 siblings, 1 reply; 44+ messages in thread
From: Huang, Kai @ 2024-08-07 12:09 UTC (permalink / raw)
  To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
	peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
	Williams, Dan J, kirill.shutemov@linux.intel.com,
	pbonzini@redhat.com, tglx@linutronix.de
  Cc: Gao, Chao, kvm@vger.kernel.org, binbin.wu@linux.intel.com,
	linux-kernel@vger.kernel.org, Edgecombe, Rick P, x86@kernel.org,
	Yamahata, Isaku

On Mon, 2024-08-05 at 18:13 -0700, Dan Williams wrote:
> Huang, Kai wrote:
> [..]
> > > The unrolled loop is the same amount of work as maintaining @fields.
> > 
> > Hi Dan,
> > 
> > Thanks for the feedback.
> > 
> > AFAICT Dave didn't like this way:
> > 
> > https://lore.kernel.org/lkml/cover.1699527082.git.kai.huang@intel.com/T/#me6f615d7845215c278753b57a0bce1162960209d
> 
> I agree with Dave that the original was unreadable. However, I also
> think he glossed over the loss of type-safety and the silliness of
> defining an array to precisely map fields only to turn around and do a
> runtime check that the statically defined array was filled out
> correctly. So I think lets solve the readability problem *and* make the
> array definition identical in appearance to unrolled type-safe
> execution, something like (UNTESTED!):
> 
> 
[...]

> +/*
> + * Assumes locally defined @ret and @ts to convey the error code and the
> + * 'struct tdx_tdmr_sysinfo' instance to fill out
> + */
> +#define TD_SYSINFO_MAP(_field_id, _offset)                              \
> +	({                                                              \
> +		if (ret == 0)                                           \
> +			ret = read_sys_metadata_field16(                \
> +				MD_FIELD_ID_##_field_id, &ts->_offset); \
> +	})
> +

We need to support u16/u32/u64 metadata field sizes, but not just u16.

E.g.:

struct tdx_sysinfo_module_info {                                        
        u32 sys_attributes;                                             
        u64 tdx_features0;                                              
};

has both u32 and u64 in one structure.

To achieve type-safety for all field sizes, I think we need one helper
for each field size.  E.g.,

#define READ_SYSMD_FIELD_FUNC(_size)                            \
static inline int                                               \
read_sys_metadata_field##_size(u64 field_id, u##_size *data)    \
{                                                               \
        u64 tmp;                                                \
        int ret;                                                \
                                                                \
        ret = read_sys_metadata_field(field_id, &tmp);          \
        if (ret)                                                \
                return ret;                                     \
                                                                \
        *data = tmp;                                            \
        return 0;                                               \
}                                                                       

/* For now only u16/u32/u64 are needed */
READ_SYSMD_FIELD_FUNC(16)                                               
READ_SYSMD_FIELD_FUNC(32)                                               
READ_SYSMD_FIELD_FUNC(64)                                               

Is this what you were thinking?

(Btw, I recall that I tried this before for internal review, but AFAICT
Dave didn't like this.)

For the build time check as you replied to the next patch, I agree it's
better than the runtime warning check as done in the current code.

If we still use the type-less 'void *stbuf' function to read metadata
fields for all sizes, then I think we can do below:

/*
 * Read one global metadata field and store the data to a location of a 
 * given buffer specified by the offset and size (in bytes).            
 */
static int stbuf_read_sysmd_field(u64 field_id, void *stbuf, int offset,
                                  int size)                             
{       
        void *member = stbuf + offset;                                  
        u64 tmp;                                                        
        int ret;                                                        

        ret = read_sys_metadata_field(field_id, &tmp);                  
        if (ret)
                return ret;                                             
        
        memcpy(member, &tmp, size);                                     
        
        return 0;                                                       
}                                                                       

/* Wrapper to read one metadata field to u8/u16/u32/u64 */              
#define stbuf_read_sysmd_single(_field_id, _pdata)      \
        stbuf_read_sysmd_field(_field_id, _pdata, 0, 	\
		sizeof(typeof(*(_pdata)))) 

#define CHECK_MD_FIELD_SIZE(_field_id, _st, _member)    \
        BUILD_BUG_ON(MD_FIELD_ELE_SIZE(MD_FIELD_ID_##_field_id) != \
                        sizeof(_st->_member))

#define TD_SYSINFO_MAP_TEST(_field_id, _st, _member)                    \
        ({                                                              \
                if (ret) {                                              \
                        CHECK_MD_FIELD_SIZE(_field_id, _st, _member);   \
                        ret = stbuf_read_sysmd_single(                  \
                                        MD_FIELD_ID_##_field_id,        \
                                        &_st->_member);                 \
                }                                                       \
         })

static int get_tdx_module_info(struct tdx_sysinfo_module_info *modinfo)
{
        int ret = 0;

#define TD_SYSINFO_MAP_MOD_INFO(_field_id, _member)     \
        TD_SYSINFO_MAP_TEST(_field_id, modinfo, _member)

        TD_SYSINFO_MAP_MOD_INFO(SYS_ATTRIBUTES, sys_attributes);
        TD_SYSINFO_MAP_MOD_INFO(TDX_FEATURES0,  tdx_features0);

        return ret;
}

With the build time check above, I think it's OK to lose the type-safe
inside the stbuf_read_sysmd_field(), and the code is simpler IMHO.

Any comments?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 08/10] x86/virt/tdx: Print TDX module basic information
  2024-08-06 11:51     ` Huang, Kai
  2024-08-06 12:48       ` Huang, Kai
@ 2024-08-07 21:56       ` Dan Williams
  2024-08-07 22:32         ` Huang, Kai
  1 sibling, 1 reply; 44+ messages in thread
From: Dan Williams @ 2024-08-07 21:56 UTC (permalink / raw)
  To: Huang, Kai, Hansen, Dave, seanjc@google.com, bp@alien8.de,
	peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
	Williams, Dan J, kirill.shutemov@linux.intel.com,
	pbonzini@redhat.com, tglx@linutronix.de
  Cc: Gao, Chao, kvm@vger.kernel.org, binbin.wu@linux.intel.com,
	linux-kernel@vger.kernel.org, Edgecombe, Rick P, x86@kernel.org,
	Yamahata, Isaku

Huang, Kai wrote:
> On Mon, 2024-08-05 at 21:19 -0700, Williams, Dan J wrote:
> > Kai Huang wrote:
> > > Currently the kernel doesn't print any information regarding the TDX
> > > module itself, e.g. module version.  In practice such information is
> > > useful, especially to the developers.
> > > 
> > > For instance, there are a couple of use cases for dumping module basic
> > > information:
> > > 
> > > 1) When something goes wrong around using TDX, the information like TDX
> > >    module version, supported features etc could be helpful [1][2].
> > > 
> > > 2) For Linux, when the user wants to update the TDX module, one needs to
> > >    replace the old module in a specific location in the EFI partition
> > >    with the new one so that after reboot the BIOS can load it.  However,
> > >    after kernel boots, currently the user has no way to verify it is
> > >    indeed the new module that gets loaded and initialized (e.g., error
> > >    could happen when replacing the old module).  With the module version
> > >    dumped the user can verify this easily.
> > > 
> > > So dump the basic TDX module information:
> > > 
> > >  - TDX module version, and the build date.
> > >  - TDX module type: Debug or Production.
> > >  - TDX_FEATURES0: Supported TDX features.
> > > 
> > > And dump the information right after reading global metadata, so that
> > > this information is printed no matter whether module initialization
> > > fails or not.
> > > 
> > > The actual dmesg will look like:
> > > 
> > >   virt/tdx: Initializing TDX module: 1.5.00.00.0481 (build_date 20230323, Production module), TDX_FEATURES0 0xfbf
> > > 
> > > Link: https://lore.kernel.org/lkml/e2d844ad-182a-4fc0-a06a-d609c9cbef74@suse.com/T/#m352829aedf6680d4628c7e40dc40b332eda93355 [1]
> > > Link: https://lore.kernel.org/lkml/e2d844ad-182a-4fc0-a06a-d609c9cbef74@suse.com/T/#m351ebcbc006d2e5bc3e7650206a087cb2708d451 [2]
> > > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > > ---
> > > 
> > > v1 -> v2 (Nikolay):
> > >  - Change the format to dump TDX basic info.
> > >  - Slightly improve changelog.
> > > 
> > > ---
> > >  arch/x86/virt/vmx/tdx/tdx.c | 64 +++++++++++++++++++++++++++++++++++++
> > >  arch/x86/virt/vmx/tdx/tdx.h | 33 ++++++++++++++++++-
> > >  2 files changed, 96 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > > index 3253cdfa5207..5ac0c411f4f7 100644
> > > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > > @@ -319,6 +319,58 @@ static int stbuf_read_sysmd_multi(const struct field_mapping *fields,
> > >  	return 0;
> > >  }
> > >  
> > > +#define TD_SYSINFO_MAP_MOD_INFO(_field_id, _member)	\
> > > +	TD_SYSINFO_MAP(_field_id, struct tdx_sysinfo_module_info, _member)
> > > +
> > > +static int get_tdx_module_info(struct tdx_sysinfo_module_info *modinfo)
> > > +{
> > > +	static const struct field_mapping fields[] = {
> > > +		TD_SYSINFO_MAP_MOD_INFO(SYS_ATTRIBUTES, sys_attributes),
> > > +		TD_SYSINFO_MAP_MOD_INFO(TDX_FEATURES0,  tdx_features0),
> > > +	};
> > > +
> > > +	return stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields), modinfo);
> > > +}
> > > +
> > > +#define TD_SYSINFO_MAP_MOD_VERSION(_field_id, _member)	\
> > > +	TD_SYSINFO_MAP(_field_id, struct tdx_sysinfo_module_version, _member)
> > > +
> > > +static int get_tdx_module_version(struct tdx_sysinfo_module_version *modver)
> > > +{
> > > +	static const struct field_mapping fields[] = {
> > > +		TD_SYSINFO_MAP_MOD_VERSION(MAJOR_VERSION,    major),
> > > +		TD_SYSINFO_MAP_MOD_VERSION(MINOR_VERSION,    minor),
> > > +		TD_SYSINFO_MAP_MOD_VERSION(UPDATE_VERSION,   update),
> > > +		TD_SYSINFO_MAP_MOD_VERSION(INTERNAL_VERSION, internal),
> > > +		TD_SYSINFO_MAP_MOD_VERSION(BUILD_NUM,	     build_num),
> > > +		TD_SYSINFO_MAP_MOD_VERSION(BUILD_DATE,	     build_date),
> > > +	};
> > > +
> > > +	return stbuf_read_sysmd_multi(fields, ARRAY_SIZE(fields), modver);
> > 
> > Looks good if stbuf_read_sysmd_multi() is replaced with the work being
> > done internal to TD_SYSINFO_MAP_MOD_VERSION().
> > 
> > > +}
> > > +
> > > +static void print_basic_sysinfo(struct tdx_sysinfo *sysinfo)
> > > +{
> > > +	struct tdx_sysinfo_module_version *modver = &sysinfo->module_version;
> > > +	struct tdx_sysinfo_module_info *modinfo = &sysinfo->module_info;
> > > +	bool debug = modinfo->sys_attributes & TDX_SYS_ATTR_DEBUG_MODULE;
> > 
> > Why is this casually checking for debug modules, but doing nothing with
> > that indication? Shouldn't the kernel have policy around whether it
> > wants to interoperate with a debug module? I would expect that kernel
> > operation with a debug module would need explicit opt-in consideration.
> 
> For now the purpose is just to print whether module is debug or
> production in the dmesg to let the user easily see, just like the module
> version info.
> 
> Currently Linux depends on the BIOS to load the TDX module.  For that we
> need to put the module at /boot/efi/EFI/TDX/ and name it TDX-SEAM.so.  So
> given a machine, it's hard for the user to know whether a module is debug
> one (the user may be able to get such info from the BIOS log, but it is
> not always available for the user).
> 
> Yes I agree we should have a policy in the kernel to handle debug module,
> but I don't see urgent need of it.  So I would prefer to leave it as
> future work when needed.

Then lets leave printing it as future work as well. It has no value
outside of folks that can get their hands on a platform and a
module-build that enables debug and to my knowledge that capability is
not openly available.

In the meantime I assume TDs will just need to be careful to check for
this detail in their attestation report. It serves no real purpose to
the VMM kernel.

[..]
> > This name feels too generic, perhaps 'tdx_sys_info_features' makes it
> > clearer?
> 
> I wanted to name the structure following the "Class" name in the JSON
> file.  Both 'sys_attributes' and 'tdx_featueres0' are under class "Module
> Info".

I am not sure how far we need to take fidelity to the naming choices
that the TDX module makes. It would likely be sufficient to
note the class name in a comment for the origin of the fields, i.e. the
script has some mapping like:

{ class name, field name } => { linux struct name, linux attribute name }

...where they are mostly 1:1, but Linux has the option of picking more
relevant names, especially since the class names are not directly
reusable as Linux data type names.

> I guess "attributes" are not necessarily features.

Sure, but given that attributes have no real value to the VMM kernel at
this point and features do, then name the data structure by its primary
use.

> > > +	u32 sys_attributes;
> > > +	u64 tdx_features0;
> > > +};
> > > +
> > > +#define TDX_SYS_ATTR_DEBUG_MODULE	0x1
> > > +
> > > +/* Class "TDX Module Version" */
> > > +struct tdx_sysinfo_module_version {
> > > +	u16 major;
> > > +	u16 minor;
> > > +	u16 update;
> > > +	u16 internal;
> > > +	u16 build_num;
> > > +	u32 build_date;
> > > +};
> > > +
> > >  /* Class "TDMR Info" */
> > >  struct tdx_sysinfo_tdmr_info {
> > >  	u16 max_tdmrs;
> > > @@ -134,7 +163,9 @@ struct tdx_sysinfo_tdmr_info {
> > >  };
> > >  
> > >  struct tdx_sysinfo {
> > > -	struct tdx_sysinfo_tdmr_info tdmr_info;
> > > +	struct tdx_sysinfo_module_info		module_info;
> > > +	struct tdx_sysinfo_module_version	module_version;
> > > +	struct tdx_sysinfo_tdmr_info		tdmr_info;
> > 
> > Compare that to:
> > 
> >         struct tdx_sys_info {
> >                 struct tdx_sys_info_features features;
> >                 struct tdx_sys_info_version version;
> >                 struct tdx_sys_info_tdmr tdmr;
> >         };
> > 
> > ...and tell me which oine is easier to read.
> 
> I agree this is easier to read if we don't look at the JSON file.  On the
> other hand, following JSON file's "Class" names IMHO we can more easily
> find which class to look at for a given member.
> 
> So I think they both have pros/cons, and I have no hard opinion on this.

Yeah, it is arbitrary. All I can offer is this quote from Ingo when I
did the initial ACPI NFIT enabling and spilled all of its awkward
terminology into the Linux implementation [1]:

"So why on earth is this whole concept and the naming itself
('drivers/block/nd/' stands for 'NFIT Defined', apparently) revolving
around a specific 'firmware' mindset and revolving around specific,
weirdly named, overly complicated looking firmware interfaces that come
with their own new weird glossary??"

The TDX "Class" names are not completely unreasonable, but if they only
get replicated as part of kdoc comments on the data structures I think
that's ok. 

[1]: http://lore.kernel.org/20150420070624.GB13876@gmail.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 08/10] x86/virt/tdx: Print TDX module basic information
  2024-08-07 21:56       ` Dan Williams
@ 2024-08-07 22:32         ` Huang, Kai
  0 siblings, 0 replies; 44+ messages in thread
From: Huang, Kai @ 2024-08-07 22:32 UTC (permalink / raw)
  To: Dan Williams, Hansen, Dave, seanjc@google.com, bp@alien8.de,
	peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
	kirill.shutemov@linux.intel.com, pbonzini@redhat.com,
	tglx@linutronix.de
  Cc: Gao, Chao, kvm@vger.kernel.org, binbin.wu@linux.intel.com,
	linux-kernel@vger.kernel.org, Edgecombe, Rick P, x86@kernel.org,
	Yamahata, Isaku


>>>> +static void print_basic_sysinfo(struct tdx_sysinfo *sysinfo)
>>>> +{
>>>> +	struct tdx_sysinfo_module_version *modver = &sysinfo->module_version;
>>>> +	struct tdx_sysinfo_module_info *modinfo = &sysinfo->module_info;
>>>> +	bool debug = modinfo->sys_attributes & TDX_SYS_ATTR_DEBUG_MODULE;
>>>
>>> Why is this casually checking for debug modules, but doing nothing with
>>> that indication? Shouldn't the kernel have policy around whether it
>>> wants to interoperate with a debug module? I would expect that kernel
>>> operation with a debug module would need explicit opt-in consideration.
>>
>> For now the purpose is just to print whether module is debug or
>> production in the dmesg to let the user easily see, just like the module
>> version info.
>>
>> Currently Linux depends on the BIOS to load the TDX module.  For that we
>> need to put the module at /boot/efi/EFI/TDX/ and name it TDX-SEAM.so.  So
>> given a machine, it's hard for the user to know whether a module is debug
>> one (the user may be able to get such info from the BIOS log, but it is
>> not always available for the user).
>>
>> Yes I agree we should have a policy in the kernel to handle debug module,
>> but I don't see urgent need of it.  So I would prefer to leave it as
>> future work when needed.
> 
> Then lets leave printing it as future work as well. It has no value
> outside of folks that can get their hands on a platform and a
> module-build that enables debug and to my knowledge that capability is
> not openly available.
> 
> In the meantime I assume TDs will just need to be careful to check for
> this detail in their attestation report. It serves no real purpose to
> the VMM kernel.

Sure I'll remove.

It's basically for kernel developers and customers who are trying to 
integrating TDX to their environment to easily find some basic module 
info when something went wrong or they just want to check.

So if we don't print debug, then the 'sys_attributes' member is no 
longer needed, that means if we want to keep 'struct 
tdx_sysinfo_module_info' (or a better name in the next version) then it 
will only have one member, which is 'tdx_features0'.

In the long term, we might need to query other 'tdx_featuresN' fields 
since TDX module actually provides a metadata field 'NUM_TDX_FEATURES' 
to report how many fields like 'TDX_FEATURES0' the module has.  But I 
don't see that coming in any near future.

So perhaps we don't need to restrictly follow 1:1 between 'linux 
structure' <-> 'TDX class', and put the 'tdx_features0' together with 
TDX module version members and rename that one to 'struct 
tdx_sys_module_info'?

> 
> [..]
>>> This name feels too generic, perhaps 'tdx_sys_info_features' makes it
>>> clearer?
>>
>> I wanted to name the structure following the "Class" name in the JSON
>> file.  Both 'sys_attributes' and 'tdx_featueres0' are under class "Module
>> Info".
> 
> I am not sure how far we need to take fidelity to the naming choices
> that the TDX module makes. It would likely be sufficient to
> note the class name in a comment for the origin of the fields, i.e. the
> script has some mapping like:
> 
> { class name, field name } => { linux struct name, linux attribute name }
> 
> ...where they are mostly 1:1, but Linux has the option of picking more
> relevant names, especially since the class names are not directly
> reusable as Linux data type names.

Yes this seems better.

> 
>> I guess "attributes" are not necessarily features.
> 
> Sure, but given that attributes have no real value to the VMM kernel at
> this point and features do, then name the data structure by its primary
> use.

Sure.

> 
>>>> +	u32 sys_attributes;
>>>> +	u64 tdx_features0;
>>>> +};
>>>> +
>>>> +#define TDX_SYS_ATTR_DEBUG_MODULE	0x1
>>>> +
>>>> +/* Class "TDX Module Version" */
>>>> +struct tdx_sysinfo_module_version {
>>>> +	u16 major;
>>>> +	u16 minor;
>>>> +	u16 update;
>>>> +	u16 internal;
>>>> +	u16 build_num;
>>>> +	u32 build_date;
>>>> +};
>>>> +
>>>>   /* Class "TDMR Info" */
>>>>   struct tdx_sysinfo_tdmr_info {
>>>>   	u16 max_tdmrs;
>>>> @@ -134,7 +163,9 @@ struct tdx_sysinfo_tdmr_info {
>>>>   };
>>>>   
>>>>   struct tdx_sysinfo {
>>>> -	struct tdx_sysinfo_tdmr_info tdmr_info;
>>>> +	struct tdx_sysinfo_module_info		module_info;
>>>> +	struct tdx_sysinfo_module_version	module_version;
>>>> +	struct tdx_sysinfo_tdmr_info		tdmr_info;
>>>
>>> Compare that to:
>>>
>>>          struct tdx_sys_info {
>>>                  struct tdx_sys_info_features features;
>>>                  struct tdx_sys_info_version version;
>>>                  struct tdx_sys_info_tdmr tdmr;
>>>          };
>>>
>>> ...and tell me which oine is easier to read.
>>
>> I agree this is easier to read if we don't look at the JSON file.  On the
>> other hand, following JSON file's "Class" names IMHO we can more easily
>> find which class to look at for a given member.
>>
>> So I think they both have pros/cons, and I have no hard opinion on this.
> 
> Yeah, it is arbitrary. All I can offer is this quote from Ingo when I
> did the initial ACPI NFIT enabling and spilled all of its awkward
> terminology into the Linux implementation [1]:
> 
> "So why on earth is this whole concept and the naming itself
> ('drivers/block/nd/' stands for 'NFIT Defined', apparently) revolving
> around a specific 'firmware' mindset and revolving around specific,
> weirdly named, overly complicated looking firmware interfaces that come
> with their own new weird glossary??"
> 
> The TDX "Class" names are not completely unreasonable, but if they only
> get replicated as part of kdoc comments on the data structures I think
> that's ok.
> 
> [1]: http://lore.kernel.org/20150420070624.GB13876@gmail.com

Thanks for the info!

I agree we don't need exactly follow TDX "class" to name linux 
structures. We can add a comment to mention which structure/member 
corresponds to which class/member in TDX spec when needed.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 00/10] TDX host: metadata reading tweaks, bug fix and info dump
  2024-08-05 22:36 ` Dan Williams
@ 2024-08-08  0:19   ` Huang, Kai
  0 siblings, 0 replies; 44+ messages in thread
From: Huang, Kai @ 2024-08-08  0:19 UTC (permalink / raw)
  To: Dan Williams, dave.hansen, kirill.shutemov, bp, tglx, peterz,
	mingo, hpa, seanjc, pbonzini
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu



On 6/08/2024 10:36 am, Dan Williams wrote:
> Kai Huang wrote:
>> TL;DR:
>>
>> This series does necessary tweaks to TDX host "global metadata" reading
>> code to fix some immediate issues in the TDX module initialization code,
>> with intention to also provide a flexible code base to support sharing
>> global metadata to KVM (and other kernel components) for future needs.
>>
>> This series, and additional patches to initialize TDX when loading KVM
>> module and read essential metadata fields for KVM TDX can be found at:
>>
>> https://github.com/intel/tdx/commits/kvm-tdxinit/
>>
>> Dear maintainers,
>>
>> This series targets x86 tip.  I also added Dan, KVM maintainers and KVM
>> list so people can review and comment.  Thanks for your time.
>>
>> v1 -> v2:
>>    - Fix comments from Chao and Nikolay.
>>    - A new patch to refine an out-dated comment by Nikolay.
>>    - Collect tags from Nikolay (thanks!).
>>
>> v1: https://lore.kernel.org/linux-kernel/cover.1718538552.git.kai.huang@intel.com/T/
>>
>> === More info ===
>>
>> TDX module provides a set of "global metadata fields" for software to
>> query.  They report things like TDX module version, supported features
>> fields required for creating TDX guests and so on.
>>
>> Today the TDX host code already reads "TD Memory Region" (TDMR) related
>> metadata fields for module initialization.  There are immediate needs
>> that require TDX host code to read more metadata fields:
>>
>>   - Dump basic TDX module info [1];
>>   - Reject module with no NO_RBP_MOD feature support [2];
>>   - Read CMR info to fix a module initialization failure bug [3].
>>
>> Also, the upstreaming-on-going KVM TDX support [4] requires to read more
>> global metadata fields.  In the longer term, the TDX Connect [5] (which
>> supports assigning trusted IO devices to TDX guest) may also require
>> other kernel components (e.g., pci/vt-d) to access more metadata.
>>
>> To meet all of those, the idea is the TDX host core-kernel to provide a
>> centralized, canonical, and read-only structure to contain all global
>> metadata that comes out of TDX module for all kernel components to use.
>>
>> There is an "alternative option to manage global metadata" (see below)
>> but it is not as straightforward as this.
>>
>> This series starts to track all global metadata fields into a single
>> 'struct tdx_sysinfo', and reads more metadata fields to that structure
>> to address the immediate needs as mentioned above.
>>
>> More fields will be added in the near future to support KVM TDX, and the
>> actual sharing/export the "read-only" global metadata for KVM will also
>> be sent out in the near future when that becomes immediate (also see
>> "Share global metadata to KVM" below).
> 
> I think it is important to share why this unified data structure
> proposal reached escape velocity from internal review. The idea that x86
> gets to review growth to this structure over time is an asset for
> maintainability and oversight of what is happening in the downstream
> consumers like KVM and TSM (for TDX Connect).
> 
> A dynamic retrieval API removes that natural auditing of data structure
> patches from tip.git.
> 
> Yes, it requires more touches than letting use cases consume new
> metadata fields at will, but that's net positive for maintainence of the
> kernel and the feedback loop to the TDX module.

Thanks Dan for this.  I think I can somehow integrate your words above 
into the changelog.  (It took me bit of time to digest though due to my 
bad english.)

> 
>> Note, the first couple of patches in this series were from the old
>> patchset "TDX host: Provide TDX module metadata reading APIs" [6].
>>
>> === Further read ===
>>
>> 1) Altertive option to manage global metadata
>>
>> The TDX host core-kernel could also expose/export APIs for reading
>> metadata out of TDX module directly, and all in-kernel TDX users use
>> these APIs and manage their own metadata fields.
>>
>> However this isn't as straightforward as exposing/exporting structure,
>> because the API to read multi fields to a structure requires the caller
>> to build a "mapping table" between field ID to structure member:
>>
>> 	struct kvm_used_metadata {
>> 		u64 member1;
>> 		...
>> 	};
>>
>> 	#define TD_SYSINFO_KVM_MAP(_field_id, _member)	\
>> 		TD_SYSINFO_MAP(_field_id, struct kvm_used_metadata, \
>> 				_member)
>>
>> 	struct tdx_metadata_field_mapping fields[] = {
>> 		TD_SYSINFO_KVM_MAP(FIELD_ID1, member1),
>> 		...
>> 	};
>>
>> 	ret = tdx_sysmd_read_multi(fields, ARRAY_SIZE(fields), buf);
>>
>> Another problem is some metadata field may be accessed by multiple
>> kernel components, e.g., the one reports TDX module features, in which
>> case there will be duplicated code comparing to exposing structure
>> directly.
> 
> A full explanation of what this patch is not doing is a bit overkill.

I'll remove the structure/macro/function details to make it more concise.

> 
>> 2) Share global metadata to KVM
>>
>> To achieve "read-only" centralized global metadata structure, the idea
>> way is to use __ro_after_init.  However currently all global metadata
>> are read by tdx_enable(), which is supposed to be called at any time at
>> runtime thus isn't annotated with __init.
>>
>> The __ro_after_init can be done eventually, but it can only be done
>> after moving VMXON out of KVM to the core-kernel: after that we can
>> read all metadata during kernel boot (thus __ro_after_init), but
>> doesn't necessarily have to do it in tdx_enable().
>>
>> However moving VMXON out of KVM is NOT considered as dependency for the
>> initial KVM TDX support [7].  Thus for the initial support, the idea is
>> TDX host to export a function which returns a "const struct pointer" so
>> KVM won't be able to modify any global metadata.
> 
> For now I think it is sufficient to say that metadata just gets
> populated to a central data structure. Follow on work to protect that
> data structure against post-init updates can come later.

OK.  Will do.

> 
>> 3) TDH.SYS.RD vs TDH.SYS.RDALL
>>
>> The kernel can use two SEAMCALLs to read global metadata: TDH.SYS.RD and
>> TDH.SYS.RDALL.  The former simply reads one metadata field to a 'u64'.
>> The latter tries to read all fields to a 4KB buffer.
>>
>> Currently the kernel only uses the former to read metadata, and this
>> series doesn't choose to use TDH.SYS.RDALL.
>>
>> The main reason is the "layout of all fields in the 4KB buffer" that
>> returned by TDH.SYS.RDALL isn't architectural consistent among different
>> TDX module versions.
>>
>> E.g., some metadata fields may not be supported by the old module, thus
>> they may or may not be in the 4KB buffer depending on module version.
>> And it is impractical to know whether those fields are in the buffer or
>> not.
>>
>> TDH.SYS.RDALL may be useful to read one small set of metadata fields,
>> e.g., fields in one "Class" (TDX categories all global metadata fields
>> in different "Class"es).  But this is only an optimization even if
>> TDH.SYS.RDALL can be used, so leave this to future consideration.
> 
> I appreciate the effort to include some of the discussions had while
> boiling this patchset down to its simplest near term form, but this
> much text makes the simple patches look much more controversial than
> they are. This TDH.SYS.RDALL consideration is not relevant to the
> current proposal.

I'll remove this section.

Again thanks for your feedback.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 08/10] x86/virt/tdx: Print TDX module basic information
  2024-07-17  3:40 ` [PATCH v2 08/10] x86/virt/tdx: Print TDX module basic information Kai Huang
  2024-08-06  4:19   ` Dan Williams
@ 2024-08-08 10:31   ` Chenyi Qiang
  2024-08-08 23:52     ` Huang, Kai
  1 sibling, 1 reply; 44+ messages in thread
From: Chenyi Qiang @ 2024-08-08 10:31 UTC (permalink / raw)
  To: Kai Huang, dave.hansen, kirill.shutemov, bp, tglx, peterz, mingo,
	hpa, seanjc, pbonzini, dan.j.williams
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu



On 7/17/2024 11:40 AM, Kai Huang wrote:
> Currently the kernel doesn't print any information regarding the TDX
> module itself, e.g. module version.  In practice such information is
> useful, especially to the developers.
> 
> For instance, there are a couple of use cases for dumping module basic
> information:
> 
> 1) When something goes wrong around using TDX, the information like TDX
>    module version, supported features etc could be helpful [1][2].
> 
> 2) For Linux, when the user wants to update the TDX module, one needs to
>    replace the old module in a specific location in the EFI partition
>    with the new one so that after reboot the BIOS can load it.  However,
>    after kernel boots, currently the user has no way to verify it is
>    indeed the new module that gets loaded and initialized (e.g., error
>    could happen when replacing the old module).  With the module version
>    dumped the user can verify this easily.
> 
> So dump the basic TDX module information:
> 
>  - TDX module version, and the build date.
>  - TDX module type: Debug or Production.
>  - TDX_FEATURES0: Supported TDX features.
> 
> And dump the information right after reading global metadata, so that
> this information is printed no matter whether module initialization
> fails or not.
> 
> The actual dmesg will look like:
> 
>   virt/tdx: Initializing TDX module: 1.5.00.00.0481 (build_date 20230323, Production module), TDX_FEATURES0 0xfbf
> 
> Link: https://lore.kernel.org/lkml/e2d844ad-182a-4fc0-a06a-d609c9cbef74@suse.com/T/#m352829aedf6680d4628c7e40dc40b332eda93355 [1]
> Link: https://lore.kernel.org/lkml/e2d844ad-182a-4fc0-a06a-d609c9cbef74@suse.com/T/#m351ebcbc006d2e5bc3e7650206a087cb2708d451 [2]
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
> 
> v1 -> v2 (Nikolay):
>  - Change the format to dump TDX basic info.
>  - Slightly improve changelog.
> 
> ---
>  arch/x86/virt/vmx/tdx/tdx.c | 64 +++++++++++++++++++++++++++++++++++++
>  arch/x86/virt/vmx/tdx/tdx.h | 33 ++++++++++++++++++-
>  2 files changed, 96 insertions(+), 1 deletion(-)
> 

[...]

> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index b5eb7c35f1dc..861ddf2c2e88 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -31,6 +31,15 @@
>   *
>   * See the "global_metadata.json" in the "TDX 1.5 ABI definitions".
>   */
> +#define MD_FIELD_ID_SYS_ATTRIBUTES		0x0A00000200000000ULL
> +#define MD_FIELD_ID_TDX_FEATURES0		0x0A00000300000008ULL
> +#define MD_FIELD_ID_BUILD_DATE			0x8800000200000001ULL
> +#define MD_FIELD_ID_BUILD_NUM			0x8800000100000002ULL
> +#define MD_FIELD_ID_MINOR_VERSION		0x0800000100000003ULL
> +#define MD_FIELD_ID_MAJOR_VERSION		0x0800000100000004ULL
> +#define MD_FIELD_ID_UPDATE_VERSION		0x0800000100000005ULL
> +#define MD_FIELD_ID_INTERNAL_VERSION		0x0800000100000006ULL
> +
>  #define MD_FIELD_ID_MAX_TDMRS			0x9100000100000008ULL
>  #define MD_FIELD_ID_MAX_RESERVED_PER_TDMR	0x9100000100000009ULL
>  #define MD_FIELD_ID_PAMT_4K_ENTRY_SIZE		0x9100000100000010ULL
> @@ -124,8 +133,28 @@ struct tdmr_info_list {
>   *
>   * Note not all metadata fields in each class are defined, only those
>   * used by the kernel are.
> + *
> + * Also note the "bit definitions" are architectural.
>   */
>  
> +/* Class "TDX Module Info" */
> +struct tdx_sysinfo_module_info {
> +	u32 sys_attributes;
> +	u64 tdx_features0;
> +};
> +
> +#define TDX_SYS_ATTR_DEBUG_MODULE	0x1

One minor issue, TDX_SYS_ATTR_DEBUG_MODULE is indicated by bit 31 of
sys_attributes.

> +
> +/* Class "TDX Module Version" */
> +struct tdx_sysinfo_module_version {
> +	u16 major;
> +	u16 minor;
> +	u16 update;
> +	u16 internal;
> +	u16 build_num;
> +	u32 build_date;
> +};
> +
>  /* Class "TDMR Info" */
>  struct tdx_sysinfo_tdmr_info {
>  	u16 max_tdmrs;
> @@ -134,7 +163,9 @@ struct tdx_sysinfo_tdmr_info {
>  };
>  
>  struct tdx_sysinfo {
> -	struct tdx_sysinfo_tdmr_info tdmr_info;
> +	struct tdx_sysinfo_module_info		module_info;
> +	struct tdx_sysinfo_module_version	module_version;
> +	struct tdx_sysinfo_tdmr_info		tdmr_info;
>  };
>  
>  #endif

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 08/10] x86/virt/tdx: Print TDX module basic information
  2024-08-08 10:31   ` Chenyi Qiang
@ 2024-08-08 23:52     ` Huang, Kai
  0 siblings, 0 replies; 44+ messages in thread
From: Huang, Kai @ 2024-08-08 23:52 UTC (permalink / raw)
  To: Hansen, Dave, seanjc@google.com, bp@alien8.de, Qiang, Chenyi,
	peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
	kirill.shutemov@linux.intel.com, tglx@linutronix.de,
	pbonzini@redhat.com, Williams, Dan J
  Cc: Gao, Chao, kvm@vger.kernel.org, binbin.wu@linux.intel.com,
	linux-kernel@vger.kernel.org, Edgecombe, Rick P, x86@kernel.org,
	Yamahata, Isaku

On Thu, 2024-08-08 at 18:31 +0800, Chenyi Qiang wrote:
> > +
> > +#define TDX_SYS_ATTR_DEBUG_MODULE	0x1
> 
> One minor issue, TDX_SYS_ATTR_DEBUG_MODULE is indicated by bit 31 of
> sys_attributes.

Facepalm :( Thanks!

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 09/10] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes
  2024-07-17  3:40 ` [PATCH v2 09/10] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes Kai Huang
  2024-08-06  4:47   ` Dan Williams
@ 2024-08-20 18:38   ` Adrian Hunter
  2024-08-27  7:24     ` Huang, Kai
  1 sibling, 1 reply; 44+ messages in thread
From: Adrian Hunter @ 2024-08-20 18:38 UTC (permalink / raw)
  To: Kai Huang, dave.hansen, kirill.shutemov, bp, tglx, peterz, mingo,
	hpa, seanjc, pbonzini, dan.j.williams
  Cc: x86, kvm, linux-kernel, rick.p.edgecombe, isaku.yamahata,
	chao.gao, binbin.wu

> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index 861ddf2c2e88..4b43eb774ffa 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -40,6 +40,10 @@
>  #define MD_FIELD_ID_UPDATE_VERSION		0x0800000100000005ULL
>  #define MD_FIELD_ID_INTERNAL_VERSION		0x0800000100000006ULL
>  
> +#define MD_FIELD_ID_NUM_CMRS			0x9000000100000000ULL
> +#define MD_FIELD_ID_CMR_BASE0			0x9000000300000080ULL
> +#define MD_FIELD_ID_CMR_SIZE0			0x9000000300000100ULL

For scripted checking against "global_metadata.json" it might be better
to stick to the same field names e.g.

	MD_FIELD_ID_CMR_BASE0 -> MD_FIELD_ID_CMR_BASE




^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 02/10] x86/virt/tdx: Unbind global metadata read with 'struct tdx_tdmr_sysinfo'
  2024-08-07 12:09         ` Huang, Kai
@ 2024-08-26 15:38           ` Adrian Hunter
  2024-08-26 22:40             ` Huang, Kai
  0 siblings, 1 reply; 44+ messages in thread
From: Adrian Hunter @ 2024-08-26 15:38 UTC (permalink / raw)
  To: Huang, Kai, Hansen, Dave, seanjc@google.com, bp@alien8.de,
	peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
	Williams, Dan J, kirill.shutemov@linux.intel.com,
	pbonzini@redhat.com, tglx@linutronix.de
  Cc: Gao, Chao, kvm@vger.kernel.org, binbin.wu@linux.intel.com,
	linux-kernel@vger.kernel.org, Edgecombe, Rick P, x86@kernel.org,
	Yamahata, Isaku

On 7/08/24 15:09, Huang, Kai wrote:
> On Mon, 2024-08-05 at 18:13 -0700, Dan Williams wrote:
>> Huang, Kai wrote:
>> [..]
>>>> The unrolled loop is the same amount of work as maintaining @fields.
>>>
>>> Hi Dan,
>>>
>>> Thanks for the feedback.
>>>
>>> AFAICT Dave didn't like this way:
>>>
>>> https://lore.kernel.org/lkml/cover.1699527082.git.kai.huang@intel.com/T/#me6f615d7845215c278753b57a0bce1162960209d
>>
>> I agree with Dave that the original was unreadable. However, I also
>> think he glossed over the loss of type-safety and the silliness of
>> defining an array to precisely map fields only to turn around and do a
>> runtime check that the statically defined array was filled out
>> correctly. So I think lets solve the readability problem *and* make the
>> array definition identical in appearance to unrolled type-safe
>> execution, something like (UNTESTED!):
>>
>>
> [...]
> 
>> +/*
>> + * Assumes locally defined @ret and @ts to convey the error code and the
>> + * 'struct tdx_tdmr_sysinfo' instance to fill out
>> + */
>> +#define TD_SYSINFO_MAP(_field_id, _offset)                              \
>> +	({                                                              \
>> +		if (ret == 0)                                           \
>> +			ret = read_sys_metadata_field16(                \
>> +				MD_FIELD_ID_##_field_id, &ts->_offset); \
>> +	})
>> +
> 
> We need to support u16/u32/u64 metadata field sizes, but not just u16.
> 
> E.g.:
> 
> struct tdx_sysinfo_module_info {                                        
>         u32 sys_attributes;                                             
>         u64 tdx_features0;                                              
> };
> 
> has both u32 and u64 in one structure.
> 
> To achieve type-safety for all field sizes, I think we need one helper
> for each field size.  E.g.,
> 
> #define READ_SYSMD_FIELD_FUNC(_size)                            \
> static inline int                                               \
> read_sys_metadata_field##_size(u64 field_id, u##_size *data)    \
> {                                                               \
>         u64 tmp;                                                \
>         int ret;                                                \
>                                                                 \
>         ret = read_sys_metadata_field(field_id, &tmp);          \
>         if (ret)                                                \
>                 return ret;                                     \
>                                                                 \
>         *data = tmp;                                            \
>         return 0;                                               \
> }                                                                       
> 
> /* For now only u16/u32/u64 are needed */
> READ_SYSMD_FIELD_FUNC(16)                                               
> READ_SYSMD_FIELD_FUNC(32)                                               
> READ_SYSMD_FIELD_FUNC(64)                                               
> 
> Is this what you were thinking?
> 
> (Btw, I recall that I tried this before for internal review, but AFAICT
> Dave didn't like this.)
> 
> For the build time check as you replied to the next patch, I agree it's
> better than the runtime warning check as done in the current code.
> 
> If we still use the type-less 'void *stbuf' function to read metadata
> fields for all sizes, then I think we can do below:
> 
> /*
>  * Read one global metadata field and store the data to a location of a 
>  * given buffer specified by the offset and size (in bytes).            
>  */
> static int stbuf_read_sysmd_field(u64 field_id, void *stbuf, int offset,
>                                   int size)                             
> {       
>         void *member = stbuf + offset;                                  
>         u64 tmp;                                                        
>         int ret;                                                        
> 
>         ret = read_sys_metadata_field(field_id, &tmp);                  
>         if (ret)
>                 return ret;                                             
>         
>         memcpy(member, &tmp, size);                                     
>         
>         return 0;                                                       
> }                                                                       
> 
> /* Wrapper to read one metadata field to u8/u16/u32/u64 */              
> #define stbuf_read_sysmd_single(_field_id, _pdata)      \
>         stbuf_read_sysmd_field(_field_id, _pdata, 0, 	\
> 		sizeof(typeof(*(_pdata)))) 
> 
> #define CHECK_MD_FIELD_SIZE(_field_id, _st, _member)    \
>         BUILD_BUG_ON(MD_FIELD_ELE_SIZE(MD_FIELD_ID_##_field_id) != \
>                         sizeof(_st->_member))
> 
> #define TD_SYSINFO_MAP_TEST(_field_id, _st, _member)                    \
>         ({                                                              \
>                 if (ret) {                                              \
>                         CHECK_MD_FIELD_SIZE(_field_id, _st, _member);   \
>                         ret = stbuf_read_sysmd_single(                  \
>                                         MD_FIELD_ID_##_field_id,        \
>                                         &_st->_member);                 \
>                 }                                                       \
>          })
> 
> static int get_tdx_module_info(struct tdx_sysinfo_module_info *modinfo)
> {
>         int ret = 0;
> 
> #define TD_SYSINFO_MAP_MOD_INFO(_field_id, _member)     \
>         TD_SYSINFO_MAP_TEST(_field_id, modinfo, _member)
> 
>         TD_SYSINFO_MAP_MOD_INFO(SYS_ATTRIBUTES, sys_attributes);
>         TD_SYSINFO_MAP_MOD_INFO(TDX_FEATURES0,  tdx_features0);
> 
>         return ret;
> }
> 
> With the build time check above, I think it's OK to lose the type-safe
> inside the stbuf_read_sysmd_field(), and the code is simpler IMHO.
> 
> Any comments?

BUILD_BUG_ON() requires a function, but it is still
be possible to add a build time check in TD_SYSINFO_MAP
e.g.

#define TD_SYSINFO_CHECK_SIZE(_field_id, _size)			\
	__builtin_choose_expr(MD_FIELD_ELE_SIZE(_field_id) == _size, _size, (void)0)

#define _TD_SYSINFO_MAP(_field_id, _offset, _size)		\
	{ .field_id = _field_id,				\
	  .offset   = _offset,					\
	  .size	    = TD_SYSINFO_CHECK_SIZE(_field_id, _size) }

#define TD_SYSINFO_MAP(_field_id, _struct, _member)		\
	_TD_SYSINFO_MAP(MD_FIELD_ID_##_field_id,		\
			offsetof(_struct, _member),		\
			sizeof(typeof(((_struct *)0)->_member)))



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 02/10] x86/virt/tdx: Unbind global metadata read with 'struct tdx_tdmr_sysinfo'
  2024-08-26 15:38           ` Adrian Hunter
@ 2024-08-26 22:40             ` Huang, Kai
  2024-08-27  4:54               ` Adrian Hunter
  0 siblings, 1 reply; 44+ messages in thread
From: Huang, Kai @ 2024-08-26 22:40 UTC (permalink / raw)
  To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
	peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
	Hunter, Adrian, Williams, Dan J, pbonzini@redhat.com,
	kirill.shutemov@linux.intel.com, tglx@linutronix.de
  Cc: Edgecombe, Rick P, kvm@vger.kernel.org, Yamahata, Isaku,
	linux-kernel@vger.kernel.org, Gao, Chao,
	binbin.wu@linux.intel.com, x86@kernel.org

On Mon, 2024-08-26 at 18:38 +0300, Adrian Hunter wrote:
> On 7/08/24 15:09, Huang, Kai wrote:
> > On Mon, 2024-08-05 at 18:13 -0700, Dan Williams wrote:
> > > Huang, Kai wrote:
> > > [..]
> > > > > The unrolled loop is the same amount of work as maintaining @fields.
> > > > 
> > > > Hi Dan,
> > > > 
> > > > Thanks for the feedback.
> > > > 
> > > > AFAICT Dave didn't like this way:
> > > > 
> > > > https://lore.kernel.org/lkml/cover.1699527082.git.kai.huang@intel.com/T/#me6f615d7845215c278753b57a0bce1162960209d
> > > 
> > > I agree with Dave that the original was unreadable. However, I also
> > > think he glossed over the loss of type-safety and the silliness of
> > > defining an array to precisely map fields only to turn around and do a
> > > runtime check that the statically defined array was filled out
> > > correctly. So I think lets solve the readability problem *and* make the
> > > array definition identical in appearance to unrolled type-safe
> > > execution, something like (UNTESTED!):
> > > 
> > > 
> > [...]
> > 
> > > +/*
> > > + * Assumes locally defined @ret and @ts to convey the error code and the
> > > + * 'struct tdx_tdmr_sysinfo' instance to fill out
> > > + */
> > > +#define TD_SYSINFO_MAP(_field_id, _offset)                              \
> > > +	({                                                              \
> > > +		if (ret == 0)                                           \
> > > +			ret = read_sys_metadata_field16(                \
> > > +				MD_FIELD_ID_##_field_id, &ts->_offset); \
> > > +	})
> > > +
> > 
> > We need to support u16/u32/u64 metadata field sizes, but not just u16.
> > 
> > E.g.:
> > 
> > struct tdx_sysinfo_module_info {                                        
> >         u32 sys_attributes;                                             
> >         u64 tdx_features0;                                              
> > };
> > 
> > has both u32 and u64 in one structure.
> > 
> > To achieve type-safety for all field sizes, I think we need one helper
> > for each field size.  E.g.,
> > 
> > #define READ_SYSMD_FIELD_FUNC(_size)                            \
> > static inline int                                               \
> > read_sys_metadata_field##_size(u64 field_id, u##_size *data)    \
> > {                                                               \
> >         u64 tmp;                                                \
> >         int ret;                                                \
> >                                                                 \
> >         ret = read_sys_metadata_field(field_id, &tmp);          \
> >         if (ret)                                                \
> >                 return ret;                                     \
> >                                                                 \
> >         *data = tmp;                                            \
> >         return 0;                                               \
> > }                                                                       
> > 
> > /* For now only u16/u32/u64 are needed */
> > READ_SYSMD_FIELD_FUNC(16)                                               
> > READ_SYSMD_FIELD_FUNC(32)                                               
> > READ_SYSMD_FIELD_FUNC(64)                                               
> > 
> > Is this what you were thinking?
> > 
> > (Btw, I recall that I tried this before for internal review, but AFAICT
> > Dave didn't like this.)
> > 
> > For the build time check as you replied to the next patch, I agree it's
> > better than the runtime warning check as done in the current code.
> > 
> > If we still use the type-less 'void *stbuf' function to read metadata
> > fields for all sizes, then I think we can do below:
> > 
> > /*
> >  * Read one global metadata field and store the data to a location of a 
> >  * given buffer specified by the offset and size (in bytes).            
> >  */
> > static int stbuf_read_sysmd_field(u64 field_id, void *stbuf, int offset,
> >                                   int size)                             
> > {       
> >         void *member = stbuf + offset;                                  
> >         u64 tmp;                                                        
> >         int ret;                                                        
> > 
> >         ret = read_sys_metadata_field(field_id, &tmp);                  
> >         if (ret)
> >                 return ret;                                             
> >         
> >         memcpy(member, &tmp, size);                                     
> >         
> >         return 0;                                                       
> > }                                                                       
> > 
> > /* Wrapper to read one metadata field to u8/u16/u32/u64 */              
> > #define stbuf_read_sysmd_single(_field_id, _pdata)      \
> >         stbuf_read_sysmd_field(_field_id, _pdata, 0, 	\
> > 		sizeof(typeof(*(_pdata)))) 
> > 
> > #define CHECK_MD_FIELD_SIZE(_field_id, _st, _member)    \
> >         BUILD_BUG_ON(MD_FIELD_ELE_SIZE(MD_FIELD_ID_##_field_id) != \
> >                         sizeof(_st->_member))
> > 
> > #define TD_SYSINFO_MAP_TEST(_field_id, _st, _member)                    \
> >         ({                                                              \
> >                 if (ret) {                                              \
> >                         CHECK_MD_FIELD_SIZE(_field_id, _st, _member);   \
> >                         ret = stbuf_read_sysmd_single(                  \
> >                                         MD_FIELD_ID_##_field_id,        \
> >                                         &_st->_member);                 \
> >                 }                                                       \
> >          })
> > 
> > static int get_tdx_module_info(struct tdx_sysinfo_module_info *modinfo)
> > {
> >         int ret = 0;
> > 
> > #define TD_SYSINFO_MAP_MOD_INFO(_field_id, _member)     \
> >         TD_SYSINFO_MAP_TEST(_field_id, modinfo, _member)
> > 
> >         TD_SYSINFO_MAP_MOD_INFO(SYS_ATTRIBUTES, sys_attributes);
> >         TD_SYSINFO_MAP_MOD_INFO(TDX_FEATURES0,  tdx_features0);
> > 
> >         return ret;
> > }
> > 
> > With the build time check above, I think it's OK to lose the type-safe
> > inside the stbuf_read_sysmd_field(), and the code is simpler IMHO.
> > 
> > Any comments?
> 
> BUILD_BUG_ON() requires a function, but it is still
> be possible to add a build time check in TD_SYSINFO_MAP
> e.g.
> 
> #define TD_SYSINFO_CHECK_SIZE(_field_id, _size)			\
> 	__builtin_choose_expr(MD_FIELD_ELE_SIZE(_field_id) == _size, _size, (void)0)
> 
> #define _TD_SYSINFO_MAP(_field_id, _offset, _size)		\
> 	{ .field_id = _field_id,				\
> 	  .offset   = _offset,					\
> 	  .size	    = TD_SYSINFO_CHECK_SIZE(_field_id, _size) }
> 
> #define TD_SYSINFO_MAP(_field_id, _struct, _member)		\
> 	_TD_SYSINFO_MAP(MD_FIELD_ID_##_field_id,		\
> 			offsetof(_struct, _member),		\
> 			sizeof(typeof(((_struct *)0)->_member)))
> 
> 

Thanks for the comment, but I don't think this meets for our purpose.

We want a build time "error" when the "MD_FIELD_ELE_SIZE(_field_id) == _size"
fails, but not "still initializing the size to 0".  Otherwise, we might get
some unexpected issue (due to size is 0) at runtime, which is worse IMHO than
a runtime check as done in the current upstream code.

I have been trying to add a BUILD_BUG_ON() to the field_mapping structure
initializer, but I haven't found a reliable way to do so.

For now I have completed the new version based on Dan's suggestion, but still
need to work on changelog/coverletter etc, so I think I can send the new
version out and see whether people like it.  We can revert back if that's not
what people want.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 02/10] x86/virt/tdx: Unbind global metadata read with 'struct tdx_tdmr_sysinfo'
  2024-08-26 22:40             ` Huang, Kai
@ 2024-08-27  4:54               ` Adrian Hunter
  2024-08-27  7:22                 ` Huang, Kai
  0 siblings, 1 reply; 44+ messages in thread
From: Adrian Hunter @ 2024-08-27  4:54 UTC (permalink / raw)
  To: Huang, Kai, Hansen, Dave, seanjc@google.com, bp@alien8.de,
	peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
	Williams, Dan J, pbonzini@redhat.com,
	kirill.shutemov@linux.intel.com, tglx@linutronix.de
  Cc: Edgecombe, Rick P, kvm@vger.kernel.org, Yamahata, Isaku,
	linux-kernel@vger.kernel.org, Gao, Chao,
	binbin.wu@linux.intel.com, x86@kernel.org

On 27/08/24 01:40, Huang, Kai wrote:
> On Mon, 2024-08-26 at 18:38 +0300, Adrian Hunter wrote:
>> On 7/08/24 15:09, Huang, Kai wrote:
>>> On Mon, 2024-08-05 at 18:13 -0700, Dan Williams wrote:
>>>> Huang, Kai wrote:
>>>> [..]
>>>>>> The unrolled loop is the same amount of work as maintaining @fields.
>>>>>
>>>>> Hi Dan,
>>>>>
>>>>> Thanks for the feedback.
>>>>>
>>>>> AFAICT Dave didn't like this way:
>>>>>
>>>>> https://lore.kernel.org/lkml/cover.1699527082.git.kai.huang@intel.com/T/#me6f615d7845215c278753b57a0bce1162960209d
>>>>
>>>> I agree with Dave that the original was unreadable. However, I also
>>>> think he glossed over the loss of type-safety and the silliness of
>>>> defining an array to precisely map fields only to turn around and do a
>>>> runtime check that the statically defined array was filled out
>>>> correctly. So I think lets solve the readability problem *and* make the
>>>> array definition identical in appearance to unrolled type-safe
>>>> execution, something like (UNTESTED!):
>>>>
>>>>
>>> [...]
>>>
>>>> +/*
>>>> + * Assumes locally defined @ret and @ts to convey the error code and the
>>>> + * 'struct tdx_tdmr_sysinfo' instance to fill out
>>>> + */
>>>> +#define TD_SYSINFO_MAP(_field_id, _offset)                              \
>>>> + ({                                                              \
>>>> +         if (ret == 0)                                           \
>>>> +                 ret = read_sys_metadata_field16(                \
>>>> +                         MD_FIELD_ID_##_field_id, &ts->_offset); \
>>>> + })
>>>> +
>>>
>>> We need to support u16/u32/u64 metadata field sizes, but not just u16.
>>>
>>> E.g.:
>>>
>>> struct tdx_sysinfo_module_info {
>>>         u32 sys_attributes;
>>>         u64 tdx_features0;
>>> };
>>>
>>> has both u32 and u64 in one structure.
>>>
>>> To achieve type-safety for all field sizes, I think we need one helper
>>> for each field size.  E.g.,
>>>
>>> #define READ_SYSMD_FIELD_FUNC(_size)                            \
>>> static inline int                                               \
>>> read_sys_metadata_field##_size(u64 field_id, u##_size *data)    \
>>> {                                                               \
>>>         u64 tmp;                                                \
>>>         int ret;                                                \
>>>                                                                 \
>>>         ret = read_sys_metadata_field(field_id, &tmp);          \
>>>         if (ret)                                                \
>>>                 return ret;                                     \
>>>                                                                 \
>>>         *data = tmp;                                            \
>>>         return 0;                                               \
>>> }
>>>
>>> /* For now only u16/u32/u64 are needed */
>>> READ_SYSMD_FIELD_FUNC(16)
>>> READ_SYSMD_FIELD_FUNC(32)
>>> READ_SYSMD_FIELD_FUNC(64)
>>>
>>> Is this what you were thinking?
>>>
>>> (Btw, I recall that I tried this before for internal review, but AFAICT
>>> Dave didn't like this.)
>>>
>>> For the build time check as you replied to the next patch, I agree it's
>>> better than the runtime warning check as done in the current code.
>>>
>>> If we still use the type-less 'void *stbuf' function to read metadata
>>> fields for all sizes, then I think we can do below:
>>>
>>> /*
>>>  * Read one global metadata field and store the data to a location of a
>>>  * given buffer specified by the offset and size (in bytes).
>>>  */
>>> static int stbuf_read_sysmd_field(u64 field_id, void *stbuf, int offset,
>>>                                   int size)
>>> {
>>>         void *member = stbuf + offset;
>>>         u64 tmp;
>>>         int ret;
>>>
>>>         ret = read_sys_metadata_field(field_id, &tmp);
>>>         if (ret)
>>>                 return ret;
>>>
>>>         memcpy(member, &tmp, size);
>>>
>>>         return 0;
>>> }
>>>
>>> /* Wrapper to read one metadata field to u8/u16/u32/u64 */
>>> #define stbuf_read_sysmd_single(_field_id, _pdata)      \
>>>         stbuf_read_sysmd_field(_field_id, _pdata, 0,        \
>>>             sizeof(typeof(*(_pdata))))
>>>
>>> #define CHECK_MD_FIELD_SIZE(_field_id, _st, _member)    \
>>>         BUILD_BUG_ON(MD_FIELD_ELE_SIZE(MD_FIELD_ID_##_field_id) != \
>>>                         sizeof(_st->_member))
>>>
>>> #define TD_SYSINFO_MAP_TEST(_field_id, _st, _member)                    \
>>>         ({                                                              \
>>>                 if (ret) {                                              \
>>>                         CHECK_MD_FIELD_SIZE(_field_id, _st, _member);   \
>>>                         ret = stbuf_read_sysmd_single(                  \
>>>                                         MD_FIELD_ID_##_field_id,        \
>>>                                         &_st->_member);                 \
>>>                 }                                                       \
>>>          })
>>>
>>> static int get_tdx_module_info(struct tdx_sysinfo_module_info *modinfo)
>>> {
>>>         int ret = 0;
>>>
>>> #define TD_SYSINFO_MAP_MOD_INFO(_field_id, _member)     \
>>>         TD_SYSINFO_MAP_TEST(_field_id, modinfo, _member)
>>>
>>>         TD_SYSINFO_MAP_MOD_INFO(SYS_ATTRIBUTES, sys_attributes);
>>>         TD_SYSINFO_MAP_MOD_INFO(TDX_FEATURES0,  tdx_features0);
>>>
>>>         return ret;
>>> }
>>>
>>> With the build time check above, I think it's OK to lose the type-safe
>>> inside the stbuf_read_sysmd_field(), and the code is simpler IMHO.
>>>
>>> Any comments?
>>
>> BUILD_BUG_ON() requires a function, but it is still
>> be possible to add a build time check in TD_SYSINFO_MAP
>> e.g.
>>
>> #define TD_SYSINFO_CHECK_SIZE(_field_id, _size)                       \
>>       __builtin_choose_expr(MD_FIELD_ELE_SIZE(_field_id) == _size, _size, (void)0)
>>
>> #define _TD_SYSINFO_MAP(_field_id, _offset, _size)            \
>>       { .field_id = _field_id,                                \
>>         .offset   = _offset,                                  \
>>         .size     = TD_SYSINFO_CHECK_SIZE(_field_id, _size) }
>>
>> #define TD_SYSINFO_MAP(_field_id, _struct, _member)           \
>>       _TD_SYSINFO_MAP(MD_FIELD_ID_##_field_id,                \
>>                       offsetof(_struct, _member),             \
>>                       sizeof(typeof(((_struct *)0)->_member)))
>>
>>
> 
> Thanks for the comment, but I don't think this meets for our purpose.
> 
> We want a build time "error" when the "MD_FIELD_ELE_SIZE(_field_id) == _size"
> fails, but not "still initializing the size to 0".

FWIW, it isn't 0, it is void.  Assignment to void is an error.  Could use
anything that is correct syntax but would produce a compile-time error
e.g. (1 / 0).

> Otherwise, we might get
> some unexpected issue (due to size is 0) at runtime, which is worse IMHO than
> a runtime check as done in the current upstream code.
> 
> I have been trying to add a BUILD_BUG_ON() to the field_mapping structure
> initializer, but I haven't found a reliable way to do so.
> 
> For now I have completed the new version based on Dan's suggestion, but still
> need to work on changelog/coverletter etc, so I think I can send the new
> version out and see whether people like it.  We can revert back if that's not
> what people want.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 02/10] x86/virt/tdx: Unbind global metadata read with 'struct tdx_tdmr_sysinfo'
  2024-08-27  4:54               ` Adrian Hunter
@ 2024-08-27  7:22                 ` Huang, Kai
  0 siblings, 0 replies; 44+ messages in thread
From: Huang, Kai @ 2024-08-27  7:22 UTC (permalink / raw)
  To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
	peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
	Hunter, Adrian, Williams, Dan J, pbonzini@redhat.com,
	kirill.shutemov@linux.intel.com, tglx@linutronix.de
  Cc: Gao, Chao, kvm@vger.kernel.org, binbin.wu@linux.intel.com,
	linux-kernel@vger.kernel.org, Edgecombe, Rick P, Yamahata, Isaku,
	x86@kernel.org

On Tue, 2024-08-27 at 07:54 +0300, Adrian Hunter wrote:
> > > 
> > > BUILD_BUG_ON() requires a function, but it is still
> > > be possible to add a build time check in TD_SYSINFO_MAP
> > > e.g.
> > > 
> > > #define TD_SYSINFO_CHECK_SIZE(_field_id, _size)                       \
> > >        __builtin_choose_expr(MD_FIELD_ELE_SIZE(_field_id) == _size, _size, (void)0)
> > > 
> > > #define _TD_SYSINFO_MAP(_field_id, _offset, _size)            \
> > >        { .field_id = _field_id,                                \
> > >          .offset   = _offset,                                  \
> > >          .size     = TD_SYSINFO_CHECK_SIZE(_field_id, _size) }
> > > 
> > > #define TD_SYSINFO_MAP(_field_id, _struct, _member)           \
> > >        _TD_SYSINFO_MAP(MD_FIELD_ID_##_field_id,                \
> > >                        offsetof(_struct, _member),             \
> > >                        sizeof(typeof(((_struct *)0)->_member)))
> > > 
> > > 
> > 
> > Thanks for the comment, but I don't think this meets for our purpose.
> > 
> > We want a build time "error" when the "MD_FIELD_ELE_SIZE(_field_id) == _size"
> > fails, but not "still initializing the size to 0".
> 
> FWIW, it isn't 0, it is void.  Assignment to void is an error.  Could use
> anything that is correct syntax but would produce a compile-time error
> e.g. (1 / 0).

Ah I missed the '(void)'.  I didn't thought this way (and yet to try out). 
Thanks for the insight.

I already sent out the v3 based on Dan's suggestion.  Besides the pros
mentioned by Dan, I also found Dan's suggestion yields less LoC of the final
tdx.c despite it is trivial.  So let's continue on the v3.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 09/10] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes
  2024-08-20 18:38   ` Adrian Hunter
@ 2024-08-27  7:24     ` Huang, Kai
  0 siblings, 0 replies; 44+ messages in thread
From: Huang, Kai @ 2024-08-27  7:24 UTC (permalink / raw)
  To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
	peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
	kirill.shutemov@linux.intel.com, tglx@linutronix.de,
	pbonzini@redhat.com, Hunter, Adrian, Williams, Dan J
  Cc: Gao, Chao, kvm@vger.kernel.org, binbin.wu@linux.intel.com,
	linux-kernel@vger.kernel.org, Edgecombe, Rick P, x86@kernel.org,
	Yamahata, Isaku

On Tue, 2024-08-20 at 21:38 +0300, Adrian Hunter wrote:
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> > index 861ddf2c2e88..4b43eb774ffa 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.h
> > +++ b/arch/x86/virt/vmx/tdx/tdx.h
> > @@ -40,6 +40,10 @@
> >  #define MD_FIELD_ID_UPDATE_VERSION		0x0800000100000005ULL
> >  #define MD_FIELD_ID_INTERNAL_VERSION		0x0800000100000006ULL
> >  
> > 
> > 
> > 
> > +#define MD_FIELD_ID_NUM_CMRS			0x9000000100000000ULL
> > +#define MD_FIELD_ID_CMR_BASE0			0x9000000300000080ULL
> > +#define MD_FIELD_ID_CMR_SIZE0			0x9000000300000100ULL
> 
> For scripted checking against "global_metadata.json" it might be better
> to stick to the same field names e.g.
> 
> 	MD_FIELD_ID_CMR_BASE0 -> MD_FIELD_ID_CMR_BASE
> 
> 

I ended up with

  #define MD_FIELD_ID_CMR_BASE(_i)	(0x9000000300000080ULL + (u16)_i)
  #define MD_FIELD_ID_CMR_SIZE(_i)	(0x9000000300000100ULL + (u16)_i)

.. due to they are arrays anyway.

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2024-08-27  7:24 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-17  3:40 [PATCH v2 00/10] TDX host: metadata reading tweaks, bug fix and info dump Kai Huang
2024-07-17  3:40 ` [PATCH v2 01/10] x86/virt/tdx: Rename _offset to _member for TD_SYSINFO_MAP() macro Kai Huang
2024-08-05 22:37   ` Dan Williams
2024-07-17  3:40 ` [PATCH v2 02/10] x86/virt/tdx: Unbind global metadata read with 'struct tdx_tdmr_sysinfo' Kai Huang
2024-08-05 23:32   ` Dan Williams
2024-08-06  0:09     ` Huang, Kai
2024-08-06  1:13       ` Dan Williams
2024-08-07 12:09         ` Huang, Kai
2024-08-26 15:38           ` Adrian Hunter
2024-08-26 22:40             ` Huang, Kai
2024-08-27  4:54               ` Adrian Hunter
2024-08-27  7:22                 ` Huang, Kai
2024-07-17  3:40 ` [PATCH v2 03/10] x86/virt/tdx: Support global metadata read for all element sizes Kai Huang
2024-08-05 23:45   ` Dan Williams
2024-07-17  3:40 ` [PATCH v2 04/10] x86/virt/tdx: Abstract reading multiple global metadata fields as a helper Kai Huang
2024-07-17  3:40 ` [PATCH v2 05/10] x86/virt/tdx: Move field mapping table of getting TDMR info to function local Kai Huang
2024-08-05 23:48   ` Dan Williams
2024-07-17  3:40 ` [PATCH v2 06/10] x86/virt/tdx: Refine a comment to reflect the latest TDX spec Kai Huang
2024-08-06  3:43   ` Dan Williams
2024-08-06 11:23     ` Huang, Kai
2024-08-06 19:06       ` Dan Williams
2024-08-06 21:01         ` Huang, Kai
2024-07-17  3:40 ` [PATCH v2 07/10] x86/virt/tdx: Start to track all global metadata in one structure Kai Huang
2024-08-06  3:51   ` Dan Williams
2024-08-06 11:29     ` Huang, Kai
2024-07-17  3:40 ` [PATCH v2 08/10] x86/virt/tdx: Print TDX module basic information Kai Huang
2024-08-06  4:19   ` Dan Williams
2024-08-06 11:51     ` Huang, Kai
2024-08-06 12:48       ` Huang, Kai
2024-08-07 21:56       ` Dan Williams
2024-08-07 22:32         ` Huang, Kai
2024-08-08 10:31   ` Chenyi Qiang
2024-08-08 23:52     ` Huang, Kai
2024-07-17  3:40 ` [PATCH v2 09/10] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes Kai Huang
2024-08-06  4:47   ` Dan Williams
2024-08-06 12:17     ` Huang, Kai
2024-08-20 18:38   ` Adrian Hunter
2024-08-27  7:24     ` Huang, Kai
2024-07-17  3:40 ` [PATCH v2 10/10] x86/virt/tdx: Don't initialize module that doesn't support NO_RBP_MOD feature Kai Huang
2024-08-06  4:55   ` Dan Williams
2024-08-06 12:18     ` Huang, Kai
2024-08-05 12:03 ` [PATCH v2 00/10] TDX host: metadata reading tweaks, bug fix and info dump Huang, Kai
2024-08-05 22:36 ` Dan Williams
2024-08-08  0:19   ` Huang, Kai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox