* [PATCH v3 0/8] TDX host: metadata reading tweaks, bug fix and info dump
@ 2024-08-27 7:14 Kai Huang
2024-08-27 7:14 ` [PATCH v3 1/8] x86/virt/tdx: Rename 'struct tdx_tdmr_sysinfo' to reflect the spec better Kai Huang
` (7 more replies)
0 siblings, 8 replies; 42+ messages in thread
From: Kai Huang @ 2024-08-27 7:14 UTC (permalink / raw)
To: dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo, hpa,
dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu, adrian.hunter, kai.huang
TL;DR:
This series does necessary tweaks to TDX host "global metadata" reading
code to fix some immediate issues in the TDX module initialization code,
with intention to also provide a flexible code base to support sharing
global metadata to KVM (and other kernel components) for future needs.
This series, and additional patches to initialize TDX when loading KVM
module and read essential metadata fields for KVM TDX can be found at
[1].
Dear maintainers,
This series targets x86 tip. Also add Dan, KVM maintainers and KVM list
so people can also review and comment.
This is a pre-work of the "quite near future" KVM TDX support (see the
kvm-coco-queue branch [2], which already includes all the patches in the
v2 of this series). I appreciate if you can review, comment and give
tag if the patches look good to you.
v2 -> v3 (address comments from Dan):
- Replace the first couple of "metadata reading tweaks" patches with
two new patches using a different approach (removin the 'struct
field_mapping' and reimplement the TD_SYSINFO_MAP()):
https://lore.kernel.org/kvm/a107b067-861d-43f4-86b5-29271cb93dad@intel.com/T/#m7cfb3c146214d94b24e978eeb8708d92c0b14ac6
https://lore.kernel.org/kvm/a107b067-861d-43f4-86b5-29271cb93dad@intel.com/T/#mbe65f0903ff7835bc418a907f0d02d7a9e0b78be
https://lore.kernel.org/kvm/a107b067-861d-43f4-86b5-29271cb93dad@intel.com/T/#m80cde5e6504b3af74d933ea0cbfc3ca9d24697d3
- Split out the renaming of 'struct tdx_tdmr_sysinfo' as a separate
patch and place it at the beginning of this series.
https://lore.kernel.org/kvm/cover.1721186590.git.kai.huang@intel.com/T/#m8fec7c429242d640cf5e756eb68e3b822e6dff8b
- Refine this cover letter ("More info" section)
https://lore.kernel.org/kvm/cover.1721186590.git.kai.huang@intel.com/T/#m11868c9f486dcc4cfbbb690c7c18dfa4570e433f
- Address other comments. See changelog of individual patches.
v2: https://lore.kernel.org/kvm/cover.1721186590.git.kai.huang@intel.com/T/
v1 -> v2:
- Fix comments from Chao and Nikolay.
- A new patch to refine an out-dated comment by Nikolay.
- Collect tags from Nikolay (thanks!).
v1: https://lore.kernel.org/linux-kernel/cover.1718538552.git.kai.huang@intel.com/T/
=== More info ===
TDX module provides a set of "global metadata fields" for software to
query. They report things like TDX module version, supported features
fields required for creating TDX guests and so on.
Today the TDX host code already reads "TD Memory Region" (TDMR) related
metadata fields for module initialization. There are immediate needs
that require TDX host code to read more metadata fields:
- Dump basic TDX module info [3];
- Reject module with no NO_RBP_MOD feature support [4];
- Read CMR info to fix a module initialization failure bug [5].
Also, the "quite near future" KVM TDX support requires to read more
global metadata fields. In the longer term, the TDX Connect [6] (which
supports assigning trusted IO devices to TDX guest) may also require
other kernel components (e.g., pci/vt-d) to access more metadata.
To meet all of those, the idea is the TDX host core-kernel to provide a
centralized, canonical, and read-only structure to contain all global
metadata that comes out of TDX module for all kernel components to use.
An alternative way is to expose metadata reading API(s) for in-kernel
TDX users to use, but the reasons of choosing to provide a centural
structure are, quoted from Dan:
The idea that x86 gets to review growth to this structure over time is
an asset for maintainability and oversight of what is happening in the
downstream consumers like KVM and TSM (for TDX Connect).
A dynamic retrieval API removes that natural auditing of data structure
patches from tip.git.
Yes, it requires more touches than letting use cases consume new
metadata fields at will, but that's net positive for maintainence of
the kernel and the feedback loop to the TDX module.
This series starts to track all global metadata fields into a single
'struct tdx_sys_info', and reads more metadata fields to that structure
to address the immediate needs as mentioned above.
More fields will be added to support KVM TDX. For the initial support
all metadata fields are populated in this single structure and shared to
KVM via a 'const pointer' to that structure (see last patches in [1]).
[1] https://github.com/intel/tdx/commits/kvm-tdxinit-host-metadata-v3/
[2] https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=kvm-coco-queue
[3] https://lore.kernel.org/lkml/e2d844ad-182a-4fc0-a06a-d609c9cbef74@suse.com/T/#m352829aedf6680d4628c7e40dc40b332eda93355
[4] https://lore.kernel.org/lkml/e2d844ad-182a-4fc0-a06a-d609c9cbef74@suse.com/T/#mef98469c51e2382ead2c537ea189752360bd2bef
[5] https://github.com/canonical/tdx/issues/135#issuecomment-2151570238
[6] https://cdrdv2.intel.com/v1/dl/getContent/773614
Kai Huang (8):
x86/virt/tdx: Rename 'struct tdx_tdmr_sysinfo' to reflect the spec
better
x86/virt/tdx: Remove 'struct field_mapping' and implement
TD_SYSINFO_MAP() macro
x86/virt/tdx: Prepare to support reading other global metadata fields
x86/virt/tdx: Refine a comment to reflect the latest TDX spec
x86/virt/tdx: Start to track all global metadata in one structure
x86/virt/tdx: Print TDX module basic information
x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find
memory holes
x86/virt/tdx: Don't initialize module that doesn't support NO_RBP_MOD
feature
arch/x86/virt/vmx/tdx/tdx.c | 276 +++++++++++++++++++++++++++---------
arch/x86/virt/vmx/tdx/tdx.h | 89 ++++++++++--
2 files changed, 292 insertions(+), 73 deletions(-)
base-commit: e77f8f275278886d05ce6dfe9e3bc854e7bf0713
--
2.46.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH v3 1/8] x86/virt/tdx: Rename 'struct tdx_tdmr_sysinfo' to reflect the spec better
2024-08-27 7:14 [PATCH v3 0/8] TDX host: metadata reading tweaks, bug fix and info dump Kai Huang
@ 2024-08-27 7:14 ` Kai Huang
2024-08-27 13:10 ` Adrian Hunter
2024-08-30 22:05 ` Dan Williams
2024-08-27 7:14 ` [PATCH v3 2/8] x86/virt/tdx: Remove 'struct field_mapping' and implement TD_SYSINFO_MAP() macro Kai Huang
` (6 subsequent siblings)
7 siblings, 2 replies; 42+ messages in thread
From: Kai Huang @ 2024-08-27 7:14 UTC (permalink / raw)
To: dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo, hpa,
dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu, adrian.hunter, kai.huang
The TDX module provides a set of "global metadata fields". They report
things like TDX module version, supported features, and fields related
to create/run TDX guests and so on.
TDX organizes those metadata fields by "Class"es based on the meaning of
those fields. E.g., for now the kernel only reads "TD Memory Region"
(TDMR) related fields for module initialization. Those fields are
defined under class "TDMR Info".
There are both immediate needs to read more metadata fields for module
initialization and near-future needs for other kernel components like
KVM to run TDX guests. To meet all those requirements, the idea is the
TDX host core-kernel to provide a centralized, canonical, and read-only
structure for the global metadata that comes out from the TDX module for
all kernel components to use.
More specifically, the target is to end up with something like:
struct tdx_sys_info {
struct tdx_sys_info_classA a;
struct tdx_sys_info_classB b;
...
};
Currently the kernel organizes all fields under "TDMR Info" class in
'struct tdx_tdmr_sysinfo'. To prepare for the above target, rename the
structure to 'struct tdx_sys_info_tdmr' to follow the class name better.
No functional change intended.
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
v2 -> v3:
- Split out as a separate patch and place it at beginning:
https://lore.kernel.org/kvm/cover.1721186590.git.kai.huang@intel.com/T/#m8fec7c429242d640cf5e756eb68e3b822e6dff8b
- Rename to 'struct tdx_sys_info_tdmr':
https://lore.kernel.org/kvm/cover.1721186590.git.kai.huang@intel.com/T/#md73dd9b02a492acf4a6facae63e8d030e320967d
https://lore.kernel.org/kvm/cover.1721186590.git.kai.huang@intel.com/T/#m8fec7c429242d640cf5e756eb68e3b822e6dff8b
---
arch/x86/virt/vmx/tdx/tdx.c | 36 ++++++++++++++++++------------------
arch/x86/virt/vmx/tdx/tdx.h | 2 +-
2 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 4e2b2e2ac9f9..e979bf442929 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -272,7 +272,7 @@ static int read_sys_metadata_field(u64 field_id, u64 *data)
static int read_sys_metadata_field16(u64 field_id,
int offset,
- struct tdx_tdmr_sysinfo *ts)
+ struct tdx_sys_info_tdmr *ts)
{
u16 *ts_member = ((void *)ts) + offset;
u64 tmp;
@@ -298,9 +298,9 @@ struct field_mapping {
#define TD_SYSINFO_MAP(_field_id, _offset) \
{ .field_id = MD_FIELD_ID_##_field_id, \
- .offset = offsetof(struct tdx_tdmr_sysinfo, _offset) }
+ .offset = offsetof(struct tdx_sys_info_tdmr, _offset) }
-/* Map TD_SYSINFO fields into 'struct tdx_tdmr_sysinfo': */
+/* Map TD_SYSINFO fields into 'struct tdx_sys_info_tdmr': */
static const struct field_mapping fields[] = {
TD_SYSINFO_MAP(MAX_TDMRS, max_tdmrs),
TD_SYSINFO_MAP(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr),
@@ -309,16 +309,16 @@ static const struct field_mapping fields[] = {
TD_SYSINFO_MAP(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]),
};
-static int get_tdx_tdmr_sysinfo(struct tdx_tdmr_sysinfo *tdmr_sysinfo)
+static int get_tdx_sys_info_tdmr(struct tdx_sys_info_tdmr *sysinfo_tdmr)
{
int ret;
int i;
- /* Populate 'tdmr_sysinfo' fields using the mapping structure above: */
+ /* Populate 'sysinfo_tdmr' fields using the mapping structure above: */
for (i = 0; i < ARRAY_SIZE(fields); i++) {
ret = read_sys_metadata_field16(fields[i].field_id,
fields[i].offset,
- tdmr_sysinfo);
+ sysinfo_tdmr);
if (ret)
return ret;
}
@@ -342,13 +342,13 @@ static int tdmr_size_single(u16 max_reserved_per_tdmr)
}
static int alloc_tdmr_list(struct tdmr_info_list *tdmr_list,
- struct tdx_tdmr_sysinfo *tdmr_sysinfo)
+ struct tdx_sys_info_tdmr *sysinfo_tdmr)
{
size_t tdmr_sz, tdmr_array_sz;
void *tdmr_array;
- tdmr_sz = tdmr_size_single(tdmr_sysinfo->max_reserved_per_tdmr);
- tdmr_array_sz = tdmr_sz * tdmr_sysinfo->max_tdmrs;
+ tdmr_sz = tdmr_size_single(sysinfo_tdmr->max_reserved_per_tdmr);
+ tdmr_array_sz = tdmr_sz * sysinfo_tdmr->max_tdmrs;
/*
* To keep things simple, allocate all TDMRs together.
@@ -367,7 +367,7 @@ static int alloc_tdmr_list(struct tdmr_info_list *tdmr_list,
* at a given index in the TDMR list.
*/
tdmr_list->tdmr_sz = tdmr_sz;
- tdmr_list->max_tdmrs = tdmr_sysinfo->max_tdmrs;
+ tdmr_list->max_tdmrs = sysinfo_tdmr->max_tdmrs;
tdmr_list->nr_consumed_tdmrs = 0;
return 0;
@@ -921,11 +921,11 @@ static int tdmrs_populate_rsvd_areas_all(struct tdmr_info_list *tdmr_list,
/*
* Construct a list of TDMRs on the preallocated space in @tdmr_list
* to cover all TDX memory regions in @tmb_list based on the TDX module
- * TDMR global information in @tdmr_sysinfo.
+ * TDMR global information in @sysinfo_tdmr.
*/
static int construct_tdmrs(struct list_head *tmb_list,
struct tdmr_info_list *tdmr_list,
- struct tdx_tdmr_sysinfo *tdmr_sysinfo)
+ struct tdx_sys_info_tdmr *sysinfo_tdmr)
{
int ret;
@@ -934,12 +934,12 @@ static int construct_tdmrs(struct list_head *tmb_list,
return ret;
ret = tdmrs_set_up_pamt_all(tdmr_list, tmb_list,
- tdmr_sysinfo->pamt_entry_size);
+ sysinfo_tdmr->pamt_entry_size);
if (ret)
return ret;
ret = tdmrs_populate_rsvd_areas_all(tdmr_list, tmb_list,
- tdmr_sysinfo->max_reserved_per_tdmr);
+ sysinfo_tdmr->max_reserved_per_tdmr);
if (ret)
tdmrs_free_pamt_all(tdmr_list);
@@ -1098,7 +1098,7 @@ static int init_tdmrs(struct tdmr_info_list *tdmr_list)
static int init_tdx_module(void)
{
- struct tdx_tdmr_sysinfo tdmr_sysinfo;
+ struct tdx_sys_info_tdmr sysinfo_tdmr;
int ret;
/*
@@ -1117,17 +1117,17 @@ static int init_tdx_module(void)
if (ret)
goto out_put_tdxmem;
- ret = get_tdx_tdmr_sysinfo(&tdmr_sysinfo);
+ ret = get_tdx_sys_info_tdmr(&sysinfo_tdmr);
if (ret)
goto err_free_tdxmem;
/* Allocate enough space for constructing TDMRs */
- ret = alloc_tdmr_list(&tdx_tdmr_list, &tdmr_sysinfo);
+ ret = alloc_tdmr_list(&tdx_tdmr_list, &sysinfo_tdmr);
if (ret)
goto err_free_tdxmem;
/* Cover all TDX-usable memory regions in TDMRs */
- ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &tdmr_sysinfo);
+ ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &sysinfo_tdmr);
if (ret)
goto err_free_tdmrs;
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index b701f69485d3..148f9b4d1140 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -100,7 +100,7 @@ struct tdx_memblock {
};
/* "TDMR info" part of "Global Scope Metadata" for constructing TDMRs */
-struct tdx_tdmr_sysinfo {
+struct tdx_sys_info_tdmr {
u16 max_tdmrs;
u16 max_reserved_per_tdmr;
u16 pamt_entry_size[TDX_PS_NR];
--
2.46.0
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v3 2/8] x86/virt/tdx: Remove 'struct field_mapping' and implement TD_SYSINFO_MAP() macro
2024-08-27 7:14 [PATCH v3 0/8] TDX host: metadata reading tweaks, bug fix and info dump Kai Huang
2024-08-27 7:14 ` [PATCH v3 1/8] x86/virt/tdx: Rename 'struct tdx_tdmr_sysinfo' to reflect the spec better Kai Huang
@ 2024-08-27 7:14 ` Kai Huang
2024-08-29 7:20 ` Adrian Hunter
2024-09-06 20:21 ` Dan Williams
2024-08-27 7:14 ` [PATCH v3 3/8] x86/virt/tdx: Prepare to support reading other global metadata fields Kai Huang
` (5 subsequent siblings)
7 siblings, 2 replies; 42+ messages in thread
From: Kai Huang @ 2024-08-27 7:14 UTC (permalink / raw)
To: dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo, hpa,
dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu, adrian.hunter, kai.huang
TL;DR: Remove the 'struct field_mapping' structure and use another way
to implement the TD_SYSINFO_MAP() macro to improve the current metadata
reading code, e.g., switching to use build-time check for metadata field
size over the existing runtime check.
The TDX module provides a set of "global metadata fields". They report
things like TDX module version, supported features, and fields related
to create/run TDX guests and so on.
For now the kernel only reads "TD Memory Region" (TDMR) related global
metadata fields, and all of those fields are organized in one structure.
The kernel currently uses 'struct field_mapping' to facilitate reading
multiple metadata fields into one structure. The 'struct field_mapping'
records the mapping between the field ID and the offset of the structure
to fill out. The kernel initializes an array of 'struct field_mapping'
for each structure member (using the TD_SYSINFO_MAP() macro) and then
reads all metadata fields in a loop using that array.
Currently the kernel only reads TDMR related metadata fields into one
structure, and the function to read one metadata field takes the pointer
of that structure and the offset:
static int read_sys_metadata_field16(u64 field_id,
int offset,
struct tdx_sys_info_tdmr *ts)
{...}
Future changes will need to read more metadata fields into different
structures. To support this the above function will need to be changed
to take 'void *':
static int read_sys_metadata_field16(u64 field_id,
int offset,
void *stbuf)
{...}
This approach loses type-safety, as Dan suggested. A better way is to
make it be ..
static int read_sys_metadata_field16(u64 field_id, u16 *val) {...}
.. and let the caller calculate the buffer in a type-safe way [1].
Also, the using of the 'struct field_mapping' loses the ability to be
able to do build-time check about the metadata field size (encoded in
the field ID) due to the field ID is "indirectly" initialized to the
'struct field_mapping' and passed to the function to read. Thus the
current code uses runtime check instead.
Dan suggested to remove the 'struct field_mapping', unroll the loop,
skip the array, and call the read_sys_metadata_field16() directly with
build-time check [1][2]. And to improve the readability, reimplement
the TD_SYSINFO_MAP() [3].
The new TD_SYSINFO_MAP() isn't perfect. It requires the function that
uses it to define a local variable @ret to carry the error code and set
the initial value to 0. It also hard-codes the variable name of the
structure pointer used in the function. But overall the pros of this
approach beat the cons.
Link: https://lore.kernel.org/kvm/a107b067-861d-43f4-86b5-29271cb93dad@intel.com/T/#m7cfb3c146214d94b24e978eeb8708d92c0b14ac6 [1]
Link: https://lore.kernel.org/kvm/a107b067-861d-43f4-86b5-29271cb93dad@intel.com/T/#mbe65f0903ff7835bc418a907f0d02d7a9e0b78be [2]
Link: https://lore.kernel.org/kvm/a107b067-861d-43f4-86b5-29271cb93dad@intel.com/T/#m80cde5e6504b3af74d933ea0cbfc3ca9d24697d3 [3]
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
v2 -> v3:
- Remove 'struct field_mapping' and reimplement TD_SYSINFO_MAP().
---
arch/x86/virt/vmx/tdx/tdx.c | 57 ++++++++++++++-----------------------
1 file changed, 21 insertions(+), 36 deletions(-)
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index e979bf442929..7e75c1b10838 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -270,60 +270,45 @@ static int read_sys_metadata_field(u64 field_id, u64 *data)
return 0;
}
-static int read_sys_metadata_field16(u64 field_id,
- int offset,
- struct tdx_sys_info_tdmr *ts)
+static int read_sys_metadata_field16(u64 field_id, u16 *val)
{
- u16 *ts_member = ((void *)ts) + offset;
u64 tmp;
int ret;
- if (WARN_ON_ONCE(MD_FIELD_ID_ELE_SIZE_CODE(field_id) !=
- MD_FIELD_ID_ELE_SIZE_16BIT))
- return -EINVAL;
+ BUILD_BUG_ON(MD_FIELD_ID_ELE_SIZE_CODE(field_id) !=
+ MD_FIELD_ID_ELE_SIZE_16BIT);
ret = read_sys_metadata_field(field_id, &tmp);
if (ret)
return ret;
- *ts_member = tmp;
+ *val = tmp;
return 0;
}
-struct field_mapping {
- u64 field_id;
- int offset;
-};
-
-#define TD_SYSINFO_MAP(_field_id, _offset) \
- { .field_id = MD_FIELD_ID_##_field_id, \
- .offset = offsetof(struct tdx_sys_info_tdmr, _offset) }
-
-/* Map TD_SYSINFO fields into 'struct tdx_sys_info_tdmr': */
-static const struct field_mapping fields[] = {
- TD_SYSINFO_MAP(MAX_TDMRS, max_tdmrs),
- TD_SYSINFO_MAP(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr),
- TD_SYSINFO_MAP(PAMT_4K_ENTRY_SIZE, pamt_entry_size[TDX_PS_4K]),
- TD_SYSINFO_MAP(PAMT_2M_ENTRY_SIZE, pamt_entry_size[TDX_PS_2M]),
- TD_SYSINFO_MAP(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]),
-};
+/*
+ * Assumes locally defined @ret and @sysinfo_tdmr to convey the error
+ * code and the 'struct tdx_sys_info_tdmr' instance to fill out.
+ */
+#define TD_SYSINFO_MAP(_field_id, _member) \
+ ({ \
+ if (!ret) \
+ ret = read_sys_metadata_field16(MD_FIELD_ID_##_field_id, \
+ &sysinfo_tdmr->_member); \
+ })
static int get_tdx_sys_info_tdmr(struct tdx_sys_info_tdmr *sysinfo_tdmr)
{
- int ret;
- int i;
+ int ret = 0;
- /* Populate 'sysinfo_tdmr' fields using the mapping structure above: */
- for (i = 0; i < ARRAY_SIZE(fields); i++) {
- ret = read_sys_metadata_field16(fields[i].field_id,
- fields[i].offset,
- sysinfo_tdmr);
- if (ret)
- return ret;
- }
+ TD_SYSINFO_MAP(MAX_TDMRS, max_tdmrs);
+ TD_SYSINFO_MAP(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr);
+ TD_SYSINFO_MAP(PAMT_4K_ENTRY_SIZE, pamt_entry_size[TDX_PS_4K]);
+ TD_SYSINFO_MAP(PAMT_2M_ENTRY_SIZE, pamt_entry_size[TDX_PS_2M]);
+ TD_SYSINFO_MAP(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]);
- return 0;
+ return ret;
}
/* Calculate the actual TDMR size */
--
2.46.0
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v3 3/8] x86/virt/tdx: Prepare to support reading other global metadata fields
2024-08-27 7:14 [PATCH v3 0/8] TDX host: metadata reading tweaks, bug fix and info dump Kai Huang
2024-08-27 7:14 ` [PATCH v3 1/8] x86/virt/tdx: Rename 'struct tdx_tdmr_sysinfo' to reflect the spec better Kai Huang
2024-08-27 7:14 ` [PATCH v3 2/8] x86/virt/tdx: Remove 'struct field_mapping' and implement TD_SYSINFO_MAP() macro Kai Huang
@ 2024-08-27 7:14 ` Kai Huang
2024-08-30 6:43 ` Adrian Hunter
` (2 more replies)
2024-08-27 7:14 ` [PATCH v3 4/8] x86/virt/tdx: Refine a comment to reflect the latest TDX spec Kai Huang
` (4 subsequent siblings)
7 siblings, 3 replies; 42+ messages in thread
From: Kai Huang @ 2024-08-27 7:14 UTC (permalink / raw)
To: dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo, hpa,
dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu, adrian.hunter, kai.huang
The TDX module provides a set of "global metadata fields". They report
things like TDX module version, supported features, and fields related
to create/run TDX guests and so on.
For now the kernel only reads "TD Memory Region" (TDMR) related fields
for module initialization. There are both immediate needs to read more
fields for module initialization and near-future needs for other kernel
components like KVM to run TDX guests.
will be organized in different structures depending on their meanings.
For now the kernel only reads TDMR related fields. The TD_SYSINFO_MAP()
macro hard-codes the 'struct tdx_sys_info_tdmr' instance name. To make
it work with different instances of different structures, extend it to
take the structure instance name as an argument.
This also means the current code which uses TD_SYSINFO_MAP() must type
'struct tdx_sys_info_tdmr' instance name explicitly for each use. To
make the code easier to read, add a wrapper TD_SYSINFO_MAP_TDMR_INFO()
which hides the instance name.
TDX also support 8/16/32/64 bits metadata field element sizes. For now
all TDMR related fields are 16-bit long thus the kernel only has one
helper:
static int read_sys_metadata_field16(u64 field_id, u16 *val) {}
Future changes will need to read more metadata fields with different
element sizes. To make the code short, extend the helper to take a
'void *' buffer and the buffer size so it can work with all element
sizes.
Note in this way the helper loses the type safety and the build-time
check inside the helper cannot work anymore because the compiler cannot
determine the exact value of the buffer size.
To resolve those, add a wrapper of the helper which only works with
u8/u16/u32/u64 directly and do build-time check, where the compiler
can easily know both the element size (from field ID) and the buffer
size(using sizeof()), before calling the helper.
An alternative option is to provide one helper for each element size:
static int read_sys_metadata_field8(u64 field_id, u8 *val) {}
static int read_sys_metadata_field16(u64 field_id, u16 *val) {}
...
But this will result in duplicated code given those helpers will look
exactly the same except for the type of buffer pointer. It's feasible
to define a macro for the body of the helper and define one entry for
each element size to reduce the code, but it is a little bit
over-engineering.
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
v2 -> v3:
- Rename read_sys_metadata_field() to tdh_sys_rd() so the former can be
used as the high level wrapper. Get rid of "stbuf_" prefix since
people don't like it.
- Rewrite after removing 'struct field_mapping' and reimplementing
TD_SYSINFO_MAP().
---
arch/x86/virt/vmx/tdx/tdx.c | 45 +++++++++++++++++++++----------------
arch/x86/virt/vmx/tdx/tdx.h | 3 ++-
2 files changed, 28 insertions(+), 20 deletions(-)
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 7e75c1b10838..1cd9035c783f 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -250,7 +250,7 @@ static int build_tdx_memlist(struct list_head *tmb_list)
return ret;
}
-static int read_sys_metadata_field(u64 field_id, u64 *data)
+static int tdh_sys_rd(u64 field_id, u64 *data)
{
struct tdx_module_args args = {};
int ret;
@@ -270,43 +270,50 @@ static int read_sys_metadata_field(u64 field_id, u64 *data)
return 0;
}
-static int read_sys_metadata_field16(u64 field_id, u16 *val)
+static int __read_sys_metadata_field(u64 field_id, void *val, int size)
{
u64 tmp;
int ret;
- BUILD_BUG_ON(MD_FIELD_ID_ELE_SIZE_CODE(field_id) !=
- MD_FIELD_ID_ELE_SIZE_16BIT);
-
- ret = read_sys_metadata_field(field_id, &tmp);
+ ret = tdh_sys_rd(field_id, &tmp);
if (ret)
return ret;
- *val = tmp;
+ memcpy(val, &tmp, size);
return 0;
}
+/* Wrapper to read one global metadata to u8/u16/u32/u64 */
+#define read_sys_metadata_field(_field_id, _val) \
+ ({ \
+ BUILD_BUG_ON(MD_FIELD_ELE_SIZE(_field_id) != sizeof(typeof(*(_val)))); \
+ __read_sys_metadata_field(_field_id, _val, sizeof(typeof(*(_val)))); \
+ })
+
/*
- * Assumes locally defined @ret and @sysinfo_tdmr to convey the error
- * code and the 'struct tdx_sys_info_tdmr' instance to fill out.
+ * Read one global metadata field to a member of a structure instance,
+ * assuming locally defined @ret to convey the error code.
*/
-#define TD_SYSINFO_MAP(_field_id, _member) \
- ({ \
- if (!ret) \
- ret = read_sys_metadata_field16(MD_FIELD_ID_##_field_id, \
- &sysinfo_tdmr->_member); \
+#define TD_SYSINFO_MAP(_field_id, _stbuf, _member) \
+ ({ \
+ if (!ret) \
+ ret = read_sys_metadata_field(MD_FIELD_ID_##_field_id, \
+ &_stbuf->_member); \
})
static int get_tdx_sys_info_tdmr(struct tdx_sys_info_tdmr *sysinfo_tdmr)
{
int ret = 0;
- TD_SYSINFO_MAP(MAX_TDMRS, max_tdmrs);
- TD_SYSINFO_MAP(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr);
- TD_SYSINFO_MAP(PAMT_4K_ENTRY_SIZE, pamt_entry_size[TDX_PS_4K]);
- TD_SYSINFO_MAP(PAMT_2M_ENTRY_SIZE, pamt_entry_size[TDX_PS_2M]);
- TD_SYSINFO_MAP(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]);
+#define TD_SYSINFO_MAP_TDMR_INFO(_field_id, _member) \
+ TD_SYSINFO_MAP(_field_id, sysinfo_tdmr, _member)
+
+ TD_SYSINFO_MAP_TDMR_INFO(MAX_TDMRS, max_tdmrs);
+ TD_SYSINFO_MAP_TDMR_INFO(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr);
+ TD_SYSINFO_MAP_TDMR_INFO(PAMT_4K_ENTRY_SIZE, pamt_entry_size[TDX_PS_4K]);
+ TD_SYSINFO_MAP_TDMR_INFO(PAMT_2M_ENTRY_SIZE, pamt_entry_size[TDX_PS_2M]);
+ TD_SYSINFO_MAP_TDMR_INFO(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]);
return ret;
}
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 148f9b4d1140..7458f6717873 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -53,7 +53,8 @@
#define MD_FIELD_ID_ELE_SIZE_CODE(_field_id) \
(((_field_id) & GENMASK_ULL(33, 32)) >> 32)
-#define MD_FIELD_ID_ELE_SIZE_16BIT 1
+#define MD_FIELD_ELE_SIZE(_field_id) \
+ (1 << MD_FIELD_ID_ELE_SIZE_CODE(_field_id))
struct tdmr_reserved_area {
u64 offset;
--
2.46.0
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v3 4/8] x86/virt/tdx: Refine a comment to reflect the latest TDX spec
2024-08-27 7:14 [PATCH v3 0/8] TDX host: metadata reading tweaks, bug fix and info dump Kai Huang
` (2 preceding siblings ...)
2024-08-27 7:14 ` [PATCH v3 3/8] x86/virt/tdx: Prepare to support reading other global metadata fields Kai Huang
@ 2024-08-27 7:14 ` Kai Huang
2024-08-30 6:45 ` Adrian Hunter
2024-08-30 9:14 ` Nikolay Borisov
2024-08-27 7:14 ` [PATCH v3 5/8] x86/virt/tdx: Start to track all global metadata in one structure Kai Huang
` (3 subsequent siblings)
7 siblings, 2 replies; 42+ messages in thread
From: Kai Huang @ 2024-08-27 7:14 UTC (permalink / raw)
To: dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo, hpa,
dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu, adrian.hunter, kai.huang
The old versions of "Intel TDX Module v1.5 ABI Specification" contain
the definitions of all global metadata field IDs directly in a table.
However, the latest spec moves those definitions to a dedicated
'global_metadata.json' file as part of a new (separate) "Intel TDX
Module v1.5 ABI definitions" [1].
Update the comment to reflect this.
[1]: https://cdrdv2.intel.com/v1/dl/getContent/795381
Reported-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
arch/x86/virt/vmx/tdx/tdx.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 7458f6717873..8aabd03d8bf5 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -29,7 +29,7 @@
/*
* Global scope metadata field ID.
*
- * See Table "Global Scope Metadata", TDX module 1.5 ABI spec.
+ * See the "global_metadata.json" in the "TDX 1.5 ABI definitions".
*/
#define MD_FIELD_ID_MAX_TDMRS 0x9100000100000008ULL
#define MD_FIELD_ID_MAX_RESERVED_PER_TDMR 0x9100000100000009ULL
--
2.46.0
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v3 5/8] x86/virt/tdx: Start to track all global metadata in one structure
2024-08-27 7:14 [PATCH v3 0/8] TDX host: metadata reading tweaks, bug fix and info dump Kai Huang
` (3 preceding siblings ...)
2024-08-27 7:14 ` [PATCH v3 4/8] x86/virt/tdx: Refine a comment to reflect the latest TDX spec Kai Huang
@ 2024-08-27 7:14 ` Kai Huang
2024-08-30 7:02 ` Adrian Hunter
2024-08-30 11:01 ` Nikolay Borisov
2024-08-27 7:14 ` [PATCH v3 6/8] x86/virt/tdx: Print TDX module basic information Kai Huang
` (2 subsequent siblings)
7 siblings, 2 replies; 42+ messages in thread
From: Kai Huang @ 2024-08-27 7:14 UTC (permalink / raw)
To: dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo, hpa,
dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu, adrian.hunter, kai.huang
The TDX module provides a set of "global metadata fields". They report
things like TDX module version, supported features, and fields related
to create/run TDX guests and so on.
Currently the kernel only reads "TD Memory Region" (TDMR) related fields
for module initialization. There are immediate needs which require the
TDX module initialization to read more global metadata including module
version, supported features and "Convertible Memory Regions" (CMRs).
Also, KVM will need to read more metadata fields to support baseline TDX
guests. In the longer term, other TDX features like TDX Connect (which
supports assigning trusted IO devices to TDX guest) may also require
other kernel components such as pci/vt-d to access global metadata.
To meet all those requirements, the idea is the TDX host core-kernel to
to provide a centralized, canonical, and read-only structure for the
global metadata that comes out from the TDX module for all kernel
components to use.
As the first step, introduce a new 'struct tdx_sys_info' to track all
global metadata fields.
TDX categories global metadata fields into different "Class"es. E.g.,
the TDMR related fields are under class "TDMR Info". Instead of making
'struct tdx_sys_info' a plain structure to contain all metadata fields,
organize them in smaller structures based on the "Class".
This allows those metadata fields to be used in finer granularity thus
makes the code more clear. E.g., the construct_tdmr() can just take the
structure which contains "TDMR Info" metadata fields.
Add a new function get_tdx_sys_info() as the placeholder to read all
metadata fields, and call it at the beginning of init_tdx_module(). For
now it only calls get_tdx_sys_info_tdmr() to read TDMR related fields.
Note there is a functional change: get_tdx_sys_info_tdmr() is moved from
after build_tdx_memlist() to before it, but it is fine to do so.
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
v2 -> v3:
- Split out the part to rename 'struct tdx_tdmr_sysinfo' to 'struct
tdx_sys_info_tdmr'.
---
arch/x86/virt/vmx/tdx/tdx.c | 19 ++++++++++++-------
arch/x86/virt/vmx/tdx/tdx.h | 36 +++++++++++++++++++++++++++++-------
2 files changed, 41 insertions(+), 14 deletions(-)
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 1cd9035c783f..24eb289c80e8 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -318,6 +318,11 @@ static int get_tdx_sys_info_tdmr(struct tdx_sys_info_tdmr *sysinfo_tdmr)
return ret;
}
+static int get_tdx_sys_info(struct tdx_sys_info *sysinfo)
+{
+ return get_tdx_sys_info_tdmr(&sysinfo->tdmr);
+}
+
/* Calculate the actual TDMR size */
static int tdmr_size_single(u16 max_reserved_per_tdmr)
{
@@ -1090,9 +1095,13 @@ static int init_tdmrs(struct tdmr_info_list *tdmr_list)
static int init_tdx_module(void)
{
- struct tdx_sys_info_tdmr sysinfo_tdmr;
+ struct tdx_sys_info sysinfo;
int ret;
+ ret = get_tdx_sys_info(&sysinfo);
+ if (ret)
+ return ret;
+
/*
* To keep things simple, assume that all TDX-protected memory
* will come from the page allocator. Make sure all pages in the
@@ -1109,17 +1118,13 @@ static int init_tdx_module(void)
if (ret)
goto out_put_tdxmem;
- ret = get_tdx_sys_info_tdmr(&sysinfo_tdmr);
- if (ret)
- goto err_free_tdxmem;
-
/* Allocate enough space for constructing TDMRs */
- ret = alloc_tdmr_list(&tdx_tdmr_list, &sysinfo_tdmr);
+ ret = alloc_tdmr_list(&tdx_tdmr_list, &sysinfo.tdmr);
if (ret)
goto err_free_tdxmem;
/* Cover all TDX-usable memory regions in TDMRs */
- ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &sysinfo_tdmr);
+ ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &sysinfo.tdmr);
if (ret)
goto err_free_tdmrs;
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 8aabd03d8bf5..4cddbb035b9f 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -100,13 +100,6 @@ struct tdx_memblock {
int nid;
};
-/* "TDMR info" part of "Global Scope Metadata" for constructing TDMRs */
-struct tdx_sys_info_tdmr {
- u16 max_tdmrs;
- u16 max_reserved_per_tdmr;
- u16 pamt_entry_size[TDX_PS_NR];
-};
-
/* Warn if kernel has less than TDMR_NR_WARN TDMRs after allocation */
#define TDMR_NR_WARN 4
@@ -119,4 +112,33 @@ struct tdmr_info_list {
int max_tdmrs; /* How many 'tdmr_info's are allocated */
};
+/*
+ * Kernel-defined structures to contain "Global Scope Metadata".
+ *
+ * TDX global metadata fields are categorized by "Class"es. See the
+ * "global_metadata.json" in the "TDX 1.5 ABI Definitions".
+ *
+ * 'struct tdx_sys_info' is the main structure to contain all metadata
+ * used by the kernel. It contains sub-structures with each reflecting
+ * the "Class" in the 'global_metadata.json'.
+ *
+ * Note the structure name may not exactly follow the name of the
+ * "Class" in the TDX spec, but the comment of that structure always
+ * reflect that.
+ *
+ * Also note not all metadata fields in each class are defined, only
+ * those used by the kernel are.
+ */
+
+/* Class "TDMR info" */
+struct tdx_sys_info_tdmr {
+ u16 max_tdmrs;
+ u16 max_reserved_per_tdmr;
+ u16 pamt_entry_size[TDX_PS_NR];
+};
+
+struct tdx_sys_info {
+ struct tdx_sys_info_tdmr tdmr;
+};
+
#endif
--
2.46.0
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v3 6/8] x86/virt/tdx: Print TDX module basic information
2024-08-27 7:14 [PATCH v3 0/8] TDX host: metadata reading tweaks, bug fix and info dump Kai Huang
` (4 preceding siblings ...)
2024-08-27 7:14 ` [PATCH v3 5/8] x86/virt/tdx: Start to track all global metadata in one structure Kai Huang
@ 2024-08-27 7:14 ` Kai Huang
2024-09-06 22:46 ` Dan Williams
2024-08-27 7:14 ` [PATCH v3 7/8] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes Kai Huang
2024-08-27 7:14 ` [PATCH v3 8/8] x86/virt/tdx: Don't initialize module that doesn't support NO_RBP_MOD feature Kai Huang
7 siblings, 1 reply; 42+ messages in thread
From: Kai Huang @ 2024-08-27 7:14 UTC (permalink / raw)
To: dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo, hpa,
dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu, adrian.hunter, kai.huang
Currently the kernel doesn't print any information regarding the TDX
module itself, e.g. module version. In practice such information is
useful, especially to the developers.
For instance, there are a couple of use cases for dumping module basic
information:
1) When something goes wrong around using TDX, the information like TDX
module version, supported features etc could be helpful [1][2].
2) For Linux, when the user wants to update the TDX module, one needs to
replace the old module in a specific location in the EFI partition
with the new one so that after reboot the BIOS can load it. However,
after kernel boots, currently the user has no way to verify it is
indeed the new module that gets loaded and initialized (e.g., error
could happen when replacing the old module). With the module version
dumped the user can verify this easily.
So dump the basic TDX module information:
- TDX module version, and the build date.
- TDX_FEATURES0: Supported TDX features.
And dump the information right after reading global metadata, so that
this information is printed no matter whether module initialization
fails or not.
The actual dmesg will look like:
virt/tdx: Initializing TDX module: 1.5.00.00.0481 (build_date 20230323), TDX_FEATURES0 0xfbf
Link: https://lore.kernel.org/lkml/e2d844ad-182a-4fc0-a06a-d609c9cbef74@suse.com/T/#m352829aedf6680d4628c7e40dc40b332eda93355 [1]
Link: https://lore.kernel.org/lkml/e2d844ad-182a-4fc0-a06a-d609c9cbef74@suse.com/T/#m351ebcbc006d2e5bc3e7650206a087cb2708d451 [2]
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
v2 -> v3:
- 'struct tdx_sysinfo_module_info' -> 'struct tdx_sys_info_features'
- 'struct tdx_sysinfo_module_version' -> 'struct tdx_sys_info_version'
- Remove the 'sys_attributes' and the check of debug/production module.
https://lore.kernel.org/kvm/cover.1721186590.git.kai.huang@intel.com/T/#md73dd9b02a492acf4a6facae63e8d030e320967d
---
arch/x86/virt/vmx/tdx/tdx.c | 61 +++++++++++++++++++++++++++++++++++++
arch/x86/virt/vmx/tdx/tdx.h | 34 ++++++++++++++++++++-
2 files changed, 94 insertions(+), 1 deletion(-)
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 24eb289c80e8..0fb673dd43ed 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -302,6 +302,55 @@ static int __read_sys_metadata_field(u64 field_id, void *val, int size)
&_stbuf->_member); \
})
+static int get_tdx_sys_info_features(struct tdx_sys_info_features *sysinfo_features)
+{
+ int ret = 0;
+
+#define TD_SYSINFO_MAP_MOD_FEATURES(_field_id, _member) \
+ TD_SYSINFO_MAP(_field_id, sysinfo_features, _member)
+
+ TD_SYSINFO_MAP_MOD_FEATURES(TDX_FEATURES0, tdx_features0);
+
+ return ret;
+}
+
+static int get_tdx_sys_info_version(struct tdx_sys_info_version *sysinfo_version)
+{
+ int ret = 0;
+
+#define TD_SYSINFO_MAP_MOD_VERSION(_field_id, _member) \
+ TD_SYSINFO_MAP(_field_id, sysinfo_version, _member)
+
+ TD_SYSINFO_MAP_MOD_VERSION(MAJOR_VERSION, major);
+ TD_SYSINFO_MAP_MOD_VERSION(MINOR_VERSION, minor);
+ TD_SYSINFO_MAP_MOD_VERSION(UPDATE_VERSION, update);
+ TD_SYSINFO_MAP_MOD_VERSION(INTERNAL_VERSION, internal);
+ TD_SYSINFO_MAP_MOD_VERSION(BUILD_NUM, build_num);
+ TD_SYSINFO_MAP_MOD_VERSION(BUILD_DATE, build_date);
+
+ return ret;
+}
+
+static void print_basic_sys_info(struct tdx_sys_info *sysinfo)
+{
+ struct tdx_sys_info_features *features = &sysinfo->features;
+ struct tdx_sys_info_version *version = &sysinfo->version;
+
+ /*
+ * TDX module version encoding:
+ *
+ * <major>.<minor>.<update>.<internal>.<build_num>
+ *
+ * When printed as text, <major> and <minor> are 1-digit,
+ * <update> and <internal> are 2-digits and <build_num>
+ * is 4-digits.
+ */
+ pr_info("Initializing TDX module: %u.%u.%02u.%02u.%04u (build_date %u), TDX_FEATURES0 0x%llx\n",
+ version->major, version->minor, version->update,
+ version->internal, version->build_num,
+ version->build_date, features->tdx_features0);
+}
+
static int get_tdx_sys_info_tdmr(struct tdx_sys_info_tdmr *sysinfo_tdmr)
{
int ret = 0;
@@ -320,6 +369,16 @@ static int get_tdx_sys_info_tdmr(struct tdx_sys_info_tdmr *sysinfo_tdmr)
static int get_tdx_sys_info(struct tdx_sys_info *sysinfo)
{
+ int ret;
+
+ ret = get_tdx_sys_info_features(&sysinfo->features);
+ if (ret)
+ return ret;
+
+ ret = get_tdx_sys_info_version(&sysinfo->version);
+ if (ret)
+ return ret;
+
return get_tdx_sys_info_tdmr(&sysinfo->tdmr);
}
@@ -1102,6 +1161,8 @@ static int init_tdx_module(void)
if (ret)
return ret;
+ print_basic_sys_info(&sysinfo);
+
/*
* To keep things simple, assume that all TDX-protected memory
* will come from the page allocator. Make sure all pages in the
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 4cddbb035b9f..b422e8517e01 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -31,6 +31,15 @@
*
* See the "global_metadata.json" in the "TDX 1.5 ABI definitions".
*/
+#define MD_FIELD_ID_SYS_ATTRIBUTES 0x0A00000200000000ULL
+#define MD_FIELD_ID_TDX_FEATURES0 0x0A00000300000008ULL
+#define MD_FIELD_ID_BUILD_DATE 0x8800000200000001ULL
+#define MD_FIELD_ID_BUILD_NUM 0x8800000100000002ULL
+#define MD_FIELD_ID_MINOR_VERSION 0x0800000100000003ULL
+#define MD_FIELD_ID_MAJOR_VERSION 0x0800000100000004ULL
+#define MD_FIELD_ID_UPDATE_VERSION 0x0800000100000005ULL
+#define MD_FIELD_ID_INTERNAL_VERSION 0x0800000100000006ULL
+
#define MD_FIELD_ID_MAX_TDMRS 0x9100000100000008ULL
#define MD_FIELD_ID_MAX_RESERVED_PER_TDMR 0x9100000100000009ULL
#define MD_FIELD_ID_PAMT_4K_ENTRY_SIZE 0x9100000100000010ULL
@@ -130,6 +139,27 @@ struct tdmr_info_list {
* those used by the kernel are.
*/
+/*
+ * Class "TDX Module Info".
+ *
+ * This class also contains other fields like SYS_ATTRIBUTES and the
+ * NUM_TDX_FEATURES. For now only TDX_FEATURES0 is needed, but still
+ * keep the structure to follow the spec (and for future extension).
+ */
+struct tdx_sys_info_features {
+ u64 tdx_features0;
+};
+
+/* Class "TDX Module Version" */
+struct tdx_sys_info_version {
+ u16 major;
+ u16 minor;
+ u16 update;
+ u16 internal;
+ u16 build_num;
+ u32 build_date;
+};
+
/* Class "TDMR info" */
struct tdx_sys_info_tdmr {
u16 max_tdmrs;
@@ -138,7 +168,9 @@ struct tdx_sys_info_tdmr {
};
struct tdx_sys_info {
- struct tdx_sys_info_tdmr tdmr;
+ struct tdx_sys_info_features features;
+ struct tdx_sys_info_version version;
+ struct tdx_sys_info_tdmr tdmr;
};
#endif
--
2.46.0
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v3 7/8] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes
2024-08-27 7:14 [PATCH v3 0/8] TDX host: metadata reading tweaks, bug fix and info dump Kai Huang
` (5 preceding siblings ...)
2024-08-27 7:14 ` [PATCH v3 6/8] x86/virt/tdx: Print TDX module basic information Kai Huang
@ 2024-08-27 7:14 ` Kai Huang
2024-08-30 10:50 ` Adrian Hunter
` (2 more replies)
2024-08-27 7:14 ` [PATCH v3 8/8] x86/virt/tdx: Don't initialize module that doesn't support NO_RBP_MOD feature Kai Huang
7 siblings, 3 replies; 42+ messages in thread
From: Kai Huang @ 2024-08-27 7:14 UTC (permalink / raw)
To: dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo, hpa,
dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu, adrian.hunter, kai.huang
A TDX module initialization failure was reported on a Emerald Rapids
platform:
virt/tdx: initialization failed: TDMR [0x0, 0x80000000): reserved areas exhausted.
virt/tdx: module initialization failed (-28)
As part of initializing the TDX module, the kernel informs the TDX
module of all "TDX-usable memory regions" using an array of TDX defined
structure "TD Memory Region" (TDMR). Each TDMR must be in 1GB aligned
and in 1GB granularity, and all "non-TDX-usable memory holes" within a
given TDMR must be marked as "reserved areas". The TDX module reports a
maximum number of reserved areas that can be supported per TDMR.
Currently, the kernel finds those "non-TDX-usable memory holes" within a
given TDMR by walking over a list of "TDX-usable memory regions", which
essentially reflects the "usable" regions in the e820 table (w/o memory
hotplug operations precisely, but this is not relevant here).
As shown above, the root cause of this failure is when the kernel tries
to construct a TDMR to cover address range [0x0, 0x80000000), there
are too many memory holes within that range and the number of memory
holes exceeds the maximum number of reserved areas.
The E820 table of that platform (see [1] below) reflects this: the
number of memory holes among e820 "usable" entries exceeds 16, which is
the maximum number of reserved areas TDX module supports in practice.
=== Fix ===
There are two options to fix this: 1) reduce the number of memory holes
when constructing a TDMR to save "reserved areas"; 2) reduce the TDMR's
size to cover fewer memory regions, thus fewer memory holes.
Option 1) is possible, and in fact is easier and preferable:
TDX actually has a concept of "Convertible Memory Regions" (CMRs). TDX
reports a list of CMRs that meet TDX's security requirements on memory.
TDX requires all the "TDX-usable memory regions" that the kernel passes
to the module via TDMRs, a.k.a, all the "non-reserved regions in TDMRs",
must be convertible memory.
In other words, if a memory hole is indeed CMR, then it's not mandatory
for the kernel to add it to the reserved areas. By doing so, the number
of consumed reserved areas can be reduced w/o having any functional
impact. The kernel still allocates TDX memory from the page allocator.
There's no harm if the kernel tells the TDX module some memory regions
are "TDX-usable" but they will never be allocated by the kernel as TDX
memory.
Note this doesn't have any security impact either because the kernel is
out of TDX's TCB anyway.
This is feasible because in practice the CMRs just reflect the nature of
whether the RAM can indeed be used by TDX, thus each CMR tends to be a
large, uninterrupted range of memory, i.e., unlike the e820 table which
contains numerous "ACPI *" entries in the first 2G range. Refer to [2]
for CMRs reported on the problematic platform using off-tree TDX code.
So for this particular module initialization failure, the memory holes
that are within [0x0, 0x80000000) are mostly indeed CMR. By not adding
them to the reserved areas, the number of consumed reserved areas for
the TDMR [0x0, 0x80000000) can be dramatically reduced.
Option 2) is also theoretically feasible, but it is not desired:
It requires more complicated logic to handle splitting TDMR into smaller
ones, which isn't trivial. There are limitations to splitting TDMR too,
thus it may not always work: 1) The smallest TDMR is 1GB, and it cannot
be split any further; 2) This also increases the total number of TDMRs,
which also has a maximum value limited by the TDX module.
So, fix this issue by using option 1):
1) reading out the CMRs from the TDX module global metadata, and
2) changing to find memory holes for a given TDMR based on CMRs, but not
based on the list of "TDX-usable memory regions".
Also dump the CMRs in dmesg. They are helpful when something goes wrong
around "constructing the TDMRs and configuring the TDX module with
them". Note there are no existing userspace tools that the user can get
CMRs since they can only be read via SEAMCALL (no CPUID, MSR etc).
[1] BIOS-E820 table of the problematic platform:
BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable
BIOS-e820: [mem 0x000000000009f000-0x00000000000fffff] reserved
BIOS-e820: [mem 0x0000000000100000-0x000000005d168fff] usable
BIOS-e820: [mem 0x000000005d169000-0x000000005d22afff] ACPI data
BIOS-e820: [mem 0x000000005d22b000-0x000000005d3cefff] usable
BIOS-e820: [mem 0x000000005d3cf000-0x000000005d469fff] reserved
BIOS-e820: [mem 0x000000005d46a000-0x000000005e5b2fff] usable
BIOS-e820: [mem 0x000000005e5b3000-0x000000005e5c2fff] reserved
BIOS-e820: [mem 0x000000005e5c3000-0x000000005e5d2fff] usable
BIOS-e820: [mem 0x000000005e5d3000-0x000000005e5e4fff] reserved
BIOS-e820: [mem 0x000000005e5e5000-0x000000005eb57fff] usable
BIOS-e820: [mem 0x000000005eb58000-0x0000000061357fff] ACPI NVS
BIOS-e820: [mem 0x0000000061358000-0x000000006172afff] usable
BIOS-e820: [mem 0x000000006172b000-0x0000000061794fff] ACPI data
BIOS-e820: [mem 0x0000000061795000-0x00000000617fefff] usable
BIOS-e820: [mem 0x00000000617ff000-0x0000000061912fff] ACPI data
BIOS-e820: [mem 0x0000000061913000-0x0000000061998fff] usable
BIOS-e820: [mem 0x0000000061999000-0x00000000619dffff] ACPI data
BIOS-e820: [mem 0x00000000619e0000-0x00000000619e1fff] usable
BIOS-e820: [mem 0x00000000619e2000-0x00000000619e9fff] reserved
BIOS-e820: [mem 0x00000000619ea000-0x0000000061a26fff] usable
BIOS-e820: [mem 0x0000000061a27000-0x0000000061baefff] ACPI data
BIOS-e820: [mem 0x0000000061baf000-0x00000000623c2fff] usable
BIOS-e820: [mem 0x00000000623c3000-0x0000000062471fff] reserved
BIOS-e820: [mem 0x0000000062472000-0x0000000062823fff] usable
BIOS-e820: [mem 0x0000000062824000-0x0000000063a24fff] reserved
BIOS-e820: [mem 0x0000000063a25000-0x0000000063d57fff] usable
BIOS-e820: [mem 0x0000000063d58000-0x0000000064157fff] reserved
BIOS-e820: [mem 0x0000000064158000-0x0000000064158fff] usable
BIOS-e820: [mem 0x0000000064159000-0x0000000064194fff] reserved
BIOS-e820: [mem 0x0000000064195000-0x000000006e9cefff] usable
BIOS-e820: [mem 0x000000006e9cf000-0x000000006eccefff] reserved
BIOS-e820: [mem 0x000000006eccf000-0x000000006f6fefff] ACPI NVS
BIOS-e820: [mem 0x000000006f6ff000-0x000000006f7fefff] ACPI data
BIOS-e820: [mem 0x000000006f7ff000-0x000000006f7fffff] usable
BIOS-e820: [mem 0x000000006f800000-0x000000008fffffff] reserved
......
[2] Convertible Memory Regions of the problematic platform:
virt/tdx: CMR: [0x100000, 0x6f800000)
virt/tdx: CMR: [0x100000000, 0x107a000000)
virt/tdx: CMR: [0x1080000000, 0x207c000000)
virt/tdx: CMR: [0x2080000000, 0x307c000000)
virt/tdx: CMR: [0x3080000000, 0x407c000000)
Fixes: dde3b60d572c9 ("x86/virt/tdx: Designate reserved areas for all TDMRs")
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
v2 -> v3:
- Add the Fixes tag, although this patch depends on previous patches.
- CMR_BASE0 -> CMR_BASE(_i), CMR_SIZE0 -> CMR_SIZE(_i) to silence the
build-check error.
v1 -> v2:
- Change to walk over CMRs directly to find out memory holes, instead
of walking over TDX memory blocks and explicitly check whether a hole
is subregion of CMR. (Chao)
- Mention any constant macro definitions in global metadata structures
are TDX architectural. (Binbin)
- Slightly improve the changelog.
---
arch/x86/virt/vmx/tdx/tdx.c | 105 ++++++++++++++++++++++++++++++------
arch/x86/virt/vmx/tdx/tdx.h | 13 +++++
2 files changed, 102 insertions(+), 16 deletions(-)
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 0fb673dd43ed..fa335ab1ae92 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -331,6 +331,72 @@ static int get_tdx_sys_info_version(struct tdx_sys_info_version *sysinfo_version
return ret;
}
+/* Update the @sysinfo_cmr->num_cmrs to trim tail empty CMRs */
+static void trim_empty_tail_cmrs(struct tdx_sys_info_cmr *sysinfo_cmr)
+{
+ int i;
+
+ for (i = 0; i < sysinfo_cmr->num_cmrs; i++) {
+ u64 cmr_base = sysinfo_cmr->cmr_base[i];
+ u64 cmr_size = sysinfo_cmr->cmr_size[i];
+
+ if (!cmr_size) {
+ WARN_ON_ONCE(cmr_base);
+ break;
+ }
+
+ /* TDX architecture: CMR must be 4KB aligned */
+ WARN_ON_ONCE(!PAGE_ALIGNED(cmr_base) ||
+ !PAGE_ALIGNED(cmr_size));
+ }
+
+ sysinfo_cmr->num_cmrs = i;
+}
+
+static int get_tdx_sys_info_cmr(struct tdx_sys_info_cmr *sysinfo_cmr)
+{
+ int i, ret = 0;
+
+#define TD_SYSINFO_MAP_CMR_INFO(_field_id, _member) \
+ TD_SYSINFO_MAP(_field_id, sysinfo_cmr, _member)
+
+ TD_SYSINFO_MAP_CMR_INFO(NUM_CMRS, num_cmrs);
+
+ if (ret)
+ return ret;
+
+ for (i = 0; i < sysinfo_cmr->num_cmrs; i++) {
+ TD_SYSINFO_MAP_CMR_INFO(CMR_BASE(i), cmr_base[i]);
+ TD_SYSINFO_MAP_CMR_INFO(CMR_SIZE(i), cmr_size[i]);
+ }
+
+ if (ret)
+ return ret;
+
+ /*
+ * The TDX module may just report the maximum number of CMRs that
+ * TDX architecturally supports as the actual number of CMRs,
+ * despite the latter is smaller. In this case all the tail
+ * CMRs will be empty. Trim them away.
+ */
+ trim_empty_tail_cmrs(sysinfo_cmr);
+
+ return 0;
+}
+
+static void print_sys_info_cmr(struct tdx_sys_info_cmr *sysinfo_cmr)
+{
+ int i;
+
+ for (i = 0; i < sysinfo_cmr->num_cmrs; i++) {
+ u64 cmr_base = sysinfo_cmr->cmr_base[i];
+ u64 cmr_size = sysinfo_cmr->cmr_size[i];
+
+ pr_info("CMR[%d]: [0x%llx, 0x%llx)\n", i, cmr_base,
+ cmr_base + cmr_size);
+ }
+}
+
static void print_basic_sys_info(struct tdx_sys_info *sysinfo)
{
struct tdx_sys_info_features *features = &sysinfo->features;
@@ -349,6 +415,8 @@ static void print_basic_sys_info(struct tdx_sys_info *sysinfo)
version->major, version->minor, version->update,
version->internal, version->build_num,
version->build_date, features->tdx_features0);
+
+ print_sys_info_cmr(&sysinfo->cmr);
}
static int get_tdx_sys_info_tdmr(struct tdx_sys_info_tdmr *sysinfo_tdmr)
@@ -379,6 +447,10 @@ static int get_tdx_sys_info(struct tdx_sys_info *sysinfo)
if (ret)
return ret;
+ ret = get_tdx_sys_info_cmr(&sysinfo->cmr);
+ if (ret)
+ return ret;
+
return get_tdx_sys_info_tdmr(&sysinfo->tdmr);
}
@@ -803,29 +875,28 @@ static int tdmr_add_rsvd_area(struct tdmr_info *tdmr, int *p_idx, u64 addr,
}
/*
- * Go through @tmb_list to find holes between memory areas. If any of
+ * Go through all CMRs in @sysinfo_cmr to find memory holes. If any of
* those holes fall within @tdmr, set up a TDMR reserved area to cover
* the hole.
*/
-static int tdmr_populate_rsvd_holes(struct list_head *tmb_list,
+static int tdmr_populate_rsvd_holes(struct tdx_sys_info_cmr *sysinfo_cmr,
struct tdmr_info *tdmr,
int *rsvd_idx,
u16 max_reserved_per_tdmr)
{
- struct tdx_memblock *tmb;
u64 prev_end;
- int ret;
+ int i, ret;
/*
* Start looking for reserved blocks at the
* beginning of the TDMR.
*/
prev_end = tdmr->base;
- list_for_each_entry(tmb, tmb_list, list) {
+ for (i = 0; i < sysinfo_cmr->num_cmrs; i++) {
u64 start, end;
- start = PFN_PHYS(tmb->start_pfn);
- end = PFN_PHYS(tmb->end_pfn);
+ start = sysinfo_cmr->cmr_base[i];
+ end = start + sysinfo_cmr->cmr_size[i];
/* Break if this region is after the TDMR */
if (start >= tdmr_end(tdmr))
@@ -926,16 +997,16 @@ static int rsvd_area_cmp_func(const void *a, const void *b)
/*
* Populate reserved areas for the given @tdmr, including memory holes
- * (via @tmb_list) and PAMTs (via @tdmr_list).
+ * (via @sysinfo_cmr) and PAMTs (via @tdmr_list).
*/
static int tdmr_populate_rsvd_areas(struct tdmr_info *tdmr,
- struct list_head *tmb_list,
+ struct tdx_sys_info_cmr *sysinfo_cmr,
struct tdmr_info_list *tdmr_list,
u16 max_reserved_per_tdmr)
{
int ret, rsvd_idx = 0;
- ret = tdmr_populate_rsvd_holes(tmb_list, tdmr, &rsvd_idx,
+ ret = tdmr_populate_rsvd_holes(sysinfo_cmr, tdmr, &rsvd_idx,
max_reserved_per_tdmr);
if (ret)
return ret;
@@ -954,10 +1025,10 @@ static int tdmr_populate_rsvd_areas(struct tdmr_info *tdmr,
/*
* Populate reserved areas for all TDMRs in @tdmr_list, including memory
- * holes (via @tmb_list) and PAMTs.
+ * holes (via @sysinfo_cmr) and PAMTs.
*/
static int tdmrs_populate_rsvd_areas_all(struct tdmr_info_list *tdmr_list,
- struct list_head *tmb_list,
+ struct tdx_sys_info_cmr *sysinfo_cmr,
u16 max_reserved_per_tdmr)
{
int i;
@@ -966,7 +1037,7 @@ static int tdmrs_populate_rsvd_areas_all(struct tdmr_info_list *tdmr_list,
int ret;
ret = tdmr_populate_rsvd_areas(tdmr_entry(tdmr_list, i),
- tmb_list, tdmr_list, max_reserved_per_tdmr);
+ sysinfo_cmr, tdmr_list, max_reserved_per_tdmr);
if (ret)
return ret;
}
@@ -981,7 +1052,8 @@ static int tdmrs_populate_rsvd_areas_all(struct tdmr_info_list *tdmr_list,
*/
static int construct_tdmrs(struct list_head *tmb_list,
struct tdmr_info_list *tdmr_list,
- struct tdx_sys_info_tdmr *sysinfo_tdmr)
+ struct tdx_sys_info_tdmr *sysinfo_tdmr,
+ struct tdx_sys_info_cmr *sysinfo_cmr)
{
int ret;
@@ -994,7 +1066,7 @@ static int construct_tdmrs(struct list_head *tmb_list,
if (ret)
return ret;
- ret = tdmrs_populate_rsvd_areas_all(tdmr_list, tmb_list,
+ ret = tdmrs_populate_rsvd_areas_all(tdmr_list, sysinfo_cmr,
sysinfo_tdmr->max_reserved_per_tdmr);
if (ret)
tdmrs_free_pamt_all(tdmr_list);
@@ -1185,7 +1257,8 @@ static int init_tdx_module(void)
goto err_free_tdxmem;
/* Cover all TDX-usable memory regions in TDMRs */
- ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &sysinfo.tdmr);
+ ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &sysinfo.tdmr,
+ &sysinfo.cmr);
if (ret)
goto err_free_tdmrs;
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index b422e8517e01..e7bed9e717c7 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -40,6 +40,10 @@
#define MD_FIELD_ID_UPDATE_VERSION 0x0800000100000005ULL
#define MD_FIELD_ID_INTERNAL_VERSION 0x0800000100000006ULL
+#define MD_FIELD_ID_NUM_CMRS 0x9000000100000000ULL
+#define MD_FIELD_ID_CMR_BASE(_i) (0x9000000300000080ULL + (u16)_i)
+#define MD_FIELD_ID_CMR_SIZE(_i) (0x9000000300000100ULL + (u16)_i)
+
#define MD_FIELD_ID_MAX_TDMRS 0x9100000100000008ULL
#define MD_FIELD_ID_MAX_RESERVED_PER_TDMR 0x9100000100000009ULL
#define MD_FIELD_ID_PAMT_4K_ENTRY_SIZE 0x9100000100000010ULL
@@ -160,6 +164,14 @@ struct tdx_sys_info_version {
u32 build_date;
};
+/* Class "CMR Info" */
+#define TDX_MAX_CMRS 32 /* architectural */
+struct tdx_sys_info_cmr {
+ u16 num_cmrs;
+ u64 cmr_base[TDX_MAX_CMRS];
+ u64 cmr_size[TDX_MAX_CMRS];
+};
+
/* Class "TDMR info" */
struct tdx_sys_info_tdmr {
u16 max_tdmrs;
@@ -170,6 +182,7 @@ struct tdx_sys_info_tdmr {
struct tdx_sys_info {
struct tdx_sys_info_features features;
struct tdx_sys_info_version version;
+ struct tdx_sys_info_cmr cmr;
struct tdx_sys_info_tdmr tdmr;
};
--
2.46.0
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v3 8/8] x86/virt/tdx: Don't initialize module that doesn't support NO_RBP_MOD feature
2024-08-27 7:14 [PATCH v3 0/8] TDX host: metadata reading tweaks, bug fix and info dump Kai Huang
` (6 preceding siblings ...)
2024-08-27 7:14 ` [PATCH v3 7/8] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes Kai Huang
@ 2024-08-27 7:14 ` Kai Huang
2024-08-30 11:01 ` Adrian Hunter
2024-09-06 23:36 ` Dan Williams
7 siblings, 2 replies; 42+ messages in thread
From: Kai Huang @ 2024-08-27 7:14 UTC (permalink / raw)
To: dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo, hpa,
dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu, adrian.hunter, kai.huang
Old TDX modules can clobber RBP in the TDH.VP.ENTER SEAMCALL. However
RBP is used as frame pointer in the x86_64 calling convention, and
clobbering RBP could result in bad things like being unable to unwind
the stack if any non-maskable exceptions (NMI, #MC etc) happens in that
gap.
A new "NO_RBP_MOD" feature was introduced to more recent TDX modules to
not clobber RBP. This feature is reported in the TDX_FEATURES0 global
metadata field via bit 18.
Don't initialize the TDX module if this feature is not supported [1].
Link: https://lore.kernel.org/all/c0067319-2653-4cbd-8fee-1ccf21b1e646@suse.com/T/#mef98469c51e2382ead2c537ea189752360bd2bef [1]
Signed-off-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
---
v2 -> v3:
- check_module_compatibility() -> check_features().
- Improve error message.
https://lore.kernel.org/kvm/cover.1721186590.git.kai.huang@intel.com/T/#md9e2eeef927838cbf20d7b361cdbea518b8aec50
---
arch/x86/virt/vmx/tdx/tdx.c | 17 +++++++++++++++++
arch/x86/virt/vmx/tdx/tdx.h | 3 +++
2 files changed, 20 insertions(+)
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index fa335ab1ae92..032a53ddf5bc 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -454,6 +454,18 @@ static int get_tdx_sys_info(struct tdx_sys_info *sysinfo)
return get_tdx_sys_info_tdmr(&sysinfo->tdmr);
}
+static int check_features(struct tdx_sys_info *sysinfo)
+{
+ u64 tdx_features0 = sysinfo->features.tdx_features0;
+
+ if (!(tdx_features0 & TDX_FEATURES0_NO_RBP_MOD)) {
+ pr_err("frame pointer (RBP) clobber bug present, upgrade TDX module\n");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
/* Calculate the actual TDMR size */
static int tdmr_size_single(u16 max_reserved_per_tdmr)
{
@@ -1235,6 +1247,11 @@ static int init_tdx_module(void)
print_basic_sys_info(&sysinfo);
+ /* Check whether the kernel can support this module */
+ ret = check_features(&sysinfo);
+ if (ret)
+ return ret;
+
/*
* To keep things simple, assume that all TDX-protected memory
* will come from the page allocator. Make sure all pages in the
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index e7bed9e717c7..831361e6d0fb 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -154,6 +154,9 @@ struct tdx_sys_info_features {
u64 tdx_features0;
};
+/* Architectural bit definitions of TDX_FEATURES0 metadata field */
+#define TDX_FEATURES0_NO_RBP_MOD _BITULL(18)
+
/* Class "TDX Module Version" */
struct tdx_sys_info_version {
u16 major;
--
2.46.0
^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH v3 1/8] x86/virt/tdx: Rename 'struct tdx_tdmr_sysinfo' to reflect the spec better
2024-08-27 7:14 ` [PATCH v3 1/8] x86/virt/tdx: Rename 'struct tdx_tdmr_sysinfo' to reflect the spec better Kai Huang
@ 2024-08-27 13:10 ` Adrian Hunter
2024-08-27 22:20 ` Huang, Kai
2024-08-30 22:05 ` Dan Williams
1 sibling, 1 reply; 42+ messages in thread
From: Adrian Hunter @ 2024-08-27 13:10 UTC (permalink / raw)
To: Kai Huang, dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo,
hpa, dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu
On 27/08/24 10:14, Kai Huang wrote:
"to reflect the spec better" is a bit vague. How about:
x86/virt/tdx: Rename 'struct tdx_tdmr_sysinfo' to 'struct tdx_sys_info_tdmr'
Rename 'struct tdx_tdmr_sysinfo' to 'struct tdx_sys_info_tdmr' to
prepare for adding similar structures that will all be prefixed by
'tdx_sys_info_'.
> The TDX module provides a set of "global metadata fields". They report
Since it is a name of something, could capitalize "Global Metadata Fields"
> things like TDX module version, supported features, and fields related
> to create/run TDX guests and so on.
>
> TDX organizes those metadata fields by "Class"es based on the meaning of
by "Class"es -> into "Classes"
> those fields. E.g., for now the kernel only reads "TD Memory Region"
> (TDMR) related fields for module initialization. Those fields are
> defined under class "TDMR Info".
>
> There are both immediate needs to read more metadata fields for module
> initialization and near-future needs for other kernel components like
> KVM to run TDX guests. To meet all those requirements, the idea is the
> TDX host core-kernel to provide a centralized, canonical, and read-only
> structure for the global metadata that comes out from the TDX module for
> all kernel components to use.
>
> More specifically, the target is to end up with something like:
>
> struct tdx_sys_info {
> struct tdx_sys_info_classA a;
> struct tdx_sys_info_classB b;
> ...
> };
>
> Currently the kernel organizes all fields under "TDMR Info" class in
> 'struct tdx_tdmr_sysinfo'. To prepare for the above target, rename the
> structure to 'struct tdx_sys_info_tdmr' to follow the class name better.
>
> No functional change intended.
>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
> v2 -> v3:
> - Split out as a separate patch and place it at beginning:
>
> https://lore.kernel.org/kvm/cover.1721186590.git.kai.huang@intel.com/T/#m8fec7c429242d640cf5e756eb68e3b822e6dff8b
>
> - Rename to 'struct tdx_sys_info_tdmr':
>
> https://lore.kernel.org/kvm/cover.1721186590.git.kai.huang@intel.com/T/#md73dd9b02a492acf4a6facae63e8d030e320967d
> https://lore.kernel.org/kvm/cover.1721186590.git.kai.huang@intel.com/T/#m8fec7c429242d640cf5e756eb68e3b822e6dff8b
>
> ---
> arch/x86/virt/vmx/tdx/tdx.c | 36 ++++++++++++++++++------------------
> arch/x86/virt/vmx/tdx/tdx.h | 2 +-
> 2 files changed, 19 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 4e2b2e2ac9f9..e979bf442929 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -272,7 +272,7 @@ static int read_sys_metadata_field(u64 field_id, u64 *data)
>
> static int read_sys_metadata_field16(u64 field_id,
> int offset,
> - struct tdx_tdmr_sysinfo *ts)
> + struct tdx_sys_info_tdmr *ts)
> {
> u16 *ts_member = ((void *)ts) + offset;
> u64 tmp;
> @@ -298,9 +298,9 @@ struct field_mapping {
>
> #define TD_SYSINFO_MAP(_field_id, _offset) \
> { .field_id = MD_FIELD_ID_##_field_id, \
> - .offset = offsetof(struct tdx_tdmr_sysinfo, _offset) }
> + .offset = offsetof(struct tdx_sys_info_tdmr, _offset) }
>
> -/* Map TD_SYSINFO fields into 'struct tdx_tdmr_sysinfo': */
> +/* Map TD_SYSINFO fields into 'struct tdx_sys_info_tdmr': */
> static const struct field_mapping fields[] = {
> TD_SYSINFO_MAP(MAX_TDMRS, max_tdmrs),
> TD_SYSINFO_MAP(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr),
> @@ -309,16 +309,16 @@ static const struct field_mapping fields[] = {
> TD_SYSINFO_MAP(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]),
> };
>
> -static int get_tdx_tdmr_sysinfo(struct tdx_tdmr_sysinfo *tdmr_sysinfo)
> +static int get_tdx_sys_info_tdmr(struct tdx_sys_info_tdmr *sysinfo_tdmr)
> {
> int ret;
> int i;
>
> - /* Populate 'tdmr_sysinfo' fields using the mapping structure above: */
> + /* Populate 'sysinfo_tdmr' fields using the mapping structure above: */
> for (i = 0; i < ARRAY_SIZE(fields); i++) {
> ret = read_sys_metadata_field16(fields[i].field_id,
> fields[i].offset,
> - tdmr_sysinfo);
> + sysinfo_tdmr);
> if (ret)
> return ret;
> }
> @@ -342,13 +342,13 @@ static int tdmr_size_single(u16 max_reserved_per_tdmr)
> }
>
> static int alloc_tdmr_list(struct tdmr_info_list *tdmr_list,
> - struct tdx_tdmr_sysinfo *tdmr_sysinfo)
> + struct tdx_sys_info_tdmr *sysinfo_tdmr)
> {
> size_t tdmr_sz, tdmr_array_sz;
> void *tdmr_array;
>
> - tdmr_sz = tdmr_size_single(tdmr_sysinfo->max_reserved_per_tdmr);
> - tdmr_array_sz = tdmr_sz * tdmr_sysinfo->max_tdmrs;
> + tdmr_sz = tdmr_size_single(sysinfo_tdmr->max_reserved_per_tdmr);
> + tdmr_array_sz = tdmr_sz * sysinfo_tdmr->max_tdmrs;
>
> /*
> * To keep things simple, allocate all TDMRs together.
> @@ -367,7 +367,7 @@ static int alloc_tdmr_list(struct tdmr_info_list *tdmr_list,
> * at a given index in the TDMR list.
> */
> tdmr_list->tdmr_sz = tdmr_sz;
> - tdmr_list->max_tdmrs = tdmr_sysinfo->max_tdmrs;
> + tdmr_list->max_tdmrs = sysinfo_tdmr->max_tdmrs;
> tdmr_list->nr_consumed_tdmrs = 0;
>
> return 0;
> @@ -921,11 +921,11 @@ static int tdmrs_populate_rsvd_areas_all(struct tdmr_info_list *tdmr_list,
> /*
> * Construct a list of TDMRs on the preallocated space in @tdmr_list
> * to cover all TDX memory regions in @tmb_list based on the TDX module
> - * TDMR global information in @tdmr_sysinfo.
> + * TDMR global information in @sysinfo_tdmr.
> */
> static int construct_tdmrs(struct list_head *tmb_list,
> struct tdmr_info_list *tdmr_list,
> - struct tdx_tdmr_sysinfo *tdmr_sysinfo)
> + struct tdx_sys_info_tdmr *sysinfo_tdmr)
> {
> int ret;
>
> @@ -934,12 +934,12 @@ static int construct_tdmrs(struct list_head *tmb_list,
> return ret;
>
> ret = tdmrs_set_up_pamt_all(tdmr_list, tmb_list,
> - tdmr_sysinfo->pamt_entry_size);
> + sysinfo_tdmr->pamt_entry_size);
> if (ret)
> return ret;
>
> ret = tdmrs_populate_rsvd_areas_all(tdmr_list, tmb_list,
> - tdmr_sysinfo->max_reserved_per_tdmr);
> + sysinfo_tdmr->max_reserved_per_tdmr);
> if (ret)
> tdmrs_free_pamt_all(tdmr_list);
>
> @@ -1098,7 +1098,7 @@ static int init_tdmrs(struct tdmr_info_list *tdmr_list)
>
> static int init_tdx_module(void)
> {
> - struct tdx_tdmr_sysinfo tdmr_sysinfo;
> + struct tdx_sys_info_tdmr sysinfo_tdmr;
> int ret;
>
> /*
> @@ -1117,17 +1117,17 @@ static int init_tdx_module(void)
> if (ret)
> goto out_put_tdxmem;
>
> - ret = get_tdx_tdmr_sysinfo(&tdmr_sysinfo);
> + ret = get_tdx_sys_info_tdmr(&sysinfo_tdmr);
> if (ret)
> goto err_free_tdxmem;
>
> /* Allocate enough space for constructing TDMRs */
> - ret = alloc_tdmr_list(&tdx_tdmr_list, &tdmr_sysinfo);
> + ret = alloc_tdmr_list(&tdx_tdmr_list, &sysinfo_tdmr);
> if (ret)
> goto err_free_tdxmem;
>
> /* Cover all TDX-usable memory regions in TDMRs */
> - ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &tdmr_sysinfo);
> + ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &sysinfo_tdmr);
> if (ret)
> goto err_free_tdmrs;
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index b701f69485d3..148f9b4d1140 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -100,7 +100,7 @@ struct tdx_memblock {
> };
>
> /* "TDMR info" part of "Global Scope Metadata" for constructing TDMRs */
> -struct tdx_tdmr_sysinfo {
> +struct tdx_sys_info_tdmr {
> u16 max_tdmrs;
> u16 max_reserved_per_tdmr;
> u16 pamt_entry_size[TDX_PS_NR];
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 1/8] x86/virt/tdx: Rename 'struct tdx_tdmr_sysinfo' to reflect the spec better
2024-08-27 13:10 ` Adrian Hunter
@ 2024-08-27 22:20 ` Huang, Kai
2024-09-24 11:39 ` Huang, Kai
0 siblings, 1 reply; 42+ messages in thread
From: Huang, Kai @ 2024-08-27 22:20 UTC (permalink / raw)
To: Adrian Hunter, dave.hansen, kirill.shutemov, tglx, bp, peterz,
mingo, hpa, dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu
On 28/08/2024 1:10 am, Adrian Hunter wrote:
> On 27/08/24 10:14, Kai Huang wrote:
>
> "to reflect the spec better" is a bit vague. How about:
>
> x86/virt/tdx: Rename 'struct tdx_tdmr_sysinfo' to 'struct tdx_sys_info_tdmr'
>
> Rename 'struct tdx_tdmr_sysinfo' to 'struct tdx_sys_info_tdmr' to
> prepare for adding similar structures that will all be prefixed by
> 'tdx_sys_info_'.
>
>> The TDX module provides a set of "global metadata fields". They report
>
> Since it is a name of something, could capitalize "Global Metadata Fields"
>
>> things like TDX module version, supported features, and fields related
>> to create/run TDX guests and so on.
>>
>> TDX organizes those metadata fields by "Class"es based on the meaning of
>
> by "Class"es -> into "Classes"
>
>> those fields. E.g., for now the kernel only reads "TD Memory Region"
>> (TDMR) related fields for module initialization. Those fields are
>> defined under class "TDMR Info".
>>
>> There are both immediate needs to read more metadata fields for module
>> initialization and near-future needs for other kernel components like
>> KVM to run TDX guests. To meet all those requirements, the idea is the
>> TDX host core-kernel to provide a centralized, canonical, and read-only
>> structure for the global metadata that comes out from the TDX module for
>> all kernel components to use.
>>
>> More specifically, the target is to end up with something like:
>>
>> struct tdx_sys_info {
>> struct tdx_sys_info_classA a;
>> struct tdx_sys_info_classB b;
>> ...
>> };
>>
>> Currently the kernel organizes all fields under "TDMR Info" class in
>> 'struct tdx_tdmr_sysinfo'. To prepare for the above target, rename the
>> structure to 'struct tdx_sys_info_tdmr' to follow the class name better.
>>
>> No functional change intended.
>>
>> Signed-off-by: Kai Huang <kai.huang@intel.com>
>
> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com
Thanks for the review.
All comments above look good to me. Will do.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 2/8] x86/virt/tdx: Remove 'struct field_mapping' and implement TD_SYSINFO_MAP() macro
2024-08-27 7:14 ` [PATCH v3 2/8] x86/virt/tdx: Remove 'struct field_mapping' and implement TD_SYSINFO_MAP() macro Kai Huang
@ 2024-08-29 7:20 ` Adrian Hunter
2024-08-30 10:52 ` Huang, Kai
2024-09-06 21:30 ` Dan Williams
2024-09-06 20:21 ` Dan Williams
1 sibling, 2 replies; 42+ messages in thread
From: Adrian Hunter @ 2024-08-29 7:20 UTC (permalink / raw)
To: Kai Huang, dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo,
hpa, dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu
On 27/08/24 10:14, Kai Huang wrote:
Subject could be "Simplify the reading of Global Metadata Fields"
> TL;DR: Remove the 'struct field_mapping' structure and use another way
> to implement the TD_SYSINFO_MAP() macro to improve the current metadata
> reading code, e.g., switching to use build-time check for metadata field
> size over the existing runtime check.
Perhaps:
Remove 'struct field_mapping' and let read_sys_metadata_field16() accept
the member variable address, simplifying the code in preparation for adding
support for more metadata structures and field sizes.
>
> The TDX module provides a set of "global metadata fields". They report
Global Metadata Fields
> things like TDX module version, supported features, and fields related
> to create/run TDX guests and so on.
>
> For now the kernel only reads "TD Memory Region" (TDMR) related global
> metadata fields, and all of those fields are organized in one structure.
The patch is self-evidently simpler (21 insertions(+), 36 deletions(-))
so there doesn't seem to be any need for further elaboration. Perhaps
just round it off and stop there.
... and all of those fields are organized in one structure,
but that will change in the near future.
>
> The kernel currently uses 'struct field_mapping' to facilitate reading
> multiple metadata fields into one structure. The 'struct field_mapping'
> records the mapping between the field ID and the offset of the structure
> to fill out. The kernel initializes an array of 'struct field_mapping'
> for each structure member (using the TD_SYSINFO_MAP() macro) and then
> reads all metadata fields in a loop using that array.
>
> Currently the kernel only reads TDMR related metadata fields into one
> structure, and the function to read one metadata field takes the pointer
> of that structure and the offset:
>
> static int read_sys_metadata_field16(u64 field_id,
> int offset,
> struct tdx_sys_info_tdmr *ts)
> {...}
>
> Future changes will need to read more metadata fields into different
> structures. To support this the above function will need to be changed
> to take 'void *':
>
> static int read_sys_metadata_field16(u64 field_id,
> int offset,
> void *stbuf)
> {...}
>
> This approach loses type-safety, as Dan suggested. A better way is to
> make it be ..
>
> static int read_sys_metadata_field16(u64 field_id, u16 *val) {...}
>
> .. and let the caller calculate the buffer in a type-safe way [1].
>
> Also, the using of the 'struct field_mapping' loses the ability to be
> able to do build-time check about the metadata field size (encoded in
> the field ID) due to the field ID is "indirectly" initialized to the
> 'struct field_mapping' and passed to the function to read. Thus the
> current code uses runtime check instead.
>
> Dan suggested to remove the 'struct field_mapping', unroll the loop,
> skip the array, and call the read_sys_metadata_field16() directly with
> build-time check [1][2]. And to improve the readability, reimplement
> the TD_SYSINFO_MAP() [3].
>
> The new TD_SYSINFO_MAP() isn't perfect. It requires the function that
> uses it to define a local variable @ret to carry the error code and set
> the initial value to 0. It also hard-codes the variable name of the
> structure pointer used in the function. But overall the pros of this
> approach beat the cons.
>
> Link: https://lore.kernel.org/kvm/a107b067-861d-43f4-86b5-29271cb93dad@intel.com/T/#m7cfb3c146214d94b24e978eeb8708d92c0b14ac6 [1]
> Link: https://lore.kernel.org/kvm/a107b067-861d-43f4-86b5-29271cb93dad@intel.com/T/#mbe65f0903ff7835bc418a907f0d02d7a9e0b78be [2]
> Link: https://lore.kernel.org/kvm/a107b067-861d-43f4-86b5-29271cb93dad@intel.com/T/#m80cde5e6504b3af74d933ea0cbfc3ca9d24697d3 [3]
Probably just one link would suffice, say the permalink to Dan's
comment:
https://lore.kernel.org/kvm/66b16121c48f4_4fc729424@dwillia2-xfh.jf.intel.com.notmuch/
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>
> v2 -> v3:
> - Remove 'struct field_mapping' and reimplement TD_SYSINFO_MAP().
>
> ---
> arch/x86/virt/vmx/tdx/tdx.c | 57 ++++++++++++++-----------------------
> 1 file changed, 21 insertions(+), 36 deletions(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index e979bf442929..7e75c1b10838 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -270,60 +270,45 @@ static int read_sys_metadata_field(u64 field_id, u64 *data)
> return 0;
> }
>
> -static int read_sys_metadata_field16(u64 field_id,
> - int offset,
> - struct tdx_sys_info_tdmr *ts)
> +static int read_sys_metadata_field16(u64 field_id, u16 *val)
> {
> - u16 *ts_member = ((void *)ts) + offset;
> u64 tmp;
> int ret;
>
> - if (WARN_ON_ONCE(MD_FIELD_ID_ELE_SIZE_CODE(field_id) !=
> - MD_FIELD_ID_ELE_SIZE_16BIT))
> - return -EINVAL;
> + BUILD_BUG_ON(MD_FIELD_ID_ELE_SIZE_CODE(field_id) !=
> + MD_FIELD_ID_ELE_SIZE_16BIT);
This gets removed next patch, so why do it
>
> ret = read_sys_metadata_field(field_id, &tmp);
> if (ret)
> return ret;
>
> - *ts_member = tmp;
> + *val = tmp;
>
> return 0;
> }
>
> -struct field_mapping {
> - u64 field_id;
> - int offset;
> -};
> -
> -#define TD_SYSINFO_MAP(_field_id, _offset) \
> - { .field_id = MD_FIELD_ID_##_field_id, \
> - .offset = offsetof(struct tdx_sys_info_tdmr, _offset) }
> -
> -/* Map TD_SYSINFO fields into 'struct tdx_sys_info_tdmr': */
> -static const struct field_mapping fields[] = {
> - TD_SYSINFO_MAP(MAX_TDMRS, max_tdmrs),
> - TD_SYSINFO_MAP(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr),
> - TD_SYSINFO_MAP(PAMT_4K_ENTRY_SIZE, pamt_entry_size[TDX_PS_4K]),
> - TD_SYSINFO_MAP(PAMT_2M_ENTRY_SIZE, pamt_entry_size[TDX_PS_2M]),
> - TD_SYSINFO_MAP(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]),
> -};
> +/*
> + * Assumes locally defined @ret and @sysinfo_tdmr to convey the error
> + * code and the 'struct tdx_sys_info_tdmr' instance to fill out.
> + */
> +#define TD_SYSINFO_MAP(_field_id, _member) \
"MAP" made sense when it was in a struct whereas
now it is reading.
> + ({ \
> + if (!ret) \
> + ret = read_sys_metadata_field16(MD_FIELD_ID_##_field_id, \
> + &sysinfo_tdmr->_member); \
> + })
>
> static int get_tdx_sys_info_tdmr(struct tdx_sys_info_tdmr *sysinfo_tdmr)
> {
> - int ret;
> - int i;
> + int ret = 0;
>
> - /* Populate 'sysinfo_tdmr' fields using the mapping structure above: */
> - for (i = 0; i < ARRAY_SIZE(fields); i++) {
> - ret = read_sys_metadata_field16(fields[i].field_id,
> - fields[i].offset,
> - sysinfo_tdmr);
> - if (ret)
> - return ret;
> - }
> + TD_SYSINFO_MAP(MAX_TDMRS, max_tdmrs);
> + TD_SYSINFO_MAP(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr);
> + TD_SYSINFO_MAP(PAMT_4K_ENTRY_SIZE, pamt_entry_size[TDX_PS_4K]);
> + TD_SYSINFO_MAP(PAMT_2M_ENTRY_SIZE, pamt_entry_size[TDX_PS_2M]);
> + TD_SYSINFO_MAP(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]);
Another possibility is to put the macro at the invocation site:
#define READ_SYS_INFO(_field_id, _member) \
ret = ret ?: read_sys_metadata_field16(MD_FIELD_ID_##_field_id, \
&sysinfo_tdmr->_member)
READ_SYS_INFO(MAX_TDMRS, max_tdmrs);
READ_SYS_INFO(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr);
READ_SYS_INFO(PAMT_4K_ENTRY_SIZE, pamt_entry_size[TDX_PS_4K]);
READ_SYS_INFO(PAMT_2M_ENTRY_SIZE, pamt_entry_size[TDX_PS_2M]);
READ_SYS_INFO(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]);
#undef READ_SYS_INFO
And so on in later patches:
#define READ_SYS_INFO(_field_id, _member) \
ret = ret ?: read_sys_metadata_field(MD_FIELD_ID_##_field_id, \
&sysinfo_version->_member)
READ_SYS_INFO(MAJOR_VERSION, major);
READ_SYS_INFO(MINOR_VERSION, minor);
READ_SYS_INFO(UPDATE_VERSION, update);
READ_SYS_INFO(INTERNAL_VERSION, internal);
READ_SYS_INFO(BUILD_NUM, build_num);
READ_SYS_INFO(BUILD_DATE, build_date);
#undef READ_SYS_INFO
>
> - return 0;
> + return ret;
> }
>
> /* Calculate the actual TDMR size */
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 3/8] x86/virt/tdx: Prepare to support reading other global metadata fields
2024-08-27 7:14 ` [PATCH v3 3/8] x86/virt/tdx: Prepare to support reading other global metadata fields Kai Huang
@ 2024-08-30 6:43 ` Adrian Hunter
2024-08-30 11:02 ` Huang, Kai
2024-08-30 9:05 ` Nikolay Borisov
2024-09-06 21:34 ` Dan Williams
2 siblings, 1 reply; 42+ messages in thread
From: Adrian Hunter @ 2024-08-30 6:43 UTC (permalink / raw)
To: Kai Huang, dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo,
hpa, dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu
On 27/08/24 10:14, Kai Huang wrote:
> The TDX module provides a set of "global metadata fields". They report
Global Metadata Fields
> things like TDX module version, supported features, and fields related
> to create/run TDX guests and so on.
>
> For now the kernel only reads "TD Memory Region" (TDMR) related fields
> for module initialization. There are both immediate needs to read more
> fields for module initialization and near-future needs for other kernel
> components like KVM to run TDX guests.
> will be organized in different structures depending on their meanings.
Line above seems lost
>
> For now the kernel only reads TDMR related fields. The TD_SYSINFO_MAP()
> macro hard-codes the 'struct tdx_sys_info_tdmr' instance name. To make
> it work with different instances of different structures, extend it to
> take the structure instance name as an argument.
>
> This also means the current code which uses TD_SYSINFO_MAP() must type
> 'struct tdx_sys_info_tdmr' instance name explicitly for each use. To
> make the code easier to read, add a wrapper TD_SYSINFO_MAP_TDMR_INFO()
> which hides the instance name.
Note, were you to accept my suggestion for patch 2, TD_SYSINFO_MAP() would
have gone away, and no changes would be needed to get_tdx_sys_info_tdmr().
So the above 2 paragraphs could be dropped.
>
> TDX also support 8/16/32/64 bits metadata field element sizes. For now
> all TDMR related fields are 16-bit long thus the kernel only has one
> helper:
>
> static int read_sys_metadata_field16(u64 field_id, u16 *val) {}
>
> Future changes will need to read more metadata fields with different
> element sizes. To make the code short, extend the helper to take a
> 'void *' buffer and the buffer size so it can work with all element
> sizes.
>
> Note in this way the helper loses the type safety and the build-time
> check inside the helper cannot work anymore because the compiler cannot
> determine the exact value of the buffer size.
>
> To resolve those, add a wrapper of the helper which only works with
> u8/u16/u32/u64 directly and do build-time check, where the compiler
> can easily know both the element size (from field ID) and the buffer
> size(using sizeof()), before calling the helper.
>
> An alternative option is to provide one helper for each element size:
IMHO, it is not necessary to describe alternative options. Better to
explain why the actual code change is good, rather than why something
that wasn't done, wasn't so good. That discussion can stay on the
mailing list. For example, this commit message can just have:
Add a build-time check that the element size is correct, which
enables a common function to safely read data of different sizes.
Perhaps end up with:
TDX supports 8/16/32/64 bit metadata field element sizes. For now all
TDMR related fields are 16-bits long. Future changes will need to read
more metadata fields with different element sizes.
So add a build-time check that the element size is correct, which
enables a common function to safely read data of different sizes.
>
> static int read_sys_metadata_field8(u64 field_id, u8 *val) {}
> static int read_sys_metadata_field16(u64 field_id, u16 *val) {}
> ...
>
> But this will result in duplicated code given those helpers will look
> exactly the same except for the type of buffer pointer. It's feasible
> to define a macro for the body of the helper and define one entry for
> each element size to reduce the code, but it is a little bit
> over-engineering.
>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>
> v2 -> v3:
> - Rename read_sys_metadata_field() to tdh_sys_rd() so the former can be
> used as the high level wrapper. Get rid of "stbuf_" prefix since
> people don't like it.
>
> - Rewrite after removing 'struct field_mapping' and reimplementing
> TD_SYSINFO_MAP().
>
> ---
> arch/x86/virt/vmx/tdx/tdx.c | 45 +++++++++++++++++++++----------------
> arch/x86/virt/vmx/tdx/tdx.h | 3 ++-
> 2 files changed, 28 insertions(+), 20 deletions(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 7e75c1b10838..1cd9035c783f 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -250,7 +250,7 @@ static int build_tdx_memlist(struct list_head *tmb_list)
> return ret;
> }
>
> -static int read_sys_metadata_field(u64 field_id, u64 *data)
> +static int tdh_sys_rd(u64 field_id, u64 *data)
> {
> struct tdx_module_args args = {};
> int ret;
> @@ -270,43 +270,50 @@ static int read_sys_metadata_field(u64 field_id, u64 *data)
> return 0;
> }
>
> -static int read_sys_metadata_field16(u64 field_id, u16 *val)
> +static int __read_sys_metadata_field(u64 field_id, void *val, int size)
> {
> u64 tmp;
> int ret;
>
> - BUILD_BUG_ON(MD_FIELD_ID_ELE_SIZE_CODE(field_id) !=
> - MD_FIELD_ID_ELE_SIZE_16BIT);
> -
> - ret = read_sys_metadata_field(field_id, &tmp);
> + ret = tdh_sys_rd(field_id, &tmp);
> if (ret)
> return ret;
>
> - *val = tmp;
> + memcpy(val, &tmp, size);
>
> return 0;
> }
>
> +/* Wrapper to read one global metadata to u8/u16/u32/u64 */
> +#define read_sys_metadata_field(_field_id, _val) \
> + ({ \
> + BUILD_BUG_ON(MD_FIELD_ELE_SIZE(_field_id) != sizeof(typeof(*(_val)))); \
> + __read_sys_metadata_field(_field_id, _val, sizeof(typeof(*(_val)))); \
> + })
> +
> /*
> - * Assumes locally defined @ret and @sysinfo_tdmr to convey the error
> - * code and the 'struct tdx_sys_info_tdmr' instance to fill out.
> + * Read one global metadata field to a member of a structure instance,
> + * assuming locally defined @ret to convey the error code.
> */
> -#define TD_SYSINFO_MAP(_field_id, _member) \
> - ({ \
> - if (!ret) \
> - ret = read_sys_metadata_field16(MD_FIELD_ID_##_field_id, \
> - &sysinfo_tdmr->_member); \
> +#define TD_SYSINFO_MAP(_field_id, _stbuf, _member) \
> + ({ \
> + if (!ret) \
> + ret = read_sys_metadata_field(MD_FIELD_ID_##_field_id, \
> + &_stbuf->_member); \
> })
>
> static int get_tdx_sys_info_tdmr(struct tdx_sys_info_tdmr *sysinfo_tdmr)
> {
> int ret = 0;
>
> - TD_SYSINFO_MAP(MAX_TDMRS, max_tdmrs);
> - TD_SYSINFO_MAP(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr);
> - TD_SYSINFO_MAP(PAMT_4K_ENTRY_SIZE, pamt_entry_size[TDX_PS_4K]);
> - TD_SYSINFO_MAP(PAMT_2M_ENTRY_SIZE, pamt_entry_size[TDX_PS_2M]);
> - TD_SYSINFO_MAP(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]);
> +#define TD_SYSINFO_MAP_TDMR_INFO(_field_id, _member) \
> + TD_SYSINFO_MAP(_field_id, sysinfo_tdmr, _member)
> +
> + TD_SYSINFO_MAP_TDMR_INFO(MAX_TDMRS, max_tdmrs);
> + TD_SYSINFO_MAP_TDMR_INFO(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr);
> + TD_SYSINFO_MAP_TDMR_INFO(PAMT_4K_ENTRY_SIZE, pamt_entry_size[TDX_PS_4K]);
> + TD_SYSINFO_MAP_TDMR_INFO(PAMT_2M_ENTRY_SIZE, pamt_entry_size[TDX_PS_2M]);
> + TD_SYSINFO_MAP_TDMR_INFO(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]);
>
> return ret;
> }
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index 148f9b4d1140..7458f6717873 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -53,7 +53,8 @@
> #define MD_FIELD_ID_ELE_SIZE_CODE(_field_id) \
> (((_field_id) & GENMASK_ULL(33, 32)) >> 32)
>
> -#define MD_FIELD_ID_ELE_SIZE_16BIT 1
> +#define MD_FIELD_ELE_SIZE(_field_id) \
> + (1 << MD_FIELD_ID_ELE_SIZE_CODE(_field_id))
>
> struct tdmr_reserved_area {
> u64 offset;
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 4/8] x86/virt/tdx: Refine a comment to reflect the latest TDX spec
2024-08-27 7:14 ` [PATCH v3 4/8] x86/virt/tdx: Refine a comment to reflect the latest TDX spec Kai Huang
@ 2024-08-30 6:45 ` Adrian Hunter
2024-08-30 9:14 ` Nikolay Borisov
1 sibling, 0 replies; 42+ messages in thread
From: Adrian Hunter @ 2024-08-30 6:45 UTC (permalink / raw)
To: Kai Huang, dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo,
hpa, dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu
On 27/08/24 10:14, Kai Huang wrote:
> The old versions of "Intel TDX Module v1.5 ABI Specification" contain
> the definitions of all global metadata field IDs directly in a table.
>
> However, the latest spec moves those definitions to a dedicated
> 'global_metadata.json' file as part of a new (separate) "Intel TDX
> Module v1.5 ABI definitions" [1].
>
> Update the comment to reflect this.
>
> [1]: https://cdrdv2.intel.com/v1/dl/getContent/795381
>
> Reported-by: Nikolay Borisov <nik.borisov@suse.com>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
> arch/x86/virt/vmx/tdx/tdx.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index 7458f6717873..8aabd03d8bf5 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -29,7 +29,7 @@
> /*
> * Global scope metadata field ID.
> *
> - * See Table "Global Scope Metadata", TDX module 1.5 ABI spec.
> + * See the "global_metadata.json" in the "TDX 1.5 ABI definitions".
> */
> #define MD_FIELD_ID_MAX_TDMRS 0x9100000100000008ULL
> #define MD_FIELD_ID_MAX_RESERVED_PER_TDMR 0x9100000100000009ULL
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 5/8] x86/virt/tdx: Start to track all global metadata in one structure
2024-08-27 7:14 ` [PATCH v3 5/8] x86/virt/tdx: Start to track all global metadata in one structure Kai Huang
@ 2024-08-30 7:02 ` Adrian Hunter
2024-08-30 11:10 ` Huang, Kai
2024-08-30 11:01 ` Nikolay Borisov
1 sibling, 1 reply; 42+ messages in thread
From: Adrian Hunter @ 2024-08-30 7:02 UTC (permalink / raw)
To: Kai Huang, dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo,
hpa, dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu
On 27/08/24 10:14, Kai Huang wrote:
> The TDX module provides a set of "global metadata fields". They report
Global Metadata Fields
> things like TDX module version, supported features, and fields related
> to create/run TDX guests and so on.
>
> Currently the kernel only reads "TD Memory Region" (TDMR) related fields
> for module initialization. There are immediate needs which require the
> TDX module initialization to read more global metadata including module
> version, supported features and "Convertible Memory Regions" (CMRs).
>
> Also, KVM will need to read more metadata fields to support baseline TDX
> guests. In the longer term, other TDX features like TDX Connect (which
> supports assigning trusted IO devices to TDX guest) may also require
> other kernel components such as pci/vt-d to access global metadata.
>
> To meet all those requirements, the idea is the TDX host core-kernel to
> to provide a centralized, canonical, and read-only structure for the
> global metadata that comes out from the TDX module for all kernel
> components to use.
>
> As the first step, introduce a new 'struct tdx_sys_info' to track all
> global metadata fields.
>
> TDX categories global metadata fields into different "Class"es. E.g.,
"Classes"
> the TDMR related fields are under class "TDMR Info". Instead of making
> 'struct tdx_sys_info' a plain structure to contain all metadata fields,
> organize them in smaller structures based on the "Class".
>
> This allows those metadata fields to be used in finer granularity thus
> makes the code more clear. E.g., the construct_tdmr() can just take the
> structure which contains "TDMR Info" metadata fields.
>
> Add a new function get_tdx_sys_info() as the placeholder to read all
> metadata fields, and call it at the beginning of init_tdx_module(). For
> now it only calls get_tdx_sys_info_tdmr() to read TDMR related fields.
>
> Note there is a functional change: get_tdx_sys_info_tdmr() is moved from
> after build_tdx_memlist() to before it, but it is fine to do so.
>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
Notwithstanding minor cosmetic tweaks:
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>
> v2 -> v3:
> - Split out the part to rename 'struct tdx_tdmr_sysinfo' to 'struct
> tdx_sys_info_tdmr'.
>
>
> ---
> arch/x86/virt/vmx/tdx/tdx.c | 19 ++++++++++++-------
> arch/x86/virt/vmx/tdx/tdx.h | 36 +++++++++++++++++++++++++++++-------
> 2 files changed, 41 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 1cd9035c783f..24eb289c80e8 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -318,6 +318,11 @@ static int get_tdx_sys_info_tdmr(struct tdx_sys_info_tdmr *sysinfo_tdmr)
> return ret;
> }
>
> +static int get_tdx_sys_info(struct tdx_sys_info *sysinfo)
> +{
> + return get_tdx_sys_info_tdmr(&sysinfo->tdmr);
> +}
> +
> /* Calculate the actual TDMR size */
> static int tdmr_size_single(u16 max_reserved_per_tdmr)
> {
> @@ -1090,9 +1095,13 @@ static int init_tdmrs(struct tdmr_info_list *tdmr_list)
>
> static int init_tdx_module(void)
> {
> - struct tdx_sys_info_tdmr sysinfo_tdmr;
> + struct tdx_sys_info sysinfo;
> int ret;
>
> + ret = get_tdx_sys_info(&sysinfo);
> + if (ret)
> + return ret;
> +
> /*
> * To keep things simple, assume that all TDX-protected memory
> * will come from the page allocator. Make sure all pages in the
> @@ -1109,17 +1118,13 @@ static int init_tdx_module(void)
> if (ret)
> goto out_put_tdxmem;
>
> - ret = get_tdx_sys_info_tdmr(&sysinfo_tdmr);
> - if (ret)
> - goto err_free_tdxmem;
> -
> /* Allocate enough space for constructing TDMRs */
> - ret = alloc_tdmr_list(&tdx_tdmr_list, &sysinfo_tdmr);
> + ret = alloc_tdmr_list(&tdx_tdmr_list, &sysinfo.tdmr);
> if (ret)
> goto err_free_tdxmem;
>
> /* Cover all TDX-usable memory regions in TDMRs */
> - ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &sysinfo_tdmr);
> + ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &sysinfo.tdmr);
> if (ret)
> goto err_free_tdmrs;
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index 8aabd03d8bf5..4cddbb035b9f 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -100,13 +100,6 @@ struct tdx_memblock {
> int nid;
> };
>
> -/* "TDMR info" part of "Global Scope Metadata" for constructing TDMRs */
> -struct tdx_sys_info_tdmr {
> - u16 max_tdmrs;
> - u16 max_reserved_per_tdmr;
> - u16 pamt_entry_size[TDX_PS_NR];
> -};
> -
> /* Warn if kernel has less than TDMR_NR_WARN TDMRs after allocation */
> #define TDMR_NR_WARN 4
>
> @@ -119,4 +112,33 @@ struct tdmr_info_list {
> int max_tdmrs; /* How many 'tdmr_info's are allocated */
> };
>
> +/*
> + * Kernel-defined structures to contain "Global Scope Metadata".
> + *
> + * TDX global metadata fields are categorized by "Class"es. See the
"Classes"
> + * "global_metadata.json" in the "TDX 1.5 ABI Definitions".
> + *
> + * 'struct tdx_sys_info' is the main structure to contain all metadata
> + * used by the kernel. It contains sub-structures with each reflecting
> + * the "Class" in the 'global_metadata.json'.
> + *
> + * Note the structure name may not exactly follow the name of the
> + * "Class" in the TDX spec, but the comment of that structure always
> + * reflect that.
> + *
> + * Also note not all metadata fields in each class are defined, only
> + * those used by the kernel are.
> + */
> +
> +/* Class "TDMR info" */
> +struct tdx_sys_info_tdmr {
> + u16 max_tdmrs;
> + u16 max_reserved_per_tdmr;
> + u16 pamt_entry_size[TDX_PS_NR];
> +};
> +
> +struct tdx_sys_info {
> + struct tdx_sys_info_tdmr tdmr;
> +};
> +
> #endif
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 3/8] x86/virt/tdx: Prepare to support reading other global metadata fields
2024-08-27 7:14 ` [PATCH v3 3/8] x86/virt/tdx: Prepare to support reading other global metadata fields Kai Huang
2024-08-30 6:43 ` Adrian Hunter
@ 2024-08-30 9:05 ` Nikolay Borisov
2024-08-30 11:09 ` Huang, Kai
2024-09-06 21:34 ` Dan Williams
2 siblings, 1 reply; 42+ messages in thread
From: Nikolay Borisov @ 2024-08-30 9:05 UTC (permalink / raw)
To: Kai Huang, dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo,
hpa, dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu, adrian.hunter
On 27.08.24 г. 10:14 ч., Kai Huang wrote:
> The TDX module provides a set of "global metadata fields". They report
> things like TDX module version, supported features, and fields related
> to create/run TDX guests and so on.
>
> For now the kernel only reads "TD Memory Region" (TDMR) related fields
> for module initialization. There are both immediate needs to read more
> fields for module initialization and near-future needs for other kernel
> components like KVM to run TDX guests.
> will be organized in different structures depending on their meanings.
>
> For now the kernel only reads TDMR related fields. The TD_SYSINFO_MAP()
> macro hard-codes the 'struct tdx_sys_info_tdmr' instance name. To make
> it work with different instances of different structures, extend it to
> take the structure instance name as an argument.
>
> This also means the current code which uses TD_SYSINFO_MAP() must type
> 'struct tdx_sys_info_tdmr' instance name explicitly for each use. To
> make the code easier to read, add a wrapper TD_SYSINFO_MAP_TDMR_INFO()
> which hides the instance name.
>
> TDX also support 8/16/32/64 bits metadata field element sizes. For now
> all TDMR related fields are 16-bit long thus the kernel only has one
> helper:
>
> static int read_sys_metadata_field16(u64 field_id, u16 *val) {}
>
> Future changes will need to read more metadata fields with different
> element sizes. To make the code short, extend the helper to take a
> 'void *' buffer and the buffer size so it can work with all element
> sizes.
>
> Note in this way the helper loses the type safety and the build-time
> check inside the helper cannot work anymore because the compiler cannot
> determine the exact value of the buffer size.
>
> To resolve those, add a wrapper of the helper which only works with
> u8/u16/u32/u64 directly and do build-time check, where the compiler
> can easily know both the element size (from field ID) and the buffer
> size(using sizeof()), before calling the helper.
>
> An alternative option is to provide one helper for each element size:
>
> static int read_sys_metadata_field8(u64 field_id, u8 *val) {}
> static int read_sys_metadata_field16(u64 field_id, u16 *val) {}
> ...
>
> But this will result in duplicated code given those helpers will look
> exactly the same except for the type of buffer pointer. It's feasible
> to define a macro for the body of the helper and define one entry for
> each element size to reduce the code, but it is a little bit
> over-engineering.
>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>
> v2 -> v3:
> - Rename read_sys_metadata_field() to tdh_sys_rd() so the former can be
> used as the high level wrapper. Get rid of "stbuf_" prefix since
> people don't like it.
>
> - Rewrite after removing 'struct field_mapping' and reimplementing
> TD_SYSINFO_MAP().
>
> ---
> arch/x86/virt/vmx/tdx/tdx.c | 45 +++++++++++++++++++++----------------
> arch/x86/virt/vmx/tdx/tdx.h | 3 ++-
> 2 files changed, 28 insertions(+), 20 deletions(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 7e75c1b10838..1cd9035c783f 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -250,7 +250,7 @@ static int build_tdx_memlist(struct list_head *tmb_list)
> return ret;
> }
>
> -static int read_sys_metadata_field(u64 field_id, u64 *data)
> +static int tdh_sys_rd(u64 field_id, u64 *data)
> {
> struct tdx_module_args args = {};
> int ret;
> @@ -270,43 +270,50 @@ static int read_sys_metadata_field(u64 field_id, u64 *data)
> return 0;
> }
>
> -static int read_sys_metadata_field16(u64 field_id, u16 *val)
> +static int __read_sys_metadata_field(u64 field_id, void *val, int size)
The type of 'size' should be size_t.
> {
> u64 tmp;
> int ret;
>
> - BUILD_BUG_ON(MD_FIELD_ID_ELE_SIZE_CODE(field_id) !=
> - MD_FIELD_ID_ELE_SIZE_16BIT);
> -
> - ret = read_sys_metadata_field(field_id, &tmp);
> + ret = tdh_sys_rd(field_id, &tmp);
> if (ret)
> return ret;
>
> - *val = tmp;
> + memcpy(val, &tmp, size);
>
> return 0;
> }
>
> +/* Wrapper to read one global metadata to u8/u16/u32/u64 */
> +#define read_sys_metadata_field(_field_id, _val) \
> + ({ \
> + BUILD_BUG_ON(MD_FIELD_ELE_SIZE(_field_id) != sizeof(typeof(*(_val)))); \
> + __read_sys_metadata_field(_field_id, _val, sizeof(typeof(*(_val)))); \
> + })
> +
> /*
> - * Assumes locally defined @ret and @sysinfo_tdmr to convey the error
> - * code and the 'struct tdx_sys_info_tdmr' instance to fill out.
> + * Read one global metadata field to a member of a structure instance,
> + * assuming locally defined @ret to convey the error code.
> */
> -#define TD_SYSINFO_MAP(_field_id, _member) \
> - ({ \
> - if (!ret) \
> - ret = read_sys_metadata_field16(MD_FIELD_ID_##_field_id, \
> - &sysinfo_tdmr->_member); \
> +#define TD_SYSINFO_MAP(_field_id, _stbuf, _member) \
> + ({ \
> + if (!ret) \
> + ret = read_sys_metadata_field(MD_FIELD_ID_##_field_id, \
> + &_stbuf->_member); \
> })
>
> static int get_tdx_sys_info_tdmr(struct tdx_sys_info_tdmr *sysinfo_tdmr)
> {
> int ret = 0;
>
> - TD_SYSINFO_MAP(MAX_TDMRS, max_tdmrs);
> - TD_SYSINFO_MAP(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr);
> - TD_SYSINFO_MAP(PAMT_4K_ENTRY_SIZE, pamt_entry_size[TDX_PS_4K]);
> - TD_SYSINFO_MAP(PAMT_2M_ENTRY_SIZE, pamt_entry_size[TDX_PS_2M]);
> - TD_SYSINFO_MAP(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]);
> +#define TD_SYSINFO_MAP_TDMR_INFO(_field_id, _member) \
> + TD_SYSINFO_MAP(_field_id, sysinfo_tdmr, _member)
nit: I guess its a personal preference but honestly I think the amount
of macro indirection (3 levels) here is crazy, despite each being rather
simple. Just use TD_SYSINFO_MAP directly, saving the typing of
"sysinfo_tdmr" doesn't seem like a big deal.
You can probably take it even a bit further and simply opencode
read_sys_metadata_field macro inside TD_SYSINFO_MAP and be left with
just it, no ? No other patch in this series uses read_sys_metadata_field
stand alone, if anything factoring it out could be deferred until the
first users gets introduced.
> +
> + TD_SYSINFO_MAP_TDMR_INFO(MAX_TDMRS, max_tdmrs);
> + TD_SYSINFO_MAP_TDMR_INFO(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr);
> + TD_SYSINFO_MAP_TDMR_INFO(PAMT_4K_ENTRY_SIZE, pamt_entry_size[TDX_PS_4K]);
> + TD_SYSINFO_MAP_TDMR_INFO(PAMT_2M_ENTRY_SIZE, pamt_entry_size[TDX_PS_2M]);
> + TD_SYSINFO_MAP_TDMR_INFO(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]);
>
> return ret;
> }
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index 148f9b4d1140..7458f6717873 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -53,7 +53,8 @@
> #define MD_FIELD_ID_ELE_SIZE_CODE(_field_id) \
> (((_field_id) & GENMASK_ULL(33, 32)) >> 32)
>
> -#define MD_FIELD_ID_ELE_SIZE_16BIT 1
> +#define MD_FIELD_ELE_SIZE(_field_id) \
That ELE seems a bit ambiguous, ELEM seems more natural and is in line
with other macros in the kernel.
> + (1 << MD_FIELD_ID_ELE_SIZE_CODE(_field_id))
>
> struct tdmr_reserved_area {
> u64 offset;
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 4/8] x86/virt/tdx: Refine a comment to reflect the latest TDX spec
2024-08-27 7:14 ` [PATCH v3 4/8] x86/virt/tdx: Refine a comment to reflect the latest TDX spec Kai Huang
2024-08-30 6:45 ` Adrian Hunter
@ 2024-08-30 9:14 ` Nikolay Borisov
1 sibling, 0 replies; 42+ messages in thread
From: Nikolay Borisov @ 2024-08-30 9:14 UTC (permalink / raw)
To: Kai Huang, dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo,
hpa, dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu, adrian.hunter
On 27.08.24 г. 10:14 ч., Kai Huang wrote:
> The old versions of "Intel TDX Module v1.5 ABI Specification" contain
> the definitions of all global metadata field IDs directly in a table.
>
> However, the latest spec moves those definitions to a dedicated
> 'global_metadata.json' file as part of a new (separate) "Intel TDX
> Module v1.5 ABI definitions" [1].
>
> Update the comment to reflect this.
>
> [1]: https://cdrdv2.intel.com/v1/dl/getContent/795381
>
> Reported-by: Nikolay Borisov <nik.borisov@suse.com>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 7/8] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes
2024-08-27 7:14 ` [PATCH v3 7/8] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes Kai Huang
@ 2024-08-30 10:50 ` Adrian Hunter
2024-08-30 11:52 ` Huang, Kai
2024-08-30 12:27 ` Nikolay Borisov
2024-09-06 23:31 ` Dan Williams
2 siblings, 1 reply; 42+ messages in thread
From: Adrian Hunter @ 2024-08-30 10:50 UTC (permalink / raw)
To: Kai Huang, dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo,
hpa, dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu
On 27/08/24 10:14, Kai Huang wrote:
> A TDX module initialization failure was reported on a Emerald Rapids
> platform:
>
> virt/tdx: initialization failed: TDMR [0x0, 0x80000000): reserved areas exhausted.
> virt/tdx: module initialization failed (-28)
>
> As part of initializing the TDX module, the kernel informs the TDX
> module of all "TDX-usable memory regions" using an array of TDX defined
> structure "TD Memory Region" (TDMR). Each TDMR must be in 1GB aligned
> and in 1GB granularity, and all "non-TDX-usable memory holes" within a
> given TDMR must be marked as "reserved areas". The TDX module reports a
> maximum number of reserved areas that can be supported per TDMR.
The statement:
... all "non-TDX-usable memory holes" within a
given TDMR must be marked as "reserved areas".
is not exactly true, which is essentially the basis of this fix.
The relevant requirements are (from the spec):
Any non-reserved 4KB page within a TDMR must be convertible
i.e., it must be within a CMR
Reserved areas within a TDMR need not be within a CMR.
PAMT areas must not overlap with TDMR non-reserved areas;
however, they may reside within TDMR reserved areas
(as long as these are convertible).
>
> Currently, the kernel finds those "non-TDX-usable memory holes" within a
> given TDMR by walking over a list of "TDX-usable memory regions", which
> essentially reflects the "usable" regions in the e820 table (w/o memory
> hotplug operations precisely, but this is not relevant here).
But including e820 table regions that are not "usable" in the TDMR
reserved areas is not necessary - it is not one of the rules.
What confused me initially was that I did not realize the we already
require that the TDX Module does not touch memory in the TDMR
non-reserved areas not specifically allocated to it. So it makes no
difference to the TDX Module what the pages that have not been allocated
to it, are used for.
>
> As shown above, the root cause of this failure is when the kernel tries
> to construct a TDMR to cover address range [0x0, 0x80000000), there
> are too many memory holes within that range and the number of memory
> holes exceeds the maximum number of reserved areas.
>
> The E820 table of that platform (see [1] below) reflects this: the
> number of memory holes among e820 "usable" entries exceeds 16, which is
> the maximum number of reserved areas TDX module supports in practice.
>
> === Fix ===
>
> There are two options to fix this: 1) reduce the number of memory holes
> when constructing a TDMR to save "reserved areas"; 2) reduce the TDMR's
> size to cover fewer memory regions, thus fewer memory holes.
Probably better to try and get rid of this "two options" stuff and focus
on how this is a simple and effective fix.
>
> Option 1) is possible, and in fact is easier and preferable:
>
> TDX actually has a concept of "Convertible Memory Regions" (CMRs). TDX
> reports a list of CMRs that meet TDX's security requirements on memory.
> TDX requires all the "TDX-usable memory regions" that the kernel passes
> to the module via TDMRs, a.k.a, all the "non-reserved regions in TDMRs",
> must be convertible memory.
>
> In other words, if a memory hole is indeed CMR, then it's not mandatory
> for the kernel to add it to the reserved areas. By doing so, the number
> of consumed reserved areas can be reduced w/o having any functional
> impact. The kernel still allocates TDX memory from the page allocator.
> There's no harm if the kernel tells the TDX module some memory regions
> are "TDX-usable" but they will never be allocated by the kernel as TDX
> memory.
>
> Note this doesn't have any security impact either because the kernel is
> out of TDX's TCB anyway.
>
> This is feasible because in practice the CMRs just reflect the nature of
> whether the RAM can indeed be used by TDX, thus each CMR tends to be a
> large, uninterrupted range of memory, i.e., unlike the e820 table which
> contains numerous "ACPI *" entries in the first 2G range. Refer to [2]
> for CMRs reported on the problematic platform using off-tree TDX code.
>
> So for this particular module initialization failure, the memory holes
> that are within [0x0, 0x80000000) are mostly indeed CMR. By not adding
> them to the reserved areas, the number of consumed reserved areas for
> the TDMR [0x0, 0x80000000) can be dramatically reduced.
>
> Option 2) is also theoretically feasible, but it is not desired:
>
> It requires more complicated logic to handle splitting TDMR into smaller
> ones, which isn't trivial. There are limitations to splitting TDMR too,
> thus it may not always work: 1) The smallest TDMR is 1GB, and it cannot
> be split any further; 2) This also increases the total number of TDMRs,
> which also has a maximum value limited by the TDX module.
>
> So, fix this issue by using option 1):
>
> 1) reading out the CMRs from the TDX module global metadata, and
> 2) changing to find memory holes for a given TDMR based on CMRs, but not
> based on the list of "TDX-usable memory regions".
>
> Also dump the CMRs in dmesg. They are helpful when something goes wrong
> around "constructing the TDMRs and configuring the TDX module with
> them". Note there are no existing userspace tools that the user can get
> CMRs since they can only be read via SEAMCALL (no CPUID, MSR etc).
>
> [1] BIOS-E820 table of the problematic platform:
>
> BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable
> BIOS-e820: [mem 0x000000000009f000-0x00000000000fffff] reserved
> BIOS-e820: [mem 0x0000000000100000-0x000000005d168fff] usable
> BIOS-e820: [mem 0x000000005d169000-0x000000005d22afff] ACPI data
> BIOS-e820: [mem 0x000000005d22b000-0x000000005d3cefff] usable
> BIOS-e820: [mem 0x000000005d3cf000-0x000000005d469fff] reserved
> BIOS-e820: [mem 0x000000005d46a000-0x000000005e5b2fff] usable
> BIOS-e820: [mem 0x000000005e5b3000-0x000000005e5c2fff] reserved
> BIOS-e820: [mem 0x000000005e5c3000-0x000000005e5d2fff] usable
> BIOS-e820: [mem 0x000000005e5d3000-0x000000005e5e4fff] reserved
> BIOS-e820: [mem 0x000000005e5e5000-0x000000005eb57fff] usable
> BIOS-e820: [mem 0x000000005eb58000-0x0000000061357fff] ACPI NVS
> BIOS-e820: [mem 0x0000000061358000-0x000000006172afff] usable
> BIOS-e820: [mem 0x000000006172b000-0x0000000061794fff] ACPI data
> BIOS-e820: [mem 0x0000000061795000-0x00000000617fefff] usable
> BIOS-e820: [mem 0x00000000617ff000-0x0000000061912fff] ACPI data
> BIOS-e820: [mem 0x0000000061913000-0x0000000061998fff] usable
> BIOS-e820: [mem 0x0000000061999000-0x00000000619dffff] ACPI data
> BIOS-e820: [mem 0x00000000619e0000-0x00000000619e1fff] usable
> BIOS-e820: [mem 0x00000000619e2000-0x00000000619e9fff] reserved
> BIOS-e820: [mem 0x00000000619ea000-0x0000000061a26fff] usable
> BIOS-e820: [mem 0x0000000061a27000-0x0000000061baefff] ACPI data
> BIOS-e820: [mem 0x0000000061baf000-0x00000000623c2fff] usable
> BIOS-e820: [mem 0x00000000623c3000-0x0000000062471fff] reserved
> BIOS-e820: [mem 0x0000000062472000-0x0000000062823fff] usable
> BIOS-e820: [mem 0x0000000062824000-0x0000000063a24fff] reserved
> BIOS-e820: [mem 0x0000000063a25000-0x0000000063d57fff] usable
> BIOS-e820: [mem 0x0000000063d58000-0x0000000064157fff] reserved
> BIOS-e820: [mem 0x0000000064158000-0x0000000064158fff] usable
> BIOS-e820: [mem 0x0000000064159000-0x0000000064194fff] reserved
> BIOS-e820: [mem 0x0000000064195000-0x000000006e9cefff] usable
> BIOS-e820: [mem 0x000000006e9cf000-0x000000006eccefff] reserved
> BIOS-e820: [mem 0x000000006eccf000-0x000000006f6fefff] ACPI NVS
> BIOS-e820: [mem 0x000000006f6ff000-0x000000006f7fefff] ACPI data
> BIOS-e820: [mem 0x000000006f7ff000-0x000000006f7fffff] usable
> BIOS-e820: [mem 0x000000006f800000-0x000000008fffffff] reserved
> ......
>
> [2] Convertible Memory Regions of the problematic platform:
>
> virt/tdx: CMR: [0x100000, 0x6f800000)
> virt/tdx: CMR: [0x100000000, 0x107a000000)
> virt/tdx: CMR: [0x1080000000, 0x207c000000)
> virt/tdx: CMR: [0x2080000000, 0x307c000000)
> virt/tdx: CMR: [0x3080000000, 0x407c000000)
>
> Fixes: dde3b60d572c9 ("x86/virt/tdx: Designate reserved areas for all TDMRs")
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>
> v2 -> v3:
>
> - Add the Fixes tag, although this patch depends on previous patches.
> - CMR_BASE0 -> CMR_BASE(_i), CMR_SIZE0 -> CMR_SIZE(_i) to silence the
> build-check error.
>
> v1 -> v2:
> - Change to walk over CMRs directly to find out memory holes, instead
> of walking over TDX memory blocks and explicitly check whether a hole
> is subregion of CMR. (Chao)
> - Mention any constant macro definitions in global metadata structures
> are TDX architectural. (Binbin)
> - Slightly improve the changelog.
>
> ---
> arch/x86/virt/vmx/tdx/tdx.c | 105 ++++++++++++++++++++++++++++++------
> arch/x86/virt/vmx/tdx/tdx.h | 13 +++++
> 2 files changed, 102 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 0fb673dd43ed..fa335ab1ae92 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -331,6 +331,72 @@ static int get_tdx_sys_info_version(struct tdx_sys_info_version *sysinfo_version
> return ret;
> }
>
> +/* Update the @sysinfo_cmr->num_cmrs to trim tail empty CMRs */
> +static void trim_empty_tail_cmrs(struct tdx_sys_info_cmr *sysinfo_cmr)
> +{
> + int i;
> +
> + for (i = 0; i < sysinfo_cmr->num_cmrs; i++) {
> + u64 cmr_base = sysinfo_cmr->cmr_base[i];
> + u64 cmr_size = sysinfo_cmr->cmr_size[i];
> +
> + if (!cmr_size) {
> + WARN_ON_ONCE(cmr_base);
> + break;
> + }
> +
> + /* TDX architecture: CMR must be 4KB aligned */
> + WARN_ON_ONCE(!PAGE_ALIGNED(cmr_base) ||
> + !PAGE_ALIGNED(cmr_size));
> + }
> +
> + sysinfo_cmr->num_cmrs = i;
> +}
> +
> +static int get_tdx_sys_info_cmr(struct tdx_sys_info_cmr *sysinfo_cmr)
> +{
> + int i, ret = 0;
> +
> +#define TD_SYSINFO_MAP_CMR_INFO(_field_id, _member) \
> + TD_SYSINFO_MAP(_field_id, sysinfo_cmr, _member)
> +
> + TD_SYSINFO_MAP_CMR_INFO(NUM_CMRS, num_cmrs);
> +
> + if (ret)
> + return ret;
> +
> + for (i = 0; i < sysinfo_cmr->num_cmrs; i++) {
> + TD_SYSINFO_MAP_CMR_INFO(CMR_BASE(i), cmr_base[i]);
> + TD_SYSINFO_MAP_CMR_INFO(CMR_SIZE(i), cmr_size[i]);
> + }
> +
> + if (ret)
> + return ret;
> +
> + /*
> + * The TDX module may just report the maximum number of CMRs that
> + * TDX architecturally supports as the actual number of CMRs,
> + * despite the latter is smaller. In this case all the tail
> + * CMRs will be empty. Trim them away.
> + */
> + trim_empty_tail_cmrs(sysinfo_cmr);
> +
> + return 0;
> +}
> +
> +static void print_sys_info_cmr(struct tdx_sys_info_cmr *sysinfo_cmr)
> +{
> + int i;
> +
> + for (i = 0; i < sysinfo_cmr->num_cmrs; i++) {
> + u64 cmr_base = sysinfo_cmr->cmr_base[i];
> + u64 cmr_size = sysinfo_cmr->cmr_size[i];
> +
> + pr_info("CMR[%d]: [0x%llx, 0x%llx)\n", i, cmr_base,
> + cmr_base + cmr_size);
> + }
> +}
> +
> static void print_basic_sys_info(struct tdx_sys_info *sysinfo)
> {
> struct tdx_sys_info_features *features = &sysinfo->features;
> @@ -349,6 +415,8 @@ static void print_basic_sys_info(struct tdx_sys_info *sysinfo)
> version->major, version->minor, version->update,
> version->internal, version->build_num,
> version->build_date, features->tdx_features0);
> +
> + print_sys_info_cmr(&sysinfo->cmr);
> }
>
> static int get_tdx_sys_info_tdmr(struct tdx_sys_info_tdmr *sysinfo_tdmr)
> @@ -379,6 +447,10 @@ static int get_tdx_sys_info(struct tdx_sys_info *sysinfo)
> if (ret)
> return ret;
>
> + ret = get_tdx_sys_info_cmr(&sysinfo->cmr);
> + if (ret)
> + return ret;
> +
> return get_tdx_sys_info_tdmr(&sysinfo->tdmr);
> }
>
> @@ -803,29 +875,28 @@ static int tdmr_add_rsvd_area(struct tdmr_info *tdmr, int *p_idx, u64 addr,
> }
>
> /*
> - * Go through @tmb_list to find holes between memory areas. If any of
> + * Go through all CMRs in @sysinfo_cmr to find memory holes. If any of
> * those holes fall within @tdmr, set up a TDMR reserved area to cover
> * the hole.
> */
> -static int tdmr_populate_rsvd_holes(struct list_head *tmb_list,
> +static int tdmr_populate_rsvd_holes(struct tdx_sys_info_cmr *sysinfo_cmr,
> struct tdmr_info *tdmr,
> int *rsvd_idx,
> u16 max_reserved_per_tdmr)
> {
> - struct tdx_memblock *tmb;
> u64 prev_end;
> - int ret;
> + int i, ret;
>
> /*
> * Start looking for reserved blocks at the
> * beginning of the TDMR.
> */
> prev_end = tdmr->base;
> - list_for_each_entry(tmb, tmb_list, list) {
> + for (i = 0; i < sysinfo_cmr->num_cmrs; i++) {
> u64 start, end;
>
> - start = PFN_PHYS(tmb->start_pfn);
> - end = PFN_PHYS(tmb->end_pfn);
> + start = sysinfo_cmr->cmr_base[i];
> + end = start + sysinfo_cmr->cmr_size[i];
>
> /* Break if this region is after the TDMR */
> if (start >= tdmr_end(tdmr))
> @@ -926,16 +997,16 @@ static int rsvd_area_cmp_func(const void *a, const void *b)
>
> /*
> * Populate reserved areas for the given @tdmr, including memory holes
> - * (via @tmb_list) and PAMTs (via @tdmr_list).
> + * (via @sysinfo_cmr) and PAMTs (via @tdmr_list).
> */
> static int tdmr_populate_rsvd_areas(struct tdmr_info *tdmr,
> - struct list_head *tmb_list,
> + struct tdx_sys_info_cmr *sysinfo_cmr,
> struct tdmr_info_list *tdmr_list,
> u16 max_reserved_per_tdmr)
> {
> int ret, rsvd_idx = 0;
>
> - ret = tdmr_populate_rsvd_holes(tmb_list, tdmr, &rsvd_idx,
> + ret = tdmr_populate_rsvd_holes(sysinfo_cmr, tdmr, &rsvd_idx,
> max_reserved_per_tdmr);
> if (ret)
> return ret;
> @@ -954,10 +1025,10 @@ static int tdmr_populate_rsvd_areas(struct tdmr_info *tdmr,
>
> /*
> * Populate reserved areas for all TDMRs in @tdmr_list, including memory
> - * holes (via @tmb_list) and PAMTs.
> + * holes (via @sysinfo_cmr) and PAMTs.
> */
> static int tdmrs_populate_rsvd_areas_all(struct tdmr_info_list *tdmr_list,
> - struct list_head *tmb_list,
> + struct tdx_sys_info_cmr *sysinfo_cmr,
> u16 max_reserved_per_tdmr)
> {
> int i;
> @@ -966,7 +1037,7 @@ static int tdmrs_populate_rsvd_areas_all(struct tdmr_info_list *tdmr_list,
> int ret;
>
> ret = tdmr_populate_rsvd_areas(tdmr_entry(tdmr_list, i),
> - tmb_list, tdmr_list, max_reserved_per_tdmr);
> + sysinfo_cmr, tdmr_list, max_reserved_per_tdmr);
> if (ret)
> return ret;
> }
> @@ -981,7 +1052,8 @@ static int tdmrs_populate_rsvd_areas_all(struct tdmr_info_list *tdmr_list,
> */
> static int construct_tdmrs(struct list_head *tmb_list,
> struct tdmr_info_list *tdmr_list,
> - struct tdx_sys_info_tdmr *sysinfo_tdmr)
> + struct tdx_sys_info_tdmr *sysinfo_tdmr,
> + struct tdx_sys_info_cmr *sysinfo_cmr)
> {
> int ret;
>
> @@ -994,7 +1066,7 @@ static int construct_tdmrs(struct list_head *tmb_list,
> if (ret)
> return ret;
>
> - ret = tdmrs_populate_rsvd_areas_all(tdmr_list, tmb_list,
> + ret = tdmrs_populate_rsvd_areas_all(tdmr_list, sysinfo_cmr,
> sysinfo_tdmr->max_reserved_per_tdmr);
> if (ret)
> tdmrs_free_pamt_all(tdmr_list);
> @@ -1185,7 +1257,8 @@ static int init_tdx_module(void)
> goto err_free_tdxmem;
>
> /* Cover all TDX-usable memory regions in TDMRs */
> - ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &sysinfo.tdmr);
> + ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &sysinfo.tdmr,
> + &sysinfo.cmr);
> if (ret)
> goto err_free_tdmrs;
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index b422e8517e01..e7bed9e717c7 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -40,6 +40,10 @@
> #define MD_FIELD_ID_UPDATE_VERSION 0x0800000100000005ULL
> #define MD_FIELD_ID_INTERNAL_VERSION 0x0800000100000006ULL
>
> +#define MD_FIELD_ID_NUM_CMRS 0x9000000100000000ULL
> +#define MD_FIELD_ID_CMR_BASE(_i) (0x9000000300000080ULL + (u16)_i)
> +#define MD_FIELD_ID_CMR_SIZE(_i) (0x9000000300000100ULL + (u16)_i)
> +
> #define MD_FIELD_ID_MAX_TDMRS 0x9100000100000008ULL
> #define MD_FIELD_ID_MAX_RESERVED_PER_TDMR 0x9100000100000009ULL
> #define MD_FIELD_ID_PAMT_4K_ENTRY_SIZE 0x9100000100000010ULL
> @@ -160,6 +164,14 @@ struct tdx_sys_info_version {
> u32 build_date;
> };
>
> +/* Class "CMR Info" */
> +#define TDX_MAX_CMRS 32 /* architectural */
> +struct tdx_sys_info_cmr {
> + u16 num_cmrs;
> + u64 cmr_base[TDX_MAX_CMRS];
> + u64 cmr_size[TDX_MAX_CMRS];
> +};
> +
> /* Class "TDMR info" */
> struct tdx_sys_info_tdmr {
> u16 max_tdmrs;
> @@ -170,6 +182,7 @@ struct tdx_sys_info_tdmr {
> struct tdx_sys_info {
> struct tdx_sys_info_features features;
> struct tdx_sys_info_version version;
> + struct tdx_sys_info_cmr cmr;
> struct tdx_sys_info_tdmr tdmr;
> };
>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 2/8] x86/virt/tdx: Remove 'struct field_mapping' and implement TD_SYSINFO_MAP() macro
2024-08-29 7:20 ` Adrian Hunter
@ 2024-08-30 10:52 ` Huang, Kai
2024-09-06 21:30 ` Dan Williams
1 sibling, 0 replies; 42+ messages in thread
From: Huang, Kai @ 2024-08-30 10:52 UTC (permalink / raw)
To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
kirill.shutemov@linux.intel.com, tglx@linutronix.de,
pbonzini@redhat.com, Hunter, Adrian, Williams, Dan J
Cc: Gao, Chao, kvm@vger.kernel.org, binbin.wu@linux.intel.com,
linux-kernel@vger.kernel.org, Edgecombe, Rick P, x86@kernel.org,
Yamahata, Isaku
On Thu, 2024-08-29 at 10:20 +0300, Adrian Hunter wrote:
> On 27/08/24 10:14, Kai Huang wrote:
>
> Subject could be "Simplify the reading of Global Metadata Fields"
OK.
>
> > TL;DR: Remove the 'struct field_mapping' structure and use another way
> > to implement the TD_SYSINFO_MAP() macro to improve the current metadata
> > reading code, e.g., switching to use build-time check for metadata field
> > size over the existing runtime check.
>
> Perhaps:
>
> Remove 'struct field_mapping' and let read_sys_metadata_field16() accept
> the member variable address, simplifying the code in preparation for adding
> support for more metadata structures and field sizes.
Fine to me. I would like also to mention there are improvements in the new
code (as suggested by Dan), so I will say:
"..., simplifying and improving the code in preparation ...".
>
> >
> > The TDX module provides a set of "global metadata fields". They report
>
> Global Metadata Fields
OK.
>
> > things like TDX module version, supported features, and fields related
> > to create/run TDX guests and so on.
> >
> > For now the kernel only reads "TD Memory Region" (TDMR) related global
> > metadata fields, and all of those fields are organized in one structure.
>
> The patch is self-evidently simpler (21 insertions(+), 36 deletions(-))
> so there doesn't seem to be any need for further elaboration. Perhaps
> just round it off and stop there.
>
> ... and all of those fields are organized in one structure,
> but that will change in the near future.
I think the code change here is not only to reduce the LoC (i.e., simplify the
code), but also improve the code, so I would like to keep the things mentioned
by Dan in the changelog.
>
> >
> > The kernel currently uses 'struct field_mapping' to facilitate reading
> > multiple metadata fields into one structure. The 'struct field_mapping'
> > records the mapping between the field ID and the offset of the structure
> > to fill out. The kernel initializes an array of 'struct field_mapping'
> > for each structure member (using the TD_SYSINFO_MAP() macro) and then
> > reads all metadata fields in a loop using that array.
> >
> > Currently the kernel only reads TDMR related metadata fields into one
> > structure, and the function to read one metadata field takes the pointer
> > of that structure and the offset:
> >
> > static int read_sys_metadata_field16(u64 field_id,
> > int offset,
> > struct tdx_sys_info_tdmr *ts)
> > {...}
> >
> > Future changes will need to read more metadata fields into different
> > structures. To support this the above function will need to be changed
> > to take 'void *':
> >
> > static int read_sys_metadata_field16(u64 field_id,
> > int offset,
> > void *stbuf)
> > {...}
> >
> > This approach loses type-safety, as Dan suggested. A better way is to
> > make it be ..
> >
> > static int read_sys_metadata_field16(u64 field_id, u16 *val) {...}
> >
> > .. and let the caller calculate the buffer in a type-safe way [1].
> >
> > Also, the using of the 'struct field_mapping' loses the ability to be
> > able to do build-time check about the metadata field size (encoded in
> > the field ID) due to the field ID is "indirectly" initialized to the
> > 'struct field_mapping' and passed to the function to read. Thus the
> > current code uses runtime check instead.
> >
> > Dan suggested to remove the 'struct field_mapping', unroll the loop,
> > skip the array, and call the read_sys_metadata_field16() directly with
> > build-time check [1][2]. And to improve the readability, reimplement
> > the TD_SYSINFO_MAP() [3].
> >
> > The new TD_SYSINFO_MAP() isn't perfect. It requires the function that
> > uses it to define a local variable @ret to carry the error code and set
> > the initial value to 0. It also hard-codes the variable name of the
> > structure pointer used in the function. But overall the pros of this
> > approach beat the cons.
> >
> > Link: https://lore.kernel.org/kvm/a107b067-861d-43f4-86b5-29271cb93dad@intel.com/T/#m7cfb3c146214d94b24e978eeb8708d92c0b14ac6 [1]
> > Link: https://lore.kernel.org/kvm/a107b067-861d-43f4-86b5-29271cb93dad@intel.com/T/#mbe65f0903ff7835bc418a907f0d02d7a9e0b78be [2]
> > Link: https://lore.kernel.org/kvm/a107b067-861d-43f4-86b5-29271cb93dad@intel.com/T/#m80cde5e6504b3af74d933ea0cbfc3ca9d24697d3 [3]
>
> Probably just one link would suffice, say the permalink to Dan's
> comment:
>
> https://lore.kernel.org/kvm/66b16121c48f4_4fc729424@dwillia2-xfh.jf.intel.com.notmuch/
[1] and [2] are actually replies to different patches, so I would like to keep
them. [1] and [3] are replies to the same patch, so I could remove [3], but I
think keeping all of them is also fine since it's more easy to find the exact
comment that I want to address.
>
> > Suggested-by: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > ---
> >
> > v2 -> v3:
> > - Remove 'struct field_mapping' and reimplement TD_SYSINFO_MAP().
> >
> > ---
> > arch/x86/virt/vmx/tdx/tdx.c | 57 ++++++++++++++-----------------------
> > 1 file changed, 21 insertions(+), 36 deletions(-)
> >
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > index e979bf442929..7e75c1b10838 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > @@ -270,60 +270,45 @@ static int read_sys_metadata_field(u64 field_id, u64 *data)
> > return 0;
> > }
> >
> > -static int read_sys_metadata_field16(u64 field_id,
> > - int offset,
> > - struct tdx_sys_info_tdmr *ts)
> > +static int read_sys_metadata_field16(u64 field_id, u16 *val)
> > {
> > - u16 *ts_member = ((void *)ts) + offset;
> > u64 tmp;
> > int ret;
> >
> > - if (WARN_ON_ONCE(MD_FIELD_ID_ELE_SIZE_CODE(field_id) !=
> > - MD_FIELD_ID_ELE_SIZE_16BIT))
> > - return -EINVAL;
> > + BUILD_BUG_ON(MD_FIELD_ID_ELE_SIZE_CODE(field_id) !=
> > + MD_FIELD_ID_ELE_SIZE_16BIT);
>
> This gets removed next patch, so why do it
This patch I didn't add the build-time check in the (new) TD_SYSINFO_MAP(), so
I changed the runtime check to build-time check here since we can actually do
it here. I think even for this middle state patch we should do the build-time
check. I can move it to the (new) TD_SYSINFO_MAP() though if that's better.
>
> >
> > ret = read_sys_metadata_field(field_id, &tmp);
> > if (ret)
> > return ret;
> >
> > - *ts_member = tmp;
> > + *val = tmp;
> >
> > return 0;
> > }
> >
> > -struct field_mapping {
> > - u64 field_id;
> > - int offset;
> > -};
> > -
> > -#define TD_SYSINFO_MAP(_field_id, _offset) \
> > - { .field_id = MD_FIELD_ID_##_field_id, \
> > - .offset = offsetof(struct tdx_sys_info_tdmr, _offset) }
> > -
> > -/* Map TD_SYSINFO fields into 'struct tdx_sys_info_tdmr': */
> > -static const struct field_mapping fields[] = {
> > - TD_SYSINFO_MAP(MAX_TDMRS, max_tdmrs),
> > - TD_SYSINFO_MAP(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr),
> > - TD_SYSINFO_MAP(PAMT_4K_ENTRY_SIZE, pamt_entry_size[TDX_PS_4K]),
> > - TD_SYSINFO_MAP(PAMT_2M_ENTRY_SIZE, pamt_entry_size[TDX_PS_2M]),
> > - TD_SYSINFO_MAP(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]),
> > -};
> > +/*
> > + * Assumes locally defined @ret and @sysinfo_tdmr to convey the error
> > + * code and the 'struct tdx_sys_info_tdmr' instance to fill out.
> > + */
> > +#define TD_SYSINFO_MAP(_field_id, _member) \
>
> "MAP" made sense when it was in a struct whereas
> now it is reading.
>
> > + ({ \
> > + if (!ret) \
> > + ret = read_sys_metadata_field16(MD_FIELD_ID_##_field_id, \
> > + &sysinfo_tdmr->_member); \
> > + })
> >
> > static int get_tdx_sys_info_tdmr(struct tdx_sys_info_tdmr *sysinfo_tdmr)
> > {
> > - int ret;
> > - int i;
> > + int ret = 0;
> >
> > - /* Populate 'sysinfo_tdmr' fields using the mapping structure above: */
> > - for (i = 0; i < ARRAY_SIZE(fields); i++) {
> > - ret = read_sys_metadata_field16(fields[i].field_id,
> > - fields[i].offset,
> > - sysinfo_tdmr);
> > - if (ret)
> > - return ret;
> > - }
> > + TD_SYSINFO_MAP(MAX_TDMRS, max_tdmrs);
> > + TD_SYSINFO_MAP(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr);
> > + TD_SYSINFO_MAP(PAMT_4K_ENTRY_SIZE, pamt_entry_size[TDX_PS_4K]);
> > + TD_SYSINFO_MAP(PAMT_2M_ENTRY_SIZE, pamt_entry_size[TDX_PS_2M]);
> > + TD_SYSINFO_MAP(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]);
>
> Another possibility is to put the macro at the invocation site:
>
> #define READ_SYS_INFO(_field_id, _member) \
> ret = ret ?: read_sys_metadata_field16(MD_FIELD_ID_##_field_id, \
> &sysinfo_tdmr->_member)
>
> READ_SYS_INFO(MAX_TDMRS, max_tdmrs);
> READ_SYS_INFO(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr);
> READ_SYS_INFO(PAMT_4K_ENTRY_SIZE, pamt_entry_size[TDX_PS_4K]);
> READ_SYS_INFO(PAMT_2M_ENTRY_SIZE, pamt_entry_size[TDX_PS_2M]);
> READ_SYS_INFO(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]);
>
> #undef READ_SYS_INFO
>
> And so on in later patches:
>
> #define READ_SYS_INFO(_field_id, _member) \
> ret = ret ?: read_sys_metadata_field(MD_FIELD_ID_##_field_id, \
> &sysinfo_version->_member)
>
> READ_SYS_INFO(MAJOR_VERSION, major);
> READ_SYS_INFO(MINOR_VERSION, minor);
> READ_SYS_INFO(UPDATE_VERSION, update);
> READ_SYS_INFO(INTERNAL_VERSION, internal);
> READ_SYS_INFO(BUILD_NUM, build_num);
> READ_SYS_INFO(BUILD_DATE, build_date);
>
> #undef READ_SYS_INFO
>
This is fine to me. But AFAICT Kirill doesn't like the "#undef" part.
Kirill, do you have comments?
Otherwise I will go with what Adrian suggested.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 8/8] x86/virt/tdx: Don't initialize module that doesn't support NO_RBP_MOD feature
2024-08-27 7:14 ` [PATCH v3 8/8] x86/virt/tdx: Don't initialize module that doesn't support NO_RBP_MOD feature Kai Huang
@ 2024-08-30 11:01 ` Adrian Hunter
2024-09-06 23:36 ` Dan Williams
1 sibling, 0 replies; 42+ messages in thread
From: Adrian Hunter @ 2024-08-30 11:01 UTC (permalink / raw)
To: Kai Huang, dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo,
hpa, dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu
On 27/08/24 10:14, Kai Huang wrote:
> Old TDX modules can clobber RBP in the TDH.VP.ENTER SEAMCALL. However
> RBP is used as frame pointer in the x86_64 calling convention, and
> clobbering RBP could result in bad things like being unable to unwind
> the stack if any non-maskable exceptions (NMI, #MC etc) happens in that
> gap.
>
> A new "NO_RBP_MOD" feature was introduced to more recent TDX modules to
> not clobber RBP. This feature is reported in the TDX_FEATURES0 global
> metadata field via bit 18.
>
> Don't initialize the TDX module if this feature is not supported [1].
>
> Link: https://lore.kernel.org/all/c0067319-2653-4cbd-8fee-1ccf21b1e646@suse.com/T/#mef98469c51e2382ead2c537ea189752360bd2bef [1]
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>
> v2 -> v3:
> - check_module_compatibility() -> check_features().
> - Improve error message.
>
> https://lore.kernel.org/kvm/cover.1721186590.git.kai.huang@intel.com/T/#md9e2eeef927838cbf20d7b361cdbea518b8aec50
>
> ---
> arch/x86/virt/vmx/tdx/tdx.c | 17 +++++++++++++++++
> arch/x86/virt/vmx/tdx/tdx.h | 3 +++
> 2 files changed, 20 insertions(+)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index fa335ab1ae92..032a53ddf5bc 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -454,6 +454,18 @@ static int get_tdx_sys_info(struct tdx_sys_info *sysinfo)
> return get_tdx_sys_info_tdmr(&sysinfo->tdmr);
> }
>
> +static int check_features(struct tdx_sys_info *sysinfo)
> +{
> + u64 tdx_features0 = sysinfo->features.tdx_features0;
> +
> + if (!(tdx_features0 & TDX_FEATURES0_NO_RBP_MOD)) {
> + pr_err("frame pointer (RBP) clobber bug present, upgrade TDX module\n");
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> /* Calculate the actual TDMR size */
> static int tdmr_size_single(u16 max_reserved_per_tdmr)
> {
> @@ -1235,6 +1247,11 @@ static int init_tdx_module(void)
>
> print_basic_sys_info(&sysinfo);
>
> + /* Check whether the kernel can support this module */
> + ret = check_features(&sysinfo);
> + if (ret)
> + return ret;
> +
> /*
> * To keep things simple, assume that all TDX-protected memory
> * will come from the page allocator. Make sure all pages in the
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index e7bed9e717c7..831361e6d0fb 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -154,6 +154,9 @@ struct tdx_sys_info_features {
> u64 tdx_features0;
> };
>
> +/* Architectural bit definitions of TDX_FEATURES0 metadata field */
> +#define TDX_FEATURES0_NO_RBP_MOD _BITULL(18)
> +
> /* Class "TDX Module Version" */
> struct tdx_sys_info_version {
> u16 major;
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 5/8] x86/virt/tdx: Start to track all global metadata in one structure
2024-08-27 7:14 ` [PATCH v3 5/8] x86/virt/tdx: Start to track all global metadata in one structure Kai Huang
2024-08-30 7:02 ` Adrian Hunter
@ 2024-08-30 11:01 ` Nikolay Borisov
2024-08-30 11:57 ` Huang, Kai
1 sibling, 1 reply; 42+ messages in thread
From: Nikolay Borisov @ 2024-08-30 11:01 UTC (permalink / raw)
To: Kai Huang, dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo,
hpa, dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu, adrian.hunter
On 27.08.24 г. 10:14 ч., Kai Huang wrote:
> The TDX module provides a set of "global metadata fields". They report
> things like TDX module version, supported features, and fields related
> to create/run TDX guests and so on.
>
> Currently the kernel only reads "TD Memory Region" (TDMR) related fields
> for module initialization. There are immediate needs which require the
> TDX module initialization to read more global metadata including module
> version, supported features and "Convertible Memory Regions" (CMRs).
>
> Also, KVM will need to read more metadata fields to support baseline TDX
> guests. In the longer term, other TDX features like TDX Connect (which
> supports assigning trusted IO devices to TDX guest) may also require
> other kernel components such as pci/vt-d to access global metadata.
>
> To meet all those requirements, the idea is the TDX host core-kernel to
> to provide a centralized, canonical, and read-only structure for the
> global metadata that comes out from the TDX module for all kernel
> components to use.
>
> As the first step, introduce a new 'struct tdx_sys_info' to track all
> global metadata fields.
>
> TDX categories global metadata fields into different "Class"es. E.g.,
> the TDMR related fields are under class "TDMR Info". Instead of making
> 'struct tdx_sys_info' a plain structure to contain all metadata fields,
> organize them in smaller structures based on the "Class".
>
> This allows those metadata fields to be used in finer granularity thus
> makes the code more clear. E.g., the construct_tdmr() can just take the
> structure which contains "TDMR Info" metadata fields.
>
> Add a new function get_tdx_sys_info() as the placeholder to read all
> metadata fields, and call it at the beginning of init_tdx_module(). For
> now it only calls get_tdx_sys_info_tdmr() to read TDMR related fields.
>
> Note there is a functional change: get_tdx_sys_info_tdmr() is moved from
> after build_tdx_memlist() to before it, but it is fine to do so.
>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
>
> v2 -> v3:
> - Split out the part to rename 'struct tdx_tdmr_sysinfo' to 'struct
> tdx_sys_info_tdmr'.
>
>
> ---
> arch/x86/virt/vmx/tdx/tdx.c | 19 ++++++++++++-------
> arch/x86/virt/vmx/tdx/tdx.h | 36 +++++++++++++++++++++++++++++-------
> 2 files changed, 41 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 1cd9035c783f..24eb289c80e8 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -318,6 +318,11 @@ static int get_tdx_sys_info_tdmr(struct tdx_sys_info_tdmr *sysinfo_tdmr)
> return ret;
> }
>
> +static int get_tdx_sys_info(struct tdx_sys_info *sysinfo)
A more apt name for this function would be init_tdx_sys_info, because it
will be executed only once during module initialization and it's really
initialising those values.
Given how complex TDX turns out to be it will be best if one off init
functions are prefixed with 'init_'.
> +{
> + return get_tdx_sys_info_tdmr(&sysinfo->tdmr);
> +}
> +
> /* Calculate the actual TDMR size */
> static int tdmr_size_single(u16 max_reserved_per_tdmr)
> {
> @@ -1090,9 +1095,13 @@ static int init_tdmrs(struct tdmr_info_list *tdmr_list)
>
> static int init_tdx_module(void)
> {
> - struct tdx_sys_info_tdmr sysinfo_tdmr;
> + struct tdx_sys_info sysinfo;
> int ret;
>
> + ret = get_tdx_sys_info(&sysinfo);
> + if (ret)
> + return ret;
> +
> /*
> * To keep things simple, assume that all TDX-protected memory
> * will come from the page allocator. Make sure all pages in the
> @@ -1109,17 +1118,13 @@ static int init_tdx_module(void)
> if (ret)
> goto out_put_tdxmem;
>
> - ret = get_tdx_sys_info_tdmr(&sysinfo_tdmr);
> - if (ret)
> - goto err_free_tdxmem;
> -
> /* Allocate enough space for constructing TDMRs */
> - ret = alloc_tdmr_list(&tdx_tdmr_list, &sysinfo_tdmr);
> + ret = alloc_tdmr_list(&tdx_tdmr_list, &sysinfo.tdmr);
> if (ret)
> goto err_free_tdxmem;
>
> /* Cover all TDX-usable memory regions in TDMRs */
> - ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &sysinfo_tdmr);
> + ret = construct_tdmrs(&tdx_memlist, &tdx_tdmr_list, &sysinfo.tdmr);
> if (ret)
> goto err_free_tdmrs;
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index 8aabd03d8bf5..4cddbb035b9f 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -100,13 +100,6 @@ struct tdx_memblock {
> int nid;
> };
>
> -/* "TDMR info" part of "Global Scope Metadata" for constructing TDMRs */
> -struct tdx_sys_info_tdmr {
> - u16 max_tdmrs;
> - u16 max_reserved_per_tdmr;
> - u16 pamt_entry_size[TDX_PS_NR];
> -};
> -
> /* Warn if kernel has less than TDMR_NR_WARN TDMRs after allocation */
> #define TDMR_NR_WARN 4
>
> @@ -119,4 +112,33 @@ struct tdmr_info_list {
> int max_tdmrs; /* How many 'tdmr_info's are allocated */
> };
>
> +/*
> + * Kernel-defined structures to contain "Global Scope Metadata".
> + *
> + * TDX global metadata fields are categorized by "Class"es. See the
> + * "global_metadata.json" in the "TDX 1.5 ABI Definitions".
> + *
> + * 'struct tdx_sys_info' is the main structure to contain all metadata
> + * used by the kernel. It contains sub-structures with each reflecting
> + * the "Class" in the 'global_metadata.json'.
> + *
> + * Note the structure name may not exactly follow the name of the
> + * "Class" in the TDX spec, but the comment of that structure always
> + * reflect that.
> + *
> + * Also note not all metadata fields in each class are defined, only
> + * those used by the kernel are.
> + */
> +
> +/* Class "TDMR info" */
> +struct tdx_sys_info_tdmr {
> + u16 max_tdmrs;
> + u16 max_reserved_per_tdmr;
> + u16 pamt_entry_size[TDX_PS_NR];
> +};
> +
> +struct tdx_sys_info {
> + struct tdx_sys_info_tdmr tdmr;
> +};
> +
> #endif
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 3/8] x86/virt/tdx: Prepare to support reading other global metadata fields
2024-08-30 6:43 ` Adrian Hunter
@ 2024-08-30 11:02 ` Huang, Kai
0 siblings, 0 replies; 42+ messages in thread
From: Huang, Kai @ 2024-08-30 11:02 UTC (permalink / raw)
To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
kirill.shutemov@linux.intel.com, tglx@linutronix.de,
pbonzini@redhat.com, Hunter, Adrian, Williams, Dan J
Cc: Gao, Chao, kvm@vger.kernel.org, binbin.wu@linux.intel.com,
linux-kernel@vger.kernel.org, Edgecombe, Rick P, x86@kernel.org,
Yamahata, Isaku
> >
> > For now the kernel only reads "TD Memory Region" (TDMR) related fields
> > for module initialization. There are both immediate needs to read more
> > fields for module initialization and near-future needs for other kernel
> > components like KVM to run TDX guests.
> > will be organized in different structures depending on their meanings.
>
> Line above seems lost
Hmm.. It should be removed. I thought I have done the spell check for all the
patches :-(
>
> >
> > For now the kernel only reads TDMR related fields. The TD_SYSINFO_MAP()
> > macro hard-codes the 'struct tdx_sys_info_tdmr' instance name. To make
> > it work with different instances of different structures, extend it to
> > take the structure instance name as an argument.
> >
> > This also means the current code which uses TD_SYSINFO_MAP() must type
> > 'struct tdx_sys_info_tdmr' instance name explicitly for each use. To
> > make the code easier to read, add a wrapper TD_SYSINFO_MAP_TDMR_INFO()
> > which hides the instance name.
>
> Note, were you to accept my suggestion for patch 2, TD_SYSINFO_MAP() would
> have gone away, and no changes would be needed to get_tdx_sys_info_tdmr().
> So the above 2 paragraphs could be dropped.
Yeah seems better to me. I'll use your way unless someone objects.
>
> >
> > TDX also support 8/16/32/64 bits metadata field element sizes. For now
> > all TDMR related fields are 16-bit long thus the kernel only has one
> > helper:
> >
> > static int read_sys_metadata_field16(u64 field_id, u16 *val) {}
> >
> > Future changes will need to read more metadata fields with different
> > element sizes. To make the code short, extend the helper to take a
> > 'void *' buffer and the buffer size so it can work with all element
> > sizes.
> >
> > Note in this way the helper loses the type safety and the build-time
> > check inside the helper cannot work anymore because the compiler cannot
> > determine the exact value of the buffer size.
> >
> > To resolve those, add a wrapper of the helper which only works with
> > u8/u16/u32/u64 directly and do build-time check, where the compiler
> > can easily know both the element size (from field ID) and the buffer
> > size(using sizeof()), before calling the helper.
> >
> > An alternative option is to provide one helper for each element size:
>
> IMHO, it is not necessary to describe alternative options.
>
I am not sure? My understanding is we should mention those alternatives in
the changelog so that the reviewers can have a better view?
[...]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 3/8] x86/virt/tdx: Prepare to support reading other global metadata fields
2024-08-30 9:05 ` Nikolay Borisov
@ 2024-08-30 11:09 ` Huang, Kai
0 siblings, 0 replies; 42+ messages in thread
From: Huang, Kai @ 2024-08-30 11:09 UTC (permalink / raw)
To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
kirill.shutemov@linux.intel.com, tglx@linutronix.de,
pbonzini@redhat.com, Williams, Dan J, nik.borisov@suse.com
Cc: Gao, Chao, Edgecombe, Rick P, x86@kernel.org,
binbin.wu@linux.intel.com, linux-kernel@vger.kernel.org,
Hunter, Adrian, kvm@vger.kernel.org, Yamahata, Isaku
> > -static int read_sys_metadata_field16(u64 field_id, u16 *val)
> > +static int __read_sys_metadata_field(u64 field_id, void *val, int size)
>
> The type of 'size' should be size_t.
OK will do.
>
> > {
> > u64 tmp;
> > int ret;
> >
> > - BUILD_BUG_ON(MD_FIELD_ID_ELE_SIZE_CODE(field_id) !=
> > - MD_FIELD_ID_ELE_SIZE_16BIT);
> > -
> > - ret = read_sys_metadata_field(field_id, &tmp);
> > + ret = tdh_sys_rd(field_id, &tmp);
> > if (ret)
> > return ret;
> >
> > - *val = tmp;
> > + memcpy(val, &tmp, size);
> >
> > return 0;
> > }
> >
> > +/* Wrapper to read one global metadata to u8/u16/u32/u64 */
> > +#define read_sys_metadata_field(_field_id, _val) \
> > + ({ \
> > + BUILD_BUG_ON(MD_FIELD_ELE_SIZE(_field_id) != sizeof(typeof(*(_val)))); \
> > + __read_sys_metadata_field(_field_id, _val, sizeof(typeof(*(_val)))); \
> > + })
> > +
> > /*
> > - * Assumes locally defined @ret and @sysinfo_tdmr to convey the error
> > - * code and the 'struct tdx_sys_info_tdmr' instance to fill out.
> > + * Read one global metadata field to a member of a structure instance,
> > + * assuming locally defined @ret to convey the error code.
> > */
> > -#define TD_SYSINFO_MAP(_field_id, _member) \
> > - ({ \
> > - if (!ret) \
> > - ret = read_sys_metadata_field16(MD_FIELD_ID_##_field_id, \
> > - &sysinfo_tdmr->_member); \
> > +#define TD_SYSINFO_MAP(_field_id, _stbuf, _member) \
> > + ({ \
> > + if (!ret) \
> > + ret = read_sys_metadata_field(MD_FIELD_ID_##_field_id, \
> > + &_stbuf->_member); \
> > })
> >
> > static int get_tdx_sys_info_tdmr(struct tdx_sys_info_tdmr *sysinfo_tdmr)
> > {
> > int ret = 0;
> >
> > - TD_SYSINFO_MAP(MAX_TDMRS, max_tdmrs);
> > - TD_SYSINFO_MAP(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr);
> > - TD_SYSINFO_MAP(PAMT_4K_ENTRY_SIZE, pamt_entry_size[TDX_PS_4K]);
> > - TD_SYSINFO_MAP(PAMT_2M_ENTRY_SIZE, pamt_entry_size[TDX_PS_2M]);
> > - TD_SYSINFO_MAP(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]);
> > +#define TD_SYSINFO_MAP_TDMR_INFO(_field_id, _member) \
> > + TD_SYSINFO_MAP(_field_id, sysinfo_tdmr, _member)
>
> nit: I guess its a personal preference but honestly I think the amount
> of macro indirection (3 levels) here is crazy, despite each being rather
> simple. Just use TD_SYSINFO_MAP directly, saving the typing of
> "sysinfo_tdmr" doesn't seem like a big deal.
We have a different interpretation of "crazy" :-)
>
> You can probably take it even a bit further and simply opencode
> read_sys_metadata_field macro inside TD_SYSINFO_MAP and be left with
> just it, no ? No other patch in this series uses read_sys_metadata_field
> stand alone, if anything factoring it out could be deferred until the
> first users gets introduced.
Joke aside, anyway, with what Adrian suggested we just need the
TD_SYSINFO_MAP() but don't need those wrappers anymore. I'll go with this if
no one says differently.
>
> > +
> > + TD_SYSINFO_MAP_TDMR_INFO(MAX_TDMRS, max_tdmrs);
> > + TD_SYSINFO_MAP_TDMR_INFO(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr);
> > + TD_SYSINFO_MAP_TDMR_INFO(PAMT_4K_ENTRY_SIZE, pamt_entry_size[TDX_PS_4K]);
> > + TD_SYSINFO_MAP_TDMR_INFO(PAMT_2M_ENTRY_SIZE, pamt_entry_size[TDX_PS_2M]);
> > + TD_SYSINFO_MAP_TDMR_INFO(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]);
> >
> > return ret;
> > }
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> > index 148f9b4d1140..7458f6717873 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.h
> > +++ b/arch/x86/virt/vmx/tdx/tdx.h
> > @@ -53,7 +53,8 @@
> > #define MD_FIELD_ID_ELE_SIZE_CODE(_field_id) \
> > (((_field_id) & GENMASK_ULL(33, 32)) >> 32)
> >
> > -#define MD_FIELD_ID_ELE_SIZE_16BIT 1
> > +#define MD_FIELD_ELE_SIZE(_field_id) \
>
> That ELE seems a bit ambiguous, ELEM seems more natural and is in line
> with other macros in the kernel.
OK will do. Thanks for the review!
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 5/8] x86/virt/tdx: Start to track all global metadata in one structure
2024-08-30 7:02 ` Adrian Hunter
@ 2024-08-30 11:10 ` Huang, Kai
0 siblings, 0 replies; 42+ messages in thread
From: Huang, Kai @ 2024-08-30 11:10 UTC (permalink / raw)
To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
kirill.shutemov@linux.intel.com, tglx@linutronix.de,
pbonzini@redhat.com, Hunter, Adrian, Williams, Dan J
Cc: Gao, Chao, kvm@vger.kernel.org, binbin.wu@linux.intel.com,
linux-kernel@vger.kernel.org, Edgecombe, Rick P, x86@kernel.org,
Yamahata, Isaku
> >
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
>
> Notwithstanding minor cosmetic tweaks:
Will fix.
>
> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Thanks.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 7/8] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes
2024-08-30 10:50 ` Adrian Hunter
@ 2024-08-30 11:52 ` Huang, Kai
2024-08-30 12:02 ` Adrian Hunter
0 siblings, 1 reply; 42+ messages in thread
From: Huang, Kai @ 2024-08-30 11:52 UTC (permalink / raw)
To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
kirill.shutemov@linux.intel.com, tglx@linutronix.de,
pbonzini@redhat.com, Hunter, Adrian, Williams, Dan J
Cc: Gao, Chao, kvm@vger.kernel.org, binbin.wu@linux.intel.com,
linux-kernel@vger.kernel.org, Edgecombe, Rick P, x86@kernel.org,
Yamahata, Isaku
On Fri, 2024-08-30 at 13:50 +0300, Hunter, Adrian wrote:
> On 27/08/24 10:14, Kai Huang wrote:
> > A TDX module initialization failure was reported on a Emerald Rapids
> > platform:
> >
> > virt/tdx: initialization failed: TDMR [0x0, 0x80000000): reserved areas exhausted.
> > virt/tdx: module initialization failed (-28)
> >
> > As part of initializing the TDX module, the kernel informs the TDX
> > module of all "TDX-usable memory regions" using an array of TDX defined
> > structure "TD Memory Region" (TDMR). Each TDMR must be in 1GB aligned
> > and in 1GB granularity, and all "non-TDX-usable memory holes" within a
> > given TDMR must be marked as "reserved areas". The TDX module reports a
> > maximum number of reserved areas that can be supported per TDMR.
>
> The statement:
>
> ... all "non-TDX-usable memory holes" within a
> given TDMR must be marked as "reserved areas".
>
> is not exactly true, which is essentially the basis of this fix.
Hmm I think I see what you mean. Perhaps the "must be marked as" confuses
you?
The "TDX-usable memory" here means all pages that can potentially be used by
TDX. They don't have to be actually used by TDX, i.e., "TDX-usable memory" vs
"TDX-used memory".
And the "non-TDX-usable memory holes" means the memory regions that cannot be
possibly used by TDX at all.
Is below better if I change "must be marked as" to "are"?
As part of initializing the TDX module, the kernel informs the TDX
module of all "TDX-usable memory regions" using an array of TDX defined
structure "TD Memory Region" (TDMR). Each TDMR must be in 1GB aligned
and in 1GB granularity, and all "non-TDX-usable memory holes" within a
given TDMR are marked as "reserved areas". The TDX module reports a
maximum number of reserved areas that can be supported per TDMR.
Note in my logic here we don't need to mention CMR. Here I just want to tell
the fact that each TDMR has number of "reserved areas" and the maximum number
is reported by TDX module.
>
> The relevant requirements are (from the spec):
>
> Any non-reserved 4KB page within a TDMR must be convertible
> i.e., it must be within a CMR
Yes.
>
> Reserved areas within a TDMR need not be within a CMR.
Yes. They need not to be, but they can be.
>
> PAMT areas must not overlap with TDMR non-reserved areas;
> however, they may reside within TDMR reserved areas
> (as long as these are convertible).
Yes. However in implementation PAMTs are out of page allocator so they are
all within TDMRs thus need to be put to reserved areas.
Those are TDX architectural requirements. They are not all related to the fix
of this problem. The most important thing here is:
Any non-reserved memory within a TDMR must be within CMR.
That means as long as one memory region is CMR, it doesn't need to be in
"reserved area" from TDX architecture's prespective.
That means we can include more memory regions (even they cannot be used by TDX
at all) as "non-reserved" areas in TDMRs to reduce the number of "reserved
areas" as long as those regions are within CMR.
This is the logic behind this fix.
>
> >
> > Currently, the kernel finds those "non-TDX-usable memory holes" within a
> > given TDMR by walking over a list of "TDX-usable memory regions", which
> > essentially reflects the "usable" regions in the e820 table (w/o memory
> > hotplug operations precisely, but this is not relevant here).
>
> But including e820 table regions that are not "usable" in the TDMR
> reserved areas is not necessary - it is not one of the rules.
True. That's why we can do this fix.
>
> What confused me initially was that I did not realize the we already
> require that the TDX Module does not touch memory in the TDMR
> non-reserved areas not specifically allocated to it. So it makes no
> difference to the TDX Module what the pages that have not been allocated
> to it, are used for.
>
> >
> > As shown above, the root cause of this failure is when the kernel tries
> > to construct a TDMR to cover address range [0x0, 0x80000000), there
> > are too many memory holes within that range and the number of memory
> > holes exceeds the maximum number of reserved areas.
> >
> > The E820 table of that platform (see [1] below) reflects this: the
> > number of memory holes among e820 "usable" entries exceeds 16, which is
> > the maximum number of reserved areas TDX module supports in practice.
> >
> > === Fix ===
> >
> > There are two options to fix this: 1) reduce the number of memory holes
> > when constructing a TDMR to save "reserved areas"; 2) reduce the TDMR's
> > size to cover fewer memory regions, thus fewer memory holes.
>
> Probably better to try and get rid of this "two options" stuff and focus
> on how this is a simple and effective fix.
As I mentioned in another reply I would prefer to keep those options since I
believe they can provide a full view to the reviewers.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 5/8] x86/virt/tdx: Start to track all global metadata in one structure
2024-08-30 11:01 ` Nikolay Borisov
@ 2024-08-30 11:57 ` Huang, Kai
0 siblings, 0 replies; 42+ messages in thread
From: Huang, Kai @ 2024-08-30 11:57 UTC (permalink / raw)
To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
kirill.shutemov@linux.intel.com, tglx@linutronix.de,
pbonzini@redhat.com, Williams, Dan J, nik.borisov@suse.com
Cc: Gao, Chao, Edgecombe, Rick P, x86@kernel.org,
binbin.wu@linux.intel.com, linux-kernel@vger.kernel.org,
Hunter, Adrian, kvm@vger.kernel.org, Yamahata, Isaku
On Fri, 2024-08-30 at 14:01 +0300, Nikolay Borisov wrote:
>
> On 27.08.24 г. 10:14 ч., Kai Huang wrote:
> > The TDX module provides a set of "global metadata fields". They report
> > things like TDX module version, supported features, and fields related
> > to create/run TDX guests and so on.
> >
> > Currently the kernel only reads "TD Memory Region" (TDMR) related fields
> > for module initialization. There are immediate needs which require the
> > TDX module initialization to read more global metadata including module
> > version, supported features and "Convertible Memory Regions" (CMRs).
> >
> > Also, KVM will need to read more metadata fields to support baseline TDX
> > guests. In the longer term, other TDX features like TDX Connect (which
> > supports assigning trusted IO devices to TDX guest) may also require
> > other kernel components such as pci/vt-d to access global metadata.
> >
> > To meet all those requirements, the idea is the TDX host core-kernel to
> > to provide a centralized, canonical, and read-only structure for the
> > global metadata that comes out from the TDX module for all kernel
> > components to use.
> >
> > As the first step, introduce a new 'struct tdx_sys_info' to track all
> > global metadata fields.
> >
> > TDX categories global metadata fields into different "Class"es. E.g.,
> > the TDMR related fields are under class "TDMR Info". Instead of making
> > 'struct tdx_sys_info' a plain structure to contain all metadata fields,
> > organize them in smaller structures based on the "Class".
> >
> > This allows those metadata fields to be used in finer granularity thus
> > makes the code more clear. E.g., the construct_tdmr() can just take the
> > structure which contains "TDMR Info" metadata fields.
> >
> > Add a new function get_tdx_sys_info() as the placeholder to read all
> > metadata fields, and call it at the beginning of init_tdx_module(). For
> > now it only calls get_tdx_sys_info_tdmr() to read TDMR related fields.
> >
> > Note there is a functional change: get_tdx_sys_info_tdmr() is moved from
> > after build_tdx_memlist() to before it, but it is fine to do so.
> >
> > Signed-off-by: Kai Huang <kai.huang@intel.com>
> > ---
> >
> > v2 -> v3:
> > - Split out the part to rename 'struct tdx_tdmr_sysinfo' to 'struct
> > tdx_sys_info_tdmr'.
> >
> >
> > ---
> > arch/x86/virt/vmx/tdx/tdx.c | 19 ++++++++++++-------
> > arch/x86/virt/vmx/tdx/tdx.h | 36 +++++++++++++++++++++++++++++-------
> > 2 files changed, 41 insertions(+), 14 deletions(-)
> >
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > index 1cd9035c783f..24eb289c80e8 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > @@ -318,6 +318,11 @@ static int get_tdx_sys_info_tdmr(struct tdx_sys_info_tdmr *sysinfo_tdmr)
> > return ret;
> > }
> >
> > +static int get_tdx_sys_info(struct tdx_sys_info *sysinfo)
>
> A more apt name for this function would be init_tdx_sys_info, because it
> will be executed only once during module initialization and it's really
> initialising those values.
>
> Given how complex TDX turns out to be it will be best if one off init
> functions are prefixed with 'init_'.
>
Since the function is only called once, I don't see big difference between
get_xx() vs init_xx().
Is init_xx() slightly better? Maybe. But to me we are not at that point that
get_xx() needs to be renamed.
One reason I don't want to do such change is the current upstream code is
already using get_tdx_tdmr_sysinfo(). I don't want to rename using init_xx().
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 7/8] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes
2024-08-30 11:52 ` Huang, Kai
@ 2024-08-30 12:02 ` Adrian Hunter
0 siblings, 0 replies; 42+ messages in thread
From: Adrian Hunter @ 2024-08-30 12:02 UTC (permalink / raw)
To: Huang, Kai, Hansen, Dave, seanjc@google.com, bp@alien8.de,
peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
kirill.shutemov@linux.intel.com, tglx@linutronix.de,
pbonzini@redhat.com, Williams, Dan J
Cc: Gao, Chao, kvm@vger.kernel.org, binbin.wu@linux.intel.com,
linux-kernel@vger.kernel.org, Edgecombe, Rick P, x86@kernel.org,
Yamahata, Isaku
On 30/08/24 14:52, Huang, Kai wrote:
> On Fri, 2024-08-30 at 13:50 +0300, Hunter, Adrian wrote:
>> On 27/08/24 10:14, Kai Huang wrote:
>>> A TDX module initialization failure was reported on a Emerald Rapids
>>> platform:
>>>
>>> virt/tdx: initialization failed: TDMR [0x0, 0x80000000): reserved areas exhausted.
>>> virt/tdx: module initialization failed (-28)
>>>
>>> As part of initializing the TDX module, the kernel informs the TDX
>>> module of all "TDX-usable memory regions" using an array of TDX defined
>>> structure "TD Memory Region" (TDMR). Each TDMR must be in 1GB aligned
>>> and in 1GB granularity, and all "non-TDX-usable memory holes" within a
>>> given TDMR must be marked as "reserved areas". The TDX module reports a
>>> maximum number of reserved areas that can be supported per TDMR.
>>
>> The statement:
>>
>> ... all "non-TDX-usable memory holes" within a
>> given TDMR must be marked as "reserved areas".
>>
>> is not exactly true, which is essentially the basis of this fix.
>
> Hmm I think I see what you mean. Perhaps the "must be marked as" confuses
> you?
>
> The "TDX-usable memory" here means all pages that can potentially be used by
> TDX. They don't have to be actually used by TDX, i.e., "TDX-usable memory" vs
> "TDX-used memory".
>
> And the "non-TDX-usable memory holes" means the memory regions that cannot be
> possibly used by TDX at all.
>
> Is below better if I change "must be marked as" to "are"?
>
> As part of initializing the TDX module, the kernel informs the TDX
> module of all "TDX-usable memory regions" using an array of TDX defined
> structure "TD Memory Region" (TDMR). Each TDMR must be in 1GB aligned
> and in 1GB granularity, and all "non-TDX-usable memory holes" within a
> given TDMR are marked as "reserved areas". The TDX module reports a
> maximum number of reserved areas that can be supported per TDMR.
Yes, that's better.
>
> Note in my logic here we don't need to mention CMR. Here I just want to tell
> the fact that each TDMR has number of "reserved areas" and the maximum number
> is reported by TDX module.
>
>>
>> The relevant requirements are (from the spec):
>>
>> Any non-reserved 4KB page within a TDMR must be convertible
>> i.e., it must be within a CMR
>
> Yes.
>
>>
>> Reserved areas within a TDMR need not be within a CMR.
>
> Yes. They need not to be, but they can be.
>
>>
>> PAMT areas must not overlap with TDMR non-reserved areas;
>> however, they may reside within TDMR reserved areas
>> (as long as these are convertible).
>
> Yes. However in implementation PAMTs are out of page allocator so they are
> all within TDMRs thus need to be put to reserved areas.
>
> Those are TDX architectural requirements. They are not all related to the fix
> of this problem. The most important thing here is:
>
> Any non-reserved memory within a TDMR must be within CMR.
>
> That means as long as one memory region is CMR, it doesn't need to be in
> "reserved area" from TDX architecture's prespective.
>
> That means we can include more memory regions (even they cannot be used by TDX
> at all) as "non-reserved" areas in TDMRs to reduce the number of "reserved
> areas" as long as those regions are within CMR.
>
> This is the logic behind this fix.
>
>>
>>>
>>> Currently, the kernel finds those "non-TDX-usable memory holes" within a
>>> given TDMR by walking over a list of "TDX-usable memory regions", which
>>> essentially reflects the "usable" regions in the e820 table (w/o memory
>>> hotplug operations precisely, but this is not relevant here).
>>
>> But including e820 table regions that are not "usable" in the TDMR
>> reserved areas is not necessary - it is not one of the rules.
>
> True. That's why we can do this fix.
>
>>
>> What confused me initially was that I did not realize the we already
>> require that the TDX Module does not touch memory in the TDMR
>> non-reserved areas not specifically allocated to it. So it makes no
>> difference to the TDX Module what the pages that have not been allocated
>> to it, are used for.
>>
>>>
>>> As shown above, the root cause of this failure is when the kernel tries
>>> to construct a TDMR to cover address range [0x0, 0x80000000), there
>>> are too many memory holes within that range and the number of memory
>>> holes exceeds the maximum number of reserved areas.
>>>
>>> The E820 table of that platform (see [1] below) reflects this: the
>>> number of memory holes among e820 "usable" entries exceeds 16, which is
>>> the maximum number of reserved areas TDX module supports in practice.
>>>
>>> === Fix ===
>>>
>>> There are two options to fix this: 1) reduce the number of memory holes
>>> when constructing a TDMR to save "reserved areas"; 2) reduce the TDMR's
>>> size to cover fewer memory regions, thus fewer memory holes.
>>
>> Probably better to try and get rid of this "two options" stuff and focus
>> on how this is a simple and effective fix.
>
> As I mentioned in another reply I would prefer to keep those options since I
> believe they can provide a full view to the reviewers.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 7/8] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes
2024-08-27 7:14 ` [PATCH v3 7/8] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes Kai Huang
2024-08-30 10:50 ` Adrian Hunter
@ 2024-08-30 12:27 ` Nikolay Borisov
2024-09-06 23:31 ` Dan Williams
2 siblings, 0 replies; 42+ messages in thread
From: Nikolay Borisov @ 2024-08-30 12:27 UTC (permalink / raw)
To: Kai Huang, dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo,
hpa, dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu, adrian.hunter
On 27.08.24 г. 10:14 ч., Kai Huang wrote:
<snip>
> [2] Convertible Memory Regions of the problematic platform:
>
> virt/tdx: CMR: [0x100000, 0x6f800000)
> virt/tdx: CMR: [0x100000000, 0x107a000000)
> virt/tdx: CMR: [0x1080000000, 0x207c000000)
> virt/tdx: CMR: [0x2080000000, 0x307c000000)
> virt/tdx: CMR: [0x3080000000, 0x407c000000)
>
> Fixes: dde3b60d572c9 ("x86/virt/tdx: Designate reserved areas for all TDMRs")
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
<snpi>
> ---
> arch/x86/virt/vmx/tdx/tdx.c | 105 ++++++++++++++++++++++++++++++------
> arch/x86/virt/vmx/tdx/tdx.h | 13 +++++
> 2 files changed, 102 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 0fb673dd43ed..fa335ab1ae92 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -331,6 +331,72 @@ static int get_tdx_sys_info_version(struct tdx_sys_info_version *sysinfo_version
> return ret;
> }
>
> +/* Update the @sysinfo_cmr->num_cmrs to trim tail empty CMRs */
> +static void trim_empty_tail_cmrs(struct tdx_sys_info_cmr *sysinfo_cmr)
> +{
> + int i;
> +
> + for (i = 0; i < sysinfo_cmr->num_cmrs; i++) {
> + u64 cmr_base = sysinfo_cmr->cmr_base[i];
> + u64 cmr_size = sysinfo_cmr->cmr_size[i];
> +
> + if (!cmr_size) {
> + WARN_ON_ONCE(cmr_base);
> + break;
> + }
> +
> + /* TDX architecture: CMR must be 4KB aligned */
> + WARN_ON_ONCE(!PAGE_ALIGNED(cmr_base) ||
> + !PAGE_ALIGNED(cmr_size));
> + }
> +
> + sysinfo_cmr->num_cmrs = i;
> +}
> +
> +static int get_tdx_sys_info_cmr(struct tdx_sys_info_cmr *sysinfo_cmr)
> +{
> + int i, ret = 0;
> +
> +#define TD_SYSINFO_MAP_CMR_INFO(_field_id, _member) \
> + TD_SYSINFO_MAP(_field_id, sysinfo_cmr, _member)
> +
> + TD_SYSINFO_MAP_CMR_INFO(NUM_CMRS, num_cmrs);
> +
> + if (ret)
> + return ret;
> +
> + for (i = 0; i < sysinfo_cmr->num_cmrs; i++) {
> + TD_SYSINFO_MAP_CMR_INFO(CMR_BASE(i), cmr_base[i]);
Yeah, this is just golden, not only are you going through 3 levels of
indirection to expand the TD_SYSINFO_MAP but you also top it with one
final one MD_FIELD_ID_CMR_BASE(i). I'm fine with the CMR_BASE/SIZE
macros, but the way you introduce 3 levels of macros just to avoid typing
TD_SYSINFO_MAP(foo, bar, zoo)
makes no sense.
<snip>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 1/8] x86/virt/tdx: Rename 'struct tdx_tdmr_sysinfo' to reflect the spec better
2024-08-27 7:14 ` [PATCH v3 1/8] x86/virt/tdx: Rename 'struct tdx_tdmr_sysinfo' to reflect the spec better Kai Huang
2024-08-27 13:10 ` Adrian Hunter
@ 2024-08-30 22:05 ` Dan Williams
1 sibling, 0 replies; 42+ messages in thread
From: Dan Williams @ 2024-08-30 22:05 UTC (permalink / raw)
To: Kai Huang, dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo,
hpa, dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu, adrian.hunter, kai.huang
Kai Huang wrote:
> The TDX module provides a set of "global metadata fields". They report
> things like TDX module version, supported features, and fields related
> to create/run TDX guests and so on.
>
> TDX organizes those metadata fields by "Class"es based on the meaning of
> those fields. E.g., for now the kernel only reads "TD Memory Region"
> (TDMR) related fields for module initialization. Those fields are
> defined under class "TDMR Info".
>
> There are both immediate needs to read more metadata fields for module
> initialization and near-future needs for other kernel components like
> KVM to run TDX guests. To meet all those requirements, the idea is the
> TDX host core-kernel to provide a centralized, canonical, and read-only
> structure for the global metadata that comes out from the TDX module for
> all kernel components to use.
>
> More specifically, the target is to end up with something like:
>
> struct tdx_sys_info {
> struct tdx_sys_info_classA a;
> struct tdx_sys_info_classB b;
> ...
> };
>
> Currently the kernel organizes all fields under "TDMR Info" class in
> 'struct tdx_tdmr_sysinfo'. To prepare for the above target, rename the
> structure to 'struct tdx_sys_info_tdmr' to follow the class name better.
>
> No functional change intended.
>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> ---
> v2 -> v3:
> - Split out as a separate patch and place it at beginning:
>
> https://lore.kernel.org/kvm/cover.1721186590.git.kai.huang@intel.com/T/#m8fec7c429242d640cf5e756eb68e3b822e6dff8b
>
> - Rename to 'struct tdx_sys_info_tdmr':
>
> https://lore.kernel.org/kvm/cover.1721186590.git.kai.huang@intel.com/T/#md73dd9b02a492acf4a6facae63e8d030e320967d
> https://lore.kernel.org/kvm/cover.1721186590.git.kai.huang@intel.com/T/#m8fec7c429242d640cf5e756eb68e3b822e6dff8b
>
> ---
> arch/x86/virt/vmx/tdx/tdx.c | 36 ++++++++++++++++++------------------
> arch/x86/virt/vmx/tdx/tdx.h | 2 +-
> 2 files changed, 19 insertions(+), 19 deletions(-)
Looks good to me,
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 2/8] x86/virt/tdx: Remove 'struct field_mapping' and implement TD_SYSINFO_MAP() macro
2024-08-27 7:14 ` [PATCH v3 2/8] x86/virt/tdx: Remove 'struct field_mapping' and implement TD_SYSINFO_MAP() macro Kai Huang
2024-08-29 7:20 ` Adrian Hunter
@ 2024-09-06 20:21 ` Dan Williams
2024-09-09 9:59 ` Huang, Kai
1 sibling, 1 reply; 42+ messages in thread
From: Dan Williams @ 2024-09-06 20:21 UTC (permalink / raw)
To: Kai Huang, dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo,
hpa, dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu, adrian.hunter, kai.huang
I think the subject buries the lead on what this patch does which is
more like:
x86/virt/tdx: Rework TD_SYSINFO_MAP to support build-time verification
Kai Huang wrote:
> TL;DR: Remove the 'struct field_mapping' structure and use another way
I would drop the TL;DR: and just make the changelog more concise,
because as it stands now it requires the reader to fully appreciate the
direction of the v1 approach which this new proposal abandons:
Something like:
Dan noticed [1] that read_sys_metadata_field16() has a runtime warning
to validate that the metadata field size matches the passed in buffer
size. In turns out that all the information to perform that validation
is available at build time. Rework TD_SYSINFO_MAP() to stop providing
runtime data to read_sys_metadata_field16() and instead just pass typed
fields to read_sys_metadata_field16() and let the compiler catch any
mismatches.
The new TD_SYSINFO_MAP() has a couple quirks for readability. It
requires the function that uses it to define a local variable @ret to
carry the error code and set the initial value to 0. It also hard-codes
the variable name of the structure pointer used in the function, but it
is less code, build-time verfiable, and the same readability as the
former 'struct field_mapping' approach.
Link: http://lore.kernel.org/66b16121c48f4_4fc729424@dwillia2-xfh.jf.intel.com.notmuch [1]
[..]
> Link: https://lore.kernel.org/kvm/a107b067-861d-43f4-86b5-29271cb93dad@intel.com/T/#m7cfb3c146214d94b24e978eeb8708d92c0b14ac6 [1]
The expectation for lore links is to capture the message-id. Note the
differences with the "Link:" format above.
> v2 -> v3:
> - Remove 'struct field_mapping' and reimplement TD_SYSINFO_MAP().
>
> ---
> arch/x86/virt/vmx/tdx/tdx.c | 57 ++++++++++++++-----------------------
> 1 file changed, 21 insertions(+), 36 deletions(-)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index e979bf442929..7e75c1b10838 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -270,60 +270,45 @@ static int read_sys_metadata_field(u64 field_id, u64 *data)
> return 0;
> }
>
> -static int read_sys_metadata_field16(u64 field_id,
> - int offset,
> - struct tdx_sys_info_tdmr *ts)
> +static int read_sys_metadata_field16(u64 field_id, u16 *val)
> {
> - u16 *ts_member = ((void *)ts) + offset;
> u64 tmp;
> int ret;
>
> - if (WARN_ON_ONCE(MD_FIELD_ID_ELE_SIZE_CODE(field_id) !=
> - MD_FIELD_ID_ELE_SIZE_16BIT))
> - return -EINVAL;
> + BUILD_BUG_ON(MD_FIELD_ID_ELE_SIZE_CODE(field_id) !=
> + MD_FIELD_ID_ELE_SIZE_16BIT);
Perhaps just move this to TD_SYSINFO_MAP() directly?
Something like:
#define TD_SYSINFO_MAP(_field_id, _member, _size) \
({ \
BUILD_BUG_ON(MD_FIELD_ID_ELE_SIZE_CODE(field_id) != \
MD_FIELD_ID_ELE_SIZE_##_size##BIT); \
if (!ret) \
ret = read_sys_metadata_field##_size(MD_FIELD_ID_##_field_id, \
&sysinfo_tdmr->_member); \
})
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 2/8] x86/virt/tdx: Remove 'struct field_mapping' and implement TD_SYSINFO_MAP() macro
2024-08-29 7:20 ` Adrian Hunter
2024-08-30 10:52 ` Huang, Kai
@ 2024-09-06 21:30 ` Dan Williams
2024-09-09 9:59 ` Huang, Kai
1 sibling, 1 reply; 42+ messages in thread
From: Dan Williams @ 2024-09-06 21:30 UTC (permalink / raw)
To: Adrian Hunter, Kai Huang, dave.hansen, kirill.shutemov, tglx, bp,
peterz, mingo, hpa, dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu
Adrian Hunter wrote:
[..]
> Another possibility is to put the macro at the invocation site:
>
> #define READ_SYS_INFO(_field_id, _member) \
> ret = ret ?: read_sys_metadata_field16(MD_FIELD_ID_##_field_id, \
> &sysinfo_tdmr->_member)
>
> READ_SYS_INFO(MAX_TDMRS, max_tdmrs);
> READ_SYS_INFO(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr);
> READ_SYS_INFO(PAMT_4K_ENTRY_SIZE, pamt_entry_size[TDX_PS_4K]);
> READ_SYS_INFO(PAMT_2M_ENTRY_SIZE, pamt_entry_size[TDX_PS_2M]);
> READ_SYS_INFO(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]);
>
> #undef READ_SYS_INFO
>
> And so on in later patches:
>
> #define READ_SYS_INFO(_field_id, _member) \
> ret = ret ?: read_sys_metadata_field(MD_FIELD_ID_##_field_id, \
> &sysinfo_version->_member)
>
> READ_SYS_INFO(MAJOR_VERSION, major);
> READ_SYS_INFO(MINOR_VERSION, minor);
> READ_SYS_INFO(UPDATE_VERSION, update);
> READ_SYS_INFO(INTERNAL_VERSION, internal);
> READ_SYS_INFO(BUILD_NUM, build_num);
> READ_SYS_INFO(BUILD_DATE, build_date);
>
> #undef READ_SYS_INFO
Looks like a reasonable enhancement to me.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 3/8] x86/virt/tdx: Prepare to support reading other global metadata fields
2024-08-27 7:14 ` [PATCH v3 3/8] x86/virt/tdx: Prepare to support reading other global metadata fields Kai Huang
2024-08-30 6:43 ` Adrian Hunter
2024-08-30 9:05 ` Nikolay Borisov
@ 2024-09-06 21:34 ` Dan Williams
2024-09-09 12:28 ` Huang, Kai
2 siblings, 1 reply; 42+ messages in thread
From: Dan Williams @ 2024-09-06 21:34 UTC (permalink / raw)
To: Kai Huang, dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo,
hpa, dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu, adrian.hunter, kai.huang
Kai Huang wrote:
> Future changes will need to read more metadata fields with different
> element sizes. To make the code short, extend the helper to take a
> 'void *' buffer and the buffer size so it can work with all element
> sizes.
The whole point was to have compile time type safety for global struct
member and the read routine. So, no, using 'void *' is a step backwards.
Take some inspiration from build_mmio_read() and just have a macro that
takes the size as an argument to the macro, not the runtime routine. The
macro statically selects the right routine to call and does not need to
grow a size parameter itself.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 6/8] x86/virt/tdx: Print TDX module basic information
2024-08-27 7:14 ` [PATCH v3 6/8] x86/virt/tdx: Print TDX module basic information Kai Huang
@ 2024-09-06 22:46 ` Dan Williams
2024-09-09 10:18 ` Huang, Kai
0 siblings, 1 reply; 42+ messages in thread
From: Dan Williams @ 2024-09-06 22:46 UTC (permalink / raw)
To: Kai Huang, dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo,
hpa, dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu, adrian.hunter, kai.huang
Kai Huang wrote:
> Currently the kernel doesn't print any information regarding the TDX
> module itself, e.g. module version. In practice such information is
> useful, especially to the developers.
>
> For instance, there are a couple of use cases for dumping module basic
> information:
>
> 1) When something goes wrong around using TDX, the information like TDX
> module version, supported features etc could be helpful [1][2].
>
> 2) For Linux, when the user wants to update the TDX module, one needs to
> replace the old module in a specific location in the EFI partition
> with the new one so that after reboot the BIOS can load it. However,
> after kernel boots, currently the user has no way to verify it is
> indeed the new module that gets loaded and initialized (e.g., error
> could happen when replacing the old module). With the module version
> dumped the user can verify this easily.
For this specific use case the kernel log is less useful then finding
a place to put this in sysfs. This gets back to a proposal to have TDX
instantiate a "tdx_tsm" device which among other things could host this
version data.
The kernel log message is ok, but parsing the kernel log is not
sufficient for this update validation flow concern.
[..]
> +static void print_basic_sys_info(struct tdx_sys_info *sysinfo)
> +{
> + struct tdx_sys_info_features *features = &sysinfo->features;
> + struct tdx_sys_info_version *version = &sysinfo->version;
> +
> + /*
> + * TDX module version encoding:
> + *
> + * <major>.<minor>.<update>.<internal>.<build_num>
> + *
> + * When printed as text, <major> and <minor> are 1-digit,
> + * <update> and <internal> are 2-digits and <build_num>
> + * is 4-digits.
> + */
> + pr_info("Initializing TDX module: %u.%u.%02u.%02u.%04u (build_date %u), TDX_FEATURES0 0x%llx\n",
> + version->major, version->minor, version->update,
> + version->internal, version->build_num,
> + version->build_date, features->tdx_features0);
I do not see the value in dumping a raw features value in the log.
Either parse it or omit it. I would leave it for the tdx_tsm device to
emit.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 7/8] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes
2024-08-27 7:14 ` [PATCH v3 7/8] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes Kai Huang
2024-08-30 10:50 ` Adrian Hunter
2024-08-30 12:27 ` Nikolay Borisov
@ 2024-09-06 23:31 ` Dan Williams
2024-09-09 10:30 ` Huang, Kai
2 siblings, 1 reply; 42+ messages in thread
From: Dan Williams @ 2024-09-06 23:31 UTC (permalink / raw)
To: Kai Huang, dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo,
hpa, dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu, adrian.hunter, kai.huang
Kai Huang wrote:
> A TDX module initialization failure was reported on a Emerald Rapids
> platform:
>
> virt/tdx: initialization failed: TDMR [0x0, 0x80000000): reserved areas exhausted.
> virt/tdx: module initialization failed (-28)
>
[..]
I feel what I trimmed above is a lot of text just to say:
"build_tdx_memlist() tries to fulfill the TDX module requirement it be
told about holes in the TDMR space <insert pointer / reference to
requirement>. It turns out that the kernel's view of memory holes is too
fine grained and sometimes exceeds the number of holes (16) that the TDX
module can track per TDMR <insert problematic memory map>. Thankfully
the module also lists memory that is potentially convertible in a list
of CMRs. That coarser grained CMR list tends to track usable memory in
the memory map even if it might be reserved for host usage like 'ACPI
data'. Use that list to relax what the kernel considers unusable
memory. If it falls in a CMR no need to instantiate a hole, and rely on
the fact that kernel will keep what it considers 'reserved' out of the
page allocator."
...but don't spin the patch just to make the changelog more concise. It
took me reading a few times to pull out those salient details, that is,
if I understood it correctly?
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 0fb673dd43ed..fa335ab1ae92 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -331,6 +331,72 @@ static int get_tdx_sys_info_version(struct tdx_sys_info_version *sysinfo_version
> return ret;
> }
>
> +/* Update the @sysinfo_cmr->num_cmrs to trim tail empty CMRs */
> +static void trim_empty_tail_cmrs(struct tdx_sys_info_cmr *sysinfo_cmr)
> +{
> + int i;
> +
> + for (i = 0; i < sysinfo_cmr->num_cmrs; i++) {
> + u64 cmr_base = sysinfo_cmr->cmr_base[i];
> + u64 cmr_size = sysinfo_cmr->cmr_size[i];
> +
> + if (!cmr_size) {
> + WARN_ON_ONCE(cmr_base);
> + break;
> + }
> +
> + /* TDX architecture: CMR must be 4KB aligned */
> + WARN_ON_ONCE(!PAGE_ALIGNED(cmr_base) ||
> + !PAGE_ALIGNED(cmr_size));
Is it really required to let the TDX module taint and possibly crash the
kernel if it provides misaligned CMRs? Shouldn't these be validated
early and just turn off TDX support if the TDX module is this broken?
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 8/8] x86/virt/tdx: Don't initialize module that doesn't support NO_RBP_MOD feature
2024-08-27 7:14 ` [PATCH v3 8/8] x86/virt/tdx: Don't initialize module that doesn't support NO_RBP_MOD feature Kai Huang
2024-08-30 11:01 ` Adrian Hunter
@ 2024-09-06 23:36 ` Dan Williams
2024-09-09 10:21 ` Huang, Kai
1 sibling, 1 reply; 42+ messages in thread
From: Dan Williams @ 2024-09-06 23:36 UTC (permalink / raw)
To: Kai Huang, dave.hansen, kirill.shutemov, tglx, bp, peterz, mingo,
hpa, dan.j.williams, seanjc, pbonzini
Cc: x86, linux-kernel, kvm, rick.p.edgecombe, isaku.yamahata,
chao.gao, binbin.wu, adrian.hunter, kai.huang
How about:
Subject: x86/virt/tdx: Require the module to assert it has the NO_RBP_MOD mitigation
...to avoid the double negative.
Kai Huang wrote:
> Old TDX modules can clobber RBP in the TDH.VP.ENTER SEAMCALL. However
> RBP is used as frame pointer in the x86_64 calling convention, and
> clobbering RBP could result in bad things like being unable to unwind
> the stack if any non-maskable exceptions (NMI, #MC etc) happens in that
> gap.
>
> A new "NO_RBP_MOD" feature was introduced to more recent TDX modules to
> not clobber RBP. This feature is reported in the TDX_FEATURES0 global
> metadata field via bit 18.
>
> Don't initialize the TDX module if this feature is not supported [1].
>
> Link: https://lore.kernel.org/all/c0067319-2653-4cbd-8fee-1ccf21b1e646@suse.com/T/#mef98469c51e2382ead2c537ea189752360bd2bef [1]
Trim this to the direct message-id format, but otherwise:
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 2/8] x86/virt/tdx: Remove 'struct field_mapping' and implement TD_SYSINFO_MAP() macro
2024-09-06 20:21 ` Dan Williams
@ 2024-09-09 9:59 ` Huang, Kai
0 siblings, 0 replies; 42+ messages in thread
From: Huang, Kai @ 2024-09-09 9:59 UTC (permalink / raw)
To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
Williams, Dan J, kirill.shutemov@linux.intel.com,
pbonzini@redhat.com, tglx@linutronix.de
Cc: Gao, Chao, Edgecombe, Rick P, x86@kernel.org,
binbin.wu@linux.intel.com, linux-kernel@vger.kernel.org,
Hunter, Adrian, kvm@vger.kernel.org, Yamahata, Isaku
On Fri, 2024-09-06 at 13:21 -0700, Dan Williams wrote:
> I think the subject buries the lead on what this patch does which is
> more like:
>
> x86/virt/tdx: Rework TD_SYSINFO_MAP to support build-time verification
Yes this looks better. Thanks.
>
> Kai Huang wrote:
> > TL;DR: Remove the 'struct field_mapping' structure and use another way
>
> I would drop the TL;DR: and just make the changelog more concise,
> because as it stands now it requires the reader to fully appreciate the
> direction of the v1 approach which this new proposal abandons:
>
> Something like:
>
> Dan noticed [1] that read_sys_metadata_field16() has a runtime warning
> to validate that the metadata field size matches the passed in buffer
> size. In turns out that all the information to perform that validation
> is available at build time. Rework TD_SYSINFO_MAP() to stop providing
> runtime data to read_sys_metadata_field16() and instead just pass typed
> fields to read_sys_metadata_field16() and let the compiler catch any
> mismatches.
>
> The new TD_SYSINFO_MAP() has a couple quirks for readability. It
> requires the function that uses it to define a local variable @ret to
> carry the error code and set the initial value to 0. It also hard-codes
> the variable name of the structure pointer used in the function, but it
> is less code, build-time verfiable, and the same readability as the
> former 'struct field_mapping' approach.
>
> Link: http://lore.kernel.org/66b16121c48f4_4fc729424@dwillia2-xfh.jf.intel.com.notmuch [1]
Thanks. Will do.
Btw, Adrian suggested to rename TD_SYSINFO_MAP() to READ_SYS_INFO(), which is
reasonable to me, so I will also add "rename TD_SYSINFO_MAP() to
READ_SYS_INFO()" to your above text. Please let know if you have any comments.
>
> [..]
> > Link: https://lore.kernel.org/kvm/a107b067-861d-43f4-86b5-29271cb93dad@intel.com/T/#m7cfb3c146214d94b24e978eeb8708d92c0b14ac6 [1]
>
> The expectation for lore links is to capture the message-id. Note the
> differences with the "Link:" format above.
Oh I did not know this. What's the difference between message-id and a normal
link that I got by "google <patch name> + open that lore link"?
>
> > v2 -> v3:
> > - Remove 'struct field_mapping' and reimplement TD_SYSINFO_MAP().
> >
> > ---
> > arch/x86/virt/vmx/tdx/tdx.c | 57 ++++++++++++++-----------------------
> > 1 file changed, 21 insertions(+), 36 deletions(-)
> >
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > index e979bf442929..7e75c1b10838 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > @@ -270,60 +270,45 @@ static int read_sys_metadata_field(u64 field_id, u64 *data)
> > return 0;
> > }
> >
> > -static int read_sys_metadata_field16(u64 field_id,
> > - int offset,
> > - struct tdx_sys_info_tdmr *ts)
> > +static int read_sys_metadata_field16(u64 field_id, u16 *val)
> > {
> > - u16 *ts_member = ((void *)ts) + offset;
> > u64 tmp;
> > int ret;
> >
> > - if (WARN_ON_ONCE(MD_FIELD_ID_ELE_SIZE_CODE(field_id) !=
> > - MD_FIELD_ID_ELE_SIZE_16BIT))
> > - return -EINVAL;
> > + BUILD_BUG_ON(MD_FIELD_ID_ELE_SIZE_CODE(field_id) !=
> > + MD_FIELD_ID_ELE_SIZE_16BIT);
>
> Perhaps just move this to TD_SYSINFO_MAP() directly?
Sure will do. It gets moved to TD_SYSINFO_MAP() in the next patch anyway.
>
> Something like:
>
> #define TD_SYSINFO_MAP(_field_id, _member, _size) \
> ({ \
> BUILD_BUG_ON(MD_FIELD_ID_ELE_SIZE_CODE(field_id) != \
> MD_FIELD_ID_ELE_SIZE_##_size##BIT); \
> if (!ret) \
> ret = read_sys_metadata_field##_size(MD_FIELD_ID_##_field_id, \
> &sysinfo_tdmr->_member); \
> })
Will do.
Btw this patch doesn't extend to support other sizes but only 16-bits, so I'll
defer the support of "_size" to the next patch.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 2/8] x86/virt/tdx: Remove 'struct field_mapping' and implement TD_SYSINFO_MAP() macro
2024-09-06 21:30 ` Dan Williams
@ 2024-09-09 9:59 ` Huang, Kai
0 siblings, 0 replies; 42+ messages in thread
From: Huang, Kai @ 2024-09-09 9:59 UTC (permalink / raw)
To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
Williams, Dan J, kirill.shutemov@linux.intel.com,
pbonzini@redhat.com, Hunter, Adrian, tglx@linutronix.de
Cc: Gao, Chao, kvm@vger.kernel.org, binbin.wu@linux.intel.com,
linux-kernel@vger.kernel.org, Edgecombe, Rick P, x86@kernel.org,
Yamahata, Isaku
On Fri, 2024-09-06 at 14:30 -0700, Dan Williams wrote:
> Adrian Hunter wrote:
> [..]
> > Another possibility is to put the macro at the invocation site:
> >
> > #define READ_SYS_INFO(_field_id, _member) \
> > ret = ret ?: read_sys_metadata_field16(MD_FIELD_ID_##_field_id, \
> > &sysinfo_tdmr->_member)
> >
> > READ_SYS_INFO(MAX_TDMRS, max_tdmrs);
> > READ_SYS_INFO(MAX_RESERVED_PER_TDMR, max_reserved_per_tdmr);
> > READ_SYS_INFO(PAMT_4K_ENTRY_SIZE, pamt_entry_size[TDX_PS_4K]);
> > READ_SYS_INFO(PAMT_2M_ENTRY_SIZE, pamt_entry_size[TDX_PS_2M]);
> > READ_SYS_INFO(PAMT_1G_ENTRY_SIZE, pamt_entry_size[TDX_PS_1G]);
> >
> > #undef READ_SYS_INFO
> >
> > And so on in later patches:
> >
> > #define READ_SYS_INFO(_field_id, _member) \
> > ret = ret ?: read_sys_metadata_field(MD_FIELD_ID_##_field_id, \
> > &sysinfo_version->_member)
> >
> > READ_SYS_INFO(MAJOR_VERSION, major);
> > READ_SYS_INFO(MINOR_VERSION, minor);
> > READ_SYS_INFO(UPDATE_VERSION, update);
> > READ_SYS_INFO(INTERNAL_VERSION, internal);
> > READ_SYS_INFO(BUILD_NUM, build_num);
> > READ_SYS_INFO(BUILD_DATE, build_date);
> >
> > #undef READ_SYS_INFO
>
> Looks like a reasonable enhancement to me.
Yeah will do. Thanks.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 6/8] x86/virt/tdx: Print TDX module basic information
2024-09-06 22:46 ` Dan Williams
@ 2024-09-09 10:18 ` Huang, Kai
0 siblings, 0 replies; 42+ messages in thread
From: Huang, Kai @ 2024-09-09 10:18 UTC (permalink / raw)
To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
Williams, Dan J, kirill.shutemov@linux.intel.com,
pbonzini@redhat.com, tglx@linutronix.de
Cc: Gao, Chao, Edgecombe, Rick P, x86@kernel.org,
binbin.wu@linux.intel.com, linux-kernel@vger.kernel.org,
Hunter, Adrian, kvm@vger.kernel.org, Yamahata, Isaku
On Fri, 2024-09-06 at 15:46 -0700, Dan Williams wrote:
> Kai Huang wrote:
> > Currently the kernel doesn't print any information regarding the TDX
> > module itself, e.g. module version. In practice such information is
> > useful, especially to the developers.
> >
> > For instance, there are a couple of use cases for dumping module basic
> > information:
> >
> > 1) When something goes wrong around using TDX, the information like TDX
> > module version, supported features etc could be helpful [1][2].
> >
> > 2) For Linux, when the user wants to update the TDX module, one needs to
> > replace the old module in a specific location in the EFI partition
> > with the new one so that after reboot the BIOS can load it. However,
> > after kernel boots, currently the user has no way to verify it is
> > indeed the new module that gets loaded and initialized (e.g., error
> > could happen when replacing the old module). With the module version
> > dumped the user can verify this easily.
>
> For this specific use case the kernel log is less useful then finding
> a place to put this in sysfs. This gets back to a proposal to have TDX
> instantiate a "tdx_tsm" device which among other things could host this
> version data.
>
> The kernel log message is ok, but parsing the kernel log is not
> sufficient for this update validation flow concern.
Agree we eventually need /sysfs files for TDX module info like these. For case
2) the developer who requested this use case to me just wanted to see the module
version dump in the kernel message so that he can verify, so I think this is at
least useful for developers.
Perhaps I can just trim the 2) to below?
User may want to quickly know TDX module version to see whether the
loaded module is the expected one.
>
> [..]
> > +static void print_basic_sys_info(struct tdx_sys_info *sysinfo)
> > +{
> > + struct tdx_sys_info_features *features = &sysinfo->features;
> > + struct tdx_sys_info_version *version = &sysinfo->version;
> > +
> > + /*
> > + * TDX module version encoding:
> > + *
> > + * <major>.<minor>.<update>.<internal>.<build_num>
> > + *
> > + * When printed as text, <major> and <minor> are 1-digit,
> > + * <update> and <internal> are 2-digits and <build_num>
> > + * is 4-digits.
> > + */
> > + pr_info("Initializing TDX module: %u.%u.%02u.%02u.%04u (build_date %u), TDX_FEATURES0 0x%llx\n",
> > + version->major, version->minor, version->update,
> > + version->internal, version->build_num,
> > + version->build_date, features->tdx_features0);
>
> I do not see the value in dumping a raw features value in the log.
> Either parse it or omit it. I would leave it for the tdx_tsm device to
> emit.
The raw features value is mainly for developers. They can decode the value
based on the spec. I think it will still be useful.
But sure I can remove it if you want. With this I'll also defer reading the
tdx_features0 to the patch ("x86/virt/tdx: Don't initialize module that doesn't
support NO_RBP_MOD feature"), because if we don't print it, then we don't need
to read it in this patch.
I think I can move that patch before this patch, so tdx_features0 is always
introduced in that patch. Then whether we decide to print tdx_features0 in this
patch doesn't impact that patch.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 8/8] x86/virt/tdx: Don't initialize module that doesn't support NO_RBP_MOD feature
2024-09-06 23:36 ` Dan Williams
@ 2024-09-09 10:21 ` Huang, Kai
0 siblings, 0 replies; 42+ messages in thread
From: Huang, Kai @ 2024-09-09 10:21 UTC (permalink / raw)
To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
Williams, Dan J, kirill.shutemov@linux.intel.com,
pbonzini@redhat.com, tglx@linutronix.de
Cc: Gao, Chao, Edgecombe, Rick P, x86@kernel.org,
binbin.wu@linux.intel.com, linux-kernel@vger.kernel.org,
Hunter, Adrian, kvm@vger.kernel.org, Yamahata, Isaku
On Fri, 2024-09-06 at 16:36 -0700, Dan Williams wrote:
> How about:
>
> Subject: x86/virt/tdx: Require the module to assert it has the NO_RBP_MOD mitigation
>
> ...to avoid the double negative.
Will do. Thanks.
>
> Kai Huang wrote:
> > Old TDX modules can clobber RBP in the TDH.VP.ENTER SEAMCALL. However
> > RBP is used as frame pointer in the x86_64 calling convention, and
> > clobbering RBP could result in bad things like being unable to unwind
> > the stack if any non-maskable exceptions (NMI, #MC etc) happens in that
> > gap.
> >
> > A new "NO_RBP_MOD" feature was introduced to more recent TDX modules to
> > not clobber RBP. This feature is reported in the TDX_FEATURES0 global
> > metadata field via bit 18.
> >
> > Don't initialize the TDX module if this feature is not supported [1].
> >
> > Link: https://lore.kernel.org/all/c0067319-2653-4cbd-8fee-1ccf21b1e646@suse.com/T/#mef98469c51e2382ead2c537ea189752360bd2bef [1]
>
> Trim this to the direct message-id format, but otherwise:
Will do. If I got it right, the link with message-id should be:
https://lore.kernel.org/all/fc0e8ab7-86d4-4428-be31-82e1ece6dd21@intel.com/
>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Thanks.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 7/8] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes
2024-09-06 23:31 ` Dan Williams
@ 2024-09-09 10:30 ` Huang, Kai
0 siblings, 0 replies; 42+ messages in thread
From: Huang, Kai @ 2024-09-09 10:30 UTC (permalink / raw)
To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
Williams, Dan J, kirill.shutemov@linux.intel.com,
pbonzini@redhat.com, tglx@linutronix.de
Cc: Gao, Chao, Edgecombe, Rick P, x86@kernel.org,
binbin.wu@linux.intel.com, linux-kernel@vger.kernel.org,
Hunter, Adrian, kvm@vger.kernel.org, Yamahata, Isaku
On Fri, 2024-09-06 at 16:31 -0700, Dan Williams wrote:
> Kai Huang wrote:
> > A TDX module initialization failure was reported on a Emerald Rapids
> > platform:
> >
> > virt/tdx: initialization failed: TDMR [0x0, 0x80000000): reserved areas exhausted.
> > virt/tdx: module initialization failed (-28)
> >
> [..]
>
> I feel what I trimmed above is a lot of text just to say:
>
> "build_tdx_memlist() tries to fulfill the TDX module requirement it be
> told about holes in the TDMR space <insert pointer / reference to
> requirement>. It turns out that the kernel's view of memory holes is too
> fine grained and sometimes exceeds the number of holes (16) that the TDX
> module can track per TDMR <insert problematic memory map>. Thankfully
> the module also lists memory that is potentially convertible in a list
> of CMRs. That coarser grained CMR list tends to track usable memory in
> the memory map even if it might be reserved for host usage like 'ACPI
> data'. Use that list to relax what the kernel considers unusable
> memory. If it falls in a CMR no need to instantiate a hole, and rely on
> the fact that kernel will keep what it considers 'reserved' out of the
> page allocator."
>
> ...but don't spin the patch just to make the changelog more concise. It
> took me reading a few times to pull out those salient details, that is,
> if I understood it correctly?
Thanks. Let me try to trim down the changelog since I need to send out another
version anyway.
>
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > index 0fb673dd43ed..fa335ab1ae92 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > @@ -331,6 +331,72 @@ static int get_tdx_sys_info_version(struct tdx_sys_info_version *sysinfo_version
> > return ret;
> > }
> >
> > +/* Update the @sysinfo_cmr->num_cmrs to trim tail empty CMRs */
> > +static void trim_empty_tail_cmrs(struct tdx_sys_info_cmr *sysinfo_cmr)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < sysinfo_cmr->num_cmrs; i++) {
> > + u64 cmr_base = sysinfo_cmr->cmr_base[i];
> > + u64 cmr_size = sysinfo_cmr->cmr_size[i];
> > +
> > + if (!cmr_size) {
> > + WARN_ON_ONCE(cmr_base);
> > + break;
> > + }
> > +
> > + /* TDX architecture: CMR must be 4KB aligned */
> > + WARN_ON_ONCE(!PAGE_ALIGNED(cmr_base) ||
> > + !PAGE_ALIGNED(cmr_size));
>
> Is it really required to let the TDX module taint and possibly crash the
> kernel if it provides misaligned CMRs? Shouldn't these be validated
> early and just turn off TDX support if the TDX module is this broken?
We can make this function return error code for those error cases, and fail to
initialize TDX module.
But CMRs are actually verified by the MCHECK. If any WARN() above happens, TDX
won't get enabled in hardware. I think maybe we can just remove the WARN()s?
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 3/8] x86/virt/tdx: Prepare to support reading other global metadata fields
2024-09-06 21:34 ` Dan Williams
@ 2024-09-09 12:28 ` Huang, Kai
0 siblings, 0 replies; 42+ messages in thread
From: Huang, Kai @ 2024-09-09 12:28 UTC (permalink / raw)
To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
Williams, Dan J, kirill.shutemov@linux.intel.com,
pbonzini@redhat.com, tglx@linutronix.de
Cc: Gao, Chao, Edgecombe, Rick P, x86@kernel.org,
binbin.wu@linux.intel.com, linux-kernel@vger.kernel.org,
Hunter, Adrian, kvm@vger.kernel.org, Yamahata, Isaku
On Fri, 2024-09-06 at 14:34 -0700, Dan Williams wrote:
> Kai Huang wrote:
> > Future changes will need to read more metadata fields with different
> > element sizes. To make the code short, extend the helper to take a
> > 'void *' buffer and the buffer size so it can work with all element
> > sizes.
>
> The whole point was to have compile time type safety for global struct
> member and the read routine. So, no, using 'void *' is a step backwards.
>
> Take some inspiration from build_mmio_read() and just have a macro that
> takes the size as an argument to the macro, not the runtime routine.
>
AFAICT build_mmio_read() macro defines the body of the assembly to read
requested size from a port. But instead of a single macro which can be used
universally for all port reading, the kernel uses build_mmio_read() to define
multiple read functions:
build_mmio_read(readb, "b", unsigned char, "=q", :"memory")
build_mmio_read(readw, "w", unsigned short, "=r", :"memory")
build_mmio_read(readl, "l", unsigned int, "=r", :"memory")
...
So just to understand correctly, do you mean something like below?
#define build_sysmd_read(_bits) \
static __maybe_unused int \
__read_sys_metadata_field##_bits(u64 field_id, u##_bits *val) \
{ \
u64 tmp; \
int ret; \
\
ret = tdh_sys_rd(field_id, &tmp); \
if (ret) \
return ret; \
*val = tmp; \
return ret; \
}
build_sysmd_read(8)
build_sysmd_read(16)
build_sysmd_read(32)
build_sysmd_read(64)
> The
> macro statically selects the right routine to call and does not need to
> grow a size parameter itself.
Sorry for my ignorance. Do you mean using switch-case statement to select right
routine?
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v3 1/8] x86/virt/tdx: Rename 'struct tdx_tdmr_sysinfo' to reflect the spec better
2024-08-27 22:20 ` Huang, Kai
@ 2024-09-24 11:39 ` Huang, Kai
0 siblings, 0 replies; 42+ messages in thread
From: Huang, Kai @ 2024-09-24 11:39 UTC (permalink / raw)
To: Hansen, Dave, seanjc@google.com, bp@alien8.de,
peterz@infradead.org, hpa@zytor.com, mingo@redhat.com,
kirill.shutemov@linux.intel.com, tglx@linutronix.de,
pbonzini@redhat.com, Hunter, Adrian, Williams, Dan J
Cc: Gao, Chao, kvm@vger.kernel.org, binbin.wu@linux.intel.com,
linux-kernel@vger.kernel.org, Edgecombe, Rick P, x86@kernel.org,
Yamahata, Isaku
On Wed, 2024-08-28 at 10:20 +1200, Huang, Kai wrote:
>
> On 28/08/2024 1:10 am, Adrian Hunter wrote:
> > On 27/08/24 10:14, Kai Huang wrote:
> >
> > "to reflect the spec better" is a bit vague. How about:
> >
> > x86/virt/tdx: Rename 'struct tdx_tdmr_sysinfo' to 'struct tdx_sys_info_tdmr'
> >
> > Rename 'struct tdx_tdmr_sysinfo' to 'struct tdx_sys_info_tdmr' to
> > prepare for adding similar structures that will all be prefixed by
> > 'tdx_sys_info_'.
I decided not to do the patch title change and the above paragraph in v4, since
I believe they are nit and Dan already reviewed. Yeah I agree 'reflect spec
better' is a bit vague, but it kinda reflect the purpose. However IMHO albeit
"rename A to B" reflects the final code, it doesn't convey the purpose. So I
think the old title should also be fine.
Also the suggested paragraph is kinda duplicated with the last paragraph in the
current changelog so I didn't add it either.
^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2024-09-24 11:40 UTC | newest]
Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-27 7:14 [PATCH v3 0/8] TDX host: metadata reading tweaks, bug fix and info dump Kai Huang
2024-08-27 7:14 ` [PATCH v3 1/8] x86/virt/tdx: Rename 'struct tdx_tdmr_sysinfo' to reflect the spec better Kai Huang
2024-08-27 13:10 ` Adrian Hunter
2024-08-27 22:20 ` Huang, Kai
2024-09-24 11:39 ` Huang, Kai
2024-08-30 22:05 ` Dan Williams
2024-08-27 7:14 ` [PATCH v3 2/8] x86/virt/tdx: Remove 'struct field_mapping' and implement TD_SYSINFO_MAP() macro Kai Huang
2024-08-29 7:20 ` Adrian Hunter
2024-08-30 10:52 ` Huang, Kai
2024-09-06 21:30 ` Dan Williams
2024-09-09 9:59 ` Huang, Kai
2024-09-06 20:21 ` Dan Williams
2024-09-09 9:59 ` Huang, Kai
2024-08-27 7:14 ` [PATCH v3 3/8] x86/virt/tdx: Prepare to support reading other global metadata fields Kai Huang
2024-08-30 6:43 ` Adrian Hunter
2024-08-30 11:02 ` Huang, Kai
2024-08-30 9:05 ` Nikolay Borisov
2024-08-30 11:09 ` Huang, Kai
2024-09-06 21:34 ` Dan Williams
2024-09-09 12:28 ` Huang, Kai
2024-08-27 7:14 ` [PATCH v3 4/8] x86/virt/tdx: Refine a comment to reflect the latest TDX spec Kai Huang
2024-08-30 6:45 ` Adrian Hunter
2024-08-30 9:14 ` Nikolay Borisov
2024-08-27 7:14 ` [PATCH v3 5/8] x86/virt/tdx: Start to track all global metadata in one structure Kai Huang
2024-08-30 7:02 ` Adrian Hunter
2024-08-30 11:10 ` Huang, Kai
2024-08-30 11:01 ` Nikolay Borisov
2024-08-30 11:57 ` Huang, Kai
2024-08-27 7:14 ` [PATCH v3 6/8] x86/virt/tdx: Print TDX module basic information Kai Huang
2024-09-06 22:46 ` Dan Williams
2024-09-09 10:18 ` Huang, Kai
2024-08-27 7:14 ` [PATCH v3 7/8] x86/virt/tdx: Reduce TDMR's reserved areas by using CMRs to find memory holes Kai Huang
2024-08-30 10:50 ` Adrian Hunter
2024-08-30 11:52 ` Huang, Kai
2024-08-30 12:02 ` Adrian Hunter
2024-08-30 12:27 ` Nikolay Borisov
2024-09-06 23:31 ` Dan Williams
2024-09-09 10:30 ` Huang, Kai
2024-08-27 7:14 ` [PATCH v3 8/8] x86/virt/tdx: Don't initialize module that doesn't support NO_RBP_MOD feature Kai Huang
2024-08-30 11:01 ` Adrian Hunter
2024-09-06 23:36 ` Dan Williams
2024-09-09 10:21 ` Huang, Kai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox