* [PATCH V2 0/2] Fixes for recovery for machine check in TDX/SEAM non-root mode
@ 2025-08-12 12:28 Adrian Hunter
2025-08-12 12:28 ` [PATCH V2 1/2] x86/mce: Fix missing address mask in recovery for errors " Adrian Hunter
2025-08-12 12:28 ` [PATCH V2 2/2] x86/mce: Remove MCI_ADDR_PHYSADDR Adrian Hunter
0 siblings, 2 replies; 3+ messages in thread
From: Adrian Hunter @ 2025-08-12 12:28 UTC (permalink / raw)
To: Tony Luck, pbonzini, seanjc
Cc: vannapurve, Borislav Petkov, Thomas Gleixner, Ingo Molnar,
Dave Hansen, x86, H Peter Anvin, linux-edac, linux-kernel, kvm,
rick.p.edgecombe, kai.huang, reinette.chatre, xiaoyao.li,
tony.lindgren, binbin.wu, ira.weiny, isaku.yamahata, Fan Du,
Yazen Ghannam, yan.y.zhao, chao.gao
Hi
Here is V2 of a small fix related to recovery for machine check in TDX/SEAM
non-root mode, and a small tidy-up.
Changes in V2:
x86/mce: Fix missing address mask in recovery for errors in TDX/SEAM
non-root mode
Mask address when it is read
Amend struct mce addr description
KVM: TDX: Do not clear poisoned pages
Patch dropped
x86/mce: Remove MCI_ADDR_PHYSADDR
New patch
The issue was noticed as part of work to determine the conditions under
which TDX private memory needs to be cleared after being reclaimed.
For guests with a large amount of memory, clearing all private pages during
VM shutdown can take minutes, so we are looking at when that can be
skipped. A future patch will deal with that.
One thing that was investigated was the effect of deliberately corrupting a
TDX guest private page by writing to it on the host, and then reading it
on the guest, which results in a machine check as expected, but revealed
the issue addressed in patch 1.
There are 2 outstanding issues:
1. It is assumed that once the TDX VM is shutdown that the memory is
returned to the allocator. That is true at present, but may not be in the
future. Consider, for example, patch set "New KVM ioctl to link a gmem
inode to a new gmem file" :
https://lore.kernel.org/r/cover.1747368092.git.afranji@google.com/
2. Currently, KVM TDX does not cater for the TDX VM to enter a FATAL error
state, where the only operation permitted is to tear down the VM. KVM just
carries on, hitting various errors, but in particular, memory reclaim fails
because it is not following the teardown procedure, and all guest private
memory is leaked.
Adrian Hunter (2):
x86/mce: Fix missing address mask in recovery for errors in TDX/SEAM non-root mode
x86/mce: Remove MCI_ADDR_PHYSADDR
arch/x86/include/asm/mce.h | 3 ---
arch/x86/include/uapi/asm/mce.h | 2 +-
arch/x86/kernel/cpu/mce/core.c | 9 ++++++---
drivers/cxl/core/mce.c | 2 +-
drivers/edac/skx_common.c | 2 +-
5 files changed, 9 insertions(+), 9 deletions(-)
Regards
Adrian
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH V2 1/2] x86/mce: Fix missing address mask in recovery for errors in TDX/SEAM non-root mode
2025-08-12 12:28 [PATCH V2 0/2] Fixes for recovery for machine check in TDX/SEAM non-root mode Adrian Hunter
@ 2025-08-12 12:28 ` Adrian Hunter
2025-08-12 12:28 ` [PATCH V2 2/2] x86/mce: Remove MCI_ADDR_PHYSADDR Adrian Hunter
1 sibling, 0 replies; 3+ messages in thread
From: Adrian Hunter @ 2025-08-12 12:28 UTC (permalink / raw)
To: Tony Luck, pbonzini, seanjc
Cc: vannapurve, Borislav Petkov, Thomas Gleixner, Ingo Molnar,
Dave Hansen, x86, H Peter Anvin, linux-edac, linux-kernel, kvm,
rick.p.edgecombe, kai.huang, reinette.chatre, xiaoyao.li,
tony.lindgren, binbin.wu, ira.weiny, isaku.yamahata, Fan Du,
Yazen Ghannam, yan.y.zhao, chao.gao
Commit 8a01ec97dc066 ("x86/mce: Mask out non-address bits from machine
check bank") introduced a new #define MCI_ADDR_PHYSADDR for the mask of
valid physical address bits within the machine check bank address register.
This is particularly needed in the case of errors in TDX/SEAM non-root mode
because the reported address contains the TDX KeyID. Refer to TDX and
TME-MK documentation for more information about KeyIDs.
Commit 7911f145de5fe ("x86/mce: Implement recovery for errors in TDX/SEAM
non-root mode") uses the address to mark the affected page as poisoned, but
omits to use the aforementioned mask.
Investigation of user space expectations has concluded it would be more
correct for the address to contain only address bits in the first place.
Refer https://lore.kernel.org/r/807ff02d-7af0-419d-8d14-a4d6c5d5420d@intel.com
Mask the address when it is read from the machine check bank address
register. Do not use MCI_ADDR_PHYSADDR because that will be removed in a
later patch.
It is assumed __log_error() in arch/x86/kernel/cpu/mce/amd.c does not need
similar treatment.
Amend struct mce addr member description slightly to reflect that it is
not, and never has been, an exact copy of the bank's MCi_ADDR MSR.
Fixes: 8a01ec97dc066 ("x86/mce: Mask out non-address bits from machine check bank")
Fixes: 7911f145de5fe ("x86/mce: Implement recovery for errors in TDX/SEAM non-root mode")
Link: https://lore.kernel.org/r/807ff02d-7af0-419d-8d14-a4d6c5d5420d@intel.com
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
Changes in V2:
Mask address when it is read
Amend struct mce addr description
arch/x86/include/uapi/asm/mce.h | 2 +-
arch/x86/kernel/cpu/mce/core.c | 3 +++
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/uapi/asm/mce.h b/arch/x86/include/uapi/asm/mce.h
index cb6b48a7c22b..abf6ee43f5f8 100644
--- a/arch/x86/include/uapi/asm/mce.h
+++ b/arch/x86/include/uapi/asm/mce.h
@@ -14,7 +14,7 @@
struct mce {
__u64 status; /* Bank's MCi_STATUS MSR */
__u64 misc; /* Bank's MCi_MISC MSR */
- __u64 addr; /* Bank's MCi_ADDR MSR */
+ __u64 addr; /* Address from bank's MCi_ADDR MSR */
__u64 mcgstatus; /* Machine Check Global Status MSR */
__u64 ip; /* Instruction Pointer when the error happened */
__u64 tsc; /* CPU time stamp counter */
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 4da4eab56c81..deb47463a75d 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -699,6 +699,9 @@ static noinstr void mce_read_aux(struct mce_hw_err *err, int i)
}
smca_extract_err_addr(m);
+
+ /* Mask out non-address bits, such as TDX KeyID */
+ m->addr &= GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0);
}
if (mce_flags.smca) {
--
2.48.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH V2 2/2] x86/mce: Remove MCI_ADDR_PHYSADDR
2025-08-12 12:28 [PATCH V2 0/2] Fixes for recovery for machine check in TDX/SEAM non-root mode Adrian Hunter
2025-08-12 12:28 ` [PATCH V2 1/2] x86/mce: Fix missing address mask in recovery for errors " Adrian Hunter
@ 2025-08-12 12:28 ` Adrian Hunter
1 sibling, 0 replies; 3+ messages in thread
From: Adrian Hunter @ 2025-08-12 12:28 UTC (permalink / raw)
To: Tony Luck, pbonzini, seanjc
Cc: vannapurve, Borislav Petkov, Thomas Gleixner, Ingo Molnar,
Dave Hansen, x86, H Peter Anvin, linux-edac, linux-kernel, kvm,
rick.p.edgecombe, kai.huang, reinette.chatre, xiaoyao.li,
tony.lindgren, binbin.wu, ira.weiny, isaku.yamahata, Fan Du,
Yazen Ghannam, yan.y.zhao, chao.gao
Now that the address is masked when it is read from the machine check bank
address register (refer patch "x86/mce: Fix missing address mask in
recovery for errors in TDX/SEAM non-root mode"), the MCI_ADDR_PHYSADDR
macro is no longer needed. Remove it.
Note MCE address information also enters the kernel from APEI via the
Common Platform Error Record (CPER) Memory Error Section "Physical Address"
field (struct cper_sec_mem_err physical_addr), refer the UEFI
specification. It is assumed that field contains only the physical
address.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
Changes in V2:
New patch
arch/x86/include/asm/mce.h | 3 ---
arch/x86/kernel/cpu/mce/core.c | 6 +++---
drivers/cxl/core/mce.c | 2 +-
drivers/edac/skx_common.c | 2 +-
4 files changed, 5 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 6c77c03139f7..0cf8017dcae9 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -91,9 +91,6 @@
#define MCI_MISC_ADDR_MEM 3 /* memory address */
#define MCI_MISC_ADDR_GENERIC 7 /* generic */
-/* MCi_ADDR register defines */
-#define MCI_ADDR_PHYSADDR GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
-
/* CTL2 register defines */
#define MCI_CTL2_CMCI_EN BIT_ULL(30)
#define MCI_CTL2_CMCI_THRESHOLD_MASK 0x7fffULL
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index deb47463a75d..80e06d6728a7 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -642,7 +642,7 @@ static int uc_decode_notifier(struct notifier_block *nb, unsigned long val,
mce->severity != MCE_DEFERRED_SEVERITY)
return NOTIFY_DONE;
- pfn = (mce->addr & MCI_ADDR_PHYSADDR) >> PAGE_SHIFT;
+ pfn = mce->addr >> PAGE_SHIFT;
if (!memory_failure(pfn, 0)) {
set_mce_nospec(pfn);
mce->kflags |= MCE_HANDLED_UC;
@@ -1415,7 +1415,7 @@ static void kill_me_maybe(struct callback_head *cb)
if (!p->mce_ripv)
flags |= MF_MUST_KILL;
- pfn = (p->mce_addr & MCI_ADDR_PHYSADDR) >> PAGE_SHIFT;
+ pfn = p->mce_addr >> PAGE_SHIFT;
ret = memory_failure(pfn, flags);
if (!ret) {
set_mce_nospec(pfn);
@@ -1444,7 +1444,7 @@ static void kill_me_never(struct callback_head *cb)
p->mce_count = 0;
pr_err("Kernel accessed poison in user space at %llx\n", p->mce_addr);
- pfn = (p->mce_addr & MCI_ADDR_PHYSADDR) >> PAGE_SHIFT;
+ pfn = p->mce_addr >> PAGE_SHIFT;
if (!memory_failure(pfn, 0))
set_mce_nospec(pfn);
}
diff --git a/drivers/cxl/core/mce.c b/drivers/cxl/core/mce.c
index ff8d078c6ca1..4ba8b7ae3de7 100644
--- a/drivers/cxl/core/mce.c
+++ b/drivers/cxl/core/mce.c
@@ -24,7 +24,7 @@ static int cxl_handle_mce(struct notifier_block *nb, unsigned long val,
if (!endpoint)
return NOTIFY_DONE;
- spa = mce->addr & MCI_ADDR_PHYSADDR;
+ spa = mce->addr;
pfn = spa >> PAGE_SHIFT;
if (!pfn_valid(pfn))
diff --git a/drivers/edac/skx_common.c b/drivers/edac/skx_common.c
index 39c733dbc5b9..2de675958560 100644
--- a/drivers/edac/skx_common.c
+++ b/drivers/edac/skx_common.c
@@ -732,7 +732,7 @@ int skx_mce_check_error(struct notifier_block *nb, unsigned long val,
memset(&res, 0, sizeof(res));
res.mce = mce;
- res.addr = mce->addr & MCI_ADDR_PHYSADDR;
+ res.addr = mce->addr;
if (!pfn_to_online_page(res.addr >> PAGE_SHIFT) && !arch_is_platform_page(res.addr)) {
pr_err("Invalid address 0x%llx in IA32_MC%d_ADDR\n", mce->addr, mce->bank);
return NOTIFY_DONE;
--
2.48.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-08-12 12:29 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-12 12:28 [PATCH V2 0/2] Fixes for recovery for machine check in TDX/SEAM non-root mode Adrian Hunter
2025-08-12 12:28 ` [PATCH V2 1/2] x86/mce: Fix missing address mask in recovery for errors " Adrian Hunter
2025-08-12 12:28 ` [PATCH V2 2/2] x86/mce: Remove MCI_ADDR_PHYSADDR Adrian Hunter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).