* [PATCH v5 0/3] Fix MCE handling on AMD hosts
@ 2024-06-03 19:36 John Allen
2024-06-03 19:36 ` [PATCH v5 1/3] i386: Fix MCE support for " John Allen
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: John Allen @ 2024-06-03 19:36 UTC (permalink / raw)
To: qemu-devel
Cc: yazen.ghannam, michael.roth, babu.moger, pankaj.gupta,
william.roche, joao.m.martins, pbonzini, richard.henderson,
eduardo, John Allen
In the event that a guest process attempts to access memory that has
been poisoned in response to a deferred uncorrected MCE, an AMD system
will currently generate a SIGBUS error which will result in the entire
guest being shutdown. Ideally, we only want to kill the guest process
that accessed poisoned memory in this case.
This support has been included in qemu for Intel hosts for a long time,
but there are a couple of changes needed for AMD hosts. First, we will
need to expose the SUCCOR and overflow recovery cpuid bits to guests.
Second, we need to modify the MCE injection code to avoid Intel specific
behavior when we are running on an AMD host.
Version 5 of the series differs from previous versions in that it
handles AO (deferred) errors rather than ignoring them. This is made
possible by in progress kernel patches that utilize recently accepted
address translation capabilities on AMD platforms to translate
UMC relative normalized addresses received with a deferred error to
system physical addresses that can be used for memory error recovery.
While the bulk of the address translation code is upstream, the code
to use the new translation code in the event of a deferred error is
not, but can be seen here:
https://github.com/AMDESE/linux/commits/wip-mca/
This adds a new wrapper struct for MCEs and uses this wrapper to store
the translated physical address in the following commit:
https://github.com/AMDESE/linux/commit/76732c67cbf96c14f55ed1061804db9ff1505ea3
v2:
- Add "succor" feature word.
- Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.
v3:
- Reorder series. Only enable SUCCOR after bugs have been fixed.
- Introduce new patch ignoring AO errors.
v4:
- Remove redundant check for AO errors.
v5:
- Remove patch to ignore AO errors and introduce proper deferred
error support.
- Introduce new patch to support overflow recovery cpuid bits.
John Allen (3):
i386: Fix MCE support for AMD hosts
i386: Add support for SUCCOR feature
i386: Add support for overflow recovery
target/i386/cpu.c | 18 +++++++++++++++++-
target/i386/cpu.h | 7 +++++++
target/i386/helper.c | 4 ++++
target/i386/kvm/kvm.c | 41 +++++++++++++++++++++++++++++++++--------
4 files changed, 61 insertions(+), 9 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v5 1/3] i386: Fix MCE support for AMD hosts
2024-06-03 19:36 [PATCH v5 0/3] Fix MCE handling on AMD hosts John Allen
@ 2024-06-03 19:36 ` John Allen
2024-06-03 19:36 ` [PATCH v5 2/3] i386: Add support for SUCCOR feature John Allen
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: John Allen @ 2024-06-03 19:36 UTC (permalink / raw)
To: qemu-devel
Cc: yazen.ghannam, michael.roth, babu.moger, pankaj.gupta,
william.roche, joao.m.martins, pbonzini, richard.henderson,
eduardo, John Allen
For the most part, AMD hosts can use the same MCE injection code as Intel, but
there are instances where the qemu implementation is Intel specific. First, MCE
delivery works differently on AMD and does not support broadcast. Second,
kvm_mce_inject generates MCEs that include a number of Intel specific status
bits. Modify kvm_mce_inject to properly generate MCEs on AMD platforms.
Reported-by: William Roche <william.roche@oracle.com>
Signed-off-by: John Allen <john.allen@amd.com>
---
v3:
- Update to latest qemu code that introduces using MCG_STATUS_RIPV in the
case of a BUS_MCEERR_AR on a non-AMD machine.
v5:
- This version implements proper support for AO (deferred) errors. For
deferred errors, there are some differences in the flags that need to be
set in order to be handled correctly. First, the UC flag must not be set
to allow deferred errors to be picked up by the machine_check_poll
function in the guest kernel, mirroring baremetal behavior. If the UC
flag is set, the injected MCE will be sent as an interrupt and will be
handled in do_machine_check which currently is not expected to handle
deferred errors and would need significant modification to do so.
- Both the deferred (AO) and uncorrected (AR) error cases set the POISON
bit. For the uncorrected error case, this is expected and properly
mirrors baremetal behavior. For the deferred error case, this bit would
not be set by hardware, but is being reused as a convenient workaround
to give better behavior in the case that the MCE_KILL_EARLY flag is set
in the guest process. See code comment for more details.
---
target/i386/cpu.h | 2 ++
target/i386/helper.c | 4 ++++
target/i386/kvm/kvm.c | 39 +++++++++++++++++++++++++++++++--------
3 files changed, 37 insertions(+), 8 deletions(-)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index dfe43b8204..ddf53937b4 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -365,6 +365,8 @@ typedef enum X86Seg {
#define MCI_STATUS_PCC (1ULL<<57) /* processor context corrupt */
#define MCI_STATUS_S (1ULL<<56) /* Signaled machine check */
#define MCI_STATUS_AR (1ULL<<55) /* Action required */
+#define MCI_STATUS_DEFERRED (1ULL<<44) /* Deferred error */
+#define MCI_STATUS_POISON (1ULL<<43) /* Poisoned data consumed */
/* MISC register defines */
#define MCM_ADDR_SEGOFF 0 /* segment offset */
diff --git a/target/i386/helper.c b/target/i386/helper.c
index 2070dd0dda..6f6b0f02e0 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -91,6 +91,10 @@ int cpu_x86_support_mca_broadcast(CPUX86State *env)
int family = 0;
int model = 0;
+ if (IS_AMD_CPU(env)) {
+ return 0;
+ }
+
cpu_x86_version(env, &family, &model);
if ((family == 6 && model >= 14) || family > 6) {
return 1;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 42970ab046..810f586a59 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -582,17 +582,40 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int code)
{
CPUState *cs = CPU(cpu);
CPUX86State *env = &cpu->env;
- uint64_t status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN |
- MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S;
- uint64_t mcg_status = MCG_STATUS_MCIP;
+ uint64_t status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_MISCV |
+ MCI_STATUS_ADDRV;
+ uint64_t mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV;
int flags = 0;
- if (code == BUS_MCEERR_AR) {
- status |= MCI_STATUS_AR | 0x134;
- mcg_status |= MCG_STATUS_RIPV | MCG_STATUS_EIPV;
+ if (!IS_AMD_CPU(env)) {
+ status |= MCI_STATUS_S | MCI_STATUS_UC;
+ if (code == BUS_MCEERR_AR) {
+ status |= MCI_STATUS_AR | 0x134;
+ mcg_status |= MCG_STATUS_EIPV;
+ } else {
+ status |= 0xc0;
+ }
} else {
- status |= 0xc0;
- mcg_status |= MCG_STATUS_RIPV;
+ if (code == BUS_MCEERR_AR) {
+ status |= MCI_STATUS_UC | MCI_STATUS_POISON;
+ mcg_status |= MCG_STATUS_EIPV;
+ } else {
+ /* Setting the POISON bit for deferred errors indicates to the
+ * guest kernel that the address provided by the MCE is valid
+ * and usable which will ensure that the guest kernel will send
+ * a SIGBUS_AO signal to the guest process. This allows for
+ * more desirable behavior in the case that the guest process
+ * with poisoned memory has set the MCE_KILL_EARLY prctl flag
+ * which indicates that the process would prefer to handle or
+ * shutdown due to the poisoned memory condition before the
+ * memory has been accessed.
+ *
+ * While the POISON bit would not be set in a deferred error
+ * sent from hardware, the bit is not meaningful for deferred
+ * errors and can be reused in this scenario.
+ */
+ status |= MCI_STATUS_DEFERRED | MCI_STATUS_POISON;
+ }
}
flags = cpu_x86_support_mca_broadcast(env) ? MCE_INJECT_BROADCAST : 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v5 2/3] i386: Add support for SUCCOR feature
2024-06-03 19:36 [PATCH v5 0/3] Fix MCE handling on AMD hosts John Allen
2024-06-03 19:36 ` [PATCH v5 1/3] i386: Fix MCE support for " John Allen
@ 2024-06-03 19:36 ` John Allen
2024-06-03 19:36 ` [PATCH v5 3/3] i386: Add support for overflow recovery John Allen
2024-06-06 9:09 ` [PATCH v5 0/3] Fix MCE handling on AMD hosts Paolo Bonzini
3 siblings, 0 replies; 6+ messages in thread
From: John Allen @ 2024-06-03 19:36 UTC (permalink / raw)
To: qemu-devel
Cc: yazen.ghannam, michael.roth, babu.moger, pankaj.gupta,
william.roche, joao.m.martins, pbonzini, richard.henderson,
eduardo, John Allen
Add cpuid bit definition for the SUCCOR feature. This cpuid bit is required to
be exposed to guests to allow them to handle machine check exceptions on AMD
hosts.
----
v2:
- Add "succor" feature word.
- Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.
Reported-by: William Roche <william.roche@oracle.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: John Allen <john.allen@amd.com>
---
target/i386/cpu.c | 18 +++++++++++++++++-
target/i386/cpu.h | 4 ++++
target/i386/kvm/kvm.c | 2 ++
3 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index bca776e1fe..5fa2dde732 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1032,6 +1032,22 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
.tcg_features = TCG_APM_FEATURES,
.unmigratable_flags = CPUID_APM_INVTSC,
},
+ [FEAT_8000_0007_EBX] = {
+ .type = CPUID_FEATURE_WORD,
+ .feat_names = {
+ NULL, "succor", NULL, NULL,
+ NULL, NULL, NULL, NULL,
+ NULL, NULL, NULL, NULL,
+ NULL, NULL, NULL, NULL,
+ NULL, NULL, NULL, NULL,
+ NULL, NULL, NULL, NULL,
+ NULL, NULL, NULL, NULL,
+ NULL, NULL, NULL, NULL,
+ },
+ .cpuid = { .eax = 0x80000007, .reg = R_EBX, },
+ .tcg_features = 0,
+ .unmigratable_flags = 0,
+ },
[FEAT_8000_0008_EBX] = {
.type = CPUID_FEATURE_WORD,
.feat_names = {
@@ -6556,7 +6572,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
break;
case 0x80000007:
*eax = 0;
- *ebx = 0;
+ *ebx = env->features[FEAT_8000_0007_EBX];
*ecx = 0;
*edx = env->features[FEAT_8000_0007_EDX];
break;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index ddf53937b4..5dd41e3d69 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -607,6 +607,7 @@ typedef enum FeatureWord {
FEAT_7_1_EAX, /* CPUID[EAX=7,ECX=1].EAX */
FEAT_8000_0001_EDX, /* CPUID[8000_0001].EDX */
FEAT_8000_0001_ECX, /* CPUID[8000_0001].ECX */
+ FEAT_8000_0007_EBX, /* CPUID[8000_0007].EBX */
FEAT_8000_0007_EDX, /* CPUID[8000_0007].EDX */
FEAT_8000_0008_EBX, /* CPUID[8000_0008].EBX */
FEAT_8000_0021_EAX, /* CPUID[8000_0021].EAX */
@@ -953,6 +954,9 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
/* Packets which contain IP payload have LIP values */
#define CPUID_14_0_ECX_LIP (1U << 31)
+/* RAS Features */
+#define CPUID_8000_0007_EBX_SUCCOR (1U << 1)
+
/* CLZERO instruction */
#define CPUID_8000_0008_EBX_CLZERO (1U << 0)
/* Always save/restore FP error pointers */
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 810f586a59..384b702fef 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -476,6 +476,8 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
*/
cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
ret |= cpuid_1_edx & CPUID_EXT2_AMD_ALIASES;
+ } else if (function == 0x80000007 && reg == R_EBX) {
+ ret |= CPUID_8000_0007_EBX_SUCCOR;
} else if (function == KVM_CPUID_FEATURES && reg == R_EAX) {
/* kvm_pv_unhalt is reported by GET_SUPPORTED_CPUID, but it can't
* be enabled without the in-kernel irqchip
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v5 3/3] i386: Add support for overflow recovery
2024-06-03 19:36 [PATCH v5 0/3] Fix MCE handling on AMD hosts John Allen
2024-06-03 19:36 ` [PATCH v5 1/3] i386: Fix MCE support for " John Allen
2024-06-03 19:36 ` [PATCH v5 2/3] i386: Add support for SUCCOR feature John Allen
@ 2024-06-03 19:36 ` John Allen
2024-06-06 9:09 ` [PATCH v5 0/3] Fix MCE handling on AMD hosts Paolo Bonzini
3 siblings, 0 replies; 6+ messages in thread
From: John Allen @ 2024-06-03 19:36 UTC (permalink / raw)
To: qemu-devel
Cc: yazen.ghannam, michael.roth, babu.moger, pankaj.gupta,
william.roche, joao.m.martins, pbonzini, richard.henderson,
eduardo, John Allen
Add cpuid bit definition for overflow recovery. This is needed in the case
where a deferred error has been sent to the guest, a guest process accesses the
poisoned memory, but the machine_check_poll function has not yet handled the
original deferred error. If overflow recovery is not set in this case, when we
handle the uncorrected error from the poisoned memory access, the overflow bit
will be set and will result in the guest being shut down.
Signed-off-by: John Allen <john.allen@amd.com>
---
v5:
- New in v5.
---
target/i386/cpu.c | 2 +-
target/i386/cpu.h | 1 +
target/i386/kvm/kvm.c | 2 +-
3 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 5fa2dde732..5385b26d4a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1035,7 +1035,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
[FEAT_8000_0007_EBX] = {
.type = CPUID_FEATURE_WORD,
.feat_names = {
- NULL, "succor", NULL, NULL,
+ "overflow-recov", "succor", NULL, NULL,
NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL,
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 5dd41e3d69..d56cf631b5 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -955,6 +955,7 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
#define CPUID_14_0_ECX_LIP (1U << 31)
/* RAS Features */
+#define CPUID_8000_0007_EBX_OVERFLOW_RECOV (1U << 0)
#define CPUID_8000_0007_EBX_SUCCOR (1U << 1)
/* CLZERO instruction */
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 384b702fef..796d5e9e38 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -477,7 +477,7 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
ret |= cpuid_1_edx & CPUID_EXT2_AMD_ALIASES;
} else if (function == 0x80000007 && reg == R_EBX) {
- ret |= CPUID_8000_0007_EBX_SUCCOR;
+ ret |= CPUID_8000_0007_EBX_OVERFLOW_RECOV | CPUID_8000_0007_EBX_SUCCOR;
} else if (function == KVM_CPUID_FEATURES && reg == R_EAX) {
/* kvm_pv_unhalt is reported by GET_SUPPORTED_CPUID, but it can't
* be enabled without the in-kernel irqchip
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v5 0/3] Fix MCE handling on AMD hosts
2024-06-03 19:36 [PATCH v5 0/3] Fix MCE handling on AMD hosts John Allen
` (2 preceding siblings ...)
2024-06-03 19:36 ` [PATCH v5 3/3] i386: Add support for overflow recovery John Allen
@ 2024-06-06 9:09 ` Paolo Bonzini
2024-06-06 16:00 ` John Allen
3 siblings, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2024-06-06 9:09 UTC (permalink / raw)
To: John Allen
Cc: qemu-devel, yazen.ghannam, michael.roth, babu.moger, pankaj.gupta,
william.roche, joao.m.martins, pbonzini, richard.henderson,
eduardo
Queued, thanks. I added a note to the commit message in the third patch:
By the time the MCE reaches the guest, the overflow has been handled
by the host and has not caused a shutdown, so include the bit unconditionally.
Advertising of SUCCOR and OVERFLOW_RECOV in KVM would still be nice. :)
Paolo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v5 0/3] Fix MCE handling on AMD hosts
2024-06-06 9:09 ` [PATCH v5 0/3] Fix MCE handling on AMD hosts Paolo Bonzini
@ 2024-06-06 16:00 ` John Allen
0 siblings, 0 replies; 6+ messages in thread
From: John Allen @ 2024-06-06 16:00 UTC (permalink / raw)
To: Paolo Bonzini
Cc: qemu-devel, yazen.ghannam, michael.roth, babu.moger, pankaj.gupta,
william.roche, joao.m.martins, richard.henderson, eduardo
On Thu, Jun 06, 2024 at 11:09:05AM +0200, Paolo Bonzini wrote:
> Queued, thanks. I added a note to the commit message in the third patch:
Thanks, Paolo!
>
> By the time the MCE reaches the guest, the overflow has been handled
> by the host and has not caused a shutdown, so include the bit unconditionally.
I'm not sure I understand this additional comment. Is this talking about
the case where the host gets an overflow? If so, yes, if the host has
overflow recovery supported, it should handle the overflow and won't
require any overflow recovery on the part of the guest. For clarity, it
may be nice to prefix the above statement with something like:
"In the case of a host overflow, ..."
If we're going to bring up the host overflow case, it may be worth
clarifying further that host overflows should not propagate to the guest
and this patch is specifically intended to allow the guest to handle
overflows in the MCEs that are injected from qemu.
>
> Advertising of SUCCOR and OVERFLOW_RECOV in KVM would still be nice. :)
Sure, I will send a series for this.
Thanks,
John
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-06-06 16:07 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-03 19:36 [PATCH v5 0/3] Fix MCE handling on AMD hosts John Allen
2024-06-03 19:36 ` [PATCH v5 1/3] i386: Fix MCE support for " John Allen
2024-06-03 19:36 ` [PATCH v5 2/3] i386: Add support for SUCCOR feature John Allen
2024-06-03 19:36 ` [PATCH v5 3/3] i386: Add support for overflow recovery John Allen
2024-06-06 9:09 ` [PATCH v5 0/3] Fix MCE handling on AMD hosts Paolo Bonzini
2024-06-06 16:00 ` John Allen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).