[PATCH v2 0/2] Fix MCE handling on AMD hosts

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] Fix MCE handling on AMD hosts
@ 2023-07-26 20:41 John Allen
  2023-07-26 20:41 ` [PATCH v2 1/2] i386: Add support for SUCCOR feature John Allen
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: John Allen @ 2023-07-26 20:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: yazen.ghannam, michael.roth, babu.moger, william.roche,
	joao.m.martins, pbonzini, richard.henderson, eduardo, John Allen

In the event that a guest process attempts to access memory that has
been poisoned in response to a deferred uncorrected MCE, an AMD system
will currently generate a SIGBUS error which will result in the entire
guest being shutdown. Ideally, we only want to kill the guest process
that accessed poisoned memory in this case.

This support has been included in qemu for Intel hosts for a long time,
but there are a couple of changes needed for AMD hosts. First, we will
need to expose the SUCCOR cpuid bit to guests. Second, we need to modify
the MCE injection code to avoid Intel specific behavior when we are
running on an AMD host.

v2:
  - Add "succor" feature word.
  - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.

John Allen (2):
  i386: Add support for SUCCOR feature
  i386: Fix MCE support for AMD hosts

 target/i386/cpu.c     | 18 +++++++++++++++++-
 target/i386/cpu.h     |  4 ++++
 target/i386/helper.c  |  4 ++++
 target/i386/kvm/kvm.c | 19 +++++++++++++------
 4 files changed, 38 insertions(+), 7 deletions(-)

-- 
2.39.3

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/2] i386: Add support for SUCCOR feature
  2023-07-26 20:41 [PATCH v2 0/2] Fix MCE handling on AMD hosts John Allen
@ 2023-07-26 20:41 ` John Allen
  2023-09-01 10:30   ` Joao Martins
  2023-07-26 20:41 ` [PATCH v2 2/2] i386: Fix MCE support for AMD hosts John Allen
  2023-08-31 21:40 ` [PATCH v2 0/2] Fix MCE handling on " William Roche
  2 siblings, 1 reply; 10+ messages in thread
From: John Allen @ 2023-07-26 20:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: yazen.ghannam, michael.roth, babu.moger, william.roche,
	joao.m.martins, pbonzini, richard.henderson, eduardo, John Allen

Add cpuid bit definition for the SUCCOR feature. This cpuid bit is required to
be exposed to guests to allow them to handle machine check exceptions on AMD
hosts.

Reported-by: William Roche <william.roche@oracle.com>
Signed-off-by: John Allen <john.allen@amd.com>
---
v2:
  - Add "succor" feature word.
  - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.
---
 target/i386/cpu.c     | 18 +++++++++++++++++-
 target/i386/cpu.h     |  4 ++++
 target/i386/kvm/kvm.c |  2 ++
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 06009b80e8..9708785255 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -872,6 +872,22 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
         },
         .tcg_features = TCG_7_1_EAX_FEATURES,
     },
+    [FEAT_8000_0007_EBX] = {
+        .type = CPUID_FEATURE_WORD,
+        .feat_names = {
+            NULL, "succor", NULL, NULL,
+            NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL,
+            NULL, NULL, NULL, NULL,
+        },
+        .cpuid = { .eax = 0x80000007, .reg = R_EBX, },
+        .tcg_features = 0,
+        .unmigratable_flags = 0,
+    },
     [FEAT_8000_0007_EDX] = {
         .type = CPUID_FEATURE_WORD,
         .feat_names = {
@@ -5874,7 +5890,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
         break;
     case 0x80000007:
         *eax = 0;
-        *ebx = 0;
+        *ebx = env->features[FEAT_8000_0007_EBX];
         *ecx = 0;
         *edx = env->features[FEAT_8000_0007_EDX];
         break;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 3edaad7688..fccb9b5a97 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -555,6 +555,7 @@ typedef enum FeatureWord {
     FEAT_7_1_EAX,       /* CPUID[EAX=7,ECX=1].EAX */
     FEAT_8000_0001_EDX, /* CPUID[8000_0001].EDX */
     FEAT_8000_0001_ECX, /* CPUID[8000_0001].ECX */
+    FEAT_8000_0007_EBX, /* CPUID[8000_0007].EBX */
     FEAT_8000_0007_EDX, /* CPUID[8000_0007].EDX */
     FEAT_8000_0008_EBX, /* CPUID[8000_0008].EBX */
     FEAT_C000_0001_EDX, /* CPUID[C000_0001].EDX */
@@ -856,6 +857,9 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS];
 /* Packets which contain IP payload have LIP values */
 #define CPUID_14_0_ECX_LIP              (1U << 31)
 
+/* RAS Features */
+#define CPUID_8000_0007_EBX_SUCCOR      (1U << 1)
+
 /* CLZERO instruction */
 #define CPUID_8000_0008_EBX_CLZERO      (1U << 0)
 /* Always save/restore FP error pointers */
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f25837f63f..4b62138459 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -417,6 +417,8 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
          */
         cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
         ret |= cpuid_1_edx & CPUID_EXT2_AMD_ALIASES;
+    } else if (function == 0x80000007 && reg == R_EBX) {
+        ret |= CPUID_8000_0007_EBX_SUCCOR;
     } else if (function == KVM_CPUID_FEATURES && reg == R_EAX) {
         /* kvm_pv_unhalt is reported by GET_SUPPORTED_CPUID, but it can't
          * be enabled without the in-kernel irqchip
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/2] i386: Fix MCE support for AMD hosts
  2023-07-26 20:41 [PATCH v2 0/2] Fix MCE handling on AMD hosts John Allen
  2023-07-26 20:41 ` [PATCH v2 1/2] i386: Add support for SUCCOR feature John Allen
@ 2023-07-26 20:41 ` John Allen
  2023-08-31 21:54   ` William Roche
  2023-08-31 21:40 ` [PATCH v2 0/2] Fix MCE handling on " William Roche
  2 siblings, 1 reply; 10+ messages in thread
From: John Allen @ 2023-07-26 20:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: yazen.ghannam, michael.roth, babu.moger, william.roche,
	joao.m.martins, pbonzini, richard.henderson, eduardo, John Allen

For the most part, AMD hosts can use the same MCE injection code as Intel but,
there are instances where the qemu implementation is Intel specific. First, MCE
deliviery works differently on AMD and does not support broadcast. Second,
kvm_mce_inject generates MCEs that include a number of Intel specific status
bits. Modify kvm_mce_inject to properly generate MCEs on AMD platforms.

Reported-by: William Roche <william.roche@oracle.com>
Signed-off-by: John Allen <john.allen@amd.com>
---
 target/i386/helper.c  |  4 ++++
 target/i386/kvm/kvm.c | 17 +++++++++++------
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/target/i386/helper.c b/target/i386/helper.c
index 533b29cb91..a6523858e0 100644
--- a/target/i386/helper.c
+++ b/target/i386/helper.c
@@ -76,6 +76,10 @@ int cpu_x86_support_mca_broadcast(CPUX86State *env)
     int family = 0;
     int model = 0;
 
+    if (IS_AMD_CPU(env)) {
+        return 0;
+    }
+
     cpu_x86_version(env, &family, &model);
     if ((family == 6 && model >= 14) || family > 6) {
         return 1;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 4b62138459..87a50c8aaf 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -532,16 +532,21 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int code)
     CPUState *cs = CPU(cpu);
     CPUX86State *env = &cpu->env;
     uint64_t status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN |
-                      MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S;
+                      MCI_STATUS_MISCV | MCI_STATUS_ADDRV;
     uint64_t mcg_status = MCG_STATUS_MCIP;
     int flags = 0;
 
-    if (code == BUS_MCEERR_AR) {
-        status |= MCI_STATUS_AR | 0x134;
-        mcg_status |= MCG_STATUS_EIPV;
+    if (!IS_AMD_CPU(env)) {
+        status |= MCI_STATUS_S;
+        if (code == BUS_MCEERR_AR) {
+            status |= MCI_STATUS_AR | 0x134;
+            mcg_status |= MCG_STATUS_EIPV;
+        } else {
+            status |= 0xc0;
+            mcg_status |= MCG_STATUS_RIPV;
+        }
     } else {
-        status |= 0xc0;
-        mcg_status |= MCG_STATUS_RIPV;
+        mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV;
     }
 
     flags = cpu_x86_support_mca_broadcast(env) ? MCE_INJECT_BROADCAST : 0;
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/2] Fix MCE handling on AMD hosts
  2023-07-26 20:41 [PATCH v2 0/2] Fix MCE handling on AMD hosts John Allen
  2023-07-26 20:41 ` [PATCH v2 1/2] i386: Add support for SUCCOR feature John Allen
  2023-07-26 20:41 ` [PATCH v2 2/2] i386: Fix MCE support for AMD hosts John Allen
@ 2023-08-31 21:40 ` William Roche
  2023-09-05 15:15   ` John Allen via
  2 siblings, 1 reply; 10+ messages in thread
From: William Roche @ 2023-08-31 21:40 UTC (permalink / raw)
  To: John Allen, qemu-devel
  Cc: yazen.ghannam, michael.roth, babu.moger, joao.m.martins, pbonzini,
	richard.henderson, eduardo

Hello John,

I could test your fixes and I can confirm that the BUS_MCEERR_AR is now 
working on AMD:

Before the fix, the VM panics with:

qemu-system-x86_64: Guest MCE Memory Error at QEMU addr 0x7f89573ce000 
and GUEST addr 0x10b5ce000 of type BUS_MCEERR_AR injected
[   83.562579] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 
Bank 1: a000000000000000
[   83.562585] mce: [Hardware Error]: RIP !INEXACT! 
10:<ffffffff81e8f6ff> {pv_native_safe_halt+0xf/0x20}
[   83.562592] mce: [Hardware Error]: TSC 3d39402bdc
[   83.562593] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693515449 
SOCKET 0 APIC 0 microcode 800126e
[   83.562596] mce: [Hardware Error]: Machine check: Uncorrected error 
without MCA Recovery
[   83.562597] Kernel panic - not syncing: Fatal local machine check
[   83.563401] Kernel Offset: disabled

With the fix, the same error injection doesn't kill the VM, but 
generates the following console messages:

qemu-system-x86_64: Guest MCE Memory Error at QEMU addr 0x7fa430ab9000 
and GUEST addr 0x118cb9000 of type BUS_MCEERR_AR injected
[  250.851996] Disabling lock debugging due to kernel taint
[  250.852928] mce: Uncorrected hardware memory error in user-access at 
118cb9000
[  250.853261] Memory failure: 0x118cb9: Sending SIGBUS to 
mce_process_rea:1227 due to hardware memory corruption
[  250.854933] mce: [Hardware Error]: Machine check events logged
[  250.855800] Memory failure: 0x118cb9: recovery action for dirty LRU 
page: Recovered
[  250.856661] mce: [Hardware Error]: CPU 2: Machine Check Exception: 7 
Bank 9: bc00000000000000
[  250.860552] mce: [Hardware Error]: RIP 33:<00007f56b9ecbee5>
[  250.861405] mce: [Hardware Error]: TSC 8c2c664410 ADDR 118cb9000 MISC 8c
[  250.862679] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693508937 
SOCKET 0 APIC 2 microcode 800126e


But a problem still exists with BUS_MCEERR_AO that kills the VM with:

qemu-system-x86_64: warning: Guest MCE Memory Error at QEMU addr 
0x7f1d108e5000 and GUEST addr 0x114ae5000 of type BUS_MCEERR_AO injected
[  157.392905] mce: [Hardware Error]: CPU 0: Machine Check Exception: 7 
Bank 9: bc00000000000000
[  157.392912] mce: [Hardware Error]: RIP 10:<ffffffff81e8f6ff> 
{pv_native_safe_halt+0xf/0x20}
[  157.392919] mce: [Hardware Error]: TSC 60b92a54d0 ADDR 114ae5000 MISC 8c
[  157.392921] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693500765 
SOCKET 0 APIC 0 microcode 800126e
[  157.392924] mce: [Hardware Error]: Machine check: Uncorrected 
unrecoverable error in kernel context
[  157.392925] Kernel panic - not syncing: Fatal local machine check
[  157.402582] Kernel Offset: disabled

As AMD guests can't currently deal with BUS_MCEERR_AO MCE injection, 
according to me the fix is not complete, the 'AO' case must be handled. 
The simplest way is probably to filter it at the qemu level, to only 
inject the 'AR' case -- and it also gives the possibility to let qemu 
provide a message about an ignored 'AO' error.

I would suggest to add a 3rd patch implementing this AMD specific filter:


commit bf8cc74df3fcc7bf958a7c42b876e9c059fe4d06
Author: William Roche <william.roche@oracle.com>
Date:   Thu Aug 31 18:54:57 2023 +0000

     i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

     AMD guests can't currently deal with BUS_MCEERR_AO MCE injection
     as it panics the VM kernel. We filter this event and provide a
     warning message.

     Signed-off-by: William Roche <william.roche@oracle.com>

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 9ca7187628..bd60d5697b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -606,6 +606,10 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr 
paddr, int code)
              mcg_status |= MCG_STATUS_RIPV;
          }
      } else {
+        if (code == BUS_MCEERR_AO) {
+            /* XXX we don't support BUS_MCEERR_AO injection on AMD yet */
+            return;
+        }
          mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV;
      }

@@ -657,7 +661,8 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, 
void *addr)
          if (ram_addr != RAM_ADDR_INVALID &&
              kvm_physical_memory_addr_from_host(c->kvm_state, addr, 
&paddr)) {
              kvm_hwpoison_page_add(ram_addr);
-            kvm_mce_inject(cpu, paddr, code);
+            if (!IS_AMD_CPU(env) || code != BUS_MCEERR_AO)
+                kvm_mce_inject(cpu, paddr, code);

              /*
               * Use different logging severity based on error type.
@@ -670,8 +675,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, 
void *addr)
                      addr, paddr, "BUS_MCEERR_AR");
              } else {
                   warn_report("Guest MCE Memory Error at QEMU addr %p and "
-                     "GUEST addr 0x%" HWADDR_PRIx " of type %s injected",
-                     addr, paddr, "BUS_MCEERR_AO");
+                     "GUEST addr 0x%" HWADDR_PRIx " of type %s %s",
+                     addr, paddr, "BUS_MCEERR_AO",
+                     IS_AMD_CPU(env) ? "ignored on AMD guest" : 
"injected");
              }

              return;
---


I hope this can help.

William.


On 7/26/23 22:41, John Allen wrote:
> In the event that a guest process attempts to access memory that has
> been poisoned in response to a deferred uncorrected MCE, an AMD system
> will currently generate a SIGBUS error which will result in the entire
> guest being shutdown. Ideally, we only want to kill the guest process
> that accessed poisoned memory in this case.
>
> This support has been included in qemu for Intel hosts for a long time,
> but there are a couple of changes needed for AMD hosts. First, we will
> need to expose the SUCCOR cpuid bit to guests. Second, we need to modify
> the MCE injection code to avoid Intel specific behavior when we are
> running on an AMD host.
>
> v2:
>    - Add "succor" feature word.
>    - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.
>
> John Allen (2):
>    i386: Add support for SUCCOR feature
>    i386: Fix MCE support for AMD hosts
>
>   target/i386/cpu.c     | 18 +++++++++++++++++-
>   target/i386/cpu.h     |  4 ++++
>   target/i386/helper.c  |  4 ++++
>   target/i386/kvm/kvm.c | 19 +++++++++++++------
>   4 files changed, 38 insertions(+), 7 deletions(-)
>


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] i386: Fix MCE support for AMD hosts
  2023-07-26 20:41 ` [PATCH v2 2/2] i386: Fix MCE support for AMD hosts John Allen
@ 2023-08-31 21:54   ` William Roche
  2023-09-05 15:03     ` John Allen via
  0 siblings, 1 reply; 10+ messages in thread
From: William Roche @ 2023-08-31 21:54 UTC (permalink / raw)
  To: John Allen, qemu-devel
  Cc: yazen.ghannam, michael.roth, babu.moger, joao.m.martins, pbonzini,
	richard.henderson, eduardo

John,

I also noticed something important about this specific code:

Qemu commit cb48748af7dfd7654323eb839d1f853ffa873652 introduced the use 
of the MCG_STATUS_RIPV in the case of a BUS_MCEERR_AR error, but it 
looks like your reference code is not having this flag.

According to me, we should keep this flag in the case of a non-AMD 
machine with a BUS_MCEERR_AR.

The patch should look something like that:

[...]
-    if (code == BUS_MCEERR_AR) {
-        status |= MCI_STATUS_AR | 0x134;
-        mcg_status |= MCG_STATUS_RIPV | MCG_STATUS_EIPV;
+    if (!IS_AMD_CPU(env)) {
+        status |= MCI_STATUS_S;
+        if (code == BUS_MCEERR_AR) {
+            status |= MCI_STATUS_AR | 0x134;
+            mcg_status |= MCG_STATUS_RIPV | MCG_STATUS_EIPV;
[...]


Cheers,
William.


On 7/26/23 22:41, John Allen wrote:
> For the most part, AMD hosts can use the same MCE injection code as Intel but,
> there are instances where the qemu implementation is Intel specific. First, MCE
> deliviery works differently on AMD and does not support broadcast. Second,
> kvm_mce_inject generates MCEs that include a number of Intel specific status
> bits. Modify kvm_mce_inject to properly generate MCEs on AMD platforms.
>
> Reported-by: William Roche <william.roche@oracle.com>
> Signed-off-by: John Allen <john.allen@amd.com>
> ---
>   target/i386/helper.c  |  4 ++++
>   target/i386/kvm/kvm.c | 17 +++++++++++------
>   2 files changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/target/i386/helper.c b/target/i386/helper.c
> index 533b29cb91..a6523858e0 100644
> --- a/target/i386/helper.c
> +++ b/target/i386/helper.c
> @@ -76,6 +76,10 @@ int cpu_x86_support_mca_broadcast(CPUX86State *env)
>       int family = 0;
>       int model = 0;
>   
> +    if (IS_AMD_CPU(env)) {
> +        return 0;
> +    }
> +
>       cpu_x86_version(env, &family, &model);
>       if ((family == 6 && model >= 14) || family > 6) {
>           return 1;
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 4b62138459..87a50c8aaf 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -532,16 +532,21 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int code)
>       CPUState *cs = CPU(cpu);
>       CPUX86State *env = &cpu->env;
>       uint64_t status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN |
> -                      MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S;
> +                      MCI_STATUS_MISCV | MCI_STATUS_ADDRV;
>       uint64_t mcg_status = MCG_STATUS_MCIP;
>       int flags = 0;
>   
> -    if (code == BUS_MCEERR_AR) {
> -        status |= MCI_STATUS_AR | 0x134;
> -        mcg_status |= MCG_STATUS_EIPV;
> +    if (!IS_AMD_CPU(env)) {
> +        status |= MCI_STATUS_S;
> +        if (code == BUS_MCEERR_AR) {
> +            status |= MCI_STATUS_AR | 0x134;
> +            mcg_status |= MCG_STATUS_EIPV;
> +        } else {
> +            status |= 0xc0;
> +            mcg_status |= MCG_STATUS_RIPV;
> +        }
>       } else {
> -        status |= 0xc0;
> -        mcg_status |= MCG_STATUS_RIPV;
> +        mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV;
>       }
>   
>       flags = cpu_x86_support_mca_broadcast(env) ? MCE_INJECT_BROADCAST : 0;


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] i386: Add support for SUCCOR feature
  2023-07-26 20:41 ` [PATCH v2 1/2] i386: Add support for SUCCOR feature John Allen
@ 2023-09-01 10:30   ` Joao Martins
  2023-09-05 15:01     ` John Allen
  0 siblings, 1 reply; 10+ messages in thread
From: Joao Martins @ 2023-09-01 10:30 UTC (permalink / raw)
  To: John Allen
  Cc: yazen.ghannam, michael.roth, babu.moger, william.roche, pbonzini,
	richard.henderson, eduardo, qemu-devel

On 26/07/2023 21:41, John Allen wrote:
> Add cpuid bit definition for the SUCCOR feature. This cpuid bit is required to
> be exposed to guests to allow them to handle machine check exceptions on AMD
> hosts.
> 
> Reported-by: William Roche <william.roche@oracle.com>
> Signed-off-by: John Allen <john.allen@amd.com>

I think this is matching the last discussion:

	Reviewed-by: Joao Martins <joao.m.martins@oracle.com>

The patch ordering doesn't look correct though. Perhaps we should expose succor
only after MCE is fixed so this patch would be the second, not the first?

Also, this should in generally be OK for -cpu host, but might be missing a third
patch that adds "succor" to the AMD models e.g.

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 97ad229d8ba3..d132cb3bbbbe 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -4731,6 +4731,16 @@ static const X86CPUDefinition builtin_x86_defs[] = {
                     { /* end of list */ }
                 },
             },
+            {
+                .version = 5,
+                .props = (PropValue[]) {
+                    /* Erratum 1386 */
+                    { "model-id",
+                      "AMD EPYC-Rome-v5 Processor" },
+                    { "succor", "on" },
+                    { /* end of list */ }
+                },
+            },
             { /* end of list */ }
         }
     },
@@ -4806,6 +4816,16 @@ static const X86CPUDefinition builtin_x86_defs[] = {
                 },
                 .cache_info = &epyc_milan_v2_cache_info
             },
+            {
+                .version = 3,
+                .props = (PropValue[]) {
+                    /* Erratum 1386 */
+                    { "model-id",
+                      "AMD EPYC-Milan-v3 Processor" },
+                    { "succor", "on" },
+                    { /* end of list */ }
+                },
+            },
             { /* end of list */ }
         }
     },
@@ -4880,6 +4900,20 @@ static const X86CPUDefinition builtin_x86_defs[] = {
         .xlevel = 0x80000022,
         .model_id = "AMD EPYC-Genoa Processor",
         .cache_info = &epyc_genoa_cache_info,
+        .versions = (X86CPUVersionDefinition[]) {
+            { .version = 1 },
+            {
+                .version = 2,
+                .props = (PropValue[]) {
+                    /* Erratum 1386 */
+                    { "model-id",
+                      "AMD EPYC-Genoa-v2 Processor" },
+                    { "succor", "on" },
+                    { /* end of list */ }
+                },
+            },
+            { /* end of list */ }
+        }
     },


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] i386: Add support for SUCCOR feature
  2023-09-01 10:30   ` Joao Martins
@ 2023-09-05 15:01     ` John Allen
  2023-09-06 18:20       ` Moger, Babu
  0 siblings, 1 reply; 10+ messages in thread
From: John Allen @ 2023-09-05 15:01 UTC (permalink / raw)
  To: Joao Martins, babu.moger
  Cc: yazen.ghannam, michael.roth, william.roche, pbonzini,
	richard.henderson, eduardo, qemu-devel

On Fri, Sep 01, 2023 at 11:30:53AM +0100, Joao Martins wrote:
> On 26/07/2023 21:41, John Allen wrote:
> > Add cpuid bit definition for the SUCCOR feature. This cpuid bit is required to
> > be exposed to guests to allow them to handle machine check exceptions on AMD
> > hosts.
> > 
> > Reported-by: William Roche <william.roche@oracle.com>
> > Signed-off-by: John Allen <john.allen@amd.com>
> 
> I think this is matching the last discussion:
> 
> 	Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
> 
> The patch ordering doesn't look correct though. Perhaps we should expose succor
> only after MCE is fixed so this patch would be the second, not the first?

Yes, that makes sense. I will address this and send another version of
the series with the correct ordering.

> 
> Also, this should in generally be OK for -cpu host, but might be missing a third
> patch that adds "succor" to the AMD models e.g.

Babu,

I think we previously discussed adding this to the models later in a
separate series. Is this your preferred course of action or can we add
it with this series?

Thanks,
John

> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 97ad229d8ba3..d132cb3bbbbe 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -4731,6 +4731,16 @@ static const X86CPUDefinition builtin_x86_defs[] = {
>                      { /* end of list */ }
>                  },
>              },
> +            {
> +                .version = 5,
> +                .props = (PropValue[]) {
> +                    /* Erratum 1386 */
> +                    { "model-id",
> +                      "AMD EPYC-Rome-v5 Processor" },
> +                    { "succor", "on" },
> +                    { /* end of list */ }
> +                },
> +            },
>              { /* end of list */ }
>          }
>      },
> @@ -4806,6 +4816,16 @@ static const X86CPUDefinition builtin_x86_defs[] = {
>                  },
>                  .cache_info = &epyc_milan_v2_cache_info
>              },
> +            {
> +                .version = 3,
> +                .props = (PropValue[]) {
> +                    /* Erratum 1386 */
> +                    { "model-id",
> +                      "AMD EPYC-Milan-v3 Processor" },
> +                    { "succor", "on" },
> +                    { /* end of list */ }
> +                },
> +            },
>              { /* end of list */ }
>          }
>      },
> @@ -4880,6 +4900,20 @@ static const X86CPUDefinition builtin_x86_defs[] = {
>          .xlevel = 0x80000022,
>          .model_id = "AMD EPYC-Genoa Processor",
>          .cache_info = &epyc_genoa_cache_info,
> +        .versions = (X86CPUVersionDefinition[]) {
> +            { .version = 1 },
> +            {
> +                .version = 2,
> +                .props = (PropValue[]) {
> +                    /* Erratum 1386 */
> +                    { "model-id",
> +                      "AMD EPYC-Genoa-v2 Processor" },
> +                    { "succor", "on" },
> +                    { /* end of list */ }
> +                },
> +            },
> +            { /* end of list */ }
> +        }
>      },


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] i386: Fix MCE support for AMD hosts
  2023-08-31 21:54   ` William Roche
@ 2023-09-05 15:03     ` John Allen via
  0 siblings, 0 replies; 10+ messages in thread
From: John Allen via @ 2023-09-05 15:03 UTC (permalink / raw)
  To: William Roche
  Cc: qemu-devel, yazen.ghannam, michael.roth, babu.moger,
	joao.m.martins, pbonzini, richard.henderson, eduardo

On Thu, Aug 31, 2023 at 11:54:56PM +0200, William Roche wrote:
> John,
> 
> I also noticed something important about this specific code:
> 
> Qemu commit cb48748af7dfd7654323eb839d1f853ffa873652 introduced the use of
> the MCG_STATUS_RIPV in the case of a BUS_MCEERR_AR error, but it looks like
> your reference code is not having this flag.
> 
> According to me, we should keep this flag in the case of a non-AMD machine
> with a BUS_MCEERR_AR.
> 
> The patch should look something like that:
> 
> [...]
> -    if (code == BUS_MCEERR_AR) {
> -        status |= MCI_STATUS_AR | 0x134;
> -        mcg_status |= MCG_STATUS_RIPV | MCG_STATUS_EIPV;
> +    if (!IS_AMD_CPU(env)) {
> +        status |= MCI_STATUS_S;
> +        if (code == BUS_MCEERR_AR) {
> +            status |= MCI_STATUS_AR | 0x134;
> +            mcg_status |= MCG_STATUS_RIPV | MCG_STATUS_EIPV;
> [...]

Yes, that looks correct. I will fix this in the next version of the
series.

Thanks,
John

> 
> 
> Cheers,
> William.
> 
> 
> On 7/26/23 22:41, John Allen wrote:
> > For the most part, AMD hosts can use the same MCE injection code as Intel but,
> > there are instances where the qemu implementation is Intel specific. First, MCE
> > deliviery works differently on AMD and does not support broadcast. Second,
> > kvm_mce_inject generates MCEs that include a number of Intel specific status
> > bits. Modify kvm_mce_inject to properly generate MCEs on AMD platforms.
> > 
> > Reported-by: William Roche <william.roche@oracle.com>
> > Signed-off-by: John Allen <john.allen@amd.com>
> > ---
> >   target/i386/helper.c  |  4 ++++
> >   target/i386/kvm/kvm.c | 17 +++++++++++------
> >   2 files changed, 15 insertions(+), 6 deletions(-)
> > 
> > diff --git a/target/i386/helper.c b/target/i386/helper.c
> > index 533b29cb91..a6523858e0 100644
> > --- a/target/i386/helper.c
> > +++ b/target/i386/helper.c
> > @@ -76,6 +76,10 @@ int cpu_x86_support_mca_broadcast(CPUX86State *env)
> >       int family = 0;
> >       int model = 0;
> > +    if (IS_AMD_CPU(env)) {
> > +        return 0;
> > +    }
> > +
> >       cpu_x86_version(env, &family, &model);
> >       if ((family == 6 && model >= 14) || family > 6) {
> >           return 1;
> > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > index 4b62138459..87a50c8aaf 100644
> > --- a/target/i386/kvm/kvm.c
> > +++ b/target/i386/kvm/kvm.c
> > @@ -532,16 +532,21 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int code)
> >       CPUState *cs = CPU(cpu);
> >       CPUX86State *env = &cpu->env;
> >       uint64_t status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN |
> > -                      MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S;
> > +                      MCI_STATUS_MISCV | MCI_STATUS_ADDRV;
> >       uint64_t mcg_status = MCG_STATUS_MCIP;
> >       int flags = 0;
> > -    if (code == BUS_MCEERR_AR) {
> > -        status |= MCI_STATUS_AR | 0x134;
> > -        mcg_status |= MCG_STATUS_EIPV;
> > +    if (!IS_AMD_CPU(env)) {
> > +        status |= MCI_STATUS_S;
> > +        if (code == BUS_MCEERR_AR) {
> > +            status |= MCI_STATUS_AR | 0x134;
> > +            mcg_status |= MCG_STATUS_EIPV;
> > +        } else {
> > +            status |= 0xc0;
> > +            mcg_status |= MCG_STATUS_RIPV;
> > +        }
> >       } else {
> > -        status |= 0xc0;
> > -        mcg_status |= MCG_STATUS_RIPV;
> > +        mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV;
> >       }
> >       flags = cpu_x86_support_mca_broadcast(env) ? MCE_INJECT_BROADCAST : 0;


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/2] Fix MCE handling on AMD hosts
  2023-08-31 21:40 ` [PATCH v2 0/2] Fix MCE handling on " William Roche
@ 2023-09-05 15:15   ` John Allen via
  0 siblings, 0 replies; 10+ messages in thread
From: John Allen via @ 2023-09-05 15:15 UTC (permalink / raw)
  To: William Roche
  Cc: qemu-devel, yazen.ghannam, michael.roth, babu.moger,
	joao.m.martins, pbonzini, richard.henderson, eduardo

On Thu, Aug 31, 2023 at 11:40:08PM +0200, William Roche wrote:
> Hello John,
> 
> I could test your fixes and I can confirm that the BUS_MCEERR_AR is now
> working on AMD:
> 
> Before the fix, the VM panics with:
> 
> qemu-system-x86_64: Guest MCE Memory Error at QEMU addr 0x7f89573ce000 and
> GUEST addr 0x10b5ce000 of type BUS_MCEERR_AR injected
> [   83.562579] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank
> 1: a000000000000000
> [   83.562585] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff81e8f6ff>
> {pv_native_safe_halt+0xf/0x20}
> [   83.562592] mce: [Hardware Error]: TSC 3d39402bdc
> [   83.562593] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693515449
> SOCKET 0 APIC 0 microcode 800126e
> [   83.562596] mce: [Hardware Error]: Machine check: Uncorrected error
> without MCA Recovery
> [   83.562597] Kernel panic - not syncing: Fatal local machine check
> [   83.563401] Kernel Offset: disabled
> 
> With the fix, the same error injection doesn't kill the VM, but generates
> the following console messages:
> 
> qemu-system-x86_64: Guest MCE Memory Error at QEMU addr 0x7fa430ab9000 and
> GUEST addr 0x118cb9000 of type BUS_MCEERR_AR injected
> [  250.851996] Disabling lock debugging due to kernel taint
> [  250.852928] mce: Uncorrected hardware memory error in user-access at
> 118cb9000
> [  250.853261] Memory failure: 0x118cb9: Sending SIGBUS to
> mce_process_rea:1227 due to hardware memory corruption
> [  250.854933] mce: [Hardware Error]: Machine check events logged
> [  250.855800] Memory failure: 0x118cb9: recovery action for dirty LRU page:
> Recovered
> [  250.856661] mce: [Hardware Error]: CPU 2: Machine Check Exception: 7 Bank
> 9: bc00000000000000
> [  250.860552] mce: [Hardware Error]: RIP 33:<00007f56b9ecbee5>
> [  250.861405] mce: [Hardware Error]: TSC 8c2c664410 ADDR 118cb9000 MISC 8c
> [  250.862679] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693508937
> SOCKET 0 APIC 2 microcode 800126e
> 
> 
> But a problem still exists with BUS_MCEERR_AO that kills the VM with:
> 
> qemu-system-x86_64: warning: Guest MCE Memory Error at QEMU addr
> 0x7f1d108e5000 and GUEST addr 0x114ae5000 of type BUS_MCEERR_AO injected
> [  157.392905] mce: [Hardware Error]: CPU 0: Machine Check Exception: 7 Bank
> 9: bc00000000000000
> [  157.392912] mce: [Hardware Error]: RIP 10:<ffffffff81e8f6ff>
> {pv_native_safe_halt+0xf/0x20}
> [  157.392919] mce: [Hardware Error]: TSC 60b92a54d0 ADDR 114ae5000 MISC 8c
> [  157.392921] mce: [Hardware Error]: PROCESSOR 2:800f12 TIME 1693500765
> SOCKET 0 APIC 0 microcode 800126e
> [  157.392924] mce: [Hardware Error]: Machine check: Uncorrected
> unrecoverable error in kernel context
> [  157.392925] Kernel panic - not syncing: Fatal local machine check
> [  157.402582] Kernel Offset: disabled
> 
> As AMD guests can't currently deal with BUS_MCEERR_AO MCE injection,
> according to me the fix is not complete, the 'AO' case must be handled. The
> simplest way is probably to filter it at the qemu level, to only inject the
> 'AR' case -- and it also gives the possibility to let qemu provide a message
> about an ignored 'AO' error.
> 
> I would suggest to add a 3rd patch implementing this AMD specific filter:
> 
> 
> commit bf8cc74df3fcc7bf958a7c42b876e9c059fe4d06
> Author: William Roche <william.roche@oracle.com>
> Date:   Thu Aug 31 18:54:57 2023 +0000
> 
>     i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest
> 
>     AMD guests can't currently deal with BUS_MCEERR_AO MCE injection
>     as it panics the VM kernel. We filter this event and provide a
>     warning message.
> 
>     Signed-off-by: William Roche <william.roche@oracle.com>
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 9ca7187628..bd60d5697b 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -606,6 +606,10 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr,
> int code)
>              mcg_status |= MCG_STATUS_RIPV;
>          }
>      } else {
> +        if (code == BUS_MCEERR_AO) {
> +            /* XXX we don't support BUS_MCEERR_AO injection on AMD yet */
> +            return;
> +        }
>          mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV;
>      }
> 
> @@ -657,7 +661,8 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void
> *addr)
>          if (ram_addr != RAM_ADDR_INVALID &&
>              kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr))
> {
>              kvm_hwpoison_page_add(ram_addr);
> -            kvm_mce_inject(cpu, paddr, code);
> +            if (!IS_AMD_CPU(env) || code != BUS_MCEERR_AO)
> +                kvm_mce_inject(cpu, paddr, code);
> 
>              /*
>               * Use different logging severity based on error type.
> @@ -670,8 +675,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void
> *addr)
>                      addr, paddr, "BUS_MCEERR_AR");
>              } else {
>                   warn_report("Guest MCE Memory Error at QEMU addr %p and "
> -                     "GUEST addr 0x%" HWADDR_PRIx " of type %s injected",
> -                     addr, paddr, "BUS_MCEERR_AO");
> +                     "GUEST addr 0x%" HWADDR_PRIx " of type %s %s",
> +                     addr, paddr, "BUS_MCEERR_AO",
> +                     IS_AMD_CPU(env) ? "ignored on AMD guest" :
> "injected");
>              }
> 
>              return;
> ---

Thanks, I think this will be a good solution for now while we can't
fully support AO errors. I will test this and include in the next
version of the series.

Thanks,
John

> 
> 
> I hope this can help.
> 
> William.
> 
> 
> On 7/26/23 22:41, John Allen wrote:
> > In the event that a guest process attempts to access memory that has
> > been poisoned in response to a deferred uncorrected MCE, an AMD system
> > will currently generate a SIGBUS error which will result in the entire
> > guest being shutdown. Ideally, we only want to kill the guest process
> > that accessed poisoned memory in this case.
> > 
> > This support has been included in qemu for Intel hosts for a long time,
> > but there are a couple of changes needed for AMD hosts. First, we will
> > need to expose the SUCCOR cpuid bit to guests. Second, we need to modify
> > the MCE injection code to avoid Intel specific behavior when we are
> > running on an AMD host.
> > 
> > v2:
> >    - Add "succor" feature word.
> >    - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature.
> > 
> > John Allen (2):
> >    i386: Add support for SUCCOR feature
> >    i386: Fix MCE support for AMD hosts
> > 
> >   target/i386/cpu.c     | 18 +++++++++++++++++-
> >   target/i386/cpu.h     |  4 ++++
> >   target/i386/helper.c  |  4 ++++
> >   target/i386/kvm/kvm.c | 19 +++++++++++++------
> >   4 files changed, 38 insertions(+), 7 deletions(-)
> > 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] i386: Add support for SUCCOR feature
  2023-09-05 15:01     ` John Allen
@ 2023-09-06 18:20       ` Moger, Babu
  0 siblings, 0 replies; 10+ messages in thread
From: Moger, Babu @ 2023-09-06 18:20 UTC (permalink / raw)
  To: John Allen, Joao Martins, babu.moger
  Cc: yazen.ghannam, michael.roth, william.roche, pbonzini,
	richard.henderson, eduardo, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1325 bytes --]

Hi John,

On 9/5/2023 10:01 AM, John Allen wrote:
> On Fri, Sep 01, 2023 at 11:30:53AM +0100, Joao Martins wrote:
>> On 26/07/2023 21:41, John Allen wrote:
>>> Add cpuid bit definition for the SUCCOR feature. This cpuid bit is required to
>>> be exposed to guests to allow them to handle machine check exceptions on AMD
>>> hosts.
>>>
>>> Reported-by: William Roche<william.roche@oracle.com>
>>> Signed-off-by: John Allen<john.allen@amd.com>
>> I think this is matching the last discussion:
>>
>> 	Reviewed-by: Joao Martins<joao.m.martins@oracle.com>
>>
>> The patch ordering doesn't look correct though. Perhaps we should expose succor
>> only after MCE is fixed so this patch would be the second, not the first?
> Yes, that makes sense. I will address this and send another version of
> the series with the correct ordering.
>
>> Also, this should in generally be OK for -cpu host, but might be missing a third
>> patch that adds "succor" to the AMD models e.g.
> Babu,
>
> I think we previously discussed adding this to the models later in a
> separate series. Is this your preferred course of action or can we add
> it with this series?


Yes. We can add it later as a separate series. We just added EPYC-Genoa. 
We don't want to add EPYC-Genoa-v2 at this point. We have few more 
features pending as well.

Thanks

Babu

[-- Attachment #2: Type: text/html, Size: 3026 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-09-06 20:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-26 20:41 [PATCH v2 0/2] Fix MCE handling on AMD hosts John Allen
2023-07-26 20:41 ` [PATCH v2 1/2] i386: Add support for SUCCOR feature John Allen
2023-09-01 10:30   ` Joao Martins
2023-09-05 15:01     ` John Allen
2023-09-06 18:20       ` Moger, Babu
2023-07-26 20:41 ` [PATCH v2 2/2] i386: Fix MCE support for AMD hosts John Allen
2023-08-31 21:54   ` William Roche
2023-09-05 15:03     ` John Allen via
2023-08-31 21:40 ` [PATCH v2 0/2] Fix MCE handling on " William Roche
2023-09-05 15:15   ` John Allen via

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).