* [PATCH v4 0/3] Fix MCE handling on AMD hosts @ 2023-09-12 21:18 John Allen 2023-09-12 21:18 ` [PATCH v4 1/3] i386: Fix MCE support for " John Allen ` (3 more replies) 0 siblings, 4 replies; 15+ messages in thread From: John Allen @ 2023-09-12 21:18 UTC (permalink / raw) To: qemu-devel Cc: yazen.ghannam, michael.roth, babu.moger, william.roche, joao.m.martins, pbonzini, richard.henderson, eduardo, John Allen In the event that a guest process attempts to access memory that has been poisoned in response to a deferred uncorrected MCE, an AMD system will currently generate a SIGBUS error which will result in the entire guest being shutdown. Ideally, we only want to kill the guest process that accessed poisoned memory in this case. This support has been included in qemu for Intel hosts for a long time, but there are a couple of changes needed for AMD hosts. First, we will need to expose the SUCCOR cpuid bit to guests. Second, we need to modify the MCE injection code to avoid Intel specific behavior when we are running on an AMD host. v2: - Add "succor" feature word. - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature. v3: - Reorder series. Only enable SUCCOR after bugs have been fixed. - Introduce new patch ignoring AO errors. v4: - Remove redundant check for AO errors. John Allen (2): i386: Fix MCE support for AMD hosts i386: Add support for SUCCOR feature William Roche (1): i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest target/i386/cpu.c | 18 +++++++++++++++++- target/i386/cpu.h | 4 ++++ target/i386/helper.c | 4 ++++ target/i386/kvm/kvm.c | 28 ++++++++++++++++++++-------- 4 files changed, 45 insertions(+), 9 deletions(-) -- 2.39.3 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v4 1/3] i386: Fix MCE support for AMD hosts 2023-09-12 21:18 [PATCH v4 0/3] Fix MCE handling on AMD hosts John Allen @ 2023-09-12 21:18 ` John Allen 2023-09-12 21:18 ` [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest John Allen ` (2 subsequent siblings) 3 siblings, 0 replies; 15+ messages in thread From: John Allen @ 2023-09-12 21:18 UTC (permalink / raw) To: qemu-devel Cc: yazen.ghannam, michael.roth, babu.moger, william.roche, joao.m.martins, pbonzini, richard.henderson, eduardo, John Allen For the most part, AMD hosts can use the same MCE injection code as Intel, but there are instances where the qemu implementation is Intel specific. First, MCE delivery works differently on AMD and does not support broadcast. Second, kvm_mce_inject generates MCEs that include a number of Intel specific status bits. Modify kvm_mce_inject to properly generate MCEs on AMD platforms. Reported-by: William Roche <william.roche@oracle.com> Signed-off-by: John Allen <john.allen@amd.com> --- v3: - Update to latest qemu code that introduces using MCG_STATUS_RIPV in the case of a BUS_MCEERR_AR on a non-AMD machine. --- target/i386/helper.c | 4 ++++ target/i386/kvm/kvm.c | 17 +++++++++++------ 2 files changed, 15 insertions(+), 6 deletions(-) diff --git a/target/i386/helper.c b/target/i386/helper.c index 89aa696c6d..9547e2b09d 100644 --- a/target/i386/helper.c +++ b/target/i386/helper.c @@ -91,6 +91,10 @@ int cpu_x86_support_mca_broadcast(CPUX86State *env) int family = 0; int model = 0; + if (IS_AMD_CPU(env)) { + return 0; + } + cpu_x86_version(env, &family, &model); if ((family == 6 && model >= 14) || family > 6) { return 1; diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 639a242ad8..5fce74aac5 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -590,16 +590,21 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int code) CPUState *cs = CPU(cpu); CPUX86State *env = &cpu->env; uint64_t status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN | - MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S; + MCI_STATUS_MISCV | MCI_STATUS_ADDRV; uint64_t mcg_status = MCG_STATUS_MCIP; int flags = 0; - if (code == BUS_MCEERR_AR) { - status |= MCI_STATUS_AR | 0x134; - mcg_status |= MCG_STATUS_RIPV | MCG_STATUS_EIPV; + if (!IS_AMD_CPU(env)) { + status |= MCI_STATUS_S; + if (code == BUS_MCEERR_AR) { + status |= MCI_STATUS_AR | 0x134; + mcg_status |= MCG_STATUS_RIPV | MCG_STATUS_EIPV; + } else { + status |= 0xc0; + mcg_status |= MCG_STATUS_RIPV; + } } else { - status |= 0xc0; - mcg_status |= MCG_STATUS_RIPV; + mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV; } flags = cpu_x86_support_mca_broadcast(env) ? MCE_INJECT_BROADCAST : 0; -- 2.39.3 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest 2023-09-12 21:18 [PATCH v4 0/3] Fix MCE handling on AMD hosts John Allen 2023-09-12 21:18 ` [PATCH v4 1/3] i386: Fix MCE support for " John Allen @ 2023-09-12 21:18 ` John Allen 2023-09-13 3:22 ` Gupta, Pankaj 2023-09-18 22:00 ` William Roche 2023-09-12 21:18 ` [PATCH v4 3/3] i386: Add support for SUCCOR feature John Allen 2024-02-07 11:21 ` [PATCH v4 0/3] Fix MCE handling on AMD hosts Joao Martins 3 siblings, 2 replies; 15+ messages in thread From: John Allen @ 2023-09-12 21:18 UTC (permalink / raw) To: qemu-devel Cc: yazen.ghannam, michael.roth, babu.moger, william.roche, joao.m.martins, pbonzini, richard.henderson, eduardo From: William Roche <william.roche@oracle.com> AMD guests can't currently deal with BUS_MCEERR_AO MCE injection as it panics the VM kernel. We filter this event and provide a warning message. Signed-off-by: William Roche <william.roche@oracle.com> --- v3: - New patch v4: - Remove redundant check for AO errors --- target/i386/kvm/kvm.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 5fce74aac5..7e9fc0cac5 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -604,6 +604,10 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int code) mcg_status |= MCG_STATUS_RIPV; } } else { + if (code == BUS_MCEERR_AO) { + /* XXX we don't support BUS_MCEERR_AO injection on AMD yet */ + return; + } mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV; } @@ -668,8 +672,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) addr, paddr, "BUS_MCEERR_AR"); } else { warn_report("Guest MCE Memory Error at QEMU addr %p and " - "GUEST addr 0x%" HWADDR_PRIx " of type %s injected", - addr, paddr, "BUS_MCEERR_AO"); + "GUEST addr 0x%" HWADDR_PRIx " of type %s %s", + addr, paddr, "BUS_MCEERR_AO", + IS_AMD_CPU(env) ? "ignored on AMD guest" : "injected"); } return; -- 2.39.3 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest 2023-09-12 21:18 ` [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest John Allen @ 2023-09-13 3:22 ` Gupta, Pankaj 2023-09-18 22:00 ` William Roche 1 sibling, 0 replies; 15+ messages in thread From: Gupta, Pankaj @ 2023-09-13 3:22 UTC (permalink / raw) To: John Allen, qemu-devel Cc: yazen.ghannam, michael.roth, babu.moger, william.roche, joao.m.martins, pbonzini, richard.henderson, eduardo > From: William Roche <william.roche@oracle.com> > > AMD guests can't currently deal with BUS_MCEERR_AO MCE injection > as it panics the VM kernel. We filter this event and provide a > warning message. > > Signed-off-by: William Roche <william.roche@oracle.com> > --- > v3: > - New patch > v4: > - Remove redundant check for AO errors > --- > target/i386/kvm/kvm.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c > index 5fce74aac5..7e9fc0cac5 100644 > --- a/target/i386/kvm/kvm.c > +++ b/target/i386/kvm/kvm.c > @@ -604,6 +604,10 @@ static void kvm_mce_inject(X86CPU *cpu, hwaddr paddr, int code) > mcg_status |= MCG_STATUS_RIPV; > } > } else { > + if (code == BUS_MCEERR_AO) { > + /* XXX we don't support BUS_MCEERR_AO injection on AMD yet */ > + return; > + } > mcg_status |= MCG_STATUS_EIPV | MCG_STATUS_RIPV; > } > > @@ -668,8 +672,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) > addr, paddr, "BUS_MCEERR_AR"); > } else { > warn_report("Guest MCE Memory Error at QEMU addr %p and " > - "GUEST addr 0x%" HWADDR_PRIx " of type %s injected", > - addr, paddr, "BUS_MCEERR_AO"); > + "GUEST addr 0x%" HWADDR_PRIx " of type %s %s", > + addr, paddr, "BUS_MCEERR_AO", > + IS_AMD_CPU(env) ? "ignored on AMD guest" : "injected"); > } > > return; Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest 2023-09-12 21:18 ` [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest John Allen 2023-09-13 3:22 ` Gupta, Pankaj @ 2023-09-18 22:00 ` William Roche 2023-09-20 11:13 ` Joao Martins 1 sibling, 1 reply; 15+ messages in thread From: William Roche @ 2023-09-18 22:00 UTC (permalink / raw) To: John Allen, qemu-devel Cc: yazen.ghannam, michael.roth, babu.moger, joao.m.martins, pbonzini, richard.henderson, eduardo Hi John, I'd like to put the emphasis on the fact that ignoring the SRAO error for a VM is a real problem at least for a specific (rare) case I'm currently working on: The VM migration. Context: - In the case of a poisoned page in the VM address space, the migration can't read it and will skip this page, considering it as a zero-filled page. The VM kernel (that handled the vMCE) would have marked it's associated page as poisoned, and if the VM touches the page, the VM kernel generates the associated MCE because it already knows about the poisoned page. - When we ignore the vMCE in the case of a SIGBUS/BUS_MCEERR_AO error (what this patch does), we entirely rely on the Hypervisor to send an SRAR error to qemu when the page is touched: The AMD VM kernel will receive the SIGBUS/BUS_MCEERR_AR and deal with it, thanks to your changes here. So it looks like the mechanism works fine... unless the VM has migrated between the SRAO error and the first time it really touches the poisoned page to get an SRAR error ! In this case, its new address space (created on the migration destination) will have a zero-page where we had a poisoned page, and the AMD VM Kernel (that never dealt with the SRAO) doesn't know about the poisoned page and will access the page finding only zeros... We have a memory corruption ! It is a very rare window, but in order to fix it the most reasonable course of action would be to make the AMD emulation deal with SRAO errors, instead of ignoring them. Do you agree with my analysis ? Would an AMD platform generate SRAO signal to a process (SIGBUS/BUS_MCEERR_AO) in case of a real hardware error ? Thanks, William. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest 2023-09-18 22:00 ` William Roche @ 2023-09-20 11:13 ` Joao Martins 2023-09-21 17:41 ` Yazen Ghannam 0 siblings, 1 reply; 15+ messages in thread From: Joao Martins @ 2023-09-20 11:13 UTC (permalink / raw) To: William Roche, John Allen, qemu-devel Cc: yazen.ghannam, michael.roth, babu.moger, pbonzini, richard.henderson, eduardo On 18/09/2023 23:00, William Roche wrote: > Hi John, > > I'd like to put the emphasis on the fact that ignoring the SRAO error > for a VM is a real problem at least for a specific (rare) case I'm > currently working on: The VM migration. > > Context: > > - In the case of a poisoned page in the VM address space, the migration > can't read it and will skip this page, considering it as a zero-filled > page. The VM kernel (that handled the vMCE) would have marked it's > associated page as poisoned, and if the VM touches the page, the VM > kernel generates the associated MCE because it already knows about the > poisoned page. > > - When we ignore the vMCE in the case of a SIGBUS/BUS_MCEERR_AO error > (what this patch does), we entirely rely on the Hypervisor to send an > SRAR error to qemu when the page is touched: The AMD VM kernel will > receive the SIGBUS/BUS_MCEERR_AR and deal with it, thanks to your > changes here. > > So it looks like the mechanism works fine... unless the VM has migrated > between the SRAO error and the first time it really touches the poisoned > page to get an SRAR error ! In this case, its new address space > (created on the migration destination) will have a zero-page where we > had a poisoned page, and the AMD VM Kernel (that never dealt with the > SRAO) doesn't know about the poisoned page and will access the page > finding only zeros... We have a memory corruption ! > > It is a very rare window, but in order to fix it the most reasonable > course of action would be to make the AMD emulation deal with SRAO > errors, instead of ignoring them. > > Do you agree with my analysis ? Under the case that SRAO aren't handled well in the kernel today[*] for AMD, we could always add a migration blocker when we hit AO sigbus, in case ignoring is our only option. But this would be less than ideal to propagating the SRAO into the guest. [*] Meaning knowing that handling the SRAO would generate a crash in the guest Perhaps as an improvement, perhaps allow qemu to choose to propagate should this limitation be lifted via a new -action value and allow it to ignore/propagate or not e.g. -action mce=none # default on Intel to propagate all MCE events to the guest -action mce=ignore-optional # Ignore SRAO I suppose the second is also useful for ARM64 considering they currently ignore SRAO events too. > Would an AMD platform generate SRAO signal to a process > (SIGBUS/BUS_MCEERR_AO) in case of a real hardware error ? > This would be useful to confirm. > Thanks, > William. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest 2023-09-20 11:13 ` Joao Martins @ 2023-09-21 17:41 ` Yazen Ghannam 2023-09-22 8:36 ` William Roche 0 siblings, 1 reply; 15+ messages in thread From: Yazen Ghannam @ 2023-09-21 17:41 UTC (permalink / raw) To: Joao Martins, William Roche, John Allen, qemu-devel Cc: yazen.ghannam, michael.roth, babu.moger, pbonzini, richard.henderson, eduardo On 9/20/23 7:13 AM, Joao Martins wrote: > On 18/09/2023 23:00, William Roche wrote: >> Hi John, >> >> I'd like to put the emphasis on the fact that ignoring the SRAO error >> for a VM is a real problem at least for a specific (rare) case I'm >> currently working on: The VM migration. >> >> Context: >> >> - In the case of a poisoned page in the VM address space, the migration >> can't read it and will skip this page, considering it as a zero-filled >> page. The VM kernel (that handled the vMCE) would have marked it's >> associated page as poisoned, and if the VM touches the page, the VM >> kernel generates the associated MCE because it already knows about the >> poisoned page. >> >> - When we ignore the vMCE in the case of a SIGBUS/BUS_MCEERR_AO error >> (what this patch does), we entirely rely on the Hypervisor to send an >> SRAR error to qemu when the page is touched: The AMD VM kernel will >> receive the SIGBUS/BUS_MCEERR_AR and deal with it, thanks to your >> changes here. >> >> So it looks like the mechanism works fine... unless the VM has migrated >> between the SRAO error and the first time it really touches the poisoned >> page to get an SRAR error ! In this case, its new address space >> (created on the migration destination) will have a zero-page where we >> had a poisoned page, and the AMD VM Kernel (that never dealt with the >> SRAO) doesn't know about the poisoned page and will access the page >> finding only zeros... We have a memory corruption ! I don't understand this. Why would the page be zero? Even so, why would that affect poison? Also, during page migration, does the data flow through the CPU core? Sorry for the basic question. I haven't done a lot with virtualization. Please note that current AMD systems use an internal poison marker on memory. This cannot be cleared through normal memory operations. The only exception, I think, is to use the CLZERO instruction. This will completely wipe a cacheline including metadata like poison, etc. So the hardware should not (by design) loose track of poisoned data. >> >> It is a very rare window, but in order to fix it the most reasonable >> course of action would be to make the AMD emulation deal with SRAO >> errors, instead of ignoring them. >> >> Do you agree with my analysis ? > > Under the case that SRAO aren't handled well in the kernel today[*] for AMD, we > could always add a migration blocker when we hit AO sigbus, in case ignoring > is our only option. But this would be less than ideal to propagating the > SRAO into the guest. > > [*] Meaning knowing that handling the SRAO would generate a crash in the guest > > Perhaps as an improvement, perhaps allow qemu to choose to propagate should this > limitation be lifted via a new -action value and allow it to ignore/propagate or > not e.g. > > -action mce=none # default on Intel to propagate all MCE events to the guest > -action mce=ignore-optional # Ignore SRAO > > I suppose the second is also useful for ARM64 considering they currently ignore > SRAO events too. > >> Would an AMD platform generate SRAO signal to a process >> (SIGBUS/BUS_MCEERR_AO) in case of a real hardware error ? >> > This would be useful to confirm. > There is no SRAO signal on AMD. The closest equivalent may be a "Deferred" error interrupt. This is an x86 APIC LVT interrupt, and it's sent when a deferred (uncorrectable non-urgent) error is detected by a memory controller. In this case, the CPU will get the interrupt and log the error (in the host). An enhancement will be to take the MCA error information collected during the interrupt and extract useful data. For example, we'll need to translate the reported address to a system physical address that can be mapped to a page. Once we have the page, then we can decide how we want to signal the process(es). We could get a deferred/AO error in the host, and signal the guest with an AR. So the guest handling could be the same in both cases. Would this be okay? Or is it important that the guest can distinguish between the A0/AR cases? IOW, will guests have their own policies on when to take action? Or is it more about allowing the guest to handle the error less urgently? Thanks, Yazen ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest 2023-09-21 17:41 ` Yazen Ghannam @ 2023-09-22 8:36 ` William Roche 2023-09-22 14:30 ` Yazen Ghannam 0 siblings, 1 reply; 15+ messages in thread From: William Roche @ 2023-09-22 8:36 UTC (permalink / raw) To: Yazen Ghannam, Joao Martins, John Allen, qemu-devel Cc: michael.roth, babu.moger, pbonzini, richard.henderson, eduardo On 9/21/23 19:41, Yazen Ghannam wrote: > On 9/20/23 7:13 AM, Joao Martins wrote: >> On 18/09/2023 23:00, William Roche wrote: >>> [...] >>> So it looks like the mechanism works fine... unless the VM has migrated >>> between the SRAO error and the first time it really touches the poisoned >>> page to get an SRAR error ! In this case, its new address space >>> (created on the migration destination) will have a zero-page where we >>> had a poisoned page, and the AMD VM Kernel (that never dealt with the >>> SRAO) doesn't know about the poisoned page and will access the page >>> finding only zeros... We have a memory corruption ! > > I don't understand this. Why would the page be zero? Even so, why would > that affect poison? The migration of a VM moves the memory content from a source platform to a destination. This is mainly the qemu processes reading the data and replicating it on the destination. The source qemu where a memory page is poisoned is(will be[*]) able to skip the poisoned pages it knows about to indicate to the destination machine to populate the associated page(s) with zeros as there is no "poison destination page" mechanism in place for this migration transfer. > > Also, during page migration, does the data flow through the CPU core? > Sorry for the basic question. I haven't done a lot with virtualization. Yes, in most cases (with the exception of RDMA) the data flow through the CPU cores because the migration verifies if the area to transfer has some empty pages. > > Please note that current AMD systems use an internal poison marker on > memory. This cannot be cleared through normal memory operations. The > only exception, I think, is to use the CLZERO instruction. This will > completely wipe a cacheline including metadata like poison, etc. > > So the hardware should not (by design) loose track of poisoned data. This would be better, but virtualization migration currently looses track of this. Which is not a problem for VMs where the kernel took note of the poison and keeps track of it. Because this kernel will handle the poison locations it knows about, signaling when these poisoned locations are touched. > >>> >>> It is a very rare window, but in order to fix it the most reasonable >>> course of action would be to make the AMD emulation deal with SRAO >>> errors, instead of ignoring them. >>> >>> Do you agree with my analysis ? >> >> Under the case that SRAO aren't handled well in the kernel today[*] for AMD, we >> could always add a migration blocker when we hit AO sigbus, in case ignoring >> is our only option. But this would be less than ideal to propagating the >> SRAO into the guest. >> >> [*] Meaning knowing that handling the SRAO would generate a crash in the guest >> >> Perhaps as an improvement, perhaps allow qemu to choose to propagate should this >> limitation be lifted via a new -action value and allow it to ignore/propagate or >> not e.g. >> >> -action mce=none # default on Intel to propagate all MCE events to the guest >> -action mce=ignore-optional # Ignore SRAO Yes we may need to create something like that, but missing SRAO has technical consequences too. >> >> I suppose the second is also useful for ARM64 considering they currently ignore >> SRAO events too. >> >>> Would an AMD platform generate SRAO signal to a process >>> (SIGBUS/BUS_MCEERR_AO) in case of a real hardware error ? >>> >> This would be useful to confirm. >> > > There is no SRAO signal on AMD. The closest equivalent may be a > "Deferred" error interrupt. This is an x86 APIC LVT interrupt, and it's > sent when a deferred (uncorrectable non-urgent) error is detected by a > memory controller. > > In this case, the CPU will get the interrupt and log the error (in the > host). > > An enhancement will be to take the MCA error information collected > during the interrupt and extract useful data. For example, we'll need to > translate the reported address to a system physical address that can be > mapped to a page. This would be great, as it would mean that a kernel running in a VM can get notified too. > > Once we have the page, then we can decide how we want to signal the > process(es). We could get a deferred/AO error in the host, and signal the > guest with an AR. So the guest handling could be the same in both cases. > > Would this be okay? Or is it important that the guest can distinguish > between the A0/AR cases? SIGBUS/BUS_MCEERR_AO and BUS_MCEERR_AR are not interchangeable, it is important to distinguish them. AO is an asynchronous signal that is only generated when the process asked for it -- indicating that an error has been detected in its address space but hasn't been touched yet. Most of the processes don't care about that (and don't get notified), they just continue to run, if the poisoned area is not touched, great. Otherwise a BUS_MCEERR_AR signal is generated when the area is touched, indicating that the execution thread can't access the location. > IOW, will guests have their own policies on > when to take action? Or is it more about allowing the guest to handle > the error less urgently? Yes to both questions. Any process can indicate if it is interested to be "early killed on MCE" or not. See proc(5) man page about /proc/sys/vm/memory_failure_early_kill, and prctl(2) about PR_MCE_KILL/PR_MCE_KILL_GET. Such a process could take actions before it's too late and it would need the poisoned data. Now if an AMD system doesn't warn a process when a Deferred errors occurs, and only generates SIGBUS/BUS_MCEERR_AR errors when the poison is touched, it means that its processes don't benefit from an "early kill" and can't take actions to anticipate a synchronous error. In such case, ignoring BUS_MCEERR_AO would just help qemu not to crash in case of "fake/software/injected" signals. And the case of reading the entire memory (like a migration) would need to be extra careful with a more probable SIGBUS/BUS_MCEERR_AR signal, which makes the mechanism more complicated, but would make more sense for AMD and ARM64 too. (Note that there are still cases where a BUS_MCEERR_AO capable system can miss an error that is revealed when reading the entire memory, in this case we currently crash) [*] See my patch proposal for: "Qemu crashes on VM migration after an handled memory error" In other words, having the AMD kernel to generate SIGBUS/BUS_MCEERR_AO signals and making AMD qemu able to relay them to the VM kernel would make things better for AMD platforms ;) HTH, William. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest 2023-09-22 8:36 ` William Roche @ 2023-09-22 14:30 ` Yazen Ghannam 2023-09-22 16:18 ` William Roche 2023-10-13 15:41 ` William Roche 0 siblings, 2 replies; 15+ messages in thread From: Yazen Ghannam @ 2023-09-22 14:30 UTC (permalink / raw) To: William Roche, Joao Martins, John Allen, qemu-devel Cc: yazen.ghannam, michael.roth, babu.moger, pbonzini, richard.henderson, eduardo On 9/22/23 4:36 AM, William Roche wrote: > On 9/21/23 19:41, Yazen Ghannam wrote: >> On 9/20/23 7:13 AM, Joao Martins wrote: >>> On 18/09/2023 23:00, William Roche wrote: >>>> [...] >>>> So it looks like the mechanism works fine... unless the VM has migrated >>>> between the SRAO error and the first time it really touches the poisoned >>>> page to get an SRAR error ! In this case, its new address space >>>> (created on the migration destination) will have a zero-page where we >>>> had a poisoned page, and the AMD VM Kernel (that never dealt with the >>>> SRAO) doesn't know about the poisoned page and will access the page >>>> finding only zeros... We have a memory corruption ! >> >> I don't understand this. Why would the page be zero? Even so, why would >> that affect poison? > > The migration of a VM moves the memory content from a source platform to > a destination. This is mainly the qemu processes reading the data and > replicating it on the destination. The source qemu where a memory page > is poisoned is(will be[*]) able to skip the poisoned pages it knows > about to indicate to the destination machine to populate the associated > page(s) with zeros as there is no "poison destination page" mechanism in > place for this migration transfer. > >> >> Also, during page migration, does the data flow through the CPU core? >> Sorry for the basic question. I haven't done a lot with virtualization. > > Yes, in most cases (with the exception of RDMA) the data flow through > the CPU cores because the migration verifies if the area to transfer has > some empty pages. > If the CPU moves the memory, then the data will pass through the core/L1 caches, correct? If so, then this will result in a MCE/poison consumption/AR event in that core. So it seems to me that migration will always cause an AR event, and the gap you describe will not occur. Does this make sense? Sorry if I misunderstood. In general, the hardware is designed to detect and mark poison, and to not let poison escape a system undetected. In the strictest case, the hardware will perform a system reset if poison is leaving the system. In a more graceful case, the hardware will continue to pass the poison marker with the data, so the destination hardware will receive it. In both cases, the goal is to avoid silent data corruption, and to do so in the hardware, i.e. without relying on firmware or software management. The hardware designers are very keen on this point. BTW, the RDMA case will need further discussion. I *think* this would fall under the "strictest" case. And likely, CPU-based migration will also. But I think we can test this and find out. :) >> >> Please note that current AMD systems use an internal poison marker on >> memory. This cannot be cleared through normal memory operations. The >> only exception, I think, is to use the CLZERO instruction. This will >> completely wipe a cacheline including metadata like poison, etc. >> >> So the hardware should not (by design) loose track of poisoned data. > > This would be better, but virtualization migration currently looses > track of this. > Which is not a problem for VMs where the kernel took note of the poison > and keeps track of it. Because this kernel will handle the poison > locations it knows about, signaling when these poisoned locations are > touched. > Can you please elaborate on this? I would expect the host kernel to do all the physical, including poison, memory management. Or do you mean in the nested poison case like this? 1) The host detects an "AO/deferred" error. 2) The host can try to recover the memory, if clean, etc. 3) Otherwise, the host passes the error info, with "AO/deferred" severity to the guest. 4) The guest, in nested fashion, can try to recover the memory, if clean, etc. Or signal its own processes with the AO SIGBUS. >> >>>> >>>> It is a very rare window, but in order to fix it the most reasonable >>>> course of action would be to make the AMD emulation deal with SRAO >>>> errors, instead of ignoring them. >>>> >>>> Do you agree with my analysis ? >>> >>> Under the case that SRAO aren't handled well in the kernel today[*] for AMD, we >>> could always add a migration blocker when we hit AO sigbus, in case ignoring >>> is our only option. But this would be less than ideal to propagating the >>> SRAO into the guest. >>> >>> [*] Meaning knowing that handling the SRAO would generate a crash in the guest >>> >>> Perhaps as an improvement, perhaps allow qemu to choose to propagate should this >>> limitation be lifted via a new -action value and allow it to ignore/propagate or >>> not e.g. >>> >>> -action mce=none # default on Intel to propagate all MCE events to the guest >>> -action mce=ignore-optional # Ignore SRAO > > Yes we may need to create something like that, but missing SRAO has > technical consequences too. > >>> >>> I suppose the second is also useful for ARM64 considering they currently ignore >>> SRAO events too. >>> >>>> Would an AMD platform generate SRAO signal to a process >>>> (SIGBUS/BUS_MCEERR_AO) in case of a real hardware error ? >>>> >>> This would be useful to confirm. >>> >> >> There is no SRAO signal on AMD. The closest equivalent may be a >> "Deferred" error interrupt. This is an x86 APIC LVT interrupt, and it's >> sent when a deferred (uncorrectable non-urgent) error is detected by a >> memory controller. >> >> In this case, the CPU will get the interrupt and log the error (in the >> host). >> >> An enhancement will be to take the MCA error information collected >> during the interrupt and extract useful data. For example, we'll need to >> translate the reported address to a system physical address that can be >> mapped to a page. > > This would be great, as it would mean that a kernel running in a VM can > get notified too. > Yes, I agree. >> >> Once we have the page, then we can decide how we want to signal the >> process(es). We could get a deferred/AO error in the host, and signal the >> guest with an AR. So the guest handling could be the same in both cases. > >> Would this be okay? Or is it important that the guest can distinguish >> between the A0/AR cases? > > > SIGBUS/BUS_MCEERR_AO and BUS_MCEERR_AR are not interchangeable, it is > important to distinguish them. > AO is an asynchronous signal that is only generated when the process > asked for it -- indicating that an error has been detected in its > address space but hasn't been touched yet. > Most of the processes don't care about that (and don't get notified), > they just continue to run, if the poisoned area is not touched, great. > Otherwise a BUS_MCEERR_AR signal is generated when the area is touched, > indicating that the execution thread can't access the location. > Yes, understood. > >> IOW, will guests have their own policies on >> when to take action? Or is it more about allowing the guest to handle >> the error less urgently? > > Yes to both questions. Any process can indicate if it is interested to > be "early killed on MCE" or not. See proc(5) man page about > /proc/sys/vm/memory_failure_early_kill, and prctl(2) about > PR_MCE_KILL/PR_MCE_KILL_GET. Such a process could take actions before > it's too late and it would need the poisoned data. > Yes, agree. I think the "nested" case above would fall under this. Also, an application, or software stack, with complex memory management could benefit. I'm thinking something like a long-running HPC application with multiple checkpoints or stages. It could choose to ensure its memory space is clean before starting a stage, or restart from an earlier checkpoint if some data was bad, etc. In any case, the entire application doesn't need to be killed if 4kB are bad within its entire 1TB address space, for example. > Now if an AMD system doesn't warn a process when a Deferred errors > occurs, and only generates SIGBUS/BUS_MCEERR_AR errors when the poison > is touched, it means that its processes don't benefit from an "early > kill" and can't take actions to anticipate a synchronous error. > > In such case, ignoring BUS_MCEERR_AO would just help qemu not to crash > in case of "fake/software/injected" signals. And the case of reading the > entire memory (like a migration) would need to be extra careful with a > more probable SIGBUS/BUS_MCEERR_AR signal, which makes the mechanism > more complicated, but would make more sense for AMD and ARM64 too. > (Note that there are still cases where a BUS_MCEERR_AO capable system > can miss an error that is revealed when reading the entire memory, in > this case we currently crash) > > > [*] See my patch proposal for: > "Qemu crashes on VM migration after an handled memory error" > > In other words, having the AMD kernel to generate SIGBUS/BUS_MCEERR_AO > signals and making AMD qemu able to relay them to the VM kernel would > make things better for AMD platforms ;) > Yes, I agree. :) Thanks, Yazen ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest 2023-09-22 14:30 ` Yazen Ghannam @ 2023-09-22 16:18 ` William Roche 2023-10-13 15:41 ` William Roche 1 sibling, 0 replies; 15+ messages in thread From: William Roche @ 2023-09-22 16:18 UTC (permalink / raw) To: Yazen Ghannam, Joao Martins, John Allen, qemu-devel Cc: michael.roth, babu.moger, pbonzini, richard.henderson, eduardo On 9/22/23 16:30, Yazen Ghannam wrote: > On 9/22/23 4:36 AM, William Roche wrote: >> On 9/21/23 19:41, Yazen Ghannam wrote: >>> [...] >>> Also, during page migration, does the data flow through the CPU core? >>> Sorry for the basic question. I haven't done a lot with virtualization. >> >> Yes, in most cases (with the exception of RDMA) the data flow through >> the CPU cores because the migration verifies if the area to transfer has >> some empty pages. >> > > If the CPU moves the memory, then the data will pass through the core/L1 > caches, correct? If so, then this will result in a MCE/poison > consumption/AR event in that core. That's the entire point of this other patch I was referring to: "Qemu crashes on VM migration after an handled memory error" an example of a direct link: https://www.mail-archive.com/qemu-devel@nongnu.org/msg990803.html The idea is to skip the pages we know are poisoned -- so we have a chance to complete the migration without getting AR events :) > > So it seems to me that migration will always cause an AR event, and the > gap you describe will not occur. Does this make sense? Sorry if I > misunderstood. > > In general, the hardware is designed to detect and mark poison, and to > not let poison escape a system undetected. In the strictest case, the > hardware will perform a system reset if poison is leaving the system. In > a more graceful case, the hardware will continue to pass the poison > marker with the data, so the destination hardware will receive it. In > both cases, the goal is to avoid silent data corruption, and to do so in > the hardware, i.e. without relying on firmware or software management. > The hardware designers are very keen on this point. For the moment virtualization needs *several* enhancements just to deal with memory errors -- what we are currently trying to fix is a good example of that ! > > BTW, the RDMA case will need further discussion. I *think* this would > fall under the "strictest" case. And likely, CPU-based migration will > also. But I think we can test this and find out. :) The test has been done, and showed that the RDMA migration is failing when poison exists. But we are discussing aspects that are probably too far from our main topic here. > >>> >>> Please note that current AMD systems use an internal poison marker on >>> memory. This cannot be cleared through normal memory operations. The >>> only exception, I think, is to use the CLZERO instruction. This will >>> completely wipe a cacheline including metadata like poison, etc. >>> >>> So the hardware should not (by design) loose track of poisoned data. >> >> This would be better, but virtualization migration currently looses >> track of this. >> Which is not a problem for VMs where the kernel took note of the poison >> and keeps track of it. Because this kernel will handle the poison >> locations it knows about, signaling when these poisoned locations are >> touched. >> > > Can you please elaborate on this? I would expect the host kernel to do > all the physical, including poison, memory management. Yes, the host kernel does that, and the VM kernel too for its own address space. > > Or do you mean in the nested poison case like this? > 1) The host detects an "AO/deferred" error. The host Kernel is notified by the hardware of an SRAO/deferred error > 2) The host can try to recover the memory, if clean, etc. From my understanding, this is an uncorrectable error, standard case Kernel can't "clean" the error, but keeps track of it and tries to signal the user of the impacted memory page every-time it's needed. > 3) Otherwise, the host passes the error info, with "AO/deferred" severity > to the guest. Yes, in the case of a guest VM impacted, qemu asked to be informed of AO events, so that the host kernel should signal it to qemu. Qemu than relays the information (creating a virtual MCE event) that the VM Kernel receives and deals with. > 4) The guest, in nested fashion, can try to recover the memory, if > clean, etc. Or signal its own processes with the AO SIGBUS. Here again there is no recovery: The VM kernel does the same thing as the host kernel: memory management, possible signals, etc... >>> An enhancement will be to take the MCA error information collected >>> during the interrupt and extract useful data. For example, we'll need to >>> translate the reported address to a system physical address that can be >>> mapped to a page. >> >> This would be great, as it would mean that a kernel running in a VM can >> get notified too. >> > > Yes, I agree. > >>> >>> Once we have the page, then we can decide how we want to signal the >>> process(es). We could get a deferred/AO error in the host, and signal the >>> guest with an AR. So the guest handling could be the same in both cases. > >>> Would this be okay? Or is it important that the guest can distinguish >>> between the A0/AR cases? >> >> >> SIGBUS/BUS_MCEERR_AO and BUS_MCEERR_AR are not interchangeable, it is >> important to distinguish them. >> AO is an asynchronous signal that is only generated when the process >> asked for it -- indicating that an error has been detected in its >> address space but hasn't been touched yet. >> Most of the processes don't care about that (and don't get notified), >> they just continue to run, if the poisoned area is not touched, great. >> Otherwise a BUS_MCEERR_AR signal is generated when the area is touched, >> indicating that the execution thread can't access the location. >> > > Yes, understood. > >> >>> IOW, will guests have their own policies on >>> when to take action? Or is it more about allowing the guest to handle >>> the error less urgently? >> >> Yes to both questions. Any process can indicate if it is interested to >> be "early killed on MCE" or not. See proc(5) man page about >> /proc/sys/vm/memory_failure_early_kill, and prctl(2) about >> PR_MCE_KILL/PR_MCE_KILL_GET. Such a process could take actions before >> it's too late and it would need the poisoned data. >> > > Yes, agree. I think the "nested" case above would fall under this. Also, > an application, or software stack, with complex memory management could > benefit. Sure -- some databases already take advantage of this mechanism for example too ;) >> In other words, having the AMD kernel to generate SIGBUS/BUS_MCEERR_AO >> signals and making AMD qemu able to relay them to the VM kernel would >> make things better for AMD platforms ;) >> > > Yes, I agree. :) So according to me, for the moment we should integrate the 3 proposed patches, and continue to work to make: - the AMD kernel deal better with SRAO both on the host and the VM sides, - in relationship with another qemu enhancement to relay the BUS_MCEERR_AO signal so that the VM kernel deals with it too. The reason why I started this conversation was to know if there would be a simple way to already informed the VM kernel of an AO signal (without crashing it) even if it is not yet able to relay the event to its own processes. But this would prepare qemu so that when the kernel is enhanced, it may not be necessary to modify qemu again. The patches we are currently focusing on (Fix MCE handling on AMD hosts) help to better deal with BUS_MCEERR_AR signal instead of crashing -- this looks like a necessary step to me. HTH, William. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest 2023-09-22 14:30 ` Yazen Ghannam 2023-09-22 16:18 ` William Roche @ 2023-10-13 15:41 ` William Roche 1 sibling, 0 replies; 15+ messages in thread From: William Roche @ 2023-10-13 15:41 UTC (permalink / raw) To: qemu-devel, John Allen, Yazen Ghannam, Joao Martins Cc: michael.roth, babu.moger, pbonzini, richard.henderson, eduardo Just a note to inform you that I've submitted a new patch on a separate thread -- dealing with VM live migration after receiving memory errors: https://lore.kernel.org/qemu-devel/20231013150839.867164-3-william.roche@oracle.com/ This patch belongs to a 2 patches set that should fix the migration in case of memory errors received and handled by the VM before the migration request. For the moment this other patch only fixes the ARM case ignoring SIGBUS/BUS_MCEERR_AO errors, but the same mechanism should be used with AMD ignoring SIGBUS/BUS_MCEERR_AO too. Using the same new parameter to the kvm_hwpoison_page_add function in kvm_arch_on_sigbus_vcpu with: kvm_hwpoison_page_add(ram_addr, (code == BUS_MCEERR_AR)); Of course we'll have to wait for this above patch to be integrated first. HTH, William. On 9/19/23 00:00, William Roche wrote: > Hi John, > > I'd like to put the emphasis on the fact that ignoring the SRAO error > for a VM is a real problem at least for a specific (rare) case I'm > currently working on: The VM migration. > > Context: > > - In the case of a poisoned page in the VM address space, the migration > can't read it and will skip this page, considering it as a zero-filled > page. The VM kernel (that handled the vMCE) would have marked it's > associated page as poisoned, and if the VM touches the page, the VM > kernel generates the associated MCE because it already knows about the > poisoned page. > > - When we ignore the vMCE in the case of a SIGBUS/BUS_MCEERR_AO error > (what this patch does), we entirely rely on the Hypervisor to send an > SRAR error to qemu when the page is touched: The AMD VM kernel will > receive the SIGBUS/BUS_MCEERR_AR and deal with it, thanks to your > changes here. > > So it looks like the mechanism works fine... unless the VM has migrated > between the SRAO error and the first time it really touches the poisoned > page to get an SRAR error ! In this case, its new address space > (created on the migration destination) will have a zero-page where we > had a poisoned page, and the AMD VM Kernel (that never dealt with the > SRAO) doesn't know about the poisoned page and will access the page > finding only zeros... We have a memory corruption ! > > It is a very rare window, but in order to fix it the most reasonable > course of action would be to make the AMD emulation deal with SRAO > errors, instead of ignoring them. > > Do you agree with my analysis ? > Would an AMD platform generate SRAO signal to a process > (SIGBUS/BUS_MCEERR_AO) in case of a real hardware error ? > > Thanks, > William. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v4 3/3] i386: Add support for SUCCOR feature 2023-09-12 21:18 [PATCH v4 0/3] Fix MCE handling on AMD hosts John Allen 2023-09-12 21:18 ` [PATCH v4 1/3] i386: Fix MCE support for " John Allen 2023-09-12 21:18 ` [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest John Allen @ 2023-09-12 21:18 ` John Allen 2024-02-07 11:21 ` [PATCH v4 0/3] Fix MCE handling on AMD hosts Joao Martins 3 siblings, 0 replies; 15+ messages in thread From: John Allen @ 2023-09-12 21:18 UTC (permalink / raw) To: qemu-devel Cc: yazen.ghannam, michael.roth, babu.moger, william.roche, joao.m.martins, pbonzini, richard.henderson, eduardo, John Allen Add cpuid bit definition for the SUCCOR feature. This cpuid bit is required to be exposed to guests to allow them to handle machine check exceptions on AMD hosts. Reported-by: William Roche <william.roche@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: John Allen <john.allen@amd.com> ---- v2: - Add "succor" feature word. - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature. --- target/i386/cpu.c | 18 +++++++++++++++++- target/i386/cpu.h | 4 ++++ target/i386/kvm/kvm.c | 2 ++ 3 files changed, 23 insertions(+), 1 deletion(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 00f913b638..d90d3a9489 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -1029,6 +1029,22 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = { .tcg_features = TCG_APM_FEATURES, .unmigratable_flags = CPUID_APM_INVTSC, }, + [FEAT_8000_0007_EBX] = { + .type = CPUID_FEATURE_WORD, + .feat_names = { + NULL, "succor", NULL, NULL, + NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, + }, + .cpuid = { .eax = 0x80000007, .reg = R_EBX, }, + .tcg_features = 0, + .unmigratable_flags = 0, + }, [FEAT_8000_0008_EBX] = { .type = CPUID_FEATURE_WORD, .feat_names = { @@ -6554,7 +6570,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, break; case 0x80000007: *eax = 0; - *ebx = 0; + *ebx = env->features[FEAT_8000_0007_EBX]; *ecx = 0; *edx = env->features[FEAT_8000_0007_EDX]; break; diff --git a/target/i386/cpu.h b/target/i386/cpu.h index a6000e93bd..f5afc5e4fd 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -598,6 +598,7 @@ typedef enum FeatureWord { FEAT_7_1_EAX, /* CPUID[EAX=7,ECX=1].EAX */ FEAT_8000_0001_EDX, /* CPUID[8000_0001].EDX */ FEAT_8000_0001_ECX, /* CPUID[8000_0001].ECX */ + FEAT_8000_0007_EBX, /* CPUID[8000_0007].EBX */ FEAT_8000_0007_EDX, /* CPUID[8000_0007].EDX */ FEAT_8000_0008_EBX, /* CPUID[8000_0008].EBX */ FEAT_8000_0021_EAX, /* CPUID[8000_0021].EAX */ @@ -942,6 +943,9 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w, /* Packets which contain IP payload have LIP values */ #define CPUID_14_0_ECX_LIP (1U << 31) +/* RAS Features */ +#define CPUID_8000_0007_EBX_SUCCOR (1U << 1) + /* CLZERO instruction */ #define CPUID_8000_0008_EBX_CLZERO (1U << 0) /* Always save/restore FP error pointers */ diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 7e9fc0cac5..15a642a894 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -477,6 +477,8 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function, */ cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX); ret |= cpuid_1_edx & CPUID_EXT2_AMD_ALIASES; + } else if (function == 0x80000007 && reg == R_EBX) { + ret |= CPUID_8000_0007_EBX_SUCCOR; } else if (function == KVM_CPUID_FEATURES && reg == R_EAX) { /* kvm_pv_unhalt is reported by GET_SUPPORTED_CPUID, but it can't * be enabled without the in-kernel irqchip -- 2.39.3 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v4 0/3] Fix MCE handling on AMD hosts 2023-09-12 21:18 [PATCH v4 0/3] Fix MCE handling on AMD hosts John Allen ` (2 preceding siblings ...) 2023-09-12 21:18 ` [PATCH v4 3/3] i386: Add support for SUCCOR feature John Allen @ 2024-02-07 11:21 ` Joao Martins 2024-02-20 17:27 ` John Allen 3 siblings, 1 reply; 15+ messages in thread From: Joao Martins @ 2024-02-07 11:21 UTC (permalink / raw) To: John Allen, pbonzini, william.roche Cc: yazen.ghannam, michael.roth, babu.moger, richard.henderson, eduardo, qemu-devel, peterx@redhat.com On 12/09/2023 22:18, John Allen wrote: > In the event that a guest process attempts to access memory that has > been poisoned in response to a deferred uncorrected MCE, an AMD system > will currently generate a SIGBUS error which will result in the entire > guest being shutdown. Ideally, we only want to kill the guest process > that accessed poisoned memory in this case. > > This support has been included in qemu for Intel hosts for a long time, > but there are a couple of changes needed for AMD hosts. First, we will > need to expose the SUCCOR cpuid bit to guests. Second, we need to modify > the MCE injection code to avoid Intel specific behavior when we are > running on an AMD host. > Is there any update with respect to this series? John's series should fix MCE injection on AMD; as today it is just crashing the guest (sadly) when an MCE happens in the hypervisor. William, Paolo, I think the sort-of-dependency(?) of this where we block migration if there was a poisoned page on is already in Peter's migration tree[1] (CC'ed). So perhaps this series just needs John to resend it given that it's been a couple months since v4? [1] https://lore.kernel.org/qemu-devel/20240130190640.139364-2-william.roche@oracle.com/ > v2: > - Add "succor" feature word. > - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature. > > v3: > - Reorder series. Only enable SUCCOR after bugs have been fixed. > - Introduce new patch ignoring AO errors. > > v4: > - Remove redundant check for AO errors. > > John Allen (2): > i386: Fix MCE support for AMD hosts > i386: Add support for SUCCOR feature > > William Roche (1): > i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest > > target/i386/cpu.c | 18 +++++++++++++++++- > target/i386/cpu.h | 4 ++++ > target/i386/helper.c | 4 ++++ > target/i386/kvm/kvm.c | 28 ++++++++++++++++++++-------- > 4 files changed, 45 insertions(+), 9 deletions(-) > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 0/3] Fix MCE handling on AMD hosts 2024-02-07 11:21 ` [PATCH v4 0/3] Fix MCE handling on AMD hosts Joao Martins @ 2024-02-20 17:27 ` John Allen 2024-02-21 11:42 ` Joao Martins 0 siblings, 1 reply; 15+ messages in thread From: John Allen @ 2024-02-20 17:27 UTC (permalink / raw) To: Joao Martins Cc: pbonzini, william.roche, yazen.ghannam, michael.roth, babu.moger, richard.henderson, eduardo, qemu-devel, peterx@redhat.com On Wed, Feb 07, 2024 at 11:21:05AM +0000, Joao Martins wrote: > On 12/09/2023 22:18, John Allen wrote: > > In the event that a guest process attempts to access memory that has > > been poisoned in response to a deferred uncorrected MCE, an AMD system > > will currently generate a SIGBUS error which will result in the entire > > guest being shutdown. Ideally, we only want to kill the guest process > > that accessed poisoned memory in this case. > > > > This support has been included in qemu for Intel hosts for a long time, > > but there are a couple of changes needed for AMD hosts. First, we will > > need to expose the SUCCOR cpuid bit to guests. Second, we need to modify > > the MCE injection code to avoid Intel specific behavior when we are > > running on an AMD host. > > > > Is there any update with respect to this series? > > John's series should fix MCE injection on AMD; as today it is just crashing the > guest (sadly) when an MCE happens in the hypervisor. > > William, Paolo, I think the sort-of-dependency(?) of this where we block > migration if there was a poisoned page on is already in Peter's migration > tree[1] (CC'ed). So perhaps this series just needs John to resend it given that > it's been a couple months since v4? It looks like this series still applies cleanly to latest qemu, but I can resend if needed. Thanks, John > > [1] > https://lore.kernel.org/qemu-devel/20240130190640.139364-2-william.roche@oracle.com/ > > > v2: > > - Add "succor" feature word. > > - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature. > > > > v3: > > - Reorder series. Only enable SUCCOR after bugs have been fixed. > > - Introduce new patch ignoring AO errors. > > > > v4: > > - Remove redundant check for AO errors. > > > > John Allen (2): > > i386: Fix MCE support for AMD hosts > > i386: Add support for SUCCOR feature > > > > William Roche (1): > > i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest > > > > target/i386/cpu.c | 18 +++++++++++++++++- > > target/i386/cpu.h | 4 ++++ > > target/i386/helper.c | 4 ++++ > > target/i386/kvm/kvm.c | 28 ++++++++++++++++++++-------- > > 4 files changed, 45 insertions(+), 9 deletions(-) > > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 0/3] Fix MCE handling on AMD hosts 2024-02-20 17:27 ` John Allen @ 2024-02-21 11:42 ` Joao Martins 0 siblings, 0 replies; 15+ messages in thread From: Joao Martins @ 2024-02-21 11:42 UTC (permalink / raw) To: John Allen, pbonzini Cc: william.roche, yazen.ghannam, michael.roth, babu.moger, richard.henderson, eduardo, qemu-devel, peterx@redhat.com On 20/02/2024 17:27, John Allen wrote: > On Wed, Feb 07, 2024 at 11:21:05AM +0000, Joao Martins wrote: >> On 12/09/2023 22:18, John Allen wrote: >>> In the event that a guest process attempts to access memory that has >>> been poisoned in response to a deferred uncorrected MCE, an AMD system >>> will currently generate a SIGBUS error which will result in the entire >>> guest being shutdown. Ideally, we only want to kill the guest process >>> that accessed poisoned memory in this case. >>> >>> This support has been included in qemu for Intel hosts for a long time, >>> but there are a couple of changes needed for AMD hosts. First, we will >>> need to expose the SUCCOR cpuid bit to guests. Second, we need to modify >>> the MCE injection code to avoid Intel specific behavior when we are >>> running on an AMD host. >>> >> >> Is there any update with respect to this series? >> >> John's series should fix MCE injection on AMD; as today it is just crashing the >> guest (sadly) when an MCE happens in the hypervisor. >> >> William, Paolo, I think the sort-of-dependency(?) of this where we block >> migration if there was a poisoned page on is already in Peter's migration >> tree[1] (CC'ed). So perhaps this series just needs John to resend it given that >> it's been a couple months since v4? > > It looks like this series still applies cleanly to latest qemu, but I > can resend if needed. > That's great I suppose. I was hoping Paolo responds, to understand next steps. There's also the other kernel patch that Paolo suggested[0], to declare the SUCCOR bit in the kvm supported CPUID? Maybe it's being held up because of that? [0] https://lore.kernel.org/qemu-devel/d4c1bb9b-8438-ed00-c79d-e8ad2a7e4eed@redhat.com/ > Thanks, > John > >> >> [1] >> https://lore.kernel.org/qemu-devel/20240130190640.139364-2-william.roche@oracle.com/ >> >>> v2: >>> - Add "succor" feature word. >>> - Add case to kvm_arch_get_supported_cpuid for the SUCCOR feature. >>> >>> v3: >>> - Reorder series. Only enable SUCCOR after bugs have been fixed. >>> - Introduce new patch ignoring AO errors. >>> >>> v4: >>> - Remove redundant check for AO errors. >>> >>> John Allen (2): >>> i386: Fix MCE support for AMD hosts >>> i386: Add support for SUCCOR feature >>> >>> William Roche (1): >>> i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest >>> >>> target/i386/cpu.c | 18 +++++++++++++++++- >>> target/i386/cpu.h | 4 ++++ >>> target/i386/helper.c | 4 ++++ >>> target/i386/kvm/kvm.c | 28 ++++++++++++++++++++-------- >>> 4 files changed, 45 insertions(+), 9 deletions(-) >>> >> ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-02-21 15:26 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-09-12 21:18 [PATCH v4 0/3] Fix MCE handling on AMD hosts John Allen 2023-09-12 21:18 ` [PATCH v4 1/3] i386: Fix MCE support for " John Allen 2023-09-12 21:18 ` [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest John Allen 2023-09-13 3:22 ` Gupta, Pankaj 2023-09-18 22:00 ` William Roche 2023-09-20 11:13 ` Joao Martins 2023-09-21 17:41 ` Yazen Ghannam 2023-09-22 8:36 ` William Roche 2023-09-22 14:30 ` Yazen Ghannam 2023-09-22 16:18 ` William Roche 2023-10-13 15:41 ` William Roche 2023-09-12 21:18 ` [PATCH v4 3/3] i386: Add support for SUCCOR feature John Allen 2024-02-07 11:21 ` [PATCH v4 0/3] Fix MCE handling on AMD hosts Joao Martins 2024-02-20 17:27 ` John Allen 2024-02-21 11:42 ` Joao Martins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).