* [PATCH] Ignore SRAO MCE if another MCE is being processed
@ 2010-04-27 7:10 Huang Ying
2010-04-27 16:12 ` Marcelo Tosatti
2010-04-28 16:28 ` Marcelo Tosatti
0 siblings, 2 replies; 6+ messages in thread
From: Huang Ying @ 2010-04-27 7:10 UTC (permalink / raw)
To: Avi Kivity; +Cc: Andi Kleen, kvm@vger.kernel.org
In common cases, guest SRAO MCE will cause corresponding poisoned page
be un-mapped in host and SIGBUS be sent to QEMU-KVM, then QEMU-KVM
will relay the MCE to guest OS.
But it is possible that the poisoned page is accessed in guest after
un-mapped in host and before MCE is relayed to guest OS. So that, the
SRAR SIGBUS is sent to QEMU-KVM before the SRAO SIGBUS, and if
QEMU-KVM relays them to guest OS one by one, guest system may reset,
because the SRAO MCE may be triggered while the SRAR MCE is being
processed. In fact, the SRAO MCE can be ignored in this situation, so
that the guest system is given opportunity to survive.
Signed-off-by: Huang Ying <ying.huang@intel.com>
---
qemu-kvm.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1610,6 +1610,19 @@ static void flush_queued_work(CPUState *
pthread_cond_broadcast(&qemu_work_cond);
}
+static int kvm_mce_in_exception(CPUState *env)
+{
+ struct kvm_msr_entry msr_mcg_status = {
+ .index = MSR_MCG_STATUS,
+ };
+ int r;
+
+ r = kvm_get_msrs(env, &msr_mcg_status, 1);
+ if (r == -1 || r == 0)
+ return -1;
+ return !!(msr_mcg_status.data & MCG_STATUS_MCIP);
+}
+
static void kvm_on_sigbus(CPUState *env, siginfo_t *siginfo)
{
#if defined(KVM_CAP_MCE) && defined(TARGET_I386)
@@ -1630,6 +1643,15 @@ static void kvm_on_sigbus(CPUState *env,
mce.misc = (MCM_ADDR_PHYS << 6) | 0xc;
mce.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_EIPV;
} else {
+ /*
+ * If there is an MCE excpetion being processed, ignore
+ * this SRAO MCE
+ */
+ r = kvm_mce_in_exception(env);
+ if (r == -1)
+ fprintf(stderr, "Failed to get MCE status\n");
+ else if (r)
+ return;
/* Fake an Intel architectural Memory scrubbing UCR */
mce.status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
| MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
@@ -2475,6 +2497,12 @@ static void kvm_do_inject_x86_mce(void *
struct kvm_x86_mce_data *data = _data;
int r;
+ /* If there is an MCE excpetion being processed, ignore this SRAO MCE */
+ r = kvm_mce_in_exception(data->env);
+ if (r == -1)
+ fprintf(stderr, "Failed to get MCE status\n");
+ else if (r && !(data->mce->status & MCI_STATUS_AR))
+ return;
r = kvm_set_mce(data->env, data->mce);
if (r < 0) {
perror("kvm_set_mce FAILED");
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Ignore SRAO MCE if another MCE is being processed
2010-04-27 7:10 [PATCH] Ignore SRAO MCE if another MCE is being processed Huang Ying
@ 2010-04-27 16:12 ` Marcelo Tosatti
2010-04-28 6:08 ` Huang Ying
2010-04-28 16:28 ` Marcelo Tosatti
1 sibling, 1 reply; 6+ messages in thread
From: Marcelo Tosatti @ 2010-04-27 16:12 UTC (permalink / raw)
To: Huang Ying; +Cc: Avi Kivity, Andi Kleen, kvm@vger.kernel.org
On Tue, Apr 27, 2010 at 03:10:49PM +0800, Huang Ying wrote:
> In common cases, guest SRAO MCE will cause corresponding poisoned page
> be un-mapped in host and SIGBUS be sent to QEMU-KVM, then QEMU-KVM
> will relay the MCE to guest OS.
>
> But it is possible that the poisoned page is accessed in guest after
> un-mapped in host and before MCE is relayed to guest OS. So that, the
> SRAR SIGBUS is sent to QEMU-KVM before the SRAO SIGBUS, and if
> QEMU-KVM relays them to guest OS one by one, guest system may reset,
> because the SRAO MCE may be triggered while the SRAR MCE is being
> processed. In fact, the SRAO MCE can be ignored in this situation, so
> that the guest system is given opportunity to survive.
>
> Signed-off-by: Huang Ying <ying.huang@intel.com>
> ---
> qemu-kvm.c | 28 ++++++++++++++++++++++++++++
> 1 file changed, 28 insertions(+)
>
> --- a/qemu-kvm.c
> +++ b/qemu-kvm.c
> @@ -1610,6 +1610,19 @@ static void flush_queued_work(CPUState *
> pthread_cond_broadcast(&qemu_work_cond);
> }
>
> +static int kvm_mce_in_exception(CPUState *env)
> +{
> + struct kvm_msr_entry msr_mcg_status = {
> + .index = MSR_MCG_STATUS,
> + };
> + int r;
> +
> + r = kvm_get_msrs(env, &msr_mcg_status, 1);
> + if (r == -1 || r == 0)
> + return -1;
> + return !!(msr_mcg_status.data & MCG_STATUS_MCIP);
> +}
> +
> static void kvm_on_sigbus(CPUState *env, siginfo_t *siginfo)
> {
> #if defined(KVM_CAP_MCE) && defined(TARGET_I386)
> @@ -1630,6 +1643,15 @@ static void kvm_on_sigbus(CPUState *env,
> mce.misc = (MCM_ADDR_PHYS << 6) | 0xc;
> mce.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_EIPV;
> } else {
> + /*
> + * If there is an MCE excpetion being processed, ignore
> + * this SRAO MCE
> + */
> + r = kvm_mce_in_exception(env);
> + if (r == -1)
> + fprintf(stderr, "Failed to get MCE status\n");
> + else if (r)
> + return;
> /* Fake an Intel architectural Memory scrubbing UCR */
> mce.status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
> | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
> @@ -2475,6 +2497,12 @@ static void kvm_do_inject_x86_mce(void *
> struct kvm_x86_mce_data *data = _data;
> int r;
>
> + /* If there is an MCE excpetion being processed, ignore this SRAO MCE */
> + r = kvm_mce_in_exception(data->env);
> + if (r == -1)
> + fprintf(stderr, "Failed to get MCE status\n");
> + else if (r && !(data->mce->status & MCI_STATUS_AR))
> + return;
Don't you need to set the OVER bit in the MCI_STATUS register when
this happens?
Unrelated to this patch, it would be nice if you can share the testing
code.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Ignore SRAO MCE if another MCE is being processed
2010-04-27 16:12 ` Marcelo Tosatti
@ 2010-04-28 6:08 ` Huang Ying
2010-04-28 16:33 ` Avi Kivity
0 siblings, 1 reply; 6+ messages in thread
From: Huang Ying @ 2010-04-28 6:08 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Avi Kivity, Andi Kleen, kvm@vger.kernel.org
On Wed, 2010-04-28 at 00:12 +0800, Marcelo Tosatti wrote:
> On Tue, Apr 27, 2010 at 03:10:49PM +0800, Huang Ying wrote:
> > In common cases, guest SRAO MCE will cause corresponding poisoned page
> > be un-mapped in host and SIGBUS be sent to QEMU-KVM, then QEMU-KVM
> > will relay the MCE to guest OS.
> >
> > But it is possible that the poisoned page is accessed in guest after
> > un-mapped in host and before MCE is relayed to guest OS. So that, the
> > SRAR SIGBUS is sent to QEMU-KVM before the SRAO SIGBUS, and if
> > QEMU-KVM relays them to guest OS one by one, guest system may reset,
> > because the SRAO MCE may be triggered while the SRAR MCE is being
> > processed. In fact, the SRAO MCE can be ignored in this situation, so
> > that the guest system is given opportunity to survive.
> >
> > Signed-off-by: Huang Ying <ying.huang@intel.com>
> > ---
> > qemu-kvm.c | 28 ++++++++++++++++++++++++++++
> > 1 file changed, 28 insertions(+)
> >
> > --- a/qemu-kvm.c
> > +++ b/qemu-kvm.c
> > @@ -1610,6 +1610,19 @@ static void flush_queued_work(CPUState *
> > pthread_cond_broadcast(&qemu_work_cond);
> > }
> >
> > +static int kvm_mce_in_exception(CPUState *env)
> > +{
> > + struct kvm_msr_entry msr_mcg_status = {
> > + .index = MSR_MCG_STATUS,
> > + };
> > + int r;
> > +
> > + r = kvm_get_msrs(env, &msr_mcg_status, 1);
> > + if (r == -1 || r == 0)
> > + return -1;
> > + return !!(msr_mcg_status.data & MCG_STATUS_MCIP);
> > +}
> > +
> > static void kvm_on_sigbus(CPUState *env, siginfo_t *siginfo)
> > {
> > #if defined(KVM_CAP_MCE) && defined(TARGET_I386)
> > @@ -1630,6 +1643,15 @@ static void kvm_on_sigbus(CPUState *env,
> > mce.misc = (MCM_ADDR_PHYS << 6) | 0xc;
> > mce.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_EIPV;
> > } else {
> > + /*
> > + * If there is an MCE excpetion being processed, ignore
> > + * this SRAO MCE
> > + */
> > + r = kvm_mce_in_exception(env);
> > + if (r == -1)
> > + fprintf(stderr, "Failed to get MCE status\n");
> > + else if (r)
> > + return;
> > /* Fake an Intel architectural Memory scrubbing UCR */
> > mce.status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
> > | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
> > @@ -2475,6 +2497,12 @@ static void kvm_do_inject_x86_mce(void *
> > struct kvm_x86_mce_data *data = _data;
> > int r;
> >
> > + /* If there is an MCE excpetion being processed, ignore this SRAO MCE */
> > + r = kvm_mce_in_exception(data->env);
> > + if (r == -1)
> > + fprintf(stderr, "Failed to get MCE status\n");
> > + else if (r && !(data->mce->status & MCI_STATUS_AR))
> > + return;
>
> Don't you need to set the OVER bit in the MCI_STATUS register when
> this happens?
The OVER bit is set when uncorrected error overwrite the corrected
error. There is no specification for OVER bit for this situation. I just
don't find benefit for it.
> Unrelated to this patch, it would be nice if you can share the testing
> code.
There is some test script and document for this in:
git://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git
test script is in "kvm" directory, testing document is kvm/README
Best Regards,
Huang Ying
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Ignore SRAO MCE if another MCE is being processed
2010-04-27 7:10 [PATCH] Ignore SRAO MCE if another MCE is being processed Huang Ying
2010-04-27 16:12 ` Marcelo Tosatti
@ 2010-04-28 16:28 ` Marcelo Tosatti
1 sibling, 0 replies; 6+ messages in thread
From: Marcelo Tosatti @ 2010-04-28 16:28 UTC (permalink / raw)
To: Huang Ying; +Cc: Avi Kivity, Andi Kleen, kvm@vger.kernel.org
On Tue, Apr 27, 2010 at 03:10:49PM +0800, Huang Ying wrote:
> In common cases, guest SRAO MCE will cause corresponding poisoned page
> be un-mapped in host and SIGBUS be sent to QEMU-KVM, then QEMU-KVM
> will relay the MCE to guest OS.
>
> But it is possible that the poisoned page is accessed in guest after
> un-mapped in host and before MCE is relayed to guest OS. So that, the
> SRAR SIGBUS is sent to QEMU-KVM before the SRAO SIGBUS, and if
> QEMU-KVM relays them to guest OS one by one, guest system may reset,
> because the SRAO MCE may be triggered while the SRAR MCE is being
> processed. In fact, the SRAO MCE can be ignored in this situation, so
> that the guest system is given opportunity to survive.
>
> Signed-off-by: Huang Ying <ying.huang@intel.com>
Applied, thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Ignore SRAO MCE if another MCE is being processed
2010-04-28 6:08 ` Huang Ying
@ 2010-04-28 16:33 ` Avi Kivity
2010-04-29 1:03 ` Huang Ying
0 siblings, 1 reply; 6+ messages in thread
From: Avi Kivity @ 2010-04-28 16:33 UTC (permalink / raw)
To: Huang Ying
Cc: Marcelo Tosatti, Andi Kleen, kvm@vger.kernel.org,
Lucas Meneghel Rodrigues
On 04/28/2010 09:08 AM, Huang Ying wrote:
>
> There is some test script and document for this in:
>
> git://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git
>
> test script is in "kvm" directory, testing document is kvm/README
>
>
This is a good candidate for kvm-autotest. We'll get little user
coverage for this feature, so it's good to get it machine tested regularly.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Ignore SRAO MCE if another MCE is being processed
2010-04-28 16:33 ` Avi Kivity
@ 2010-04-29 1:03 ` Huang Ying
0 siblings, 0 replies; 6+ messages in thread
From: Huang Ying @ 2010-04-29 1:03 UTC (permalink / raw)
To: Avi Kivity
Cc: Marcelo Tosatti, Andi Kleen, kvm@vger.kernel.org,
Lucas Meneghel Rodrigues
On Thu, 2010-04-29 at 00:33 +0800, Avi Kivity wrote:
> On 04/28/2010 09:08 AM, Huang Ying wrote:
> >
> > There is some test script and document for this in:
> >
> > git://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git
> >
> > test script is in "kvm" directory, testing document is kvm/README
> >
> >
>
> This is a good candidate for kvm-autotest. We'll get little user
> coverage for this feature, so it's good to get it machine tested regularly.
Yes. We will working on this.
Best Regards,
Huang Ying
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-04-29 1:03 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-27 7:10 [PATCH] Ignore SRAO MCE if another MCE is being processed Huang Ying
2010-04-27 16:12 ` Marcelo Tosatti
2010-04-28 6:08 ` Huang Ying
2010-04-28 16:33 ` Avi Kivity
2010-04-29 1:03 ` Huang Ying
2010-04-28 16:28 ` Marcelo Tosatti
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox