* [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
@ 2013-04-27 8:38 Liu, Jinsong
2013-04-29 7:08 ` Jan Beulich
0 siblings, 1 reply; 33+ messages in thread
From: Liu, Jinsong @ 2013-04-27 8:38 UTC (permalink / raw)
To: Jan Beulich, xen-devel@lists.xensource.com; +Cc: Christoph Egger
[-- Attachment #1: Type: text/plain, Size: 4976 bytes --]
>From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17 00:00:00 2001
From: Liu Jinsong <jinsong.liu@intel.com>
Date: Sat, 27 Apr 2013 22:37:35 +0800
Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
is_vmce_ready() is problematic:
* For dom0, it checks if virq bind to dom0 mcelog driver. If not, it
results dom0 crash. However, it's problematic and overkilled since
mcelog as a dom0 feature could be enabled/disabled per dom0 option:
(XEN) MCE: This error page is ownded by DOM 0
(XEN) DOM0 not ready for vMCE
(XEN) domain_crash called from mcaction.c:133
(XEN) Domain 0 reported crashed by domain 32767 on cpu#31:
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
* For dom0, if really need check, it should check whether vMCE
injection for dom0 ready (say, exception trap bounce check, which
has been done at inject_vmce()), not check dom0 mcelog ready (which
has been done at mce_softirq() before send global virq to dom0).
* For hvm, it checks whether guest vcpu has different virtual
family/model with that of host pcpu --> that's deprecated, since
vMCE has changed a lot, not bound to host MCE any more.
Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
---
xen/arch/x86/cpu/mcheck/mcaction.c | 6 ---
xen/arch/x86/cpu/mcheck/vmce.c | 68 ------------------------------------
xen/arch/x86/cpu/mcheck/vmce.h | 1 -
3 files changed, 0 insertions(+), 75 deletions(-)
diff --git a/xen/arch/x86/cpu/mcheck/mcaction.c b/xen/arch/x86/cpu/mcheck/mcaction.c
index 0ac5b45..adf2ded 100644
--- a/xen/arch/x86/cpu/mcheck/mcaction.c
+++ b/xen/arch/x86/cpu/mcheck/mcaction.c
@@ -83,12 +83,6 @@ mc_memerr_dhandler(struct mca_binfo *binfo,
ASSERT(d);
gfn = get_gpfn_from_mfn((bank->mc_addr) >> PAGE_SHIFT);
- if ( !is_vmce_ready(bank, d) )
- {
- printk("DOM%d not ready for vMCE\n", d->domain_id);
- goto vmce_failed;
- }
-
if ( unmmap_broken_page(d, _mfn(mfn), gfn) )
{
printk("Unmap broken memory %lx for DOM%d failed\n",
diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c
index 7d3fac7..af3b491 100644
--- a/xen/arch/x86/cpu/mcheck/vmce.c
+++ b/xen/arch/x86/cpu/mcheck/vmce.c
@@ -413,74 +413,6 @@ int fill_vmsr_data(struct mcinfo_bank *mc_bank, struct domain *d,
return 0;
}
-static int is_hvm_vmce_ready(struct mcinfo_bank *bank, struct domain *d)
-{
- struct vcpu *v;
- int no_vmce = 0, i;
-
- if (!is_hvm_domain(d))
- return 0;
-
- /* kill guest if not enabled vMCE */
- for_each_vcpu(d, v)
- {
- if (!(v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_MCE))
- {
- no_vmce = 1;
- break;
- }
-
- if (!mce_broadcast)
- break;
- }
-
- if (no_vmce)
- return 0;
-
- /* Guest has virtualized family/model information */
- for ( i = 0; i < MAX_CPUID_INPUT; i++ )
- {
- if (d->arch.cpuids[i].input[0] == 0x1)
- {
- uint32_t veax = d->arch.cpuids[i].eax, vfam, vmod;
-
- vfam = (veax >> 8) & 15;
- vmod = (veax >> 4) & 15;
-
- if (vfam == 0x6 || vfam == 0xf)
- vmod += ((veax >> 16) & 0xF) << 4;
- if (vfam == 0xf)
- vfam += (veax >> 20) & 0xff;
-
- if ( ( vfam != boot_cpu_data.x86 ) ||
- (vmod != boot_cpu_data.x86_model) )
- {
- dprintk(XENLOG_WARNING,
- "No vmce for different virtual family/model cpuid\n");
- no_vmce = 1;
- }
- break;
- }
- }
-
- if (no_vmce)
- return 0;
-
- return 1;
-}
-
-int is_vmce_ready(struct mcinfo_bank *bank, struct domain *d)
-{
- if ( d == dom0)
- return dom0_vmce_enabled();
-
- /* No vMCE to HVM guest now */
- if ( is_hvm_domain(d) )
- return is_hvm_vmce_ready(bank, d);
-
- return 0;
-}
-
/* It's said some ram is setup as mmio_direct for UC cache attribute */
#define P2M_UNMAP_TYPES (p2m_to_mask(p2m_ram_rw) \
| p2m_to_mask(p2m_ram_logdirty) \
diff --git a/xen/arch/x86/cpu/mcheck/vmce.h b/xen/arch/x86/cpu/mcheck/vmce.h
index 7263deb..6b2c95a 100644
--- a/xen/arch/x86/cpu/mcheck/vmce.h
+++ b/xen/arch/x86/cpu/mcheck/vmce.h
@@ -8,7 +8,6 @@ int vmce_init(struct cpuinfo_x86 *c);
#define dom0_vmce_enabled() (dom0 && dom0->max_vcpus && dom0->vcpu[0] \
&& guest_enabled_event(dom0->vcpu[0], VIRQ_MCA))
-int is_vmce_ready(struct mcinfo_bank *bank, struct domain *d);
int unmmap_broken_page(struct domain *d, mfn_t mfn, unsigned long gfn);
int vmce_intel_rdmsr(const struct vcpu *, uint32_t msr, uint64_t *val);
--
1.7.1
[-- Attachment #2: 0002-Xen-vMCE-bugfix-to-remove-problematic-is_vmce_ready-.patch --]
[-- Type: application/octet-stream, Size: 4834 bytes --]
From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17 00:00:00 2001
From: Liu Jinsong <jinsong.liu@intel.com>
Date: Sat, 27 Apr 2013 22:37:35 +0800
Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
is_vmce_ready() is problematic:
* For dom0, it checks if virq bind to dom0 mcelog driver. If not, it
results dom0 crash. However, it's problematic and overkilled since
mcelog as a dom0 feature could be enabled/disabled per dom0 option:
(XEN) MCE: This error page is ownded by DOM 0
(XEN) DOM0 not ready for vMCE
(XEN) domain_crash called from mcaction.c:133
(XEN) Domain 0 reported crashed by domain 32767 on cpu#31:
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
* For dom0, if really need check, it should check whether vMCE
injection for dom0 ready (say, exception trap bounce check, which
has been done at inject_vmce()), not check dom0 mcelog ready (which
has been done at mce_softirq() before send global virq to dom0).
* For hvm, it checks whether guest vcpu has different virtual
family/model with that of host pcpu --> that's deprecated, since
vMCE has changed a lot, not bound to host MCE any more.
Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
---
xen/arch/x86/cpu/mcheck/mcaction.c | 6 ---
xen/arch/x86/cpu/mcheck/vmce.c | 68 ------------------------------------
xen/arch/x86/cpu/mcheck/vmce.h | 1 -
3 files changed, 0 insertions(+), 75 deletions(-)
diff --git a/xen/arch/x86/cpu/mcheck/mcaction.c b/xen/arch/x86/cpu/mcheck/mcaction.c
index 0ac5b45..adf2ded 100644
--- a/xen/arch/x86/cpu/mcheck/mcaction.c
+++ b/xen/arch/x86/cpu/mcheck/mcaction.c
@@ -83,12 +83,6 @@ mc_memerr_dhandler(struct mca_binfo *binfo,
ASSERT(d);
gfn = get_gpfn_from_mfn((bank->mc_addr) >> PAGE_SHIFT);
- if ( !is_vmce_ready(bank, d) )
- {
- printk("DOM%d not ready for vMCE\n", d->domain_id);
- goto vmce_failed;
- }
-
if ( unmmap_broken_page(d, _mfn(mfn), gfn) )
{
printk("Unmap broken memory %lx for DOM%d failed\n",
diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c
index 7d3fac7..af3b491 100644
--- a/xen/arch/x86/cpu/mcheck/vmce.c
+++ b/xen/arch/x86/cpu/mcheck/vmce.c
@@ -413,74 +413,6 @@ int fill_vmsr_data(struct mcinfo_bank *mc_bank, struct domain *d,
return 0;
}
-static int is_hvm_vmce_ready(struct mcinfo_bank *bank, struct domain *d)
-{
- struct vcpu *v;
- int no_vmce = 0, i;
-
- if (!is_hvm_domain(d))
- return 0;
-
- /* kill guest if not enabled vMCE */
- for_each_vcpu(d, v)
- {
- if (!(v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_MCE))
- {
- no_vmce = 1;
- break;
- }
-
- if (!mce_broadcast)
- break;
- }
-
- if (no_vmce)
- return 0;
-
- /* Guest has virtualized family/model information */
- for ( i = 0; i < MAX_CPUID_INPUT; i++ )
- {
- if (d->arch.cpuids[i].input[0] == 0x1)
- {
- uint32_t veax = d->arch.cpuids[i].eax, vfam, vmod;
-
- vfam = (veax >> 8) & 15;
- vmod = (veax >> 4) & 15;
-
- if (vfam == 0x6 || vfam == 0xf)
- vmod += ((veax >> 16) & 0xF) << 4;
- if (vfam == 0xf)
- vfam += (veax >> 20) & 0xff;
-
- if ( ( vfam != boot_cpu_data.x86 ) ||
- (vmod != boot_cpu_data.x86_model) )
- {
- dprintk(XENLOG_WARNING,
- "No vmce for different virtual family/model cpuid\n");
- no_vmce = 1;
- }
- break;
- }
- }
-
- if (no_vmce)
- return 0;
-
- return 1;
-}
-
-int is_vmce_ready(struct mcinfo_bank *bank, struct domain *d)
-{
- if ( d == dom0)
- return dom0_vmce_enabled();
-
- /* No vMCE to HVM guest now */
- if ( is_hvm_domain(d) )
- return is_hvm_vmce_ready(bank, d);
-
- return 0;
-}
-
/* It's said some ram is setup as mmio_direct for UC cache attribute */
#define P2M_UNMAP_TYPES (p2m_to_mask(p2m_ram_rw) \
| p2m_to_mask(p2m_ram_logdirty) \
diff --git a/xen/arch/x86/cpu/mcheck/vmce.h b/xen/arch/x86/cpu/mcheck/vmce.h
index 7263deb..6b2c95a 100644
--- a/xen/arch/x86/cpu/mcheck/vmce.h
+++ b/xen/arch/x86/cpu/mcheck/vmce.h
@@ -8,7 +8,6 @@ int vmce_init(struct cpuinfo_x86 *c);
#define dom0_vmce_enabled() (dom0 && dom0->max_vcpus && dom0->vcpu[0] \
&& guest_enabled_event(dom0->vcpu[0], VIRQ_MCA))
-int is_vmce_ready(struct mcinfo_bank *bank, struct domain *d);
int unmmap_broken_page(struct domain *d, mfn_t mfn, unsigned long gfn);
int vmce_intel_rdmsr(const struct vcpu *, uint32_t msr, uint64_t *val);
--
1.7.1
[-- Attachment #3: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply related [flat|nested] 33+ messages in thread* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-04-27 8:38 [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check Liu, Jinsong
@ 2013-04-29 7:08 ` Jan Beulich
2013-05-03 8:41 ` Liu, Jinsong
0 siblings, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2013-04-29 7:08 UTC (permalink / raw)
To: Jinsong Liu; +Cc: Christoph Egger, xen-devel
>>> On 27.04.13 at 10:38, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17 00:00:00 2001
> From: Liu Jinsong <jinsong.liu@intel.com>
> Date: Sat, 27 Apr 2013 22:37:35 +0800
> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready
> check
>
> is_vmce_ready() is problematic:
> * For dom0, it checks if virq bind to dom0 mcelog driver. If not, it
> results dom0 crash. However, it's problematic and overkilled since
> mcelog as a dom0 feature could be enabled/disabled per dom0 option:
> (XEN) MCE: This error page is ownded by DOM 0
> (XEN) DOM0 not ready for vMCE
> (XEN) domain_crash called from mcaction.c:133
> (XEN) Domain 0 reported crashed by domain 32767 on cpu#31:
> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
>
> * For dom0, if really need check, it should check whether vMCE
> injection for dom0 ready (say, exception trap bounce check, which
> has been done at inject_vmce()), not check dom0 mcelog ready (which
> has been done at mce_softirq() before send global virq to dom0).
Following the argumentation above, I wonder which of the other
"goto vmce_failed" are really appropriate, i.e. whether the patch
shouldn't be extended (at least for the Dom0 case).
Jan
> * For hvm, it checks whether guest vcpu has different virtual
> family/model with that of host pcpu --> that's deprecated, since
> vMCE has changed a lot, not bound to host MCE any more.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-04-29 7:08 ` Jan Beulich
@ 2013-05-03 8:41 ` Liu, Jinsong
2013-05-03 9:32 ` Jan Beulich
0 siblings, 1 reply; 33+ messages in thread
From: Liu, Jinsong @ 2013-05-03 8:41 UTC (permalink / raw)
To: Jan Beulich; +Cc: Christoph Egger, xen-devel
Jan Beulich wrote:
>>>> On 27.04.13 at 10:38, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17 00:00:00
>> 2001 From: Liu Jinsong <jinsong.liu@intel.com>
>> Date: Sat, 27 Apr 2013 22:37:35 +0800
>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic
>> is_vmce_ready check
>>
>> is_vmce_ready() is problematic:
>> * For dom0, it checks if virq bind to dom0 mcelog driver. If not, it
>> results dom0 crash. However, it's problematic and overkilled since
>> mcelog as a dom0 feature could be enabled/disabled per dom0 option:
>> (XEN) MCE: This error page is ownded by DOM 0
>> (XEN) DOM0 not ready for vMCE
>> (XEN) domain_crash called from mcaction.c:133
>> (XEN) Domain 0 reported crashed by domain 32767 on cpu#31:
>> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
>> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
>>
>> * For dom0, if really need check, it should check whether vMCE
>> injection for dom0 ready (say, exception trap bounce check, which
>> has been done at inject_vmce()), not check dom0 mcelog ready (which
>> has been done at mce_softirq() before send global virq to dom0).
>
> Following the argumentation above, I wonder which of the other
> "goto vmce_failed" are really appropriate, i.e. whether the patch
> shouldn't be extended (at least for the Dom0 case).
>
> Jan
You mean other 'goto vmce_failed' are also not appropriate (I'm not quite clear your point)?
Would you please point out which point you think not appropriate?
Thanks,
Jinsong
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-03 8:41 ` Liu, Jinsong
@ 2013-05-03 9:32 ` Jan Beulich
2013-05-03 14:16 ` Liu, Jinsong
0 siblings, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2013-05-03 9:32 UTC (permalink / raw)
To: Jinsong Liu; +Cc: Christoph Egger, xen-devel
>>> On 03.05.13 at 10:41, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
> Jan Beulich wrote:
>>>>> On 27.04.13 at 10:38, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17 00:00:00
>>> 2001 From: Liu Jinsong <jinsong.liu@intel.com>
>>> Date: Sat, 27 Apr 2013 22:37:35 +0800
>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic
>>> is_vmce_ready check
>>>
>>> is_vmce_ready() is problematic:
>>> * For dom0, it checks if virq bind to dom0 mcelog driver. If not, it
>>> results dom0 crash. However, it's problematic and overkilled since
>>> mcelog as a dom0 feature could be enabled/disabled per dom0 option:
>>> (XEN) MCE: This error page is ownded by DOM 0
>>> (XEN) DOM0 not ready for vMCE
>>> (XEN) domain_crash called from mcaction.c:133
>>> (XEN) Domain 0 reported crashed by domain 32767 on cpu#31:
>>> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
>>> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
>>>
>>> * For dom0, if really need check, it should check whether vMCE
>>> injection for dom0 ready (say, exception trap bounce check, which
>>> has been done at inject_vmce()), not check dom0 mcelog ready (which
>>> has been done at mce_softirq() before send global virq to dom0).
>>
>> Following the argumentation above, I wonder which of the other
>> "goto vmce_failed" are really appropriate, i.e. whether the patch
>> shouldn't be extended (at least for the Dom0 case).
>
> You mean other 'goto vmce_failed' are also not appropriate (I'm not quite
> clear your point)?
Yes.
> Would you please point out which point you think not appropriate?
I question whether it is correct/necessary to crash the domain in
any of those failure cases. Perhaps when we fail to unmap the
page it is, but failure of fill_vmsr_data() and inject_vmce() don't
appear to be valid reasons once the is_vmce_ready() path is being
dropped.
Jan
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-03 9:32 ` Jan Beulich
@ 2013-05-03 14:16 ` Liu, Jinsong
2013-05-03 14:27 ` Jan Beulich
0 siblings, 1 reply; 33+ messages in thread
From: Liu, Jinsong @ 2013-05-03 14:16 UTC (permalink / raw)
To: Jan Beulich; +Cc: Christoph Egger, xen-devel
Jan Beulich wrote:
>>>> On 03.05.13 at 10:41, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>> Jan Beulich wrote:
>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>> wrote:
>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17 00:00:00
>>>> 2001 From: Liu Jinsong <jinsong.liu@intel.com>
>>>> Date: Sat, 27 Apr 2013 22:37:35 +0800
>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic
>>>> is_vmce_ready check
>>>>
>>>> is_vmce_ready() is problematic:
>>>> * For dom0, it checks if virq bind to dom0 mcelog driver. If not,
>>>> it results dom0 crash. However, it's problematic and overkilled
>>>> since mcelog as a dom0 feature could be enabled/disabled per dom0
>>>> option: (XEN) MCE: This error page is ownded by DOM 0
>>>> (XEN) DOM0 not ready for vMCE
>>>> (XEN) domain_crash called from mcaction.c:133
>>>> (XEN) Domain 0 reported crashed by domain 32767 on cpu#31:
>>>> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
>>>> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
>>>>
>>>> * For dom0, if really need check, it should check whether vMCE
>>>> injection for dom0 ready (say, exception trap bounce check, which
>>>> has been done at inject_vmce()), not check dom0 mcelog ready (which
>>>> has been done at mce_softirq() before send global virq to dom0).
>>>
>>> Following the argumentation above, I wonder which of the other
>>> "goto vmce_failed" are really appropriate, i.e. whether the patch
>>> shouldn't be extended (at least for the Dom0 case).
>>
>> You mean other 'goto vmce_failed' are also not appropriate (I'm not
>> quite clear your point)?
>
> Yes.
>
>> Would you please point out which point you think not appropriate?
>
> I question whether it is correct/necessary to crash the domain in
> any of those failure cases. Perhaps when we fail to unmap the
> page it is, but failure of fill_vmsr_data() and inject_vmce() don't
> appear to be valid reasons once the is_vmce_ready() path is being
> dropped.
>
> Jan
For fill_vmsr_data(), it failed only when MCG_STATUS_MCIP bit still set when next vMCE# occur, means the 2nd vMCE# occur when the 1st vMCE# not handled yet. Per SDM it should shutdown.
For inject_vmce(), it failed when
1). vcpu is still mce_pending, or
2). pv not register trap callback
Maybe it's some overkilled for dom0 (for other guest, it's ok to kill them), but any graceful way to quit?
or, considering it rarely happens, how about keep current way (kill guest no matter dom0 or not)?
Thanks,
Jinsong
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-03 14:16 ` Liu, Jinsong
@ 2013-05-03 14:27 ` Jan Beulich
2013-05-03 15:51 ` Liu, Jinsong
0 siblings, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2013-05-03 14:27 UTC (permalink / raw)
To: Jinsong Liu; +Cc: Christoph Egger, xen-devel
>>> On 03.05.13 at 16:16, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
> Jan Beulich wrote:
>>>>> On 03.05.13 at 10:41, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>>> Jan Beulich wrote:
>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>> wrote:
>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17 00:00:00
>>>>> 2001 From: Liu Jinsong <jinsong.liu@intel.com>
>>>>> Date: Sat, 27 Apr 2013 22:37:35 +0800
>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic
>>>>> is_vmce_ready check
>>>>>
>>>>> is_vmce_ready() is problematic:
>>>>> * For dom0, it checks if virq bind to dom0 mcelog driver. If not,
>>>>> it results dom0 crash. However, it's problematic and overkilled
>>>>> since mcelog as a dom0 feature could be enabled/disabled per dom0
>>>>> option: (XEN) MCE: This error page is ownded by DOM 0
>>>>> (XEN) DOM0 not ready for vMCE
>>>>> (XEN) domain_crash called from mcaction.c:133
>>>>> (XEN) Domain 0 reported crashed by domain 32767 on cpu#31:
>>>>> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
>>>>> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
>>>>>
>>>>> * For dom0, if really need check, it should check whether vMCE
>>>>> injection for dom0 ready (say, exception trap bounce check, which
>>>>> has been done at inject_vmce()), not check dom0 mcelog ready (which
>>>>> has been done at mce_softirq() before send global virq to dom0).
>>>>
>>>> Following the argumentation above, I wonder which of the other
>>>> "goto vmce_failed" are really appropriate, i.e. whether the patch
>>>> shouldn't be extended (at least for the Dom0 case).
>>>
>>> You mean other 'goto vmce_failed' are also not appropriate (I'm not
>>> quite clear your point)?
>>
>> Yes.
>>
>>> Would you please point out which point you think not appropriate?
>>
>> I question whether it is correct/necessary to crash the domain in
>> any of those failure cases. Perhaps when we fail to unmap the
>> page it is, but failure of fill_vmsr_data() and inject_vmce() don't
>> appear to be valid reasons once the is_vmce_ready() path is being
>> dropped.
>
> For fill_vmsr_data(), it failed only when MCG_STATUS_MCIP bit still set when
> next vMCE# occur, means the 2nd vMCE# occur when the 1st vMCE# not handled
> yet. Per SDM it should shutdown.
>
> For inject_vmce(), it failed when
> 1). vcpu is still mce_pending, or
> 2). pv not register trap callback
> Maybe it's some overkilled for dom0 (for other guest, it's ok to kill them),
> but any graceful way to quit?
Just exit and do nothing (except perhaps log a rate limited
message)?
> or, considering it rarely happens, how about keep current way (kill guest no
> matter dom0 or not)?
Possibly - I was merely asking why this one condition was found to
be too strict, while the others are being left as is.
Jan
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-03 14:27 ` Jan Beulich
@ 2013-05-03 15:51 ` Liu, Jinsong
2013-05-06 8:54 ` Christoph Egger
0 siblings, 1 reply; 33+ messages in thread
From: Liu, Jinsong @ 2013-05-03 15:51 UTC (permalink / raw)
To: Jan Beulich; +Cc: Christoph Egger, xen-devel
Jan Beulich wrote:
>>>> On 03.05.13 at 16:16, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>> Jan Beulich wrote:
>>>>>> On 03.05.13 at 10:41, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>> wrote:
>>>> Jan Beulich wrote:
>>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>> wrote:
>>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17 00:00:00
>>>>>> 2001 From: Liu Jinsong <jinsong.liu@intel.com>
>>>>>> Date: Sat, 27 Apr 2013 22:37:35 +0800
>>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic
>>>>>> is_vmce_ready check
>>>>>>
>>>>>> is_vmce_ready() is problematic:
>>>>>> * For dom0, it checks if virq bind to dom0 mcelog driver. If not,
>>>>>> it results dom0 crash. However, it's problematic and overkilled
>>>>>> since mcelog as a dom0 feature could be enabled/disabled per dom0
>>>>>> option: (XEN) MCE: This error page is ownded by DOM 0 (XEN) DOM0
>>>>>> not ready for vMCE (XEN) domain_crash called from mcaction.c:133
>>>>>> (XEN) Domain 0 reported crashed by domain 32767 on cpu#31:
>>>>>> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
>>>>>> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
>>>>>>
>>>>>> * For dom0, if really need check, it should check whether vMCE
>>>>>> injection for dom0 ready (say, exception trap bounce check, which
>>>>>> has been done at inject_vmce()), not check dom0 mcelog ready
>>>>>> (which has been done at mce_softirq() before send global virq to
>>>>>> dom0).
>>>>>
>>>>> Following the argumentation above, I wonder which of the other
>>>>> "goto vmce_failed" are really appropriate, i.e. whether the patch
>>>>> shouldn't be extended (at least for the Dom0 case).
>>>>
>>>> You mean other 'goto vmce_failed' are also not appropriate (I'm not
>>>> quite clear your point)?
>>>
>>> Yes.
>>>
>>>> Would you please point out which point you think not appropriate?
>>>
>>> I question whether it is correct/necessary to crash the domain in
>>> any of those failure cases. Perhaps when we fail to unmap the
>>> page it is, but failure of fill_vmsr_data() and inject_vmce() don't
>>> appear to be valid reasons once the is_vmce_ready() path is being
>>> dropped.
>>
>> For fill_vmsr_data(), it failed only when MCG_STATUS_MCIP bit still
>> set when next vMCE# occur, means the 2nd vMCE# occur when the 1st
>> vMCE# not handled yet. Per SDM it should shutdown.
>>
>> For inject_vmce(), it failed when
>> 1). vcpu is still mce_pending, or
>> 2). pv not register trap callback
>> Maybe it's some overkilled for dom0 (for other guest, it's ok to
>> kill them), but any graceful way to quit?
>
> Just exit and do nothing (except perhaps log a rate limited
> message)?
>
One concern of quiet exit is, the error will be totally ignored by guest --> it didn't get preperly handled, and may recursively occur to make worse error --> it's better to kill guest under such case.
>> or, considering it rarely happens, how about keep current way (kill
>> guest no matter dom0 or not)?
>
> Possibly - I was merely asking why this one condition was found to
> be too strict, while the others are being left as is.
>
> Jan
Ah, the reason of removing is_vmce_ready check is, it's problematic (check mcelog driver, not vmce tap callback), and overkilled (since defaultly dom0 will not start mcelog driver, under which case system will crash whenever vmce inject to dom0) --> So patch 2/2 is not too strict for dom0.
Thanks,
Jinsong
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-03 15:51 ` Liu, Jinsong
@ 2013-05-06 8:54 ` Christoph Egger
2013-05-06 9:06 ` Jan Beulich
0 siblings, 1 reply; 33+ messages in thread
From: Christoph Egger @ 2013-05-06 8:54 UTC (permalink / raw)
To: Liu, Jinsong; +Cc: Jan Beulich, xen-devel
On 03.05.13 17:51, Liu, Jinsong wrote:
> Jan Beulich wrote:
>>>>> On 03.05.13 at 16:16, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>>> Jan Beulich wrote:
>>>>>>> On 03.05.13 at 10:41, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>> wrote:
>>>>> Jan Beulich wrote:
>>>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>>> wrote:
>>>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17 00:00:00
>>>>>>> 2001 From: Liu Jinsong <jinsong.liu@intel.com>
>>>>>>> Date: Sat, 27 Apr 2013 22:37:35 +0800
>>>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic
>>>>>>> is_vmce_ready check
>>>>>>>
>>>>>>> is_vmce_ready() is problematic:
>>>>>>> * For dom0, it checks if virq bind to dom0 mcelog driver. If not,
>>>>>>> it results dom0 crash. However, it's problematic and overkilled
>>>>>>> since mcelog as a dom0 feature could be enabled/disabled per dom0
>>>>>>> option: (XEN) MCE: This error page is ownded by DOM 0 (XEN) DOM0
>>>>>>> not ready for vMCE (XEN) domain_crash called from mcaction.c:133
>>>>>>> (XEN) Domain 0 reported crashed by domain 32767 on cpu#31:
>>>>>>> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
>>>>>>> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
>>>>>>>
>>>>>>> * For dom0, if really need check, it should check whether vMCE
>>>>>>> injection for dom0 ready (say, exception trap bounce check, which
>>>>>>> has been done at inject_vmce()), not check dom0 mcelog ready
>>>>>>> (which has been done at mce_softirq() before send global virq to
>>>>>>> dom0).
>>>>>>
>>>>>> Following the argumentation above, I wonder which of the other
>>>>>> "goto vmce_failed" are really appropriate, i.e. whether the patch
>>>>>> shouldn't be extended (at least for the Dom0 case).
>>>>>
>>>>> You mean other 'goto vmce_failed' are also not appropriate (I'm not
>>>>> quite clear your point)?
>>>>
>>>> Yes.
>>>>
>>>>> Would you please point out which point you think not appropriate?
>>>>
>>>> I question whether it is correct/necessary to crash the domain in
>>>> any of those failure cases. Perhaps when we fail to unmap the
>>>> page it is, but failure of fill_vmsr_data() and inject_vmce() don't
>>>> appear to be valid reasons once the is_vmce_ready() path is being
>>>> dropped.
>>>
>>> For fill_vmsr_data(), it failed only when MCG_STATUS_MCIP bit still
>>> set when next vMCE# occur, means the 2nd vMCE# occur when the 1st
>>> vMCE# not handled yet. Per SDM it should shutdown.
>>>
>>> For inject_vmce(), it failed when
>>> 1). vcpu is still mce_pending, or
>>> 2). pv not register trap callback
>>> Maybe it's some overkilled for dom0 (for other guest, it's ok to
>>> kill them), but any graceful way to quit?
>>
>> Just exit and do nothing (except perhaps log a rate limited
>> message)?
>>
>
> One concern of quiet exit is, the error will be totally ignored by guest --> it didn't get preperly handled, and may recursively occur to make worse error --> it's better to kill guest under such case.
>
>>> or, considering it rarely happens, how about keep current way (kill
>>> guest no matter dom0 or not)?
>>
>> Possibly - I was merely asking why this one condition was found to
>> be too strict, while the others are being left as is.
>>
>> Jan
>
> Ah, the reason of removing is_vmce_ready check is, it's
> problematic (check mcelog driver, not vmce tap callback),
> and overkilled (since defaultly dom0 will not start mcelog driver,
> under which case system will crash whenever vmce inject to dom0)
>
> --> So patch 2/2 is not too strict for dom0.
>
Please keep in mind the mcelog userland/kernel interface is not designed
with xen in mind. mcelog cannot report which guest is impacted for
example, although xen reports that to dom0.
I object 'fixing' the hypervisor to come over with mcelog drawbacks.
I prefer fixing Dom0 instead.
>From the design perspective, the virq for Dom0 is for logging purpose
only and the trap handler has equal purpose for both Dom0 and DomU.
BTW: I heard from Andre Pryzwara, he is working with Borislav Petkov
together on a complete new machine check interface in Linux
that supports ARM and x86. We should have an eye on this that
they do not repeat the mcelog design mistakes.
Christoph
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-06 8:54 ` Christoph Egger
@ 2013-05-06 9:06 ` Jan Beulich
2013-05-06 9:24 ` Liu, Jinsong
0 siblings, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2013-05-06 9:06 UTC (permalink / raw)
To: Christoph Egger; +Cc: Jinsong Liu, xen-devel
>>> On 06.05.13 at 10:54, Christoph Egger <chegger@amazon.de> wrote:
> On 03.05.13 17:51, Liu, Jinsong wrote:
>> Jan Beulich wrote:
>>>>>> On 03.05.13 at 16:16, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>>>> Jan Beulich wrote:
>>>>>>>> On 03.05.13 at 10:41, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>> wrote:
>>>>>> Jan Beulich wrote:
>>>>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>>>> wrote:
>>>>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17 00:00:00
>>>>>>>> 2001 From: Liu Jinsong <jinsong.liu@intel.com>
>>>>>>>> Date: Sat, 27 Apr 2013 22:37:35 +0800
>>>>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic
>>>>>>>> is_vmce_ready check
>>>>>>>>
>>>>>>>> is_vmce_ready() is problematic:
>>>>>>>> * For dom0, it checks if virq bind to dom0 mcelog driver. If not,
>>>>>>>> it results dom0 crash. However, it's problematic and overkilled
>>>>>>>> since mcelog as a dom0 feature could be enabled/disabled per dom0
>>>>>>>> option: (XEN) MCE: This error page is ownded by DOM 0 (XEN) DOM0
>>>>>>>> not ready for vMCE (XEN) domain_crash called from mcaction.c:133
>>>>>>>> (XEN) Domain 0 reported crashed by domain 32767 on cpu#31:
>>>>>>>> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
>>>>>>>> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
>>>>>>>>
>>>>>>>> * For dom0, if really need check, it should check whether vMCE
>>>>>>>> injection for dom0 ready (say, exception trap bounce check, which
>>>>>>>> has been done at inject_vmce()), not check dom0 mcelog ready
>>>>>>>> (which has been done at mce_softirq() before send global virq to
>>>>>>>> dom0).
>>>>>>>
>>>>>>> Following the argumentation above, I wonder which of the other
>>>>>>> "goto vmce_failed" are really appropriate, i.e. whether the patch
>>>>>>> shouldn't be extended (at least for the Dom0 case).
>>>>>>
>>>>>> You mean other 'goto vmce_failed' are also not appropriate (I'm not
>>>>>> quite clear your point)?
>>>>>
>>>>> Yes.
>>>>>
>>>>>> Would you please point out which point you think not appropriate?
>>>>>
>>>>> I question whether it is correct/necessary to crash the domain in
>>>>> any of those failure cases. Perhaps when we fail to unmap the
>>>>> page it is, but failure of fill_vmsr_data() and inject_vmce() don't
>>>>> appear to be valid reasons once the is_vmce_ready() path is being
>>>>> dropped.
>>>>
>>>> For fill_vmsr_data(), it failed only when MCG_STATUS_MCIP bit still
>>>> set when next vMCE# occur, means the 2nd vMCE# occur when the 1st
>>>> vMCE# not handled yet. Per SDM it should shutdown.
>>>>
>>>> For inject_vmce(), it failed when
>>>> 1). vcpu is still mce_pending, or
>>>> 2). pv not register trap callback
>>>> Maybe it's some overkilled for dom0 (for other guest, it's ok to
>>>> kill them), but any graceful way to quit?
>>>
>>> Just exit and do nothing (except perhaps log a rate limited
>>> message)?
>>>
>>
>> One concern of quiet exit is, the error will be totally ignored by guest --> it
> didn't get preperly handled, and may recursively occur to make worse error -->
> it's better to kill guest under such case.
>>
>>>> or, considering it rarely happens, how about keep current way (kill
>>>> guest no matter dom0 or not)?
>>>
>>> Possibly - I was merely asking why this one condition was found to
>>> be too strict, while the others are being left as is.
>>>
>>> Jan
>>
>> Ah, the reason of removing is_vmce_ready check is, it's
>> problematic (check mcelog driver, not vmce tap callback),
>> and overkilled (since defaultly dom0 will not start mcelog driver,
>> under which case system will crash whenever vmce inject to dom0)
>>
>> --> So patch 2/2 is not too strict for dom0.
>>
>
> Please keep in mind the mcelog userland/kernel interface is not designed
> with xen in mind. mcelog cannot report which guest is impacted for
> example, although xen reports that to dom0.
> I object 'fixing' the hypervisor to come over with mcelog drawbacks.
> I prefer fixing Dom0 instead.
>
> From the design perspective, the virq for Dom0 is for logging purpose
> only and the trap handler has equal purpose for both Dom0 and DomU.
So as this doesn't read like "don't care" - is this an ack, nak, or
a request to Jinsong to change something for the patch to be
acceptable?
Jan
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-06 9:06 ` Jan Beulich
@ 2013-05-06 9:24 ` Liu, Jinsong
2013-05-06 9:41 ` Christoph Egger
0 siblings, 1 reply; 33+ messages in thread
From: Liu, Jinsong @ 2013-05-06 9:24 UTC (permalink / raw)
To: Jan Beulich, Christoph Egger; +Cc: xen-devel
Jan Beulich wrote:
>>>> On 06.05.13 at 10:54, Christoph Egger <chegger@amazon.de> wrote:
>> On 03.05.13 17:51, Liu, Jinsong wrote:
>>> Jan Beulich wrote:
>>>>>>> On 03.05.13 at 16:16, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>> wrote:
>>>>> Jan Beulich wrote:
>>>>>>>>> On 03.05.13 at 10:41, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>>> wrote:
>>>>>>> Jan Beulich wrote:
>>>>>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>>>>> wrote:
>>>>>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17
>>>>>>>>> 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com>
>>>>>>>>> Date: Sat, 27 Apr 2013 22:37:35 +0800
>>>>>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic
>>>>>>>>> is_vmce_ready check
>>>>>>>>>
>>>>>>>>> is_vmce_ready() is problematic:
>>>>>>>>> * For dom0, it checks if virq bind to dom0 mcelog driver. If
>>>>>>>>> not, it results dom0 crash. However, it's problematic and
>>>>>>>>> overkilled since mcelog as a dom0 feature could be
>>>>>>>>> enabled/disabled per dom0 option: (XEN) MCE: This error page
>>>>>>>>> is ownded by DOM 0 (XEN) DOM0 not ready for vMCE (XEN)
>>>>>>>>> domain_crash called from mcaction.c:133 (XEN) Domain 0
>>>>>>>>> reported crashed by domain 32767 on cpu#31: (XEN) Domain 0
>>>>>>>>> crashed: rebooting machine in 5 seconds. (XEN) Resetting with
>>>>>>>>> ACPI MEMORY or I/O RESET_REG.
>>>>>>>>>
>>>>>>>>> * For dom0, if really need check, it should check whether vMCE
>>>>>>>>> injection for dom0 ready (say, exception trap bounce check,
>>>>>>>>> which has been done at inject_vmce()), not check dom0 mcelog
>>>>>>>>> ready (which has been done at mce_softirq() before send
>>>>>>>>> global virq to dom0).
>>>>>>>>
>>>>>>>> Following the argumentation above, I wonder which of the other
>>>>>>>> "goto vmce_failed" are really appropriate, i.e. whether the
>>>>>>>> patch shouldn't be extended (at least for the Dom0 case).
>>>>>>>
>>>>>>> You mean other 'goto vmce_failed' are also not appropriate (I'm
>>>>>>> not quite clear your point)?
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>> Would you please point out which point you think not
>>>>>>> appropriate?
>>>>>>
>>>>>> I question whether it is correct/necessary to crash the domain in
>>>>>> any of those failure cases. Perhaps when we fail to unmap the
>>>>>> page it is, but failure of fill_vmsr_data() and inject_vmce()
>>>>>> don't appear to be valid reasons once the is_vmce_ready() path
>>>>>> is being dropped.
>>>>>
>>>>> For fill_vmsr_data(), it failed only when MCG_STATUS_MCIP bit
>>>>> still set when next vMCE# occur, means the 2nd vMCE# occur when
>>>>> the 1st vMCE# not handled yet. Per SDM it should shutdown.
>>>>>
>>>>> For inject_vmce(), it failed when
>>>>> 1). vcpu is still mce_pending, or
>>>>> 2). pv not register trap callback
>>>>> Maybe it's some overkilled for dom0 (for other guest, it's ok to
>>>>> kill them), but any graceful way to quit?
>>>>
>>>> Just exit and do nothing (except perhaps log a rate limited
>>>> message)?
>>>>
>>>
>>> One concern of quiet exit is, the error will be totally ignored by
>>> guest --> it
>> didn't get preperly handled, and may recursively occur to make worse
>> error --> it's better to kill guest under such case.
>>>
>>>>> or, considering it rarely happens, how about keep current way
>>>>> (kill guest no matter dom0 or not)?
>>>>
>>>> Possibly - I was merely asking why this one condition was found to
>>>> be too strict, while the others are being left as is.
>>>>
>>>> Jan
>>>
>>> Ah, the reason of removing is_vmce_ready check is, it's
>>> problematic (check mcelog driver, not vmce tap callback),
>>> and overkilled (since defaultly dom0 will not start mcelog driver,
>>> under which case system will crash whenever vmce inject to dom0)
>>>
>>> --> So patch 2/2 is not too strict for dom0.
>>>
>>
>> Please keep in mind the mcelog userland/kernel interface is not
>> designed
>> with xen in mind. mcelog cannot report which guest is impacted for
>> example, although xen reports that to dom0.
>> I object 'fixing' the hypervisor to come over with mcelog drawbacks.
>> I prefer fixing Dom0 instead.
>>
Sure, xen mcelog driver in linux is implemented by me :-)
This patch does not intend to 'fix' hypervisor but just avoid overkilled system (when xen mcelog driver in dom0 not loaded as default).
>> From the design perspective, the virq for Dom0 is for logging purpose
>> only and the trap handler has equal purpose for both Dom0 and DomU.
>
Sure, that's what I meant 'problematic' check.
Thanks,
Jinsong
> So as this doesn't read like "don't care" - is this an ack, nak, or
> a request to Jinsong to change something for the patch to be
> acceptable?
>
> Jan
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-06 9:24 ` Liu, Jinsong
@ 2013-05-06 9:41 ` Christoph Egger
2013-05-06 9:50 ` Liu, Jinsong
0 siblings, 1 reply; 33+ messages in thread
From: Christoph Egger @ 2013-05-06 9:41 UTC (permalink / raw)
To: Liu, Jinsong; +Cc: Jan Beulich, xen-devel
On 06.05.13 11:24, Liu, Jinsong wrote:
> Jan Beulich wrote:
>>>>> On 06.05.13 at 10:54, Christoph Egger <chegger@amazon.de> wrote:
>>> On 03.05.13 17:51, Liu, Jinsong wrote:
>>>> Jan Beulich wrote:
>>>>>>>> On 03.05.13 at 16:16, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>> wrote:
>>>>>> Jan Beulich wrote:
>>>>>>>>>> On 03.05.13 at 10:41, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>>>> wrote:
>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17
>>>>>>>>>> 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com>
>>>>>>>>>> Date: Sat, 27 Apr 2013 22:37:35 +0800
>>>>>>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic
>>>>>>>>>> is_vmce_ready check
>>>>>>>>>>
>>>>>>>>>> is_vmce_ready() is problematic:
>>>>>>>>>> * For dom0, it checks if virq bind to dom0 mcelog driver. If
>>>>>>>>>> not, it results dom0 crash. However, it's problematic and
>>>>>>>>>> overkilled since mcelog as a dom0 feature could be
>>>>>>>>>> enabled/disabled per dom0 option: (XEN) MCE: This error page
>>>>>>>>>> is ownded by DOM 0 (XEN) DOM0 not ready for vMCE (XEN)
>>>>>>>>>> domain_crash called from mcaction.c:133 (XEN) Domain 0
>>>>>>>>>> reported crashed by domain 32767 on cpu#31: (XEN) Domain 0
>>>>>>>>>> crashed: rebooting machine in 5 seconds. (XEN) Resetting with
>>>>>>>>>> ACPI MEMORY or I/O RESET_REG.
>>>>>>>>>>
>>>>>>>>>> * For dom0, if really need check, it should check whether vMCE
>>>>>>>>>> injection for dom0 ready (say, exception trap bounce check,
>>>>>>>>>> which has been done at inject_vmce()), not check dom0 mcelog
>>>>>>>>>> ready (which has been done at mce_softirq() before send
>>>>>>>>>> global virq to dom0).
>>>>>>>>>
>>>>>>>>> Following the argumentation above, I wonder which of the other
>>>>>>>>> "goto vmce_failed" are really appropriate, i.e. whether the
>>>>>>>>> patch shouldn't be extended (at least for the Dom0 case).
>>>>>>>>
>>>>>>>> You mean other 'goto vmce_failed' are also not appropriate (I'm
>>>>>>>> not quite clear your point)?
>>>>>>>
>>>>>>> Yes.
>>>>>>>
>>>>>>>> Would you please point out which point you think not
>>>>>>>> appropriate?
>>>>>>>
>>>>>>> I question whether it is correct/necessary to crash the domain in
>>>>>>> any of those failure cases. Perhaps when we fail to unmap the
>>>>>>> page it is, but failure of fill_vmsr_data() and inject_vmce()
>>>>>>> don't appear to be valid reasons once the is_vmce_ready() path
>>>>>>> is being dropped.
>>>>>>
>>>>>> For fill_vmsr_data(), it failed only when MCG_STATUS_MCIP bit
>>>>>> still set when next vMCE# occur, means the 2nd vMCE# occur when
>>>>>> the 1st vMCE# not handled yet. Per SDM it should shutdown.
>>>>>>
>>>>>> For inject_vmce(), it failed when
>>>>>> 1). vcpu is still mce_pending, or
>>>>>> 2). pv not register trap callback
>>>>>> Maybe it's some overkilled for dom0 (for other guest, it's ok to
>>>>>> kill them), but any graceful way to quit?
>>>>>
>>>>> Just exit and do nothing (except perhaps log a rate limited
>>>>> message)?
>>>>>
>>>>
>>>> One concern of quiet exit is, the error will be totally ignored by
>>>> guest --> it
>>> didn't get preperly handled, and may recursively occur to make worse
>>> error --> it's better to kill guest under such case.
>>>>
>>>>>> or, considering it rarely happens, how about keep current way
>>>>>> (kill guest no matter dom0 or not)?
>>>>>
>>>>> Possibly - I was merely asking why this one condition was found to
>>>>> be too strict, while the others are being left as is.
>>>>>
>>>>> Jan
>>>>
>>>> Ah, the reason of removing is_vmce_ready check is, it's
>>>> problematic (check mcelog driver, not vmce tap callback),
>>>> and overkilled (since defaultly dom0 will not start mcelog driver,
>>>> under which case system will crash whenever vmce inject to dom0)
>>>>
>>>> --> So patch 2/2 is not too strict for dom0.
>>>>
>>>
>>> Please keep in mind the mcelog userland/kernel interface is not
>>> designed
>>> with xen in mind. mcelog cannot report which guest is impacted for
>>> example, although xen reports that to dom0.
>>> I object 'fixing' the hypervisor to come over with mcelog drawbacks.
>>> I prefer fixing Dom0 instead.
>>>
>
> Sure, xen mcelog driver in linux is implemented by me :-)
> This patch does not intend to 'fix' hypervisor but just avoid
> overkilled system (when xen mcelog driver in dom0 not loaded as default).
I assume dom0 w/o xen mcelog driver active means dom0 is not capable
to deal with machine check errors. Is this correct?
>>> From the design perspective, the virq for Dom0 is for logging purpose
>>> only and the trap handler has equal purpose for both Dom0 and DomU.
>
> Sure, that's what I meant 'problematic' check.
What do you want to do when Dom0 is not capable to deal with
machine check errors and Dom0 is impacted?
Christoph
> Thanks,
> Jinsong
>
>> So as this doesn't read like "don't care" - is this an ack, nak, or
>> a request to Jinsong to change something for the patch to be
>> acceptable?
>>
>> Jan
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-06 9:41 ` Christoph Egger
@ 2013-05-06 9:50 ` Liu, Jinsong
2013-05-06 11:38 ` Christoph Egger
0 siblings, 1 reply; 33+ messages in thread
From: Liu, Jinsong @ 2013-05-06 9:50 UTC (permalink / raw)
To: Christoph Egger; +Cc: Jan Beulich, xen-devel
Christoph Egger wrote:
> On 06.05.13 11:24, Liu, Jinsong wrote:
>> Jan Beulich wrote:
>>>>>> On 06.05.13 at 10:54, Christoph Egger <chegger@amazon.de> wrote:
>>>> On 03.05.13 17:51, Liu, Jinsong wrote:
>>>>> Jan Beulich wrote:
>>>>>>>>> On 03.05.13 at 16:16, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>>> wrote:
>>>>>>> Jan Beulich wrote:
>>>>>>>>>>> On 03.05.13 at 10:41, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>>>>> wrote:
>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong"
>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17
>>>>>>>>>>> 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com>
>>>>>>>>>>> Date: Sat, 27 Apr 2013 22:37:35 +0800
>>>>>>>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic
>>>>>>>>>>> is_vmce_ready check
>>>>>>>>>>>
>>>>>>>>>>> is_vmce_ready() is problematic:
>>>>>>>>>>> * For dom0, it checks if virq bind to dom0 mcelog driver. If
>>>>>>>>>>> not, it results dom0 crash. However, it's problematic and
>>>>>>>>>>> overkilled since mcelog as a dom0 feature could be
>>>>>>>>>>> enabled/disabled per dom0 option: (XEN) MCE: This error page
>>>>>>>>>>> is ownded by DOM 0 (XEN) DOM0 not ready for vMCE (XEN)
>>>>>>>>>>> domain_crash called from mcaction.c:133 (XEN) Domain 0
>>>>>>>>>>> reported crashed by domain 32767 on cpu#31: (XEN) Domain 0
>>>>>>>>>>> crashed: rebooting machine in 5 seconds. (XEN) Resetting
>>>>>>>>>>> with ACPI MEMORY or I/O RESET_REG.
>>>>>>>>>>>
>>>>>>>>>>> * For dom0, if really need check, it should check whether
>>>>>>>>>>> vMCE injection for dom0 ready (say, exception trap bounce
>>>>>>>>>>> check, which has been done at inject_vmce()), not check
>>>>>>>>>>> dom0 mcelog ready (which has been done at mce_softirq()
>>>>>>>>>>> before send global virq to dom0).
>>>>>>>>>>
>>>>>>>>>> Following the argumentation above, I wonder which of the
>>>>>>>>>> other "goto vmce_failed" are really appropriate, i.e.
>>>>>>>>>> whether the patch shouldn't be extended (at least for the
>>>>>>>>>> Dom0 case).
>>>>>>>>>
>>>>>>>>> You mean other 'goto vmce_failed' are also not appropriate
>>>>>>>>> (I'm not quite clear your point)?
>>>>>>>>
>>>>>>>> Yes.
>>>>>>>>
>>>>>>>>> Would you please point out which point you think not
>>>>>>>>> appropriate?
>>>>>>>>
>>>>>>>> I question whether it is correct/necessary to crash the domain
>>>>>>>> in any of those failure cases. Perhaps when we fail to unmap
>>>>>>>> the page it is, but failure of fill_vmsr_data() and
>>>>>>>> inject_vmce() don't appear to be valid reasons once the
>>>>>>>> is_vmce_ready() path is being dropped.
>>>>>>>
>>>>>>> For fill_vmsr_data(), it failed only when MCG_STATUS_MCIP bit
>>>>>>> still set when next vMCE# occur, means the 2nd vMCE# occur when
>>>>>>> the 1st vMCE# not handled yet. Per SDM it should shutdown.
>>>>>>>
>>>>>>> For inject_vmce(), it failed when
>>>>>>> 1). vcpu is still mce_pending, or
>>>>>>> 2). pv not register trap callback
>>>>>>> Maybe it's some overkilled for dom0 (for other guest, it's ok to
>>>>>>> kill them), but any graceful way to quit?
>>>>>>
>>>>>> Just exit and do nothing (except perhaps log a rate limited
>>>>>> message)?
>>>>>>
>>>>>
>>>>> One concern of quiet exit is, the error will be totally ignored by
>>>>> guest --> it
>>>> didn't get preperly handled, and may recursively occur to make
>>>> worse error --> it's better to kill guest under such case.
>>>>>
>>>>>>> or, considering it rarely happens, how about keep current way
>>>>>>> (kill guest no matter dom0 or not)?
>>>>>>
>>>>>> Possibly - I was merely asking why this one condition was found
>>>>>> to be too strict, while the others are being left as is.
>>>>>>
>>>>>> Jan
>>>>>
>>>>> Ah, the reason of removing is_vmce_ready check is, it's
>>>>> problematic (check mcelog driver, not vmce tap callback),
>>>>> and overkilled (since defaultly dom0 will not start mcelog driver,
>>>>> under which case system will crash whenever vmce inject to dom0)
>>>>>
>>>>> --> So patch 2/2 is not too strict for dom0.
>>>>>
>>>>
>>>> Please keep in mind the mcelog userland/kernel interface is not
>>>> designed with xen in mind. mcelog cannot report which guest is
>>>> impacted for example, although xen reports that to dom0.
>>>> I object 'fixing' the hypervisor to come over with mcelog
>>>> drawbacks. I prefer fixing Dom0 instead.
>>>>
>>
>> Sure, xen mcelog driver in linux is implemented by me :-)
>> This patch does not intend to 'fix' hypervisor but just avoid
>> overkilled system (when xen mcelog driver in dom0 not loaded as
>> default).
>
> I assume dom0 w/o xen mcelog driver active means dom0 is not capable
> to deal with machine check errors. Is this correct?
>
No, w/o xen mcelog driver active, only user daemon 'mcelog' was affected.
Dom0 is still capable of handling vmce as long as it registered trap callback (which is checked at hypervisor inject_vmce()).
>>>> From the design perspective, the virq for Dom0 is for logging
>>>> purpose only and the trap handler has equal purpose for both Dom0
>>>> and DomU.
>>
>> Sure, that's what I meant 'problematic' check.
>
> What do you want to do when Dom0 is not capable to deal with
> machine check errors and Dom0 is impacted?
>
> Christoph
As above comments :)
Thanks,
Jinsong
>
>> Thanks,
>> Jinsong
>>
>>> So as this doesn't read like "don't care" - is this an ack, nak, or
>>> a request to Jinsong to change something for the patch to be
>>> acceptable?
>>>
>>> Jan
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-06 9:50 ` Liu, Jinsong
@ 2013-05-06 11:38 ` Christoph Egger
2013-05-06 15:00 ` Liu, Jinsong
0 siblings, 1 reply; 33+ messages in thread
From: Christoph Egger @ 2013-05-06 11:38 UTC (permalink / raw)
To: Liu, Jinsong; +Cc: Jan Beulich, xen-devel
On 06.05.13 11:50, Liu, Jinsong wrote:
> Christoph Egger wrote:
>> On 06.05.13 11:24, Liu, Jinsong wrote:
>>> Jan Beulich wrote:
>>>>>>> On 06.05.13 at 10:54, Christoph Egger <chegger@amazon.de> wrote:
>>>>> On 03.05.13 17:51, Liu, Jinsong wrote:
>>>>>> Jan Beulich wrote:
>>>>>>>>>> On 03.05.13 at 16:16, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>>>> wrote:
>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>> On 03.05.13 at 10:41, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong"
>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17
>>>>>>>>>>>> 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com>
>>>>>>>>>>>> Date: Sat, 27 Apr 2013 22:37:35 +0800
>>>>>>>>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic
>>>>>>>>>>>> is_vmce_ready check
>>>>>>>>>>>>
>>>>>>>>>>>> is_vmce_ready() is problematic:
>>>>>>>>>>>> * For dom0, it checks if virq bind to dom0 mcelog driver. If
>>>>>>>>>>>> not, it results dom0 crash. However, it's problematic and
>>>>>>>>>>>> overkilled since mcelog as a dom0 feature could be
>>>>>>>>>>>> enabled/disabled per dom0 option: (XEN) MCE: This error page
>>>>>>>>>>>> is ownded by DOM 0 (XEN) DOM0 not ready for vMCE (XEN)
>>>>>>>>>>>> domain_crash called from mcaction.c:133 (XEN) Domain 0
>>>>>>>>>>>> reported crashed by domain 32767 on cpu#31: (XEN) Domain 0
>>>>>>>>>>>> crashed: rebooting machine in 5 seconds. (XEN) Resetting
>>>>>>>>>>>> with ACPI MEMORY or I/O RESET_REG.
>>>>>>>>>>>>
>>>>>>>>>>>> * For dom0, if really need check, it should check whether
>>>>>>>>>>>> vMCE injection for dom0 ready (say, exception trap bounce
>>>>>>>>>>>> check, which has been done at inject_vmce()), not check
>>>>>>>>>>>> dom0 mcelog ready (which has been done at mce_softirq()
>>>>>>>>>>>> before send global virq to dom0).
>>>>>>>>>>>
>>>>>>>>>>> Following the argumentation above, I wonder which of the
>>>>>>>>>>> other "goto vmce_failed" are really appropriate, i.e.
>>>>>>>>>>> whether the patch shouldn't be extended (at least for the
>>>>>>>>>>> Dom0 case).
>>>>>>>>>>
>>>>>>>>>> You mean other 'goto vmce_failed' are also not appropriate
>>>>>>>>>> (I'm not quite clear your point)?
>>>>>>>>>
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>>> Would you please point out which point you think not
>>>>>>>>>> appropriate?
>>>>>>>>>
>>>>>>>>> I question whether it is correct/necessary to crash the domain
>>>>>>>>> in any of those failure cases. Perhaps when we fail to unmap
>>>>>>>>> the page it is, but failure of fill_vmsr_data() and
>>>>>>>>> inject_vmce() don't appear to be valid reasons once the
>>>>>>>>> is_vmce_ready() path is being dropped.
>>>>>>>>
>>>>>>>> For fill_vmsr_data(), it failed only when MCG_STATUS_MCIP bit
>>>>>>>> still set when next vMCE# occur, means the 2nd vMCE# occur when
>>>>>>>> the 1st vMCE# not handled yet. Per SDM it should shutdown.
>>>>>>>>
>>>>>>>> For inject_vmce(), it failed when
>>>>>>>> 1). vcpu is still mce_pending, or
>>>>>>>> 2). pv not register trap callback
>>>>>>>> Maybe it's some overkilled for dom0 (for other guest, it's ok to
>>>>>>>> kill them), but any graceful way to quit?
>>>>>>>
>>>>>>> Just exit and do nothing (except perhaps log a rate limited
>>>>>>> message)?
>>>>>>>
>>>>>>
>>>>>> One concern of quiet exit is, the error will be totally ignored by
>>>>>> guest --> it
>>>>> didn't get preperly handled, and may recursively occur to make
>>>>> worse error --> it's better to kill guest under such case.
>>>>>>
>>>>>>>> or, considering it rarely happens, how about keep current way
>>>>>>>> (kill guest no matter dom0 or not)?
>>>>>>>
>>>>>>> Possibly - I was merely asking why this one condition was found
>>>>>>> to be too strict, while the others are being left as is.
>>>>>>>
>>>>>>> Jan
>>>>>>
>>>>>> Ah, the reason of removing is_vmce_ready check is, it's
>>>>>> problematic (check mcelog driver, not vmce tap callback),
>>>>>> and overkilled (since defaultly dom0 will not start mcelog driver,
>>>>>> under which case system will crash whenever vmce inject to dom0)
>>>>>>
>>>>>> --> So patch 2/2 is not too strict for dom0.
>>>>>>
>>>>>
>>>>> Please keep in mind the mcelog userland/kernel interface is not
>>>>> designed with xen in mind. mcelog cannot report which guest is
>>>>> impacted for example, although xen reports that to dom0.
>>>>> I object 'fixing' the hypervisor to come over with mcelog
>>>>> drawbacks. I prefer fixing Dom0 instead.
>>>>>
>>>
>>> Sure, xen mcelog driver in linux is implemented by me :-)
>>> This patch does not intend to 'fix' hypervisor but just avoid
>>> overkilled system (when xen mcelog driver in dom0 not loaded as
>>> default).
>>
>> I assume dom0 w/o xen mcelog driver active means dom0 is not capable
>> to deal with machine check errors. Is this correct?
>>
>
> No, w/o xen mcelog driver active, only user daemon 'mcelog' was affected.
> Dom0 is still capable of handling vmce as long as it registered trap callback
> (which is checked at hypervisor inject_vmce()).
Oh, I thought the xen mcelog *driver* registers the trap callback
and in a Dom0 it additionally registers the logging callback.
What registers the logging callback and what registers
the trap callback?
Christoph
>>>>> From the design perspective, the virq for Dom0 is for logging
>>>>> purpose only and the trap handler has equal purpose for both Dom0
>>>>> and DomU.
>>>
>>> Sure, that's what I meant 'problematic' check.
>>
>> What do you want to do when Dom0 is not capable to deal with
>> machine check errors and Dom0 is impacted?
>>
>> Christoph
>
> As above comments :)
>
> Thanks,
> Jinsong
>
>>
>>> Thanks,
>>> Jinsong
>>>
>>>> So as this doesn't read like "don't care" - is this an ack, nak, or
>>>> a request to Jinsong to change something for the patch to be
>>>> acceptable?
>>>>
>>>> Jan
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-06 11:38 ` Christoph Egger
@ 2013-05-06 15:00 ` Liu, Jinsong
2013-05-06 15:06 ` Egger, Christoph
0 siblings, 1 reply; 33+ messages in thread
From: Liu, Jinsong @ 2013-05-06 15:00 UTC (permalink / raw)
To: Christoph Egger; +Cc: Jan Beulich, xen-devel
Christoph Egger wrote:
> On 06.05.13 11:50, Liu, Jinsong wrote:
>> Christoph Egger wrote:
>>> On 06.05.13 11:24, Liu, Jinsong wrote:
>>>> Jan Beulich wrote:
>>>>>>>> On 06.05.13 at 10:54, Christoph Egger <chegger@amazon.de>
>>>>>>>> wrote:
>>>>>> On 03.05.13 17:51, Liu, Jinsong wrote:
>>>>>>> Jan Beulich wrote:
>>>>>>>>>>> On 03.05.13 at 16:16, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>>>>> wrote:
>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>> On 03.05.13 at 10:41, "Liu, Jinsong"
>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong"
>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17
>>>>>>>>>>>>> 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com>
>>>>>>>>>>>>> Date: Sat, 27 Apr 2013 22:37:35 +0800
>>>>>>>>>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove
>>>>>>>>>>>>> problematic is_vmce_ready check
>>>>>>>>>>>>>
>>>>>>>>>>>>> is_vmce_ready() is problematic:
>>>>>>>>>>>>> * For dom0, it checks if virq bind to dom0 mcelog driver.
>>>>>>>>>>>>> If not, it results dom0 crash. However, it's problematic
>>>>>>>>>>>>> and overkilled since mcelog as a dom0 feature could be
>>>>>>>>>>>>> enabled/disabled per dom0 option: (XEN) MCE: This error
>>>>>>>>>>>>> page is ownded by DOM 0 (XEN) DOM0 not ready for vMCE
>>>>>>>>>>>>> (XEN) domain_crash called from mcaction.c:133 (XEN)
>>>>>>>>>>>>> Domain 0 reported crashed by domain 32767 on cpu#31:
>>>>>>>>>>>>> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
>>>>>>>>>>>>> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
>>>>>>>>>>>>>
>>>>>>>>>>>>> * For dom0, if really need check, it should check whether
>>>>>>>>>>>>> vMCE injection for dom0 ready (say, exception trap bounce
>>>>>>>>>>>>> check, which has been done at inject_vmce()), not check
>>>>>>>>>>>>> dom0 mcelog ready (which has been done at mce_softirq()
>>>>>>>>>>>>> before send global virq to dom0).
>>>>>>>>>>>>
>>>>>>>>>>>> Following the argumentation above, I wonder which of the
>>>>>>>>>>>> other "goto vmce_failed" are really appropriate, i.e.
>>>>>>>>>>>> whether the patch shouldn't be extended (at least for the
>>>>>>>>>>>> Dom0 case).
>>>>>>>>>>>
>>>>>>>>>>> You mean other 'goto vmce_failed' are also not appropriate
>>>>>>>>>>> (I'm not quite clear your point)?
>>>>>>>>>>
>>>>>>>>>> Yes.
>>>>>>>>>>
>>>>>>>>>>> Would you please point out which point you think not
>>>>>>>>>>> appropriate?
>>>>>>>>>>
>>>>>>>>>> I question whether it is correct/necessary to crash the
>>>>>>>>>> domain in any of those failure cases. Perhaps when we fail
>>>>>>>>>> to unmap the page it is, but failure of fill_vmsr_data() and
>>>>>>>>>> inject_vmce() don't appear to be valid reasons once the
>>>>>>>>>> is_vmce_ready() path is being dropped.
>>>>>>>>>
>>>>>>>>> For fill_vmsr_data(), it failed only when MCG_STATUS_MCIP bit
>>>>>>>>> still set when next vMCE# occur, means the 2nd vMCE# occur
>>>>>>>>> when the 1st vMCE# not handled yet. Per SDM it should
>>>>>>>>> shutdown.
>>>>>>>>>
>>>>>>>>> For inject_vmce(), it failed when
>>>>>>>>> 1). vcpu is still mce_pending, or
>>>>>>>>> 2). pv not register trap callback
>>>>>>>>> Maybe it's some overkilled for dom0 (for other guest, it's ok
>>>>>>>>> to kill them), but any graceful way to quit?
>>>>>>>>
>>>>>>>> Just exit and do nothing (except perhaps log a rate limited
>>>>>>>> message)?
>>>>>>>>
>>>>>>>
>>>>>>> One concern of quiet exit is, the error will be totally ignored
>>>>>>> by guest --> it
>>>>>> didn't get preperly handled, and may recursively occur to make
>>>>>> worse error --> it's better to kill guest under such case.
>>>>>>>
>>>>>>>>> or, considering it rarely happens, how about keep current way
>>>>>>>>> (kill guest no matter dom0 or not)?
>>>>>>>>
>>>>>>>> Possibly - I was merely asking why this one condition was found
>>>>>>>> to be too strict, while the others are being left as is.
>>>>>>>>
>>>>>>>> Jan
>>>>>>>
>>>>>>> Ah, the reason of removing is_vmce_ready check is, it's
>>>>>>> problematic (check mcelog driver, not vmce tap callback),
>>>>>>> and overkilled (since defaultly dom0 will not start mcelog
>>>>>>> driver, under which case system will crash whenever vmce inject
>>>>>>> to dom0)
>>>>>>>
>>>>>>> --> So patch 2/2 is not too strict for dom0.
>>>>>>>
>>>>>>
>>>>>> Please keep in mind the mcelog userland/kernel interface is not
>>>>>> designed with xen in mind. mcelog cannot report which guest is
>>>>>> impacted for example, although xen reports that to dom0.
>>>>>> I object 'fixing' the hypervisor to come over with mcelog
>>>>>> drawbacks. I prefer fixing Dom0 instead.
>>>>>>
>>>>
>>>> Sure, xen mcelog driver in linux is implemented by me :-)
>>>> This patch does not intend to 'fix' hypervisor but just avoid
>>>> overkilled system (when xen mcelog driver in dom0 not loaded as
>>>> default).
>>>
>>> I assume dom0 w/o xen mcelog driver active means dom0 is not capable
>>> to deal with machine check errors. Is this correct?
>>>
>>
>> No, w/o xen mcelog driver active, only user daemon 'mcelog' was
>> affected. Dom0 is still capable of handling vmce as long as it
>> registered trap callback (which is checked at hypervisor
>> inject_vmce()).
>
> Oh, I thought the xen mcelog *driver* registers the trap callback
> and in a Dom0 it additionally registers the logging callback.
> What registers the logging callback and what registers
> the trap callback?
>
> Christoph
>
They are 2 paths (refer latest kernl):
1. for xen mcelog driver, dom0 registers irq handler at driver init via
drivers/xen/mcelog.c
--> bind_virq_for_mce()
--> bind_virq_to_irqhandler()
which got VIRQ_MCA via eventchannel binding and handle mcelog error logging accordingly;
2. for vmce injection, dom0 registers trap callback (which re-use kernel machine check handler) via normal kernel traps init:
arch/x86/kernel/traps.c
--> set_intr_gate_ist()
--> write_idt_entry()
--> xen_write_idt_entry()
--> cvt_gate_to_trap() & HYPERVISOR_set_trap_table()
hypervisor entry.S use the registerred trap callback to construct bounce back stack for TRAP_machine_check, bounce back to dom0 mce handler.
Thanks,
Jinsong
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-06 15:00 ` Liu, Jinsong
@ 2013-05-06 15:06 ` Egger, Christoph
2013-05-06 15:14 ` Liu, Jinsong
0 siblings, 1 reply; 33+ messages in thread
From: Egger, Christoph @ 2013-05-06 15:06 UTC (permalink / raw)
To: Liu, Jinsong; +Cc: Jan Beulich, xen-devel
On 06.05.13 17:00, Liu, Jinsong wrote:
> Christoph Egger wrote:
>> On 06.05.13 11:50, Liu, Jinsong wrote:
>>> Christoph Egger wrote:
>>>> On 06.05.13 11:24, Liu, Jinsong wrote:
>>>>> Jan Beulich wrote:
>>>>>>>>> On 06.05.13 at 10:54, Christoph Egger <chegger@amazon.de>
>>>>>>>>> wrote:
>>>>>>> On 03.05.13 17:51, Liu, Jinsong wrote:
>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>> On 03.05.13 at 16:16, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>> On 03.05.13 at 10:41, "Liu, Jinsong"
>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong"
>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17
>>>>>>>>>>>>>> 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com>
>>>>>>>>>>>>>> Date: Sat, 27 Apr 2013 22:37:35 +0800
>>>>>>>>>>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove
>>>>>>>>>>>>>> problematic is_vmce_ready check
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> is_vmce_ready() is problematic:
>>>>>>>>>>>>>> * For dom0, it checks if virq bind to dom0 mcelog driver.
>>>>>>>>>>>>>> If not, it results dom0 crash. However, it's problematic
>>>>>>>>>>>>>> and overkilled since mcelog as a dom0 feature could be
>>>>>>>>>>>>>> enabled/disabled per dom0 option: (XEN) MCE: This error
>>>>>>>>>>>>>> page is ownded by DOM 0 (XEN) DOM0 not ready for vMCE
>>>>>>>>>>>>>> (XEN) domain_crash called from mcaction.c:133 (XEN)
>>>>>>>>>>>>>> Domain 0 reported crashed by domain 32767 on cpu#31:
>>>>>>>>>>>>>> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
>>>>>>>>>>>>>> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * For dom0, if really need check, it should check whether
>>>>>>>>>>>>>> vMCE injection for dom0 ready (say, exception trap bounce
>>>>>>>>>>>>>> check, which has been done at inject_vmce()), not check
>>>>>>>>>>>>>> dom0 mcelog ready (which has been done at mce_softirq()
>>>>>>>>>>>>>> before send global virq to dom0).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Following the argumentation above, I wonder which of the
>>>>>>>>>>>>> other "goto vmce_failed" are really appropriate, i.e.
>>>>>>>>>>>>> whether the patch shouldn't be extended (at least for the
>>>>>>>>>>>>> Dom0 case).
>>>>>>>>>>>>
>>>>>>>>>>>> You mean other 'goto vmce_failed' are also not appropriate
>>>>>>>>>>>> (I'm not quite clear your point)?
>>>>>>>>>>>
>>>>>>>>>>> Yes.
>>>>>>>>>>>
>>>>>>>>>>>> Would you please point out which point you think not
>>>>>>>>>>>> appropriate?
>>>>>>>>>>>
>>>>>>>>>>> I question whether it is correct/necessary to crash the
>>>>>>>>>>> domain in any of those failure cases. Perhaps when we fail
>>>>>>>>>>> to unmap the page it is, but failure of fill_vmsr_data() and
>>>>>>>>>>> inject_vmce() don't appear to be valid reasons once the
>>>>>>>>>>> is_vmce_ready() path is being dropped.
>>>>>>>>>>
>>>>>>>>>> For fill_vmsr_data(), it failed only when MCG_STATUS_MCIP bit
>>>>>>>>>> still set when next vMCE# occur, means the 2nd vMCE# occur
>>>>>>>>>> when the 1st vMCE# not handled yet. Per SDM it should
>>>>>>>>>> shutdown.
>>>>>>>>>>
>>>>>>>>>> For inject_vmce(), it failed when
>>>>>>>>>> 1). vcpu is still mce_pending, or
>>>>>>>>>> 2). pv not register trap callback
>>>>>>>>>> Maybe it's some overkilled for dom0 (for other guest, it's ok
>>>>>>>>>> to kill them), but any graceful way to quit?
>>>>>>>>>
>>>>>>>>> Just exit and do nothing (except perhaps log a rate limited
>>>>>>>>> message)?
>>>>>>>>>
>>>>>>>>
>>>>>>>> One concern of quiet exit is, the error will be totally ignored
>>>>>>>> by guest --> it
>>>>>>> didn't get preperly handled, and may recursively occur to make
>>>>>>> worse error --> it's better to kill guest under such case.
>>>>>>>>
>>>>>>>>>> or, considering it rarely happens, how about keep current way
>>>>>>>>>> (kill guest no matter dom0 or not)?
>>>>>>>>>
>>>>>>>>> Possibly - I was merely asking why this one condition was found
>>>>>>>>> to be too strict, while the others are being left as is.
>>>>>>>>>
>>>>>>>>> Jan
>>>>>>>>
>>>>>>>> Ah, the reason of removing is_vmce_ready check is, it's
>>>>>>>> problematic (check mcelog driver, not vmce tap callback),
>>>>>>>> and overkilled (since defaultly dom0 will not start mcelog
>>>>>>>> driver, under which case system will crash whenever vmce inject
>>>>>>>> to dom0)
>>>>>>>>
>>>>>>>> --> So patch 2/2 is not too strict for dom0.
>>>>>>>>
>>>>>>>
>>>>>>> Please keep in mind the mcelog userland/kernel interface is not
>>>>>>> designed with xen in mind. mcelog cannot report which guest is
>>>>>>> impacted for example, although xen reports that to dom0.
>>>>>>> I object 'fixing' the hypervisor to come over with mcelog
>>>>>>> drawbacks. I prefer fixing Dom0 instead.
>>>>>>>
>>>>>
>>>>> Sure, xen mcelog driver in linux is implemented by me :-)
>>>>> This patch does not intend to 'fix' hypervisor but just avoid
>>>>> overkilled system (when xen mcelog driver in dom0 not loaded as
>>>>> default).
>>>>
>>>> I assume dom0 w/o xen mcelog driver active means dom0 is not capable
>>>> to deal with machine check errors. Is this correct?
>>>>
>>>
>>> No, w/o xen mcelog driver active, only user daemon 'mcelog' was
>>> affected. Dom0 is still capable of handling vmce as long as it
>>> registered trap callback (which is checked at hypervisor
>>> inject_vmce()).
>>
>> Oh, I thought the xen mcelog *driver* registers the trap callback
>> and in a Dom0 it additionally registers the logging callback.
>> What registers the logging callback and what registers
>> the trap callback?
>>
>> Christoph
>>
>
> They are 2 paths (refer latest kernl):
> 1. for xen mcelog driver, dom0 registers irq handler at driver init via
> drivers/xen/mcelog.c
> --> bind_virq_for_mce()
> --> bind_virq_to_irqhandler()
> which got VIRQ_MCA via eventchannel binding and handle mcelog error logging accordingly;
>
> 2. for vmce injection, dom0 registers trap callback (which re-use kernel machine check handler) via normal kernel traps init:
> arch/x86/kernel/traps.c
> --> set_intr_gate_ist()
> --> write_idt_entry()
> --> xen_write_idt_entry()
> --> cvt_gate_to_trap() & HYPERVISOR_set_trap_table()
> hypervisor entry.S use the registerred trap callback to construct bounce back stack for TRAP_machine_check, bounce back to dom0 mce handler.
Ok, that's how I understand the code, too. But you say above that w/o
this driver Dom0 is still capable to handle vmce. That puzzle's me.
Christoph
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-06 15:06 ` Egger, Christoph
@ 2013-05-06 15:14 ` Liu, Jinsong
2013-05-06 15:31 ` Christoph Egger
0 siblings, 1 reply; 33+ messages in thread
From: Liu, Jinsong @ 2013-05-06 15:14 UTC (permalink / raw)
To: Egger, Christoph; +Cc: Jan Beulich, xen-devel
Egger, Christoph wrote:
> On 06.05.13 17:00, Liu, Jinsong wrote:
>> Christoph Egger wrote:
>>> On 06.05.13 11:50, Liu, Jinsong wrote:
>>>> Christoph Egger wrote:
>>>>> On 06.05.13 11:24, Liu, Jinsong wrote:
>>>>>> Jan Beulich wrote:
>>>>>>>>>> On 06.05.13 at 10:54, Christoph Egger <chegger@amazon.de>
>>>>>>>>>> wrote:
>>>>>>>> On 03.05.13 17:51, Liu, Jinsong wrote:
>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>> On 03.05.13 at 16:16, "Liu, Jinsong"
>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>> On 03.05.13 at 10:41, "Liu, Jinsong"
>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong"
>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17
>>>>>>>>>>>>>>> 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com>
>>>>>>>>>>>>>>> Date: Sat, 27 Apr 2013 22:37:35 +0800
>>>>>>>>>>>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove
>>>>>>>>>>>>>>> problematic is_vmce_ready check
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> is_vmce_ready() is problematic:
>>>>>>>>>>>>>>> * For dom0, it checks if virq bind to dom0 mcelog
>>>>>>>>>>>>>>> driver. If not, it results dom0 crash. However, it's
>>>>>>>>>>>>>>> problematic and overkilled since mcelog as a dom0
>>>>>>>>>>>>>>> feature could be enabled/disabled per dom0 option:
>>>>>>>>>>>>>>> (XEN) MCE: This error page is ownded by DOM 0 (XEN)
>>>>>>>>>>>>>>> DOM0 not ready for vMCE (XEN) domain_crash called from
>>>>>>>>>>>>>>> mcaction.c:133 (XEN) Domain 0 reported crashed by
>>>>>>>>>>>>>>> domain 32767 on cpu#31: (XEN) Domain 0 crashed:
>>>>>>>>>>>>>>> rebooting machine in 5 seconds. (XEN) Resetting with
>>>>>>>>>>>>>>> ACPI MEMORY or I/O RESET_REG.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> * For dom0, if really need check, it should check
>>>>>>>>>>>>>>> whether vMCE injection for dom0 ready (say, exception
>>>>>>>>>>>>>>> trap bounce check, which has been done at
>>>>>>>>>>>>>>> inject_vmce()), not check dom0 mcelog ready (which has
>>>>>>>>>>>>>>> been done at mce_softirq() before send global virq to
>>>>>>>>>>>>>>> dom0).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Following the argumentation above, I wonder which of the
>>>>>>>>>>>>>> other "goto vmce_failed" are really appropriate, i.e.
>>>>>>>>>>>>>> whether the patch shouldn't be extended (at least for the
>>>>>>>>>>>>>> Dom0 case).
>>>>>>>>>>>>>
>>>>>>>>>>>>> You mean other 'goto vmce_failed' are also not appropriate
>>>>>>>>>>>>> (I'm not quite clear your point)?
>>>>>>>>>>>>
>>>>>>>>>>>> Yes.
>>>>>>>>>>>>
>>>>>>>>>>>>> Would you please point out which point you think not
>>>>>>>>>>>>> appropriate?
>>>>>>>>>>>>
>>>>>>>>>>>> I question whether it is correct/necessary to crash the
>>>>>>>>>>>> domain in any of those failure cases. Perhaps when we fail
>>>>>>>>>>>> to unmap the page it is, but failure of fill_vmsr_data()
>>>>>>>>>>>> and inject_vmce() don't appear to be valid reasons once the
>>>>>>>>>>>> is_vmce_ready() path is being dropped.
>>>>>>>>>>>
>>>>>>>>>>> For fill_vmsr_data(), it failed only when MCG_STATUS_MCIP
>>>>>>>>>>> bit still set when next vMCE# occur, means the 2nd vMCE#
>>>>>>>>>>> occur when the 1st vMCE# not handled yet. Per SDM it should
>>>>>>>>>>> shutdown.
>>>>>>>>>>>
>>>>>>>>>>> For inject_vmce(), it failed when
>>>>>>>>>>> 1). vcpu is still mce_pending, or
>>>>>>>>>>> 2). pv not register trap callback
>>>>>>>>>>> Maybe it's some overkilled for dom0 (for other guest, it's
>>>>>>>>>>> ok to kill them), but any graceful way to quit?
>>>>>>>>>>
>>>>>>>>>> Just exit and do nothing (except perhaps log a rate limited
>>>>>>>>>> message)?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> One concern of quiet exit is, the error will be totally
>>>>>>>>> ignored by guest --> it
>>>>>>>> didn't get preperly handled, and may recursively occur to make
>>>>>>>> worse error --> it's better to kill guest under such case.
>>>>>>>>>
>>>>>>>>>>> or, considering it rarely happens, how about keep current
>>>>>>>>>>> way (kill guest no matter dom0 or not)?
>>>>>>>>>>
>>>>>>>>>> Possibly - I was merely asking why this one condition was
>>>>>>>>>> found to be too strict, while the others are being left as
>>>>>>>>>> is.
>>>>>>>>>>
>>>>>>>>>> Jan
>>>>>>>>>
>>>>>>>>> Ah, the reason of removing is_vmce_ready check is, it's
>>>>>>>>> problematic (check mcelog driver, not vmce tap callback),
>>>>>>>>> and overkilled (since defaultly dom0 will not start mcelog
>>>>>>>>> driver, under which case system will crash whenever vmce
>>>>>>>>> inject to dom0)
>>>>>>>>>
>>>>>>>>> --> So patch 2/2 is not too strict for dom0.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Please keep in mind the mcelog userland/kernel interface is not
>>>>>>>> designed with xen in mind. mcelog cannot report which guest is
>>>>>>>> impacted for example, although xen reports that to dom0.
>>>>>>>> I object 'fixing' the hypervisor to come over with mcelog
>>>>>>>> drawbacks. I prefer fixing Dom0 instead.
>>>>>>>>
>>>>>>
>>>>>> Sure, xen mcelog driver in linux is implemented by me :-)
>>>>>> This patch does not intend to 'fix' hypervisor but just avoid
>>>>>> overkilled system (when xen mcelog driver in dom0 not loaded as
>>>>>> default).
>>>>>
>>>>> I assume dom0 w/o xen mcelog driver active means dom0 is not
>>>>> capable to deal with machine check errors. Is this correct?
>>>>>
>>>>
>>>> No, w/o xen mcelog driver active, only user daemon 'mcelog' was
>>>> affected. Dom0 is still capable of handling vmce as long as it
>>>> registered trap callback (which is checked at hypervisor
>>>> inject_vmce()).
>>>
>>> Oh, I thought the xen mcelog *driver* registers the trap callback
>>> and in a Dom0 it additionally registers the logging callback.
>>> What registers the logging callback and what registers
>>> the trap callback?
>>>
>>> Christoph
>>>
>>
>> They are 2 paths (refer latest kernl):
>> 1. for xen mcelog driver, dom0 registers irq handler at driver init
>> via drivers/xen/mcelog.c
>> --> bind_virq_for_mce()
>> --> bind_virq_to_irqhandler()
>> which got VIRQ_MCA via eventchannel binding and handle
>> mcelog error logging accordingly;
>>
>> 2. for vmce injection, dom0 registers trap callback (which re-use
>> kernel machine check handler) via normal kernel traps init:
>> arch/x86/kernel/traps.c
>> --> set_intr_gate_ist()
>> --> write_idt_entry()
>> --> xen_write_idt_entry()
>> --> cvt_gate_to_trap() &
>> HYPERVISOR_set_trap_table() hypervisor entry.S use the
>> registerred trap callback to construct bounce back stack for
>> TRAP_machine_check, bounce back to dom0 mce handler.
>
> Ok, that's how I understand the code, too. But you say above that w/o
> this driver Dom0 is still capable to handle vmce. That puzzle's me.
>
> Christoph
Ah, what I meant is, w/o mcelog dirver, kernel still successfully register its mce handler as trap callback to hypervisor, hence the injection of vmce from hypervisor to dom0 is OK, and surely got properly handled at dom0.
mcelog driver is just for error logging, getting error info from hypervisor and provide interface for user daemon tools 'mcelog'.
Thanks,
Jinsong
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-06 15:14 ` Liu, Jinsong
@ 2013-05-06 15:31 ` Christoph Egger
2013-05-06 16:00 ` Liu, Jinsong
0 siblings, 1 reply; 33+ messages in thread
From: Christoph Egger @ 2013-05-06 15:31 UTC (permalink / raw)
To: Liu, Jinsong; +Cc: Jan Beulich, xen-devel
On 06.05.13 17:14, Liu, Jinsong wrote:
> Egger, Christoph wrote:
>> On 06.05.13 17:00, Liu, Jinsong wrote:
>>> Christoph Egger wrote:
>>>> On 06.05.13 11:50, Liu, Jinsong wrote:
>>>>> Christoph Egger wrote:
>>>>>> On 06.05.13 11:24, Liu, Jinsong wrote:
>>>>>>> Jan Beulich wrote:
>>>>>>>>>>> On 06.05.13 at 10:54, Christoph Egger <chegger@amazon.de>
>>>>>>>>>>> wrote:
>>>>>>>>> On 03.05.13 17:51, Liu, Jinsong wrote:
>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>> On 03.05.13 at 16:16, "Liu, Jinsong"
>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>>> On 03.05.13 at 10:41, "Liu, Jinsong"
>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong"
>>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17
>>>>>>>>>>>>>>>> 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com>
>>>>>>>>>>>>>>>> Date: Sat, 27 Apr 2013 22:37:35 +0800
>>>>>>>>>>>>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove
>>>>>>>>>>>>>>>> problematic is_vmce_ready check
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> is_vmce_ready() is problematic:
>>>>>>>>>>>>>>>> * For dom0, it checks if virq bind to dom0 mcelog
>>>>>>>>>>>>>>>> driver. If not, it results dom0 crash. However, it's
>>>>>>>>>>>>>>>> problematic and overkilled since mcelog as a dom0
>>>>>>>>>>>>>>>> feature could be enabled/disabled per dom0 option:
>>>>>>>>>>>>>>>> (XEN) MCE: This error page is ownded by DOM 0 (XEN)
>>>>>>>>>>>>>>>> DOM0 not ready for vMCE (XEN) domain_crash called from
>>>>>>>>>>>>>>>> mcaction.c:133 (XEN) Domain 0 reported crashed by
>>>>>>>>>>>>>>>> domain 32767 on cpu#31: (XEN) Domain 0 crashed:
>>>>>>>>>>>>>>>> rebooting machine in 5 seconds. (XEN) Resetting with
>>>>>>>>>>>>>>>> ACPI MEMORY or I/O RESET_REG.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * For dom0, if really need check, it should check
>>>>>>>>>>>>>>>> whether vMCE injection for dom0 ready (say, exception
>>>>>>>>>>>>>>>> trap bounce check, which has been done at
>>>>>>>>>>>>>>>> inject_vmce()), not check dom0 mcelog ready (which has
>>>>>>>>>>>>>>>> been done at mce_softirq() before send global virq to
>>>>>>>>>>>>>>>> dom0).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Following the argumentation above, I wonder which of the
>>>>>>>>>>>>>>> other "goto vmce_failed" are really appropriate, i.e.
>>>>>>>>>>>>>>> whether the patch shouldn't be extended (at least for the
>>>>>>>>>>>>>>> Dom0 case).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You mean other 'goto vmce_failed' are also not appropriate
>>>>>>>>>>>>>> (I'm not quite clear your point)?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Would you please point out which point you think not
>>>>>>>>>>>>>> appropriate?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I question whether it is correct/necessary to crash the
>>>>>>>>>>>>> domain in any of those failure cases. Perhaps when we fail
>>>>>>>>>>>>> to unmap the page it is, but failure of fill_vmsr_data()
>>>>>>>>>>>>> and inject_vmce() don't appear to be valid reasons once the
>>>>>>>>>>>>> is_vmce_ready() path is being dropped.
>>>>>>>>>>>>
>>>>>>>>>>>> For fill_vmsr_data(), it failed only when MCG_STATUS_MCIP
>>>>>>>>>>>> bit still set when next vMCE# occur, means the 2nd vMCE#
>>>>>>>>>>>> occur when the 1st vMCE# not handled yet. Per SDM it should
>>>>>>>>>>>> shutdown.
>>>>>>>>>>>>
>>>>>>>>>>>> For inject_vmce(), it failed when
>>>>>>>>>>>> 1). vcpu is still mce_pending, or
>>>>>>>>>>>> 2). pv not register trap callback
>>>>>>>>>>>> Maybe it's some overkilled for dom0 (for other guest, it's
>>>>>>>>>>>> ok to kill them), but any graceful way to quit?
>>>>>>>>>>>
>>>>>>>>>>> Just exit and do nothing (except perhaps log a rate limited
>>>>>>>>>>> message)?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> One concern of quiet exit is, the error will be totally
>>>>>>>>>> ignored by guest --> it
>>>>>>>>> didn't get preperly handled, and may recursively occur to make
>>>>>>>>> worse error --> it's better to kill guest under such case.
>>>>>>>>>>
>>>>>>>>>>>> or, considering it rarely happens, how about keep current
>>>>>>>>>>>> way (kill guest no matter dom0 or not)?
>>>>>>>>>>>
>>>>>>>>>>> Possibly - I was merely asking why this one condition was
>>>>>>>>>>> found to be too strict, while the others are being left as
>>>>>>>>>>> is.
>>>>>>>>>>>
>>>>>>>>>>> Jan
>>>>>>>>>>
>>>>>>>>>> Ah, the reason of removing is_vmce_ready check is, it's
>>>>>>>>>> problematic (check mcelog driver, not vmce tap callback),
>>>>>>>>>> and overkilled (since defaultly dom0 will not start mcelog
>>>>>>>>>> driver, under which case system will crash whenever vmce
>>>>>>>>>> inject to dom0)
>>>>>>>>>>
>>>>>>>>>> --> So patch 2/2 is not too strict for dom0.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Please keep in mind the mcelog userland/kernel interface is not
>>>>>>>>> designed with xen in mind. mcelog cannot report which guest is
>>>>>>>>> impacted for example, although xen reports that to dom0.
>>>>>>>>> I object 'fixing' the hypervisor to come over with mcelog
>>>>>>>>> drawbacks. I prefer fixing Dom0 instead.
>>>>>>>>>
>>>>>>>
>>>>>>> Sure, xen mcelog driver in linux is implemented by me :-)
>>>>>>> This patch does not intend to 'fix' hypervisor but just avoid
>>>>>>> overkilled system (when xen mcelog driver in dom0 not loaded as
>>>>>>> default).
>>>>>>
>>>>>> I assume dom0 w/o xen mcelog driver active means dom0 is not
>>>>>> capable to deal with machine check errors. Is this correct?
>>>>>>
>>>>>
>>>>> No, w/o xen mcelog driver active, only user daemon 'mcelog' was
>>>>> affected. Dom0 is still capable of handling vmce as long as it
>>>>> registered trap callback (which is checked at hypervisor
>>>>> inject_vmce()).
>>>>
>>>> Oh, I thought the xen mcelog *driver* registers the trap callback
>>>> and in a Dom0 it additionally registers the logging callback.
>>>> What registers the logging callback and what registers
>>>> the trap callback?
>>>>
>>>> Christoph
>>>>
>>>
[...]
>>
>> Ok, that's how I understand the code, too. But you say above that w/o
>> this driver Dom0 is still capable to handle vmce. That puzzle's me.
>>
>> Christoph
>
> Ah, what I meant is, w/o mcelog dirver, kernel still successfully
> register its mce handler as trap callback to hypervisor, hence the
> injection of vmce from hypervisor to dom0 is OK, and surely got properly
> handled at dom0.
> mcelog driver is just for error logging, getting error info from hypervisor
> and provide interface for user daemon tools 'mcelog'.
Ah, so the trap handler is *always* installed in a Linux Dom0?
NetBSD Dom0/DomU still has no machine check support at all, so in that
case Dom0/DomU is not vmce ready. That means the vmce ready check is
needed.
Christoph
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-06 15:31 ` Christoph Egger
@ 2013-05-06 16:00 ` Liu, Jinsong
2013-05-07 11:43 ` Christoph Egger
0 siblings, 1 reply; 33+ messages in thread
From: Liu, Jinsong @ 2013-05-06 16:00 UTC (permalink / raw)
To: Christoph Egger; +Cc: Jan Beulich, xen-devel
Christoph Egger wrote:
> On 06.05.13 17:14, Liu, Jinsong wrote:
>> Egger, Christoph wrote:
>>> On 06.05.13 17:00, Liu, Jinsong wrote:
>>>> Christoph Egger wrote:
>>>>> On 06.05.13 11:50, Liu, Jinsong wrote:
>>>>>> Christoph Egger wrote:
>>>>>>> On 06.05.13 11:24, Liu, Jinsong wrote:
>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>> On 06.05.13 at 10:54, Christoph Egger <chegger@amazon.de>
>>>>>>>>>>>> wrote:
>>>>>>>>>> On 03.05.13 17:51, Liu, Jinsong wrote:
>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>> On 03.05.13 at 16:16, "Liu, Jinsong"
>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>>>> On 03.05.13 at 10:41, "Liu, Jinsong"
>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong"
>>>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep
>>>>>>>>>>>>>>>>> 17 00:00:00 2001 From: Liu Jinsong
>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> Date: Sat, 27 Apr 2013
>>>>>>>>>>>>>>>>> 22:37:35 +0800
>>>>>>>>>>>>>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove
>>>>>>>>>>>>>>>>> problematic is_vmce_ready check
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> is_vmce_ready() is problematic:
>>>>>>>>>>>>>>>>> * For dom0, it checks if virq bind to dom0 mcelog
>>>>>>>>>>>>>>>>> driver. If not, it results dom0 crash. However, it's
>>>>>>>>>>>>>>>>> problematic and overkilled since mcelog as a dom0
>>>>>>>>>>>>>>>>> feature could be enabled/disabled per dom0 option:
>>>>>>>>>>>>>>>>> (XEN) MCE: This error page is ownded by DOM 0 (XEN)
>>>>>>>>>>>>>>>>> DOM0 not ready for vMCE (XEN) domain_crash called from
>>>>>>>>>>>>>>>>> mcaction.c:133 (XEN) Domain 0 reported crashed by
>>>>>>>>>>>>>>>>> domain 32767 on cpu#31: (XEN) Domain 0 crashed:
>>>>>>>>>>>>>>>>> rebooting machine in 5 seconds. (XEN) Resetting with
>>>>>>>>>>>>>>>>> ACPI MEMORY or I/O RESET_REG.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> * For dom0, if really need check, it should check
>>>>>>>>>>>>>>>>> whether vMCE injection for dom0 ready (say, exception
>>>>>>>>>>>>>>>>> trap bounce check, which has been done at
>>>>>>>>>>>>>>>>> inject_vmce()), not check dom0 mcelog ready (which has
>>>>>>>>>>>>>>>>> been done at mce_softirq() before send global virq to
>>>>>>>>>>>>>>>>> dom0).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Following the argumentation above, I wonder which of
>>>>>>>>>>>>>>>> the other "goto vmce_failed" are really appropriate,
>>>>>>>>>>>>>>>> i.e. whether the patch shouldn't be extended (at least
>>>>>>>>>>>>>>>> for the Dom0 case).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You mean other 'goto vmce_failed' are also not
>>>>>>>>>>>>>>> appropriate (I'm not quite clear your point)?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Would you please point out which point you think not
>>>>>>>>>>>>>>> appropriate?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I question whether it is correct/necessary to crash the
>>>>>>>>>>>>>> domain in any of those failure cases. Perhaps when we
>>>>>>>>>>>>>> fail to unmap the page it is, but failure of
>>>>>>>>>>>>>> fill_vmsr_data() and inject_vmce() don't appear to be
>>>>>>>>>>>>>> valid reasons once the is_vmce_ready() path is being
>>>>>>>>>>>>>> dropped.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For fill_vmsr_data(), it failed only when MCG_STATUS_MCIP
>>>>>>>>>>>>> bit still set when next vMCE# occur, means the 2nd vMCE#
>>>>>>>>>>>>> occur when the 1st vMCE# not handled yet. Per SDM it
>>>>>>>>>>>>> should shutdown.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For inject_vmce(), it failed when
>>>>>>>>>>>>> 1). vcpu is still mce_pending, or
>>>>>>>>>>>>> 2). pv not register trap callback
>>>>>>>>>>>>> Maybe it's some overkilled for dom0 (for other guest, it's
>>>>>>>>>>>>> ok to kill them), but any graceful way to quit?
>>>>>>>>>>>>
>>>>>>>>>>>> Just exit and do nothing (except perhaps log a rate
>>>>>>>>>>>> limited message)?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> One concern of quiet exit is, the error will be totally
>>>>>>>>>>> ignored by guest --> it
>>>>>>>>>> didn't get preperly handled, and may recursively occur to
>>>>>>>>>> make worse error --> it's better to kill guest under such
>>>>>>>>>> case.
>>>>>>>>>>>
>>>>>>>>>>>>> or, considering it rarely happens, how about keep current
>>>>>>>>>>>>> way (kill guest no matter dom0 or not)?
>>>>>>>>>>>>
>>>>>>>>>>>> Possibly - I was merely asking why this one condition was
>>>>>>>>>>>> found to be too strict, while the others are being left as
>>>>>>>>>>>> is.
>>>>>>>>>>>>
>>>>>>>>>>>> Jan
>>>>>>>>>>>
>>>>>>>>>>> Ah, the reason of removing is_vmce_ready check is, it's
>>>>>>>>>>> problematic (check mcelog driver, not vmce tap callback),
>>>>>>>>>>> and overkilled (since defaultly dom0 will not start mcelog
>>>>>>>>>>> driver, under which case system will crash whenever vmce
>>>>>>>>>>> inject to dom0)
>>>>>>>>>>>
>>>>>>>>>>> --> So patch 2/2 is not too strict for dom0.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Please keep in mind the mcelog userland/kernel interface is
>>>>>>>>>> not designed with xen in mind. mcelog cannot report which
>>>>>>>>>> guest is impacted for example, although xen reports that to
>>>>>>>>>> dom0.
>>>>>>>>>> I object 'fixing' the hypervisor to come over with mcelog
>>>>>>>>>> drawbacks. I prefer fixing Dom0 instead.
>>>>>>>>>>
>>>>>>>>
>>>>>>>> Sure, xen mcelog driver in linux is implemented by me :-)
>>>>>>>> This patch does not intend to 'fix' hypervisor but just avoid
>>>>>>>> overkilled system (when xen mcelog driver in dom0 not loaded as
>>>>>>>> default).
>>>>>>>
>>>>>>> I assume dom0 w/o xen mcelog driver active means dom0 is not
>>>>>>> capable to deal with machine check errors. Is this correct?
>>>>>>>
>>>>>>
>>>>>> No, w/o xen mcelog driver active, only user daemon 'mcelog' was
>>>>>> affected. Dom0 is still capable of handling vmce as long as it
>>>>>> registered trap callback (which is checked at hypervisor
>>>>>> inject_vmce()).
>>>>>
>>>>> Oh, I thought the xen mcelog *driver* registers the trap callback
>>>>> and in a Dom0 it additionally registers the logging callback.
>>>>> What registers the logging callback and what registers
>>>>> the trap callback?
>>>>>
>>>>> Christoph
>>>>>
>>>>
> [...]
>>>
>>> Ok, that's how I understand the code, too. But you say above that
>>> w/o this driver Dom0 is still capable to handle vmce. That puzzle's
>>> me.
>>>
>>> Christoph
>>
>> Ah, what I meant is, w/o mcelog dirver, kernel still successfully
>> register its mce handler as trap callback to hypervisor, hence the
>> injection of vmce from hypervisor to dom0 is OK, and surely got
>> properly handled at dom0. mcelog driver is just for error logging,
>> getting error info from hypervisor and provide interface for user
>> daemon tools 'mcelog'.
>
> Ah, so the trap handler is *always* installed in a Linux Dom0?
Hypervisor didn't have such assumption, it's agnostic to guest.
>
> NetBSD Dom0/DomU still has no machine check support at all, so in that
> case Dom0/DomU is not vmce ready. That means the vmce ready check is
> needed.
>
> Christoph
Hypervisor indeed check vmce ready or not at inject_vmce().
Point of patch 2/2 is, is_vmce_ready itself is problematic.
Thanks,
Jinsong
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-06 16:00 ` Liu, Jinsong
@ 2013-05-07 11:43 ` Christoph Egger
2013-05-09 17:05 ` Liu, Jinsong
0 siblings, 1 reply; 33+ messages in thread
From: Christoph Egger @ 2013-05-07 11:43 UTC (permalink / raw)
To: Liu, Jinsong; +Cc: Jan Beulich, xen-devel
On 06.05.13 18:00, Liu, Jinsong wrote:
> Christoph Egger wrote:
>> On 06.05.13 17:14, Liu, Jinsong wrote:
>>> Egger, Christoph wrote:
>>>> On 06.05.13 17:00, Liu, Jinsong wrote:
>>>>> Christoph Egger wrote:
>>>>>> On 06.05.13 11:50, Liu, Jinsong wrote:
>>>>>>> Christoph Egger wrote:
>>>>>>>> On 06.05.13 11:24, Liu, Jinsong wrote:
>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>> On 06.05.13 at 10:54, Christoph Egger <chegger@amazon.de>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>> On 03.05.13 17:51, Liu, Jinsong wrote:
>>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>>> On 03.05.13 at 16:16, "Liu, Jinsong"
>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>>>>> On 03.05.13 at 10:41, "Liu, Jinsong"
>>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong"
>>>>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>>>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep
>>>>>>>>>>>>>>>>>> 17 00:00:00 2001 From: Liu Jinsong
>>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> Date: Sat, 27 Apr 2013
>>>>>>>>>>>>>>>>>> 22:37:35 +0800
>>>>>>>>>>>>>>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove
>>>>>>>>>>>>>>>>>> problematic is_vmce_ready check
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> is_vmce_ready() is problematic:
>>>>>>>>>>>>>>>>>> * For dom0, it checks if virq bind to dom0 mcelog
>>>>>>>>>>>>>>>>>> driver. If not, it results dom0 crash. However, it's
>>>>>>>>>>>>>>>>>> problematic and overkilled since mcelog as a dom0
>>>>>>>>>>>>>>>>>> feature could be enabled/disabled per dom0 option:
>>>>>>>>>>>>>>>>>> (XEN) MCE: This error page is ownded by DOM 0 (XEN)
>>>>>>>>>>>>>>>>>> DOM0 not ready for vMCE (XEN) domain_crash called from
>>>>>>>>>>>>>>>>>> mcaction.c:133 (XEN) Domain 0 reported crashed by
>>>>>>>>>>>>>>>>>> domain 32767 on cpu#31: (XEN) Domain 0 crashed:
>>>>>>>>>>>>>>>>>> rebooting machine in 5 seconds. (XEN) Resetting with
>>>>>>>>>>>>>>>>>> ACPI MEMORY or I/O RESET_REG.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> * For dom0, if really need check, it should check
>>>>>>>>>>>>>>>>>> whether vMCE injection for dom0 ready (say, exception
>>>>>>>>>>>>>>>>>> trap bounce check, which has been done at
>>>>>>>>>>>>>>>>>> inject_vmce()), not check dom0 mcelog ready (which has
>>>>>>>>>>>>>>>>>> been done at mce_softirq() before send global virq to
>>>>>>>>>>>>>>>>>> dom0).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Following the argumentation above, I wonder which of
>>>>>>>>>>>>>>>>> the other "goto vmce_failed" are really appropriate,
>>>>>>>>>>>>>>>>> i.e. whether the patch shouldn't be extended (at least
>>>>>>>>>>>>>>>>> for the Dom0 case).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> You mean other 'goto vmce_failed' are also not
>>>>>>>>>>>>>>>> appropriate (I'm not quite clear your point)?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Would you please point out which point you think not
>>>>>>>>>>>>>>>> appropriate?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I question whether it is correct/necessary to crash the
>>>>>>>>>>>>>>> domain in any of those failure cases. Perhaps when we
>>>>>>>>>>>>>>> fail to unmap the page it is, but failure of
>>>>>>>>>>>>>>> fill_vmsr_data() and inject_vmce() don't appear to be
>>>>>>>>>>>>>>> valid reasons once the is_vmce_ready() path is being
>>>>>>>>>>>>>>> dropped.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For fill_vmsr_data(), it failed only when MCG_STATUS_MCIP
>>>>>>>>>>>>>> bit still set when next vMCE# occur, means the 2nd vMCE#
>>>>>>>>>>>>>> occur when the 1st vMCE# not handled yet. Per SDM it
>>>>>>>>>>>>>> should shutdown.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For inject_vmce(), it failed when
>>>>>>>>>>>>>> 1). vcpu is still mce_pending, or
>>>>>>>>>>>>>> 2). pv not register trap callback
>>>>>>>>>>>>>> Maybe it's some overkilled for dom0 (for other guest, it's
>>>>>>>>>>>>>> ok to kill them), but any graceful way to quit?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Just exit and do nothing (except perhaps log a rate
>>>>>>>>>>>>> limited message)?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> One concern of quiet exit is, the error will be totally
>>>>>>>>>>>> ignored by guest --> it
>>>>>>>>>>> didn't get preperly handled, and may recursively occur to
>>>>>>>>>>> make worse error --> it's better to kill guest under such
>>>>>>>>>>> case.
>>>>>>>>>>>>
>>>>>>>>>>>>>> or, considering it rarely happens, how about keep current
>>>>>>>>>>>>>> way (kill guest no matter dom0 or not)?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Possibly - I was merely asking why this one condition was
>>>>>>>>>>>>> found to be too strict, while the others are being left as
>>>>>>>>>>>>> is.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jan
>>>>>>>>>>>>
>>>>>>>>>>>> Ah, the reason of removing is_vmce_ready check is, it's
>>>>>>>>>>>> problematic (check mcelog driver, not vmce tap callback),
>>>>>>>>>>>> and overkilled (since defaultly dom0 will not start mcelog
>>>>>>>>>>>> driver, under which case system will crash whenever vmce
>>>>>>>>>>>> inject to dom0)
>>>>>>>>>>>>
>>>>>>>>>>>> --> So patch 2/2 is not too strict for dom0.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Please keep in mind the mcelog userland/kernel interface is
>>>>>>>>>>> not designed with xen in mind. mcelog cannot report which
>>>>>>>>>>> guest is impacted for example, although xen reports that to
>>>>>>>>>>> dom0.
>>>>>>>>>>> I object 'fixing' the hypervisor to come over with mcelog
>>>>>>>>>>> drawbacks. I prefer fixing Dom0 instead.
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sure, xen mcelog driver in linux is implemented by me :-)
>>>>>>>>> This patch does not intend to 'fix' hypervisor but just avoid
>>>>>>>>> overkilled system (when xen mcelog driver in dom0 not loaded as
>>>>>>>>> default).
>>>>>>>>
>>>>>>>> I assume dom0 w/o xen mcelog driver active means dom0 is not
>>>>>>>> capable to deal with machine check errors. Is this correct?
>>>>>>>>
>>>>>>>
>>>>>>> No, w/o xen mcelog driver active, only user daemon 'mcelog' was
>>>>>>> affected. Dom0 is still capable of handling vmce as long as it
>>>>>>> registered trap callback (which is checked at hypervisor
>>>>>>> inject_vmce()).
>>>>>>
>>>>>> Oh, I thought the xen mcelog *driver* registers the trap callback
>>>>>> and in a Dom0 it additionally registers the logging callback.
>>>>>> What registers the logging callback and what registers
>>>>>> the trap callback?
>>>>>>
>>>>>> Christoph
>>>>>>
>>>>>
>> [...]
>>>>
>>>> Ok, that's how I understand the code, too. But you say above that
>>>> w/o this driver Dom0 is still capable to handle vmce. That puzzle's
>>>> me.
>>>>
>>>> Christoph
>>>
>>> Ah, what I meant is, w/o mcelog dirver, kernel still successfully
>>> register its mce handler as trap callback to hypervisor, hence the
>>> injection of vmce from hypervisor to dom0 is OK, and surely got
>>> properly handled at dom0. mcelog driver is just for error logging,
>>> getting error info from hypervisor and provide interface for user
>>> daemon tools 'mcelog'.
>>
>> Ah, so the trap handler is *always* installed in a Linux Dom0?
>
> Hypervisor didn't have such assumption, it's agnostic to guest.
>
>>
>> NetBSD Dom0/DomU still has no machine check support at all, so in that
>> case Dom0/DomU is not vmce ready. That means the vmce ready check is
>> needed.
>>
>> Christoph
>
> Hypervisor indeed check vmce ready or not at inject_vmce().
> Point of patch 2/2 is, is_vmce_ready itself is problematic.
ok. If I understand the problem right, xen sends a machine check
error for logging purpose via virq to dom0 regardless whether
it binds the virq or not. In the latter case dom0 crashes. Is this correct?
In this case xen should just log a rate limited message
(which a sysadmin should be able to understand)
and do nothing else. I think, this is what Jan already suggested.
Christoph
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-07 11:43 ` Christoph Egger
@ 2013-05-09 17:05 ` Liu, Jinsong
2013-05-10 13:59 ` Christoph Egger
0 siblings, 1 reply; 33+ messages in thread
From: Liu, Jinsong @ 2013-05-09 17:05 UTC (permalink / raw)
To: Christoph Egger; +Cc: Jan Beulich, xen-devel
Christoph Egger wrote:
> On 06.05.13 18:00, Liu, Jinsong wrote:
>> Christoph Egger wrote:
>>> On 06.05.13 17:14, Liu, Jinsong wrote:
>>>> Egger, Christoph wrote:
>>>>> On 06.05.13 17:00, Liu, Jinsong wrote:
>>>>>> Christoph Egger wrote:
>>>>>>> On 06.05.13 11:50, Liu, Jinsong wrote:
>>>>>>>> Christoph Egger wrote:
>>>>>>>>> On 06.05.13 11:24, Liu, Jinsong wrote:
>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>> On 06.05.13 at 10:54, Christoph Egger <chegger@amazon.de>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>> On 03.05.13 17:51, Liu, Jinsong wrote:
>>>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>>>> On 03.05.13 at 16:16, "Liu, Jinsong"
>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>>>>>> On 03.05.13 at 10:41, "Liu, Jinsong"
>>>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong"
>>>>>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>>>>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon
>>>>>>>>>>>>>>>>>>> Sep 17 00:00:00 2001 From: Liu Jinsong
>>>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> Date: Sat, 27 Apr 2013
>>>>>>>>>>>>>>>>>>> 22:37:35 +0800
>>>>>>>>>>>>>>>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove
>>>>>>>>>>>>>>>>>>> problematic is_vmce_ready check
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> is_vmce_ready() is problematic:
>>>>>>>>>>>>>>>>>>> * For dom0, it checks if virq bind to dom0 mcelog
>>>>>>>>>>>>>>>>>>> driver. If not, it results dom0 crash. However, it's
>>>>>>>>>>>>>>>>>>> problematic and overkilled since mcelog as a dom0
>>>>>>>>>>>>>>>>>>> feature could be enabled/disabled per dom0 option:
>>>>>>>>>>>>>>>>>>> (XEN) MCE: This error page is ownded by DOM 0 (XEN)
>>>>>>>>>>>>>>>>>>> DOM0 not ready for vMCE (XEN) domain_crash called
>>>>>>>>>>>>>>>>>>> from mcaction.c:133 (XEN) Domain 0 reported crashed
>>>>>>>>>>>>>>>>>>> by domain 32767 on cpu#31: (XEN) Domain 0 crashed:
>>>>>>>>>>>>>>>>>>> rebooting machine in 5 seconds. (XEN) Resetting
>>>>>>>>>>>>>>>>>>> with ACPI MEMORY or I/O RESET_REG.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> * For dom0, if really need check, it should check
>>>>>>>>>>>>>>>>>>> whether vMCE injection for dom0 ready (say,
>>>>>>>>>>>>>>>>>>> exception trap bounce check, which has been done at
>>>>>>>>>>>>>>>>>>> inject_vmce()), not check dom0 mcelog ready (which
>>>>>>>>>>>>>>>>>>> has been done at mce_softirq() before send global
>>>>>>>>>>>>>>>>>>> virq to dom0).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Following the argumentation above, I wonder which of
>>>>>>>>>>>>>>>>>> the other "goto vmce_failed" are really appropriate,
>>>>>>>>>>>>>>>>>> i.e. whether the patch shouldn't be extended (at
>>>>>>>>>>>>>>>>>> least for the Dom0 case).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You mean other 'goto vmce_failed' are also not
>>>>>>>>>>>>>>>>> appropriate (I'm not quite clear your point)?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Would you please point out which point you think not
>>>>>>>>>>>>>>>>> appropriate?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I question whether it is correct/necessary to crash the
>>>>>>>>>>>>>>>> domain in any of those failure cases. Perhaps when we
>>>>>>>>>>>>>>>> fail to unmap the page it is, but failure of
>>>>>>>>>>>>>>>> fill_vmsr_data() and inject_vmce() don't appear to be
>>>>>>>>>>>>>>>> valid reasons once the is_vmce_ready() path is being
>>>>>>>>>>>>>>>> dropped.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For fill_vmsr_data(), it failed only when
>>>>>>>>>>>>>>> MCG_STATUS_MCIP bit still set when next vMCE# occur,
>>>>>>>>>>>>>>> means the 2nd vMCE# occur when the 1st vMCE# not
>>>>>>>>>>>>>>> handled yet. Per SDM it should shutdown.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For inject_vmce(), it failed when
>>>>>>>>>>>>>>> 1). vcpu is still mce_pending, or
>>>>>>>>>>>>>>> 2). pv not register trap callback
>>>>>>>>>>>>>>> Maybe it's some overkilled for dom0 (for other guest,
>>>>>>>>>>>>>>> it's ok to kill them), but any graceful way to quit?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Just exit and do nothing (except perhaps log a rate
>>>>>>>>>>>>>> limited message)?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> One concern of quiet exit is, the error will be totally
>>>>>>>>>>>>> ignored by guest --> it
>>>>>>>>>>>> didn't get preperly handled, and may recursively occur to
>>>>>>>>>>>> make worse error --> it's better to kill guest under such
>>>>>>>>>>>> case.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> or, considering it rarely happens, how about keep
>>>>>>>>>>>>>>> current way (kill guest no matter dom0 or not)?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Possibly - I was merely asking why this one condition was
>>>>>>>>>>>>>> found to be too strict, while the others are being left
>>>>>>>>>>>>>> as is.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jan
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ah, the reason of removing is_vmce_ready check is, it's
>>>>>>>>>>>>> problematic (check mcelog driver, not vmce tap callback),
>>>>>>>>>>>>> and overkilled (since defaultly dom0 will not start mcelog
>>>>>>>>>>>>> driver, under which case system will crash whenever vmce
>>>>>>>>>>>>> inject to dom0)
>>>>>>>>>>>>>
>>>>>>>>>>>>> --> So patch 2/2 is not too strict for dom0.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Please keep in mind the mcelog userland/kernel interface is
>>>>>>>>>>>> not designed with xen in mind. mcelog cannot report which
>>>>>>>>>>>> guest is impacted for example, although xen reports that
>>>>>>>>>>>> to dom0. I object 'fixing' the hypervisor to come over
>>>>>>>>>>>> with mcelog drawbacks. I prefer fixing Dom0 instead.
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Sure, xen mcelog driver in linux is implemented by me :-)
>>>>>>>>>> This patch does not intend to 'fix' hypervisor but just avoid
>>>>>>>>>> overkilled system (when xen mcelog driver in dom0 not loaded
>>>>>>>>>> as default).
>>>>>>>>>
>>>>>>>>> I assume dom0 w/o xen mcelog driver active means dom0 is not
>>>>>>>>> capable to deal with machine check errors. Is this correct?
>>>>>>>>>
>>>>>>>>
>>>>>>>> No, w/o xen mcelog driver active, only user daemon 'mcelog' was
>>>>>>>> affected. Dom0 is still capable of handling vmce as long as it
>>>>>>>> registered trap callback (which is checked at hypervisor
>>>>>>>> inject_vmce()).
>>>>>>>
>>>>>>> Oh, I thought the xen mcelog *driver* registers the trap
>>>>>>> callback and in a Dom0 it additionally registers the logging
>>>>>>> callback. What registers the logging callback and what registers
>>>>>>> the trap callback?
>>>>>>>
>>>>>>> Christoph
>>>>>>>
>>>>>>
>>> [...]
>>>>>
>>>>> Ok, that's how I understand the code, too. But you say above that
>>>>> w/o this driver Dom0 is still capable to handle vmce. That
>>>>> puzzle's me.
>>>>>
>>>>> Christoph
>>>>
>>>> Ah, what I meant is, w/o mcelog dirver, kernel still successfully
>>>> register its mce handler as trap callback to hypervisor, hence the
>>>> injection of vmce from hypervisor to dom0 is OK, and surely got
>>>> properly handled at dom0. mcelog driver is just for error logging,
>>>> getting error info from hypervisor and provide interface for user
>>>> daemon tools 'mcelog'.
>>>
>>> Ah, so the trap handler is *always* installed in a Linux Dom0?
>>
>> Hypervisor didn't have such assumption, it's agnostic to guest.
>>
>>>
>>> NetBSD Dom0/DomU still has no machine check support at all, so in
>>> that case Dom0/DomU is not vmce ready. That means the vmce ready
>>> check is needed.
>>>
>>> Christoph
>>
>> Hypervisor indeed check vmce ready or not at inject_vmce().
>> Point of patch 2/2 is, is_vmce_ready itself is problematic.
>
> ok. If I understand the problem right, xen sends a machine check
> error for logging purpose via virq to dom0 regardless whether
> it binds the virq or not. In the latter case dom0 crashes. Is this
> correct?
No, in latter case dom0 is not affected, only fails to receive error log (dom0 itself doesn't care error log info, it's just for user tools 'mcelog').
>
> In this case xen should just log a rate limited message
> (which a sysadmin should be able to understand)
> and do nothing else. I think, this is what Jan already suggested.
>
No, that's another story --> Jan concern why other 'goto vmce_failed' are not overkilled.
As for is_vmce_ready itself, patch 2/2 is necessary.
Thanks,
Jinsong
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-09 17:05 ` Liu, Jinsong
@ 2013-05-10 13:59 ` Christoph Egger
2013-05-10 16:50 ` Liu, Jinsong
0 siblings, 1 reply; 33+ messages in thread
From: Christoph Egger @ 2013-05-10 13:59 UTC (permalink / raw)
To: Liu, Jinsong; +Cc: Jan Beulich, xen-devel
On 09.05.13 19:05, Liu, Jinsong wrote:
> Christoph Egger wrote:
>> On 06.05.13 18:00, Liu, Jinsong wrote:
>>> Christoph Egger wrote:
>>>> On 06.05.13 17:14, Liu, Jinsong wrote:
>>>>> Egger, Christoph wrote:
>>>>>> On 06.05.13 17:00, Liu, Jinsong wrote:
>>>>>>> Christoph Egger wrote:
>>>>>>>> On 06.05.13 11:50, Liu, Jinsong wrote:
>>>>>>>>> Christoph Egger wrote:
>>>>>>>>>> On 06.05.13 11:24, Liu, Jinsong wrote:
>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>> On 06.05.13 at 10:54, Christoph Egger <chegger@amazon.de>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> On 03.05.13 17:51, Liu, Jinsong wrote:
>>>>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>>>>> On 03.05.13 at 16:16, "Liu, Jinsong"
>>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>>>>>>> On 03.05.13 at 10:41, "Liu, Jinsong"
>>>>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>>>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong"
>>>>>>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon
>>>>>>>>>>>>>>>>>>>> Sep 17 00:00:00 2001 From: Liu Jinsong
>>>>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> Date: Sat, 27 Apr 2013
>>>>>>>>>>>>>>>>>>>> 22:37:35 +0800
>>>>>>>>>>>>>>>>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove
>>>>>>>>>>>>>>>>>>>> problematic is_vmce_ready check
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> is_vmce_ready() is problematic:
>>>>>>>>>>>>>>>>>>>> * For dom0, it checks if virq bind to dom0 mcelog
>>>>>>>>>>>>>>>>>>>> driver. If not, it results dom0 crash. However, it's
>>>>>>>>>>>>>>>>>>>> problematic and overkilled since mcelog as a dom0
>>>>>>>>>>>>>>>>>>>> feature could be enabled/disabled per dom0 option:
>>>>>>>>>>>>>>>>>>>> (XEN) MCE: This error page is ownded by DOM 0 (XEN)
>>>>>>>>>>>>>>>>>>>> DOM0 not ready for vMCE (XEN) domain_crash called
>>>>>>>>>>>>>>>>>>>> from mcaction.c:133 (XEN) Domain 0 reported crashed
>>>>>>>>>>>>>>>>>>>> by domain 32767 on cpu#31: (XEN) Domain 0 crashed:
>>>>>>>>>>>>>>>>>>>> rebooting machine in 5 seconds. (XEN) Resetting
>>>>>>>>>>>>>>>>>>>> with ACPI MEMORY or I/O RESET_REG.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> * For dom0, if really need check, it should check
>>>>>>>>>>>>>>>>>>>> whether vMCE injection for dom0 ready (say,
>>>>>>>>>>>>>>>>>>>> exception trap bounce check, which has been done at
>>>>>>>>>>>>>>>>>>>> inject_vmce()), not check dom0 mcelog ready (which
>>>>>>>>>>>>>>>>>>>> has been done at mce_softirq() before send global
>>>>>>>>>>>>>>>>>>>> virq to dom0).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Following the argumentation above, I wonder which of
>>>>>>>>>>>>>>>>>>> the other "goto vmce_failed" are really appropriate,
>>>>>>>>>>>>>>>>>>> i.e. whether the patch shouldn't be extended (at
>>>>>>>>>>>>>>>>>>> least for the Dom0 case).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> You mean other 'goto vmce_failed' are also not
>>>>>>>>>>>>>>>>>> appropriate (I'm not quite clear your point)?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yes.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Would you please point out which point you think not
>>>>>>>>>>>>>>>>>> appropriate?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I question whether it is correct/necessary to crash the
>>>>>>>>>>>>>>>>> domain in any of those failure cases. Perhaps when we
>>>>>>>>>>>>>>>>> fail to unmap the page it is, but failure of
>>>>>>>>>>>>>>>>> fill_vmsr_data() and inject_vmce() don't appear to be
>>>>>>>>>>>>>>>>> valid reasons once the is_vmce_ready() path is being
>>>>>>>>>>>>>>>>> dropped.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For fill_vmsr_data(), it failed only when
>>>>>>>>>>>>>>>> MCG_STATUS_MCIP bit still set when next vMCE# occur,
>>>>>>>>>>>>>>>> means the 2nd vMCE# occur when the 1st vMCE# not
>>>>>>>>>>>>>>>> handled yet. Per SDM it should shutdown.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For inject_vmce(), it failed when
>>>>>>>>>>>>>>>> 1). vcpu is still mce_pending, or
>>>>>>>>>>>>>>>> 2). pv not register trap callback
>>>>>>>>>>>>>>>> Maybe it's some overkilled for dom0 (for other guest,
>>>>>>>>>>>>>>>> it's ok to kill them), but any graceful way to quit?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Just exit and do nothing (except perhaps log a rate
>>>>>>>>>>>>>>> limited message)?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> One concern of quiet exit is, the error will be totally
>>>>>>>>>>>>>> ignored by guest --> it
>>>>>>>>>>>>> didn't get preperly handled, and may recursively occur to
>>>>>>>>>>>>> make worse error --> it's better to kill guest under such
>>>>>>>>>>>>> case.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> or, considering it rarely happens, how about keep
>>>>>>>>>>>>>>>> current way (kill guest no matter dom0 or not)?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Possibly - I was merely asking why this one condition was
>>>>>>>>>>>>>>> found to be too strict, while the others are being left
>>>>>>>>>>>>>>> as is.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ah, the reason of removing is_vmce_ready check is, it's
>>>>>>>>>>>>>> problematic (check mcelog driver, not vmce tap callback),
>>>>>>>>>>>>>> and overkilled (since defaultly dom0 will not start mcelog
>>>>>>>>>>>>>> driver, under which case system will crash whenever vmce
>>>>>>>>>>>>>> inject to dom0)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --> So patch 2/2 is not too strict for dom0.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please keep in mind the mcelog userland/kernel interface is
>>>>>>>>>>>>> not designed with xen in mind. mcelog cannot report which
>>>>>>>>>>>>> guest is impacted for example, although xen reports that
>>>>>>>>>>>>> to dom0. I object 'fixing' the hypervisor to come over
>>>>>>>>>>>>> with mcelog drawbacks. I prefer fixing Dom0 instead.
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Sure, xen mcelog driver in linux is implemented by me :-)
>>>>>>>>>>> This patch does not intend to 'fix' hypervisor but just avoid
>>>>>>>>>>> overkilled system (when xen mcelog driver in dom0 not loaded
>>>>>>>>>>> as default).
>>>>>>>>>>
>>>>>>>>>> I assume dom0 w/o xen mcelog driver active means dom0 is not
>>>>>>>>>> capable to deal with machine check errors. Is this correct?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> No, w/o xen mcelog driver active, only user daemon 'mcelog' was
>>>>>>>>> affected. Dom0 is still capable of handling vmce as long as it
>>>>>>>>> registered trap callback (which is checked at hypervisor
>>>>>>>>> inject_vmce()).
>>>>>>>>
>>>>>>>> Oh, I thought the xen mcelog *driver* registers the trap
>>>>>>>> callback and in a Dom0 it additionally registers the logging
>>>>>>>> callback. What registers the logging callback and what registers
>>>>>>>> the trap callback?
>>>>>>>>
>>>>>>>> Christoph
>>>>>>>>
>>>>>>>
>>>> [...]
>>>>>>
>>>>>> Ok, that's how I understand the code, too. But you say above that
>>>>>> w/o this driver Dom0 is still capable to handle vmce. That
>>>>>> puzzle's me.
>>>>>>
>>>>>> Christoph
>>>>>
>>>>> Ah, what I meant is, w/o mcelog dirver, kernel still successfully
>>>>> register its mce handler as trap callback to hypervisor, hence the
>>>>> injection of vmce from hypervisor to dom0 is OK, and surely got
>>>>> properly handled at dom0. mcelog driver is just for error logging,
>>>>> getting error info from hypervisor and provide interface for user
>>>>> daemon tools 'mcelog'.
>>>>
>>>> Ah, so the trap handler is *always* installed in a Linux Dom0?
>>>
>>> Hypervisor didn't have such assumption, it's agnostic to guest.
>>>
>>>>
>>>> NetBSD Dom0/DomU still has no machine check support at all, so in
>>>> that case Dom0/DomU is not vmce ready. That means the vmce ready
>>>> check is needed.
>>>>
>>>> Christoph
>>>
>>> Hypervisor indeed check vmce ready or not at inject_vmce().
>>> Point of patch 2/2 is, is_vmce_ready itself is problematic.
>>
>> ok. If I understand the problem right, xen sends a machine check
>> error for logging purpose via virq to dom0 regardless whether
>> it binds the virq or not. In the latter case dom0 crashes. Is this
>> correct?
>
> No, in latter case dom0 is not affected, only fails to receive
> error log (dom0 itself doesn't care error log info, it's just for user
tools 'mcelog').
Yes, that's the expected behaviour. What does your patch try
to fix then?
Christoph
>
>>
>> In this case xen should just log a rate limited message
>> (which a sysadmin should be able to understand)
>> and do nothing else. I think, this is what Jan already suggested.
>>
>
> No, that's another story --> Jan concern why other 'goto vmce_failed' are not overkilled.
> As for is_vmce_ready itself, patch 2/2 is necessary.
>
> Thanks,
> Jinsong
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-10 13:59 ` Christoph Egger
@ 2013-05-10 16:50 ` Liu, Jinsong
2013-05-13 8:43 ` Christoph Egger
0 siblings, 1 reply; 33+ messages in thread
From: Liu, Jinsong @ 2013-05-10 16:50 UTC (permalink / raw)
To: Christoph Egger; +Cc: Jan Beulich, xen-devel
Christoph Egger wrote:
> On 09.05.13 19:05, Liu, Jinsong wrote:
>> Christoph Egger wrote:
>>> On 06.05.13 18:00, Liu, Jinsong wrote:
>>>> Christoph Egger wrote:
>>>>> On 06.05.13 17:14, Liu, Jinsong wrote:
>>>>>> Egger, Christoph wrote:
>>>>>>> On 06.05.13 17:00, Liu, Jinsong wrote:
>>>>>>>> Christoph Egger wrote:
>>>>>>>>> On 06.05.13 11:50, Liu, Jinsong wrote:
>>>>>>>>>> Christoph Egger wrote:
>>>>>>>>>>> On 06.05.13 11:24, Liu, Jinsong wrote:
>>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>>> On 06.05.13 at 10:54, Christoph Egger
>>>>>>>>>>>>>>>> <chegger@amazon.de> wrote:
>>>>>>>>>>>>>> On 03.05.13 17:51, Liu, Jinsong wrote:
>>>>>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>>>>>> On 03.05.13 at 16:16, "Liu, Jinsong"
>>>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>>>>>>>> On 03.05.13 at 10:41, "Liu, Jinsong"
>>>>>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>>>>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong"
>>>>>>>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon
>>>>>>>>>>>>>>>>>>>>> Sep 17 00:00:00 2001 From: Liu Jinsong
>>>>>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> Date: Sat, 27 Apr 2013
>>>>>>>>>>>>>>>>>>>>> 22:37:35 +0800
>>>>>>>>>>>>>>>>>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove
>>>>>>>>>>>>>>>>>>>>> problematic is_vmce_ready check
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> is_vmce_ready() is problematic:
>>>>>>>>>>>>>>>>>>>>> * For dom0, it checks if virq bind to dom0 mcelog
>>>>>>>>>>>>>>>>>>>>> driver. If not, it results dom0 crash. However,
>>>>>>>>>>>>>>>>>>>>> it's problematic and overkilled since mcelog as a
>>>>>>>>>>>>>>>>>>>>> dom0 feature could be enabled/disabled per dom0
>>>>>>>>>>>>>>>>>>>>> option: (XEN) MCE: This error page is ownded by
>>>>>>>>>>>>>>>>>>>>> DOM 0 (XEN) DOM0 not ready for vMCE (XEN)
>>>>>>>>>>>>>>>>>>>>> domain_crash called from mcaction.c:133 (XEN)
>>>>>>>>>>>>>>>>>>>>> Domain 0 reported crashed by domain 32767 on
>>>>>>>>>>>>>>>>>>>>> cpu#31: (XEN) Domain 0 crashed: rebooting machine
>>>>>>>>>>>>>>>>>>>>> in 5 seconds. (XEN) Resetting with ACPI MEMORY or
>>>>>>>>>>>>>>>>>>>>> I/O RESET_REG.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> * For dom0, if really need check, it should check
>>>>>>>>>>>>>>>>>>>>> whether vMCE injection for dom0 ready (say,
>>>>>>>>>>>>>>>>>>>>> exception trap bounce check, which has been done
>>>>>>>>>>>>>>>>>>>>> at inject_vmce()), not check dom0 mcelog ready
>>>>>>>>>>>>>>>>>>>>> (which has been done at mce_softirq() before send
>>>>>>>>>>>>>>>>>>>>> global virq to dom0).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Following the argumentation above, I wonder which
>>>>>>>>>>>>>>>>>>>> of the other "goto vmce_failed" are really
>>>>>>>>>>>>>>>>>>>> appropriate, i.e. whether the patch shouldn't be
>>>>>>>>>>>>>>>>>>>> extended (at least for the Dom0 case).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> You mean other 'goto vmce_failed' are also not
>>>>>>>>>>>>>>>>>>> appropriate (I'm not quite clear your point)?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yes.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Would you please point out which point you think not
>>>>>>>>>>>>>>>>>>> appropriate?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I question whether it is correct/necessary to crash
>>>>>>>>>>>>>>>>>> the domain in any of those failure cases. Perhaps
>>>>>>>>>>>>>>>>>> when we fail to unmap the page it is, but failure of
>>>>>>>>>>>>>>>>>> fill_vmsr_data() and inject_vmce() don't appear to be
>>>>>>>>>>>>>>>>>> valid reasons once the is_vmce_ready() path is being
>>>>>>>>>>>>>>>>>> dropped.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For fill_vmsr_data(), it failed only when
>>>>>>>>>>>>>>>>> MCG_STATUS_MCIP bit still set when next vMCE# occur,
>>>>>>>>>>>>>>>>> means the 2nd vMCE# occur when the 1st vMCE# not
>>>>>>>>>>>>>>>>> handled yet. Per SDM it should shutdown.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For inject_vmce(), it failed when
>>>>>>>>>>>>>>>>> 1). vcpu is still mce_pending, or
>>>>>>>>>>>>>>>>> 2). pv not register trap callback
>>>>>>>>>>>>>>>>> Maybe it's some overkilled for dom0 (for other guest,
>>>>>>>>>>>>>>>>> it's ok to kill them), but any graceful way to quit?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Just exit and do nothing (except perhaps log a rate
>>>>>>>>>>>>>>>> limited message)?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> One concern of quiet exit is, the error will be totally
>>>>>>>>>>>>>>> ignored by guest --> it
>>>>>>>>>>>>>> didn't get preperly handled, and may recursively occur to
>>>>>>>>>>>>>> make worse error --> it's better to kill guest under such
>>>>>>>>>>>>>> case.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> or, considering it rarely happens, how about keep
>>>>>>>>>>>>>>>>> current way (kill guest no matter dom0 or not)?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Possibly - I was merely asking why this one condition
>>>>>>>>>>>>>>>> was found to be too strict, while the others are being
>>>>>>>>>>>>>>>> left as is.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Jan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ah, the reason of removing is_vmce_ready check is, it's
>>>>>>>>>>>>>>> problematic (check mcelog driver, not vmce tap
>>>>>>>>>>>>>>> callback), and overkilled (since defaultly dom0 will
>>>>>>>>>>>>>>> not start mcelog driver, under which case system will
>>>>>>>>>>>>>>> crash whenever vmce inject to dom0)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --> So patch 2/2 is not too strict for dom0.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please keep in mind the mcelog userland/kernel interface
>>>>>>>>>>>>>> is not designed with xen in mind. mcelog cannot report
>>>>>>>>>>>>>> which guest is impacted for example, although xen
>>>>>>>>>>>>>> reports that to dom0. I object 'fixing' the hypervisor
>>>>>>>>>>>>>> to come over with mcelog drawbacks. I prefer fixing Dom0
>>>>>>>>>>>>>> instead.
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Sure, xen mcelog driver in linux is implemented by me :-)
>>>>>>>>>>>> This patch does not intend to 'fix' hypervisor but just
>>>>>>>>>>>> avoid overkilled system (when xen mcelog driver in dom0
>>>>>>>>>>>> not loaded as default).
>>>>>>>>>>>
>>>>>>>>>>> I assume dom0 w/o xen mcelog driver active means dom0 is not
>>>>>>>>>>> capable to deal with machine check errors. Is this correct?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> No, w/o xen mcelog driver active, only user daemon 'mcelog'
>>>>>>>>>> was affected. Dom0 is still capable of handling vmce as long
>>>>>>>>>> as it registered trap callback (which is checked at
>>>>>>>>>> hypervisor inject_vmce()).
>>>>>>>>>
>>>>>>>>> Oh, I thought the xen mcelog *driver* registers the trap
>>>>>>>>> callback and in a Dom0 it additionally registers the logging
>>>>>>>>> callback. What registers the logging callback and what
>>>>>>>>> registers the trap callback?
>>>>>>>>>
>>>>>>>>> Christoph
>>>>>>>>>
>>>>>>>>
>>>>> [...]
>>>>>>>
>>>>>>> Ok, that's how I understand the code, too. But you say above
>>>>>>> that w/o this driver Dom0 is still capable to handle vmce. That
>>>>>>> puzzle's me.
>>>>>>>
>>>>>>> Christoph
>>>>>>
>>>>>> Ah, what I meant is, w/o mcelog dirver, kernel still successfully
>>>>>> register its mce handler as trap callback to hypervisor, hence
>>>>>> the injection of vmce from hypervisor to dom0 is OK, and surely
>>>>>> got properly handled at dom0. mcelog driver is just for error
>>>>>> logging, getting error info from hypervisor and provide
>>>>>> interface for user daemon tools 'mcelog'.
>>>>>
>>>>> Ah, so the trap handler is *always* installed in a Linux Dom0?
>>>>
>>>> Hypervisor didn't have such assumption, it's agnostic to guest.
>>>>
>>>>>
>>>>> NetBSD Dom0/DomU still has no machine check support at all, so in
>>>>> that case Dom0/DomU is not vmce ready. That means the vmce ready
>>>>> check is needed.
>>>>>
>>>>> Christoph
>>>>
>>>> Hypervisor indeed check vmce ready or not at inject_vmce().
>>>> Point of patch 2/2 is, is_vmce_ready itself is problematic.
>>>
>>> ok. If I understand the problem right, xen sends a machine check
>>> error for logging purpose via virq to dom0 regardless whether
>>> it binds the virq or not. In the latter case dom0 crashes. Is this
>>> correct?
>>
>> No, in latter case dom0 is not affected, only fails to receive
>> error log (dom0 itself doesn't care error log info, it's just for
>> user tools 'mcelog').
>
> Yes, that's the expected behaviour. What does your patch try
> to fix then?
>
> Christoph
>
Please refer to the description of patch 2/2, especially
* For dom0, if really need check, it should check whether vMCE
injection for dom0 ready (say, exception trap bounce check, which
has been done at inject_vmce()), not check dom0 mcelog ready (which
has been done at mce_softirq() before send global virq to dom0).
Which means before hypervisor send error log via virq to dom0, current code has checked whether mcelog ready at dom0 or not --> that's the right place for your concern, and it has indeed done check.
Thanks,
Jinsong
>>
>>>
>>> In this case xen should just log a rate limited message
>>> (which a sysadmin should be able to understand)
>>> and do nothing else. I think, this is what Jan already suggested.
>>>
>>
>> No, that's another story --> Jan concern why other 'goto
>> vmce_failed' are not overkilled. As for is_vmce_ready itself, patch
>> 2/2 is necessary.
>>
>> Thanks,
>> Jinsong
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-10 16:50 ` Liu, Jinsong
@ 2013-05-13 8:43 ` Christoph Egger
2013-05-13 10:44 ` Liu, Jinsong
0 siblings, 1 reply; 33+ messages in thread
From: Christoph Egger @ 2013-05-13 8:43 UTC (permalink / raw)
To: Liu, Jinsong; +Cc: Jan Beulich, xen-devel
On 10.05.13 18:50, Liu, Jinsong wrote:
> Christoph Egger wrote:
>> On 09.05.13 19:05, Liu, Jinsong wrote:
>>> Christoph Egger wrote:
>>>> On 06.05.13 18:00, Liu, Jinsong wrote:
>>>>> Christoph Egger wrote:
>>>>>> On 06.05.13 17:14, Liu, Jinsong wrote:
>>>>>>> Egger, Christoph wrote:
>>>>>>>> On 06.05.13 17:00, Liu, Jinsong wrote:
>>>>>>>>> Christoph Egger wrote:
>>>>>>>>>> On 06.05.13 11:50, Liu, Jinsong wrote:
>>>>>>>>>>> Christoph Egger wrote:
>>>>>>>>>>>> On 06.05.13 11:24, Liu, Jinsong wrote:
>>>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>>>> On 06.05.13 at 10:54, Christoph Egger
>>>>>>>>>>>>>>>>> <chegger@amazon.de> wrote:
>>>>>>>>>>>>>>> On 03.05.13 17:51, Liu, Jinsong wrote:
>>>>>>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>>>>>>> On 03.05.13 at 16:16, "Liu, Jinsong"
>>>>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>>>>>>>>> On 03.05.13 at 10:41, "Liu, Jinsong"
>>>>>>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>> Jan Beulich wrote:
>>>>>>>>>>>>>>>>>>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong"
>>>>>>>>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> wrote:
>>>>>>>>>>>>>>>>>>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon
>>>>>>>>>>>>>>>>>>>>>> Sep 17 00:00:00 2001 From: Liu Jinsong
>>>>>>>>>>>>>>>>>>>>>> <jinsong.liu@intel.com> Date: Sat, 27 Apr 2013
>>>>>>>>>>>>>>>>>>>>>> 22:37:35 +0800
>>>>>>>>>>>>>>>>>>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove
>>>>>>>>>>>>>>>>>>>>>> problematic is_vmce_ready check
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> is_vmce_ready() is problematic:
>>>>>>>>>>>>>>>>>>>>>> * For dom0, it checks if virq bind to dom0 mcelog
>>>>>>>>>>>>>>>>>>>>>> driver. If not, it results dom0 crash. However,
>>>>>>>>>>>>>>>>>>>>>> it's problematic and overkilled since mcelog as a
>>>>>>>>>>>>>>>>>>>>>> dom0 feature could be enabled/disabled per dom0
>>>>>>>>>>>>>>>>>>>>>> option: (XEN) MCE: This error page is ownded by
>>>>>>>>>>>>>>>>>>>>>> DOM 0 (XEN) DOM0 not ready for vMCE (XEN)
>>>>>>>>>>>>>>>>>>>>>> domain_crash called from mcaction.c:133 (XEN)
>>>>>>>>>>>>>>>>>>>>>> Domain 0 reported crashed by domain 32767 on
>>>>>>>>>>>>>>>>>>>>>> cpu#31: (XEN) Domain 0 crashed: rebooting machine
>>>>>>>>>>>>>>>>>>>>>> in 5 seconds. (XEN) Resetting with ACPI MEMORY or
>>>>>>>>>>>>>>>>>>>>>> I/O RESET_REG.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> * For dom0, if really need check, it should check
>>>>>>>>>>>>>>>>>>>>>> whether vMCE injection for dom0 ready (say,
>>>>>>>>>>>>>>>>>>>>>> exception trap bounce check, which has been done
>>>>>>>>>>>>>>>>>>>>>> at inject_vmce()), not check dom0 mcelog ready
>>>>>>>>>>>>>>>>>>>>>> (which has been done at mce_softirq() before send
>>>>>>>>>>>>>>>>>>>>>> global virq to dom0).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Following the argumentation above, I wonder which
>>>>>>>>>>>>>>>>>>>>> of the other "goto vmce_failed" are really
>>>>>>>>>>>>>>>>>>>>> appropriate, i.e. whether the patch shouldn't be
>>>>>>>>>>>>>>>>>>>>> extended (at least for the Dom0 case).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> You mean other 'goto vmce_failed' are also not
>>>>>>>>>>>>>>>>>>>> appropriate (I'm not quite clear your point)?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yes.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Would you please point out which point you think not
>>>>>>>>>>>>>>>>>>>> appropriate?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I question whether it is correct/necessary to crash
>>>>>>>>>>>>>>>>>>> the domain in any of those failure cases. Perhaps
>>>>>>>>>>>>>>>>>>> when we fail to unmap the page it is, but failure of
>>>>>>>>>>>>>>>>>>> fill_vmsr_data() and inject_vmce() don't appear to be
>>>>>>>>>>>>>>>>>>> valid reasons once the is_vmce_ready() path is being
>>>>>>>>>>>>>>>>>>> dropped.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> For fill_vmsr_data(), it failed only when
>>>>>>>>>>>>>>>>>> MCG_STATUS_MCIP bit still set when next vMCE# occur,
>>>>>>>>>>>>>>>>>> means the 2nd vMCE# occur when the 1st vMCE# not
>>>>>>>>>>>>>>>>>> handled yet. Per SDM it should shutdown.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> For inject_vmce(), it failed when
>>>>>>>>>>>>>>>>>> 1). vcpu is still mce_pending, or
>>>>>>>>>>>>>>>>>> 2). pv not register trap callback
>>>>>>>>>>>>>>>>>> Maybe it's some overkilled for dom0 (for other guest,
>>>>>>>>>>>>>>>>>> it's ok to kill them), but any graceful way to quit?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Just exit and do nothing (except perhaps log a rate
>>>>>>>>>>>>>>>>> limited message)?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> One concern of quiet exit is, the error will be totally
>>>>>>>>>>>>>>>> ignored by guest --> it
>>>>>>>>>>>>>>> didn't get preperly handled, and may recursively occur to
>>>>>>>>>>>>>>> make worse error --> it's better to kill guest under such
>>>>>>>>>>>>>>> case.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> or, considering it rarely happens, how about keep
>>>>>>>>>>>>>>>>>> current way (kill guest no matter dom0 or not)?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Possibly - I was merely asking why this one condition
>>>>>>>>>>>>>>>>> was found to be too strict, while the others are being
>>>>>>>>>>>>>>>>> left as is.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Jan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ah, the reason of removing is_vmce_ready check is, it's
>>>>>>>>>>>>>>>> problematic (check mcelog driver, not vmce tap
>>>>>>>>>>>>>>>> callback), and overkilled (since defaultly dom0 will
>>>>>>>>>>>>>>>> not start mcelog driver, under which case system will
>>>>>>>>>>>>>>>> crash whenever vmce inject to dom0)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --> So patch 2/2 is not too strict for dom0.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please keep in mind the mcelog userland/kernel interface
>>>>>>>>>>>>>>> is not designed with xen in mind. mcelog cannot report
>>>>>>>>>>>>>>> which guest is impacted for example, although xen
>>>>>>>>>>>>>>> reports that to dom0. I object 'fixing' the hypervisor
>>>>>>>>>>>>>>> to come over with mcelog drawbacks. I prefer fixing Dom0
>>>>>>>>>>>>>>> instead.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sure, xen mcelog driver in linux is implemented by me :-)
>>>>>>>>>>>>> This patch does not intend to 'fix' hypervisor but just
>>>>>>>>>>>>> avoid overkilled system (when xen mcelog driver in dom0
>>>>>>>>>>>>> not loaded as default).
>>>>>>>>>>>>
>>>>>>>>>>>> I assume dom0 w/o xen mcelog driver active means dom0 is not
>>>>>>>>>>>> capable to deal with machine check errors. Is this correct?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> No, w/o xen mcelog driver active, only user daemon 'mcelog'
>>>>>>>>>>> was affected. Dom0 is still capable of handling vmce as long
>>>>>>>>>>> as it registered trap callback (which is checked at
>>>>>>>>>>> hypervisor inject_vmce()).
>>>>>>>>>>
>>>>>>>>>> Oh, I thought the xen mcelog *driver* registers the trap
>>>>>>>>>> callback and in a Dom0 it additionally registers the logging
>>>>>>>>>> callback. What registers the logging callback and what
>>>>>>>>>> registers the trap callback?
>>>>>>>>>>
>>>>>>>>>> Christoph
>>>>>>>>>>
>>>>>>>>>
>>>>>> [...]
>>>>>>>>
>>>>>>>> Ok, that's how I understand the code, too. But you say above
>>>>>>>> that w/o this driver Dom0 is still capable to handle vmce. That
>>>>>>>> puzzle's me.
>>>>>>>>
>>>>>>>> Christoph
>>>>>>>
>>>>>>> Ah, what I meant is, w/o mcelog dirver, kernel still successfully
>>>>>>> register its mce handler as trap callback to hypervisor, hence
>>>>>>> the injection of vmce from hypervisor to dom0 is OK, and surely
>>>>>>> got properly handled at dom0. mcelog driver is just for error
>>>>>>> logging, getting error info from hypervisor and provide
>>>>>>> interface for user daemon tools 'mcelog'.
>>>>>>
>>>>>> Ah, so the trap handler is *always* installed in a Linux Dom0?
>>>>>
>>>>> Hypervisor didn't have such assumption, it's agnostic to guest.
>>>>>
>>>>>>
>>>>>> NetBSD Dom0/DomU still has no machine check support at all, so in
>>>>>> that case Dom0/DomU is not vmce ready. That means the vmce ready
>>>>>> check is needed.
>>>>>>
>>>>>> Christoph
>>>>>
>>>>> Hypervisor indeed check vmce ready or not at inject_vmce().
>>>>> Point of patch 2/2 is, is_vmce_ready itself is problematic.
>>>>
>>>> ok. If I understand the problem right, xen sends a machine check
>>>> error for logging purpose via virq to dom0 regardless whether
>>>> it binds the virq or not. In the latter case dom0 crashes. Is this
>>>> correct?
>>>
>>> No, in latter case dom0 is not affected, only fails to receive
>>> error log (dom0 itself doesn't care error log info, it's just for
>>> user tools 'mcelog').
>>
>> Yes, that's the expected behaviour. What does your patch try
>> to fix then?
>>
>> Christoph
>>
>
> Please refer to the description of patch 2/2, especially
>
> * For dom0, if really need check, it should check whether vMCE
> injection for dom0 ready (say, exception trap bounce check, which
> has been done at inject_vmce()), not check dom0 mcelog ready (which
> has been done at mce_softirq() before send global virq to dom0).
>
> Which means before hypervisor send error log via virq to dom0, current
> code has checked whether mcelog ready at dom0 or not --> that's the right
> place for your concern, and it has indeed done check.
I think, I do not understand the patch description.
Let me rephrase if I do now due to this discussion:
The mcelog driver in Dom0 registers itself to the virq handler to
provide the machine check logging service.
Xen checks if a virq handler has been registered but does not check
if the dom0 handler is actually ready to take the errors.
This patch fixes this.
Is this correct?
Christoph
>
> Thanks,
> Jinsong
>
>>>
>>>>
>>>> In this case xen should just log a rate limited message
>>>> (which a sysadmin should be able to understand)
>>>> and do nothing else. I think, this is what Jan already suggested.
>>>>
>>>
>>> No, that's another story --> Jan concern why other 'goto
>>> vmce_failed' are not overkilled. As for is_vmce_ready itself, patch
>>> 2/2 is necessary.
>>>
>>> Thanks,
>>> Jinsong
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-13 8:43 ` Christoph Egger
@ 2013-05-13 10:44 ` Liu, Jinsong
2013-05-13 11:09 ` Christoph Egger
0 siblings, 1 reply; 33+ messages in thread
From: Liu, Jinsong @ 2013-05-13 10:44 UTC (permalink / raw)
To: Christoph Egger; +Cc: Jan Beulich, xen-devel
>> Please refer to the description of patch 2/2, especially
>>
>> * For dom0, if really need check, it should check whether vMCE
>> injection for dom0 ready (say, exception trap bounce check, which
>> has been done at inject_vmce()), not check dom0 mcelog ready
>> (which has been done at mce_softirq() before send global virq to
>> dom0).
>>
>> Which means before hypervisor send error log via virq to dom0,
>> current code has checked whether mcelog ready at dom0 or not -->
>> that's the right place for your concern, and it has indeed done
>> check.
>
> I think, I do not understand the patch description.
> Let me rephrase if I do now due to this discussion:
>
> The mcelog driver in Dom0 registers itself to the virq handler to
> provide the machine check logging service.
Yes.
> Xen checks if a virq handler has been registered
Yes.
> but does not check
> if the dom0 handler is actually ready to take the errors.
> This patch fixes this.
>
I'm not clear your question 'does not check if the dom0 handler is actually ready to take the errors'. Could you elaborate more your concern at this point?
Thanks,
Jinsong
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-13 10:44 ` Liu, Jinsong
@ 2013-05-13 11:09 ` Christoph Egger
2013-05-13 13:35 ` Liu, Jinsong
0 siblings, 1 reply; 33+ messages in thread
From: Christoph Egger @ 2013-05-13 11:09 UTC (permalink / raw)
To: Liu, Jinsong; +Cc: Jan Beulich, xen-devel
On 13.05.13 12:44, Liu, Jinsong wrote:
>>> Please refer to the description of patch 2/2, especially
>>>
>>> * For dom0, if really need check, it should check whether vMCE
>>> injection for dom0 ready (say, exception trap bounce check, which
>>> has been done at inject_vmce()), not check dom0 mcelog ready
>>> (which has been done at mce_softirq() before send global virq to
>>> dom0).
>>>
>>> Which means before hypervisor send error log via virq to dom0,
>>> current code has checked whether mcelog ready at dom0 or not -->
>>> that's the right place for your concern, and it has indeed done
>>> check.
>>
>> I think, I do not understand the patch description.
>> Let me rephrase if I do now due to this discussion:
>>
>> The mcelog driver in Dom0 registers itself to the virq handler to
>> provide the machine check logging service.
>
> Yes.
>
>> Xen checks if a virq handler has been registered
>
> Yes.
>
>> but does not check
>> if the dom0 handler is actually ready to take the errors.
>> This patch fixes this.
>>
>
> I'm not clear your question 'does not check if the dom0 handler
> is actually ready to take the errors'. Could you elaborate more your
concern at this point?
Yes, this is exactly my question. You got it.
Christoph
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-13 11:09 ` Christoph Egger
@ 2013-05-13 13:35 ` Liu, Jinsong
2013-05-13 14:24 ` Christoph Egger
0 siblings, 1 reply; 33+ messages in thread
From: Liu, Jinsong @ 2013-05-13 13:35 UTC (permalink / raw)
To: Christoph Egger; +Cc: Jan Beulich, xen-devel
Christoph Egger wrote:
> On 13.05.13 12:44, Liu, Jinsong wrote:
>>>> Please refer to the description of patch 2/2, especially
>>>>
>>>> * For dom0, if really need check, it should check whether vMCE
>>>> injection for dom0 ready (say, exception trap bounce check,
>>>> which has been done at inject_vmce()), not check dom0 mcelog
>>>> ready (which has been done at mce_softirq() before send global
>>>> virq to dom0).
>>>>
>>>> Which means before hypervisor send error log via virq to dom0,
>>>> current code has checked whether mcelog ready at dom0 or not -->
>>>> that's the right place for your concern, and it has indeed done
>>>> check.
>>>
>>> I think, I do not understand the patch description.
>>> Let me rephrase if I do now due to this discussion:
>>>
>>> The mcelog driver in Dom0 registers itself to the virq handler to
>>> provide the machine check logging service.
>>
>> Yes.
>>
>>> Xen checks if a virq handler has been registered
>>
>> Yes.
>>
>>> but does not check
>>> if the dom0 handler is actually ready to take the errors.
>>> This patch fixes this.
>>>
>>
>> I'm not clear your question 'does not check if the dom0 handler
>> is actually ready to take the errors'. Could you elaborate more your
>> concern at this point?
>
> Yes, this is exactly my question. You got it.
>
> Christoph
Hmm, seems you misunderstand my word. What I meant is, I don't know what you are asking by 'does not check if the dom0 handler is actually ready to take the errors'. Could you elaborate more your question?
Thanks,
Jinsong
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-13 13:35 ` Liu, Jinsong
@ 2013-05-13 14:24 ` Christoph Egger
2013-05-13 15:21 ` Liu, Jinsong
0 siblings, 1 reply; 33+ messages in thread
From: Christoph Egger @ 2013-05-13 14:24 UTC (permalink / raw)
To: Liu, Jinsong; +Cc: Jan Beulich, xen-devel
On 13.05.13 15:35, Liu, Jinsong wrote:
> Christoph Egger wrote:
>> On 13.05.13 12:44, Liu, Jinsong wrote:
>>>>> Please refer to the description of patch 2/2, especially
>>>>>
>>>>> * For dom0, if really need check, it should check whether vMCE
>>>>> injection for dom0 ready (say, exception trap bounce check,
>>>>> which has been done at inject_vmce()), not check dom0 mcelog
>>>>> ready (which has been done at mce_softirq() before send global
>>>>> virq to dom0).
>>>>>
>>>>> Which means before hypervisor send error log via virq to dom0,
>>>>> current code has checked whether mcelog ready at dom0 or not -->
>>>>> that's the right place for your concern, and it has indeed done
>>>>> check.
>>>>
>>>> I think, I do not understand the patch description.
>>>> Let me rephrase if I do now due to this discussion:
>>>>
>>>> The mcelog driver in Dom0 registers itself to the virq handler to
>>>> provide the machine check logging service.
>>>
>>> Yes.
>>>
>>>> Xen checks if a virq handler has been registered
>>>
>>> Yes.
>>>
>>>> but does not check
>>>> if the dom0 handler is actually ready to take the errors.
>>>> This patch fixes this.
>>>>
>>>
>>> I'm not clear your question 'does not check if the dom0 handler
>>> is actually ready to take the errors'. Could you elaborate more your
>>> concern at this point?
>>
>> Yes, this is exactly my question. You got it.
>>
>> Christoph
>
> Hmm, seems you misunderstand my word. What I meant is,
> I don't know what you are asking by 'does not check if the dom0 handler is actually ready to take the errors'.
> Could you elaborate more your question?
I reread your patch description:
> * For dom0, if really need check, it should check whether vMCE
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> injection for dom0 ready (say, exception trap bounce check, which
^^^^^^^^^^^^^^^^^^^^^^^^
> has been done at inject_vmce()), not check dom0 mcelog ready (which
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> has been done at mce_softirq() before send global virq to dom0).
My question is: Is it possible when mcelog driver registers
the virq handler that it cannot deal with machine check errors immediately?
Christoph
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-13 14:24 ` Christoph Egger
@ 2013-05-13 15:21 ` Liu, Jinsong
2013-05-14 10:06 ` Christoph Egger
0 siblings, 1 reply; 33+ messages in thread
From: Liu, Jinsong @ 2013-05-13 15:21 UTC (permalink / raw)
To: Christoph Egger; +Cc: Jan Beulich, xen-devel
Christoph Egger wrote:
> On 13.05.13 15:35, Liu, Jinsong wrote:
>> Christoph Egger wrote:
>>> On 13.05.13 12:44, Liu, Jinsong wrote:
>>>>>> Please refer to the description of patch 2/2, especially
>>>>>>
>>>>>> * For dom0, if really need check, it should check whether
>>>>>> vMCE injection for dom0 ready (say, exception trap bounce
>>>>>> check, which has been done at inject_vmce()), not check dom0
>>>>>> mcelog ready (which has been done at mce_softirq() before
>>>>>> send global virq to dom0).
>>>>>>
>>>>>> Which means before hypervisor send error log via virq to dom0,
>>>>>> current code has checked whether mcelog ready at dom0 or not -->
>>>>>> that's the right place for your concern, and it has indeed done
>>>>>> check.
>>>>>
>>>>> I think, I do not understand the patch description.
>>>>> Let me rephrase if I do now due to this discussion:
>>>>>
>>>>> The mcelog driver in Dom0 registers itself to the virq handler to
>>>>> provide the machine check logging service.
>>>>
>>>> Yes.
>>>>
>>>>> Xen checks if a virq handler has been registered
>>>>
>>>> Yes.
>>>>
>>>>> but does not check
>>>>> if the dom0 handler is actually ready to take the errors.
>>>>> This patch fixes this.
>>>>>
>>>>
>>>> I'm not clear your question 'does not check if the dom0 handler
>>>> is actually ready to take the errors'. Could you elaborate more
>>>> your concern at this point?
>>>
>>> Yes, this is exactly my question. You got it.
>>>
>>> Christoph
>>
>> Hmm, seems you misunderstand my word. What I meant is,
>> I don't know what you are asking by 'does not check if the dom0
>> handler is actually ready to take the errors'. Could you elaborate
>> more your question?
>
> I reread your patch description:
>
>> * For dom0, if really need check, it should check whether vMCE
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> injection for dom0 ready (say, exception trap bounce check, which
> ^^^^^^^^^^^^^^^^^^^^^^^^
>> has been done at inject_vmce()), not check dom0 mcelog ready (which
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> has been done at mce_softirq() before send global virq to dom0).
>
> My question is: Is it possible when mcelog driver registers
> the virq handler that it cannot deal with machine check errors
> immediately?
>
> Christoph
Yes, there is a small time window when dom0 mcelog driver init:
The last step of xen_late_init_mcelog()
--> bind_virq_for_mce
--> bind_virq_to_irqhandler
irq = bind_virq_to_irq(virq, cpu);
if (irq < 0)
return irq;
retval = request_irq(irq, handler, irqflags, devname, dev_id);
Time window: between bind_virq_to_irq and request_irq
If hypervisor inject virq to notify error log to dom0 exactly during this init time window, dom0 no-ops handler will not fetch error log. However, it's OK since error log in hypervisor is still there, until next time when hypervisor inject virq again, dom0 mcelog driver will fetch them. Considering it occurs rarely (and no harm), I think it's OK.
Patch 2/2 itself is not to fix this issue. Patch 2/2 is to remove is_vmce_ready() check since it's problematic (wrong check), overkilled (kill system unnecessary), deprecated (vMCE is not bound to host MCE any more) and redundant (both vmce trap callback and mcelog virq has been checked in other hypervisor code points).
I agree the description is not clear at some points, so I will re-write the description later.
Thanks,
Jinsong
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-13 15:21 ` Liu, Jinsong
@ 2013-05-14 10:06 ` Christoph Egger
2013-05-14 15:29 ` Liu, Jinsong
0 siblings, 1 reply; 33+ messages in thread
From: Christoph Egger @ 2013-05-14 10:06 UTC (permalink / raw)
To: Liu, Jinsong; +Cc: Jan Beulich, xen-devel
On 13.05.13 17:21, Liu, Jinsong wrote:
> Christoph Egger wrote:
>> On 13.05.13 15:35, Liu, Jinsong wrote:
>>> Christoph Egger wrote:
>>>> On 13.05.13 12:44, Liu, Jinsong wrote:
>>>>>>> Please refer to the description of patch 2/2, especially
>>>>>>>
>>>>>>> * For dom0, if really need check, it should check whether
>>>>>>> vMCE injection for dom0 ready (say, exception trap bounce
>>>>>>> check, which has been done at inject_vmce()), not check dom0
>>>>>>> mcelog ready (which has been done at mce_softirq() before
>>>>>>> send global virq to dom0).
>>>>>>>
>>>>>>> Which means before hypervisor send error log via virq to dom0,
>>>>>>> current code has checked whether mcelog ready at dom0 or not -->
>>>>>>> that's the right place for your concern, and it has indeed done
>>>>>>> check.
>>>>>>
>>>>>> I think, I do not understand the patch description.
>>>>>> Let me rephrase if I do now due to this discussion:
>>>>>>
>>>>>> The mcelog driver in Dom0 registers itself to the virq handler to
>>>>>> provide the machine check logging service.
>>>>>
>>>>> Yes.
>>>>>
>>>>>> Xen checks if a virq handler has been registered
>>>>>
>>>>> Yes.
>>>>>
>>>>>> but does not check
>>>>>> if the dom0 handler is actually ready to take the errors.
>>>>>> This patch fixes this.
>>>>>>
>>>>>
>>>>> I'm not clear your question 'does not check if the dom0 handler
>>>>> is actually ready to take the errors'. Could you elaborate more
>>>>> your concern at this point?
>>>>
>>>> Yes, this is exactly my question. You got it.
>>>>
>>>> Christoph
>>>
>>> Hmm, seems you misunderstand my word. What I meant is,
>>> I don't know what you are asking by 'does not check if the dom0
>>> handler is actually ready to take the errors'. Could you elaborate
>>> more your question?
>>
>> I reread your patch description:
>>
>>> * For dom0, if really need check, it should check whether vMCE
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> injection for dom0 ready (say, exception trap bounce check, which
>> ^^^^^^^^^^^^^^^^^^^^^^^^
>>> has been done at inject_vmce()), not check dom0 mcelog ready (which
>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> has been done at mce_softirq() before send global virq to dom0).
>>
>> My question is: Is it possible when mcelog driver registers
>> the virq handler that it cannot deal with machine check errors
>> immediately?
>>
>> Christoph
>
> Yes, there is a small time window when dom0 mcelog driver init:
>
> The last step of xen_late_init_mcelog()
> --> bind_virq_for_mce
> --> bind_virq_to_irqhandler
>
> irq = bind_virq_to_irq(virq, cpu);
> if (irq < 0)
> return irq;
> retval = request_irq(irq, handler, irqflags, devname, dev_id);
>
> Time window: between bind_virq_to_irq and request_irq
>
> If hypervisor inject virq to notify error log to dom0 exactly during
> this init time window, dom0 no-ops handler will not fetch error log.
> However, it's OK since error log in hypervisor is still there, until
> next time when hypervisor inject virq again, dom0 mcelog driver will
> fetch them. Considering it occurs rarely (and no harm), I think it's OK.
>
> Patch 2/2 itself is not to fix this issue. Patch 2/2 is to remove
> is_vmce_ready() check since it's problematic (wrong check), overkilled
> (kill system unnecessary), deprecated (vMCE is not bound to host MCE
any more)
> and redundant (both vmce trap callback and mcelog virq has been checked
> in other hypervisor code points).
>
> I agree the description is not clear at some points, so I will
> re-write the description later.
Thanks. From reading the new description
I think, I got it now why this check is wrong:
Whenever a vmce is injected into dom0 is_vmce_ready() checks if dom0
installed an virq handler for logging and in case dom0 does no logging
dom0 is crashed instead.
Assuming above is right then:
the code path in mcaction.c not only runs when a vmce is injected into
dom0, it also runs when a vmce is injected into a domU.
So that means when dom0 does no logging then it is crashed whenever
a vMCE is injected into any guest. OUCH!
Make Jan happy with his 'goto vmce_failed' concern and you have my ack.
Christoph
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-14 10:06 ` Christoph Egger
@ 2013-05-14 15:29 ` Liu, Jinsong
2013-05-14 15:35 ` Jan Beulich
0 siblings, 1 reply; 33+ messages in thread
From: Liu, Jinsong @ 2013-05-14 15:29 UTC (permalink / raw)
To: Christoph Egger; +Cc: Jan Beulich, xen-devel
Christoph Egger wrote:
> On 13.05.13 17:21, Liu, Jinsong wrote:
>> Christoph Egger wrote:
>>> On 13.05.13 15:35, Liu, Jinsong wrote:
>>>> Christoph Egger wrote:
>>>>> On 13.05.13 12:44, Liu, Jinsong wrote:
>>>>>>>> Please refer to the description of patch 2/2, especially
>>>>>>>>
>>>>>>>> * For dom0, if really need check, it should check whether
>>>>>>>> vMCE injection for dom0 ready (say, exception trap bounce
>>>>>>>> check, which has been done at inject_vmce()), not check
>>>>>>>> dom0 mcelog ready (which has been done at mce_softirq()
>>>>>>>> before send global virq to dom0).
>>>>>>>>
>>>>>>>> Which means before hypervisor send error log via virq to dom0,
>>>>>>>> current code has checked whether mcelog ready at dom0 or not
>>>>>>>> --> that's the right place for your concern, and it has indeed
>>>>>>>> done check.
>>>>>>>
>>>>>>> I think, I do not understand the patch description.
>>>>>>> Let me rephrase if I do now due to this discussion:
>>>>>>>
>>>>>>> The mcelog driver in Dom0 registers itself to the virq handler
>>>>>>> to provide the machine check logging service.
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>> Xen checks if a virq handler has been registered
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>> but does not check
>>>>>>> if the dom0 handler is actually ready to take the errors.
>>>>>>> This patch fixes this.
>>>>>>>
>>>>>>
>>>>>> I'm not clear your question 'does not check if the dom0 handler
>>>>>> is actually ready to take the errors'. Could you elaborate more
>>>>>> your concern at this point?
>>>>>
>>>>> Yes, this is exactly my question. You got it.
>>>>>
>>>>> Christoph
>>>>
>>>> Hmm, seems you misunderstand my word. What I meant is,
>>>> I don't know what you are asking by 'does not check if the dom0
>>>> handler is actually ready to take the errors'. Could you elaborate
>>>> more your question?
>>>
>>> I reread your patch description:
>>>
>>>> * For dom0, if really need check, it should check whether vMCE
>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> injection for dom0 ready (say, exception trap bounce check, which
>>> ^^^^^^^^^^^^^^^^^^^^^^^^
>>>> has been done at inject_vmce()), not check dom0 mcelog ready (which
>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> has been done at mce_softirq() before send global virq to dom0).
>>>
>>> My question is: Is it possible when mcelog driver registers
>>> the virq handler that it cannot deal with machine check errors
>>> immediately?
>>>
>>> Christoph
>>
>> Yes, there is a small time window when dom0 mcelog driver init:
>>
>> The last step of xen_late_init_mcelog()
>> --> bind_virq_for_mce
>> --> bind_virq_to_irqhandler
>>
>> irq = bind_virq_to_irq(virq, cpu);
>> if (irq < 0)
>> return irq;
>> retval = request_irq(irq, handler, irqflags, devname,
>> dev_id);
>>
>> Time window: between bind_virq_to_irq and request_irq
>>
>> If hypervisor inject virq to notify error log to dom0 exactly during
>> this init time window, dom0 no-ops handler will not fetch error log.
>> However, it's OK since error log in hypervisor is still there, until
>> next time when hypervisor inject virq again, dom0 mcelog driver will
>> fetch them. Considering it occurs rarely (and no harm), I think it's
>> OK.
>>
>> Patch 2/2 itself is not to fix this issue. Patch 2/2 is to remove
>> is_vmce_ready() check since it's problematic (wrong check),
>> overkilled (kill system unnecessary), deprecated (vMCE is not bound
>> to host MCE any more) and redundant (both vmce trap callback and
>> mcelog virq has been checked in other hypervisor code points).
>>
>> I agree the description is not clear at some points, so I will
>> re-write the description later.
>
> Thanks. From reading the new description
> I think, I got it now why this check is wrong:
>
> Whenever a vmce is injected into dom0 is_vmce_ready() checks if dom0
> installed an virq handler for logging and in case dom0 does no logging
> dom0 is crashed instead.
>
> Assuming above is right then:
> the code path in mcaction.c not only runs when a vmce is injected into
> dom0, it also runs when a vmce is injected into a domU.
> So that means when dom0 does no logging then it is crashed whenever
> a vMCE is injected into any guest. OUCH!
>
> Make Jan happy with his 'goto vmce_failed' concern and you have my
> ack.
>
> Christoph
Thanks ack!
Jan, any more concern?
Thanks,
Jinsong
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-14 15:29 ` Liu, Jinsong
@ 2013-05-14 15:35 ` Jan Beulich
2013-05-16 5:53 ` Liu, Jinsong
0 siblings, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2013-05-14 15:35 UTC (permalink / raw)
To: Christoph Egger, Jinsong Liu; +Cc: xen-devel
>>> On 14.05.13 at 17:29, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
> Christoph Egger wrote:
>> On 13.05.13 17:21, Liu, Jinsong wrote:
>>> Christoph Egger wrote:
>>>> On 13.05.13 15:35, Liu, Jinsong wrote:
>>>>> Christoph Egger wrote:
>>>>>> On 13.05.13 12:44, Liu, Jinsong wrote:
>>>>>>>>> Please refer to the description of patch 2/2, especially
>>>>>>>>>
>>>>>>>>> * For dom0, if really need check, it should check whether
>>>>>>>>> vMCE injection for dom0 ready (say, exception trap bounce
>>>>>>>>> check, which has been done at inject_vmce()), not check
>>>>>>>>> dom0 mcelog ready (which has been done at mce_softirq()
>>>>>>>>> before send global virq to dom0).
>>>>>>>>>
>>>>>>>>> Which means before hypervisor send error log via virq to dom0,
>>>>>>>>> current code has checked whether mcelog ready at dom0 or not
>>>>>>>>> --> that's the right place for your concern, and it has indeed
>>>>>>>>> done check.
>>>>>>>>
>>>>>>>> I think, I do not understand the patch description.
>>>>>>>> Let me rephrase if I do now due to this discussion:
>>>>>>>>
>>>>>>>> The mcelog driver in Dom0 registers itself to the virq handler
>>>>>>>> to provide the machine check logging service.
>>>>>>>
>>>>>>> Yes.
>>>>>>>
>>>>>>>> Xen checks if a virq handler has been registered
>>>>>>>
>>>>>>> Yes.
>>>>>>>
>>>>>>>> but does not check
>>>>>>>> if the dom0 handler is actually ready to take the errors.
>>>>>>>> This patch fixes this.
>>>>>>>>
>>>>>>>
>>>>>>> I'm not clear your question 'does not check if the dom0 handler
>>>>>>> is actually ready to take the errors'. Could you elaborate more
>>>>>>> your concern at this point?
>>>>>>
>>>>>> Yes, this is exactly my question. You got it.
>>>>>>
>>>>>> Christoph
>>>>>
>>>>> Hmm, seems you misunderstand my word. What I meant is,
>>>>> I don't know what you are asking by 'does not check if the dom0
>>>>> handler is actually ready to take the errors'. Could you elaborate
>>>>> more your question?
>>>>
>>>> I reread your patch description:
>>>>
>>>>> * For dom0, if really need check, it should check whether vMCE
>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>> injection for dom0 ready (say, exception trap bounce check, which
>>>> ^^^^^^^^^^^^^^^^^^^^^^^^
>>>>> has been done at inject_vmce()), not check dom0 mcelog ready (which
>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>> has been done at mce_softirq() before send global virq to dom0).
>>>>
>>>> My question is: Is it possible when mcelog driver registers
>>>> the virq handler that it cannot deal with machine check errors
>>>> immediately?
>>>>
>>>> Christoph
>>>
>>> Yes, there is a small time window when dom0 mcelog driver init:
>>>
>>> The last step of xen_late_init_mcelog()
>>> --> bind_virq_for_mce
>>> --> bind_virq_to_irqhandler
>>>
>>> irq = bind_virq_to_irq(virq, cpu);
>>> if (irq < 0)
>>> return irq;
>>> retval = request_irq(irq, handler, irqflags, devname,
>>> dev_id);
>>>
>>> Time window: between bind_virq_to_irq and request_irq
>>>
>>> If hypervisor inject virq to notify error log to dom0 exactly during
>>> this init time window, dom0 no-ops handler will not fetch error log.
>>> However, it's OK since error log in hypervisor is still there, until
>>> next time when hypervisor inject virq again, dom0 mcelog driver will
>>> fetch them. Considering it occurs rarely (and no harm), I think it's
>>> OK.
>>>
>>> Patch 2/2 itself is not to fix this issue. Patch 2/2 is to remove
>>> is_vmce_ready() check since it's problematic (wrong check),
>>> overkilled (kill system unnecessary), deprecated (vMCE is not bound
>>> to host MCE any more) and redundant (both vmce trap callback and
>>> mcelog virq has been checked in other hypervisor code points).
>>>
>>> I agree the description is not clear at some points, so I will
>>> re-write the description later.
>>
>> Thanks. From reading the new description
>> I think, I got it now why this check is wrong:
>>
>> Whenever a vmce is injected into dom0 is_vmce_ready() checks if dom0
>> installed an virq handler for logging and in case dom0 does no logging
>> dom0 is crashed instead.
>>
>> Assuming above is right then:
>> the code path in mcaction.c not only runs when a vmce is injected into
>> dom0, it also runs when a vmce is injected into a domU.
>> So that means when dom0 does no logging then it is crashed whenever
>> a vMCE is injected into any guest. OUCH!
>>
>> Make Jan happy with his 'goto vmce_failed' concern and you have my
>> ack.
>>
>> Christoph
>
> Jan, any more concern?
No, I was fine with your earlier explanation.
Christoph - I wasn't sure whether your above statement implied
Jinsong needing to do further changes, or whether you really just
referred to the initial discussion Jinsong and I had which got
already settled. Please clarify.
Jan
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-14 15:35 ` Jan Beulich
@ 2013-05-16 5:53 ` Liu, Jinsong
2013-05-16 8:11 ` Christoph Egger
0 siblings, 1 reply; 33+ messages in thread
From: Liu, Jinsong @ 2013-05-16 5:53 UTC (permalink / raw)
To: Jan Beulich, Christoph Egger; +Cc: xen-devel
Jan Beulich wrote:
>>>> On 14.05.13 at 17:29, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>> wrote: Christoph Egger wrote: On 13.05.13 17:21, Liu, Jinsong
>>>> wrote: Christoph Egger wrote:
>>>>> On 13.05.13 15:35, Liu, Jinsong wrote:
>>>>>> Christoph Egger wrote:
>>>>>>> On 13.05.13 12:44, Liu, Jinsong wrote:
>>>>>>>>>> Please refer to the description of patch 2/2, especially
>>>>>>>>>>
>>>>>>>>>> * For dom0, if really need check, it should check whether
>>>>>>>>>> vMCE injection for dom0 ready (say, exception trap bounce
>>>>>>>>>> check, which has been done at inject_vmce()), not check
>>>>>>>>>> dom0 mcelog ready (which has been done at mce_softirq()
>>>>>>>>>> before send global virq to dom0).
>>>>>>>>>>
>>>>>>>>>> Which means before hypervisor send error log via virq to
>>>>>>>>>> dom0, current code has checked whether mcelog ready at dom0
>>>>>>>>>> or not --> that's the right place for your concern, and it
>>>>>>>>>> has indeed done check.
>>>>>>>>>
>>>>>>>>> I think, I do not understand the patch description.
>>>>>>>>> Let me rephrase if I do now due to this discussion:
>>>>>>>>>
>>>>>>>>> The mcelog driver in Dom0 registers itself to the virq handler
>>>>>>>>> to provide the machine check logging service.
>>>>>>>>
>>>>>>>> Yes.
>>>>>>>>
>>>>>>>>> Xen checks if a virq handler has been registered
>>>>>>>>
>>>>>>>> Yes.
>>>>>>>>
>>>>>>>>> but does not check
>>>>>>>>> if the dom0 handler is actually ready to take the errors.
>>>>>>>>> This patch fixes this.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I'm not clear your question 'does not check if the dom0 handler
>>>>>>>> is actually ready to take the errors'. Could you elaborate more
>>>>>>>> your concern at this point?
>>>>>>>
>>>>>>> Yes, this is exactly my question. You got it.
>>>>>>>
>>>>>>> Christoph
>>>>>>
>>>>>> Hmm, seems you misunderstand my word. What I meant is,
>>>>>> I don't know what you are asking by 'does not check if the dom0
>>>>>> handler is actually ready to take the errors'. Could you
>>>>>> elaborate more your question?
>>>>>
>>>>> I reread your patch description:
>>>>>
>>>>>> * For dom0, if really need check, it should check whether vMCE
>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>> injection for dom0 ready (say, exception trap bounce check, which
>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>> has been done at inject_vmce()), not check dom0 mcelog ready
>>>>>> (which
>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>> has been done at mce_softirq() before send global virq to dom0).
>>>>>
>>>>> My question is: Is it possible when mcelog driver registers
>>>>> the virq handler that it cannot deal with machine check errors
>>>>> immediately?
>>>>>
>>>>> Christoph
>>>>
>>>> Yes, there is a small time window when dom0 mcelog driver init:
>>>>
>>>> The last step of xen_late_init_mcelog()
>>>> --> bind_virq_for_mce
>>>> --> bind_virq_to_irqhandler
>>>>
>>>> irq = bind_virq_to_irq(virq, cpu);
>>>> if (irq < 0)
>>>> return irq;
>>>> retval = request_irq(irq, handler, irqflags, devname,
>>>> dev_id);
>>>>
>>>> Time window: between bind_virq_to_irq and request_irq
>>>>
>>>> If hypervisor inject virq to notify error log to dom0 exactly
>>>> during this init time window, dom0 no-ops handler will not fetch
>>>> error log. However, it's OK since error log in hypervisor is still
>>>> there, until next time when hypervisor inject virq again, dom0
>>>> mcelog driver will fetch them. Considering it occurs rarely (and
>>>> no harm), I think it's OK.
>>>>
>>>> Patch 2/2 itself is not to fix this issue. Patch 2/2 is to remove
>>>> is_vmce_ready() check since it's problematic (wrong check),
>>>> overkilled (kill system unnecessary), deprecated (vMCE is not bound
>>>> to host MCE any more) and redundant (both vmce trap callback and
>>>> mcelog virq has been checked in other hypervisor code points).
>>>>
>>>> I agree the description is not clear at some points, so I will
>>>> re-write the description later.
>>>
>>> Thanks. From reading the new description
>>> I think, I got it now why this check is wrong:
>>>
>>> Whenever a vmce is injected into dom0 is_vmce_ready() checks if dom0
>>> installed an virq handler for logging and in case dom0 does no
>>> logging dom0 is crashed instead.
>>>
>>> Assuming above is right then:
>>> the code path in mcaction.c not only runs when a vmce is injected
>>> into dom0, it also runs when a vmce is injected into a domU.
>>> So that means when dom0 does no logging then it is crashed whenever
>>> a vMCE is injected into any guest. OUCH!
>>>
>>> Make Jan happy with his 'goto vmce_failed' concern and you have my
>>> ack.
>>>
>>> Christoph
>>
>> Jan, any more concern?
>
> No, I was fine with your earlier explanation.
>
> Christoph - I wasn't sure whether your above statement implied
> Jinsong needing to do further changes, or whether you really just
> referred to the initial discussion Jinsong and I had which got
> already settled. Please clarify.
>
> Jan
I think Christoph's statement agrees that we need patch 2/2, so ack patch 2/2 as far as you have no more concern about 'goto vmce_failed'.
Christoph, am I right?
Thanks,
Jinsong
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
2013-05-16 5:53 ` Liu, Jinsong
@ 2013-05-16 8:11 ` Christoph Egger
0 siblings, 0 replies; 33+ messages in thread
From: Christoph Egger @ 2013-05-16 8:11 UTC (permalink / raw)
To: Liu, Jinsong; +Cc: Jan Beulich, xen-devel
On 16.05.13 07:53, Liu, Jinsong wrote:
> Jan Beulich wrote:
>>>>> On 14.05.13 at 17:29, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>> wrote: Christoph Egger wrote: On 13.05.13 17:21, Liu, Jinsong
>>>>> wrote: Christoph Egger wrote:
>>>>>> On 13.05.13 15:35, Liu, Jinsong wrote:
>>>>>>> Christoph Egger wrote:
>>>>>>>> On 13.05.13 12:44, Liu, Jinsong wrote:
>>>>>>>>>>> Please refer to the description of patch 2/2, especially
>>>>>>>>>>>
>>>>>>>>>>> * For dom0, if really need check, it should check whether
>>>>>>>>>>> vMCE injection for dom0 ready (say, exception trap bounce
>>>>>>>>>>> check, which has been done at inject_vmce()), not check
>>>>>>>>>>> dom0 mcelog ready (which has been done at mce_softirq()
>>>>>>>>>>> before send global virq to dom0).
>>>>>>>>>>>
>>>>>>>>>>> Which means before hypervisor send error log via virq to
>>>>>>>>>>> dom0, current code has checked whether mcelog ready at dom0
>>>>>>>>>>> or not --> that's the right place for your concern, and it
>>>>>>>>>>> has indeed done check.
>>>>>>>>>>
>>>>>>>>>> I think, I do not understand the patch description.
>>>>>>>>>> Let me rephrase if I do now due to this discussion:
>>>>>>>>>>
>>>>>>>>>> The mcelog driver in Dom0 registers itself to the virq handler
>>>>>>>>>> to provide the machine check logging service.
>>>>>>>>>
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>>> Xen checks if a virq handler has been registered
>>>>>>>>>
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>>> but does not check
>>>>>>>>>> if the dom0 handler is actually ready to take the errors.
>>>>>>>>>> This patch fixes this.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm not clear your question 'does not check if the dom0 handler
>>>>>>>>> is actually ready to take the errors'. Could you elaborate more
>>>>>>>>> your concern at this point?
>>>>>>>>
>>>>>>>> Yes, this is exactly my question. You got it.
>>>>>>>>
>>>>>>>> Christoph
>>>>>>>
>>>>>>> Hmm, seems you misunderstand my word. What I meant is,
>>>>>>> I don't know what you are asking by 'does not check if the dom0
>>>>>>> handler is actually ready to take the errors'. Could you
>>>>>>> elaborate more your question?
>>>>>>
>>>>>> I reread your patch description:
>>>>>>
>>>>>>> * For dom0, if really need check, it should check whether vMCE
>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>> injection for dom0 ready (say, exception trap bounce check, which
>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>> has been done at inject_vmce()), not check dom0 mcelog ready
>>>>>>> (which
>>>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>>>> has been done at mce_softirq() before send global virq to dom0).
>>>>>>
>>>>>> My question is: Is it possible when mcelog driver registers
>>>>>> the virq handler that it cannot deal with machine check errors
>>>>>> immediately?
>>>>>>
>>>>>> Christoph
>>>>>
>>>>> Yes, there is a small time window when dom0 mcelog driver init:
>>>>>
>>>>> The last step of xen_late_init_mcelog()
>>>>> --> bind_virq_for_mce
>>>>> --> bind_virq_to_irqhandler
>>>>>
>>>>> irq = bind_virq_to_irq(virq, cpu);
>>>>> if (irq < 0)
>>>>> return irq;
>>>>> retval = request_irq(irq, handler, irqflags, devname,
>>>>> dev_id);
>>>>>
>>>>> Time window: between bind_virq_to_irq and request_irq
>>>>>
>>>>> If hypervisor inject virq to notify error log to dom0 exactly
>>>>> during this init time window, dom0 no-ops handler will not fetch
>>>>> error log. However, it's OK since error log in hypervisor is still
>>>>> there, until next time when hypervisor inject virq again, dom0
>>>>> mcelog driver will fetch them. Considering it occurs rarely (and
>>>>> no harm), I think it's OK.
>>>>>
>>>>> Patch 2/2 itself is not to fix this issue. Patch 2/2 is to remove
>>>>> is_vmce_ready() check since it's problematic (wrong check),
>>>>> overkilled (kill system unnecessary), deprecated (vMCE is not bound
>>>>> to host MCE any more) and redundant (both vmce trap callback and
>>>>> mcelog virq has been checked in other hypervisor code points).
>>>>>
>>>>> I agree the description is not clear at some points, so I will
>>>>> re-write the description later.
>>>>
>>>> Thanks. From reading the new description
>>>> I think, I got it now why this check is wrong:
>>>>
>>>> Whenever a vmce is injected into dom0 is_vmce_ready() checks if dom0
>>>> installed an virq handler for logging and in case dom0 does no
>>>> logging dom0 is crashed instead.
>>>>
>>>> Assuming above is right then:
>>>> the code path in mcaction.c not only runs when a vmce is injected
>>>> into dom0, it also runs when a vmce is injected into a domU.
>>>> So that means when dom0 does no logging then it is crashed whenever
>>>> a vMCE is injected into any guest. OUCH!
>>>>
>>>> Make Jan happy with his 'goto vmce_failed' concern and you have my
>>>> ack.
>>>>
>>>> Christoph
>>>
>>> Jan, any more concern?
>>
>> No, I was fine with your earlier explanation.
>>
>> Christoph - I wasn't sure whether your above statement implied
>> Jinsong needing to do further changes, or whether you really just
>> referred to the initial discussion Jinsong and I had which got
>> already settled. Please clarify.
>>
>> Jan
>
> I think Christoph's statement agrees that we need patch 2/2, so ack patch 2/2 as far as you have no more concern about 'goto vmce_failed'.
> Christoph, am I right?
>
Yes, right.
Christoph
^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2013-05-16 8:11 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-27 8:38 [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check Liu, Jinsong
2013-04-29 7:08 ` Jan Beulich
2013-05-03 8:41 ` Liu, Jinsong
2013-05-03 9:32 ` Jan Beulich
2013-05-03 14:16 ` Liu, Jinsong
2013-05-03 14:27 ` Jan Beulich
2013-05-03 15:51 ` Liu, Jinsong
2013-05-06 8:54 ` Christoph Egger
2013-05-06 9:06 ` Jan Beulich
2013-05-06 9:24 ` Liu, Jinsong
2013-05-06 9:41 ` Christoph Egger
2013-05-06 9:50 ` Liu, Jinsong
2013-05-06 11:38 ` Christoph Egger
2013-05-06 15:00 ` Liu, Jinsong
2013-05-06 15:06 ` Egger, Christoph
2013-05-06 15:14 ` Liu, Jinsong
2013-05-06 15:31 ` Christoph Egger
2013-05-06 16:00 ` Liu, Jinsong
2013-05-07 11:43 ` Christoph Egger
2013-05-09 17:05 ` Liu, Jinsong
2013-05-10 13:59 ` Christoph Egger
2013-05-10 16:50 ` Liu, Jinsong
2013-05-13 8:43 ` Christoph Egger
2013-05-13 10:44 ` Liu, Jinsong
2013-05-13 11:09 ` Christoph Egger
2013-05-13 13:35 ` Liu, Jinsong
2013-05-13 14:24 ` Christoph Egger
2013-05-13 15:21 ` Liu, Jinsong
2013-05-14 10:06 ` Christoph Egger
2013-05-14 15:29 ` Liu, Jinsong
2013-05-14 15:35 ` Jan Beulich
2013-05-16 5:53 ` Liu, Jinsong
2013-05-16 8:11 ` Christoph Egger
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.