All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoph Egger <chegger@amazon.de>
To: "Liu, Jinsong" <jinsong.liu@intel.com>
Cc: Jan Beulich <JBeulich@suse.com>, xen-devel <xen-devel@lists.xen.org>
Subject: Re: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check
Date: Mon, 6 May 2013 10:54:44 +0200	[thread overview]
Message-ID: <51876FD4.9080006@amazon.de> (raw)
In-Reply-To: <DE8DF0795D48FD4CA783C40EC8292335466071@SHSMSX101.ccr.corp.intel.com>

On 03.05.13 17:51, Liu, Jinsong wrote:
> Jan Beulich wrote:
>>>>> On 03.05.13 at 16:16, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>>> Jan Beulich wrote:
>>>>>>> On 03.05.13 at 10:41, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>> wrote: 
>>>>> Jan Beulich wrote:
>>>>>>>>> On 27.04.13 at 10:38, "Liu, Jinsong" <jinsong.liu@intel.com>
>>>>>>>>> wrote:
>>>>>>> From 9098666db640183f894b9aec09599dd32dddb7fa Mon Sep 17 00:00:00
>>>>>>> 2001 From: Liu Jinsong <jinsong.liu@intel.com>
>>>>>>> Date: Sat, 27 Apr 2013 22:37:35 +0800
>>>>>>> Subject: [PATCH 2/2] Xen/vMCE: bugfix to remove problematic
>>>>>>> is_vmce_ready check 
>>>>>>>
>>>>>>> is_vmce_ready() is problematic:
>>>>>>> * For dom0, it checks if virq bind to dom0 mcelog driver. If not,
>>>>>>> it results dom0 crash. However, it's problematic and overkilled
>>>>>>> since mcelog as a dom0 feature could be enabled/disabled per dom0
>>>>>>> option: (XEN) MCE: This error page is ownded by DOM 0 (XEN) DOM0
>>>>>>> not ready for vMCE (XEN) domain_crash called from mcaction.c:133
>>>>>>> (XEN) Domain 0 reported crashed by domain 32767 on cpu#31:
>>>>>>> (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
>>>>>>> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
>>>>>>>
>>>>>>> * For dom0, if really need check, it should check whether vMCE
>>>>>>> injection for dom0 ready (say, exception trap bounce check, which
>>>>>>> has been done at inject_vmce()), not check dom0 mcelog ready
>>>>>>> (which has been done at mce_softirq() before send global virq to
>>>>>>> dom0). 
>>>>>>
>>>>>> Following the argumentation above, I wonder which of the other
>>>>>> "goto vmce_failed" are really appropriate, i.e. whether the patch
>>>>>> shouldn't be extended (at least for the Dom0 case).
>>>>>
>>>>> You mean other 'goto vmce_failed' are also not appropriate (I'm not
>>>>> quite clear your point)?
>>>>
>>>> Yes.
>>>>
>>>>> Would you please point out which point you think not appropriate?
>>>>
>>>> I question whether it is correct/necessary to crash the domain in
>>>> any of those failure cases. Perhaps when we fail to unmap the
>>>> page it is, but failure of fill_vmsr_data() and inject_vmce() don't
>>>> appear to be valid reasons once the is_vmce_ready() path is being
>>>> dropped.
>>>
>>> For fill_vmsr_data(), it failed only when MCG_STATUS_MCIP bit still
>>> set when next vMCE# occur, means the 2nd vMCE# occur when the 1st
>>> vMCE# not handled yet. Per SDM it should shutdown.
>>>
>>> For inject_vmce(), it failed when
>>> 1). vcpu is still mce_pending, or
>>> 2). pv not register trap callback
>>> Maybe it's some overkilled for dom0 (for other guest, it's ok to
>>> kill them), but any graceful way to quit?
>>
>> Just exit and do nothing (except perhaps log a rate limited
>> message)?
>>
> 
> One concern of quiet exit is, the error will be totally ignored by guest --> it didn't get preperly handled, and may recursively occur to make worse error --> it's better to kill guest under such case.
> 
>>> or, considering it rarely happens, how about keep current way (kill
>>> guest no matter dom0 or not)?
>>
>> Possibly - I was merely asking why this one condition was found to
>> be too strict, while the others are being left as is.
>>
>> Jan
> 
> Ah, the reason of removing is_vmce_ready check is, it's
> problematic (check mcelog driver, not vmce tap callback),
> and overkilled (since defaultly dom0 will not start mcelog driver,
> under which case system will crash whenever vmce inject to dom0)
>
> --> So patch 2/2 is not too strict for dom0.
>

Please keep in mind the mcelog userland/kernel interface is not designed
with xen in mind. mcelog cannot report which guest is impacted for
example, although xen reports that to dom0.
I object 'fixing' the hypervisor to come over with mcelog drawbacks.
I prefer fixing Dom0 instead.

>From the design perspective, the virq for Dom0 is for logging purpose
only and the trap handler has equal purpose for both Dom0 and DomU.

BTW: I heard from Andre Pryzwara, he is working with Borislav Petkov
together on a complete new machine check interface in Linux
that supports ARM and x86. We should have an eye on this that
they do not repeat the mcelog design mistakes.

Christoph

  reply	other threads:[~2013-05-06  8:54 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-27  8:38 [PATCH 2/2] Xen/vMCE: bugfix to remove problematic is_vmce_ready check Liu, Jinsong
2013-04-29  7:08 ` Jan Beulich
2013-05-03  8:41   ` Liu, Jinsong
2013-05-03  9:32     ` Jan Beulich
2013-05-03 14:16       ` Liu, Jinsong
2013-05-03 14:27         ` Jan Beulich
2013-05-03 15:51           ` Liu, Jinsong
2013-05-06  8:54             ` Christoph Egger [this message]
2013-05-06  9:06               ` Jan Beulich
2013-05-06  9:24                 ` Liu, Jinsong
2013-05-06  9:41                   ` Christoph Egger
2013-05-06  9:50                     ` Liu, Jinsong
2013-05-06 11:38                       ` Christoph Egger
2013-05-06 15:00                         ` Liu, Jinsong
2013-05-06 15:06                           ` Egger, Christoph
2013-05-06 15:14                             ` Liu, Jinsong
2013-05-06 15:31                               ` Christoph Egger
2013-05-06 16:00                                 ` Liu, Jinsong
2013-05-07 11:43                                   ` Christoph Egger
2013-05-09 17:05                                     ` Liu, Jinsong
2013-05-10 13:59                                       ` Christoph Egger
2013-05-10 16:50                                         ` Liu, Jinsong
2013-05-13  8:43                                           ` Christoph Egger
2013-05-13 10:44                                             ` Liu, Jinsong
2013-05-13 11:09                                               ` Christoph Egger
2013-05-13 13:35                                                 ` Liu, Jinsong
2013-05-13 14:24                                                   ` Christoph Egger
2013-05-13 15:21                                                     ` Liu, Jinsong
2013-05-14 10:06                                                       ` Christoph Egger
2013-05-14 15:29                                                         ` Liu, Jinsong
2013-05-14 15:35                                                           ` Jan Beulich
2013-05-16  5:53                                                             ` Liu, Jinsong
2013-05-16  8:11                                                               ` Christoph Egger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51876FD4.9080006@amazon.de \
    --to=chegger@amazon.de \
    --cc=JBeulich@suse.com \
    --cc=jinsong.liu@intel.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.