Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Dean Nelson <dnelson@redhat.com>
To: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	kvm@vger.kernel.org, qemu-devel@nongnu.org,
	Huang Ying <ying.huang@intel.com>
Subject: Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest
Date: Thu, 07 Oct 2010 10:23:43 -0500	[thread overview]
Message-ID: <4CADE5FF.8060605@redhat.com> (raw)
In-Reply-To: <4CAD417B.7060808@jp.fujitsu.com>

On 10/06/2010 10:41 PM, Hidetoshi Seto wrote:
> (2010/10/07 3:10), Dean Nelson wrote:
>> On 10/06/2010 11:05 AM, Marcelo Tosatti wrote:
>>> On Wed, Oct 06, 2010 at 10:58:36AM +0900, Hidetoshi Seto wrote:
>>>> I got some more question:
>>>>
>>>> (2010/10/05 3:54), Marcelo Tosatti wrote:
>>>>> Index: qemu/target-i386/cpu.h
>>>>> ===================================================================
>>>>> --- qemu.orig/target-i386/cpu.h
>>>>> +++ qemu/target-i386/cpu.h
>>>>> @@ -250,16 +250,32 @@
>>>>>    #define PG_ERROR_RSVD_MASK 0x08
>>>>>    #define PG_ERROR_I_D_MASK  0x10
>>>>>
>>>>> -#define MCG_CTL_P    (1UL<<8)   /* MCG_CAP register available */
>>>>> +#define MCG_CTL_P    (1ULL<<8)   /* MCG_CAP register available */
>>>>> +#define MCG_SER_P    (1ULL<<24) /* MCA recovery/new status bits */
>>>>>
>>>>> -#define MCE_CAP_DEF    MCG_CTL_P
>>>>> +#define MCE_CAP_DEF    (MCG_CTL_P|MCG_SER_P)
>>>>>    #define MCE_BANKS_DEF    10
>>>>>
>>>>
>>>> It seems that current kvm doesn't support SER_P, so injecting SRAO
>>>> to guest will mean that guest receives VAL|UC|!PCC and RIPV event
>>>> from virtual processor that doesn't have SER_P.
>>>
>>> Dean also noted this. I don't think it was deliberate choice to not
>>> expose SER_P. Huang?
>>
>> In my testing, I found that MCG_SER_P was not being set (and I was
>> running on a Nehalem-EX system). Injecting a MCE resulted in the
>> guest entering into panic() from mce_panic(). If crash_kexec()
>> finds a kexec_crash_image the system ends up rebooting, otherwise,
>> what happens next requires operator intervention.
>
> Good to know.
> What I'm concerning is that if memory scrubbing SRAO event is
> injected when !SER_P, linux guest with certain mce tolerant level
> might grade it as "UC" severity and continue running with none of
> panicking, killing and poisoning because of !PCC and RIPV.
>
> Could you provide the panic message of the guest in your test?
> I think it can tell me why the mce handler decided to go panic.

Sure, I'll add the info below at the end of this email.


>> When I applied a patch to the guest's kernel which forces mce_ser to be
>> set, as if MCG_SER_P was set (see __mcheck_cpu_cap_init()), I found
>> that when the memory page was 'owned' by a guest process, the process
>> would be killed (if the page was dirty), and the guest would stay
>> running. The HWPoisoned page would be sidelined and not cause any more
>> issues.
>
> Excellent.
> So while guest kernel knows which page is poisoned, guest processes
> are controlled not to touch the page.
>
> ... Therefore rebooting the vm and renewing kernel will lost the
> information where is poisoned.

Correct.


>>>> I think most OSes don't expect that it can receives MCE with !PCC
>>>> on traditional x86 processor without SER_P.
>>>>
>>>> Q1: Is it safe to expect that guests can handle such !PCC event?
>>
>> This might be best answered by Huang, but as I mentioned above, without
>> MCG_SER_P being set, the result was an orderly system panic on the
>> guest.
>
> Though I'll wait Huang (I think he is on holiday), I believe that
> system panic is just a possible option for AO (Action Optional)
> event, no matter how the SER_P is.

I think you may be correct, but Huang will know for sure.


>>>> Q2: What is the expected behavior on the guest?
>>
>> I think I answered this above.
>
> Yeah, thanks.
>
>>
>>>> Q3: What happen if guest reboots itself in response to the MCE?
>>
>> That depends...
>>
>> And the following issue also holds for a guest that is rebooted at
>> some point having successfully sidelined the bad page.
>>
>> After the guest has panic'd, a system_reset of the guest or a restart
>> initiated by crash_kexec() (called by panic() on the guest), usually
>> results in the guest hanging because the bad page still belongs
>> to qemu-kvm and is now being referenced by the new guest in some way.
>
> Yes. In other words my concern about reboot is that new guest kernel
> including kdump kernel might try to read the bad page.  If there is
> no AR-SIGBUS etc., we need some tricks to inhibit such accesses.

Agreed.


>> (It actually may not hang, but successfully reboot and be runnable,
>> with the bad page lurking in the background. It all seems to depend on
>> where the bad page ends up, and whether it's ever referenced.)
>
> I know some tough guys using their PC with buggy DIMMs :-)
>
>>
>> I believe there was an attempt to deal with this in kvm on the host.
>> See kvm_handle_bad_page(). This function was suppose to result in the
>> sending of a BUS_MCEERR_AR flavored SIGBUS by do_sigbus() to qemu-kvm
>> which in theory would result in the right thing happening. But commit
>> 96054569190bdec375fe824e48ca1f4e3b53dd36 prevents the signal from being
>> sent. So this mechanism needs to be re-worked, and the issue remains.
>
> Definitely.
> I guess Huang has some plan or hint for rework this point.

Yeah, as far as I know Huang is looking into this.


>> I would think that if the the bad page can't be sidelined, such that
>> the newly booting guest can't use it, then the new guest shouldn't be
>> allowed to boot. But perhaps there is some merit in letting it try to
>> boot and see if one gets 'lucky'.
>
> In case of booting a real machine in real world, hardware and firmware
> usually (or often) do self-test before passing control to OS.
> Some platform can boot OS with degraded configuration (for example,
> fewer memory) if it has trouble on its component.  Some BIOS may
> stop booting and show messages like "please reseat [component]" on the
> screen.  So we could implement/request qemu to have such mechanism.
>
> I can understand the merit you mentioned here, in some degree. But I
> think it is hard to say "unlucky" to customer in business...

I totally agree.


>> I understand that Huang is looking into what should be done. He can
>> give you better information than I in answer to your questions.
>
> Agreed. Thank you very much!

You're welcome.

Dean

> Thanks,
> H.Seto


::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

The test I'm running is the mce-test suite's kvm test. A portion of
the messages it outputted (to stdout) follows:

> Guest physical address is 0x71220000
> Host virtual address is 7f9dc5020
> Host physical address is 0x1051620000
> Guest physical klog address is 0x71220

And it called mce-inject with the following data file:

> [root@intel-s3e36-02 test]# cat SRAO
> CPU 0 BANK 2
> STATUS UNCORRECTED SRAO 0x17a
> MCGSTATUS MCIP RIPV
> MISC 0x8c
> ADDR 0x1051620000
> [root@intel-s3e36-02 test]#

The following is from the host's /var/log/messages:

> Oct  7 09:42:48 intel-s3e36-02 kernel: Triggering MCE exception on CPU 0
> Oct  7 09:42:48 intel-s3e36-02 kernel: Machine check events logged
> Oct  7 09:42:48 intel-s3e36-02 kernel: MCE exception done on CPU 0
> Oct  7 09:42:48 intel-s3e36-02 kernel: MCE 0x1051620: Killing qemu-system-x86:6867 early due to hardware memory corruption
> Oct  7 09:42:48 intel-s3e36-02 kernel: MCE 0x1051620: dirty LRU page recovery: Recovered

Lastly, the following is a screen grab from the guest's serial console:

> HARDWARE ERROR
> CPU 0: Machine Check Exception:                5 Bank 9: bd000000000000c0
> RIP !INEXACT! 33:<0000000000400428>
> TSC 17a67acd14 ADDR 71220000 MISC 8c
> PROCESSOR 0:6d3 TIME 1286458966 SOCKET 0 APIC 0
> No human readable MCE decoding support on this CPU type.
> Run the message through 'mcelog --ascii' to decode.
> This is not a software problem!
> Machine check: Uncorrected
> Kernel panic - not syncing: Fatal machine check on current CPU
> Pid:1493, comm: simple_process Tainted: B   M        ----------------  2.6.32.dnelson_test #48
>
> Call Trace:
>  <#MC>  [<ffffffff814c7c8d>] panic+0x78/0x137
>  [<ffffffff81027382>] mce_panic+0x1e2/0x210
>  [<ffffffff81028873>] do_machine_check+0x843/0xa70
>  [<ffffffff814cb0cc>] machine_check+0x1c/0x30
>  <<EOE>>

next prev parent reply	other threads:[~2010-10-07 15:23 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-04 18:54 [patch uq/master 0/8] port qemu-kvm's MCE support Marcelo Tosatti
2010-10-04 18:54 ` [patch uq/master 1/8] signalfd compatibility Marcelo Tosatti
2010-10-04 18:54 ` [patch uq/master 2/8] iothread: use signalfd Marcelo Tosatti
2010-10-04 18:54 ` [patch uq/master 3/8] Expose thread_id in info cpus Marcelo Tosatti
2010-10-04 18:54 ` [patch uq/master 4/8] kvm: x86: add mce support Marcelo Tosatti
2010-10-04 18:54 ` [patch uq/master 5/8] Export qemu_ram_addr_from_host Marcelo Tosatti
2010-10-05 12:57   ` Anthony Liguori
2010-10-05 20:13     ` Marcelo Tosatti
2010-10-05 20:48       ` Anthony Liguori
2010-10-04 18:54 ` [patch uq/master 6/8] Add RAM -> physical addr mapping in MCE simulation Marcelo Tosatti
2010-10-04 18:54 ` [patch uq/master 7/8] MCE: Relay UCR MCE to guest Marcelo Tosatti
2010-10-06  1:10   ` Hidetoshi Seto
2010-10-06 16:02     ` Marcelo Tosatti
2010-10-06  1:58   ` Hidetoshi Seto
2010-10-06 16:05     ` Marcelo Tosatti
2010-10-06 18:10       ` Dean Nelson
2010-10-07  3:41         ` Hidetoshi Seto
2010-10-07 15:23           ` Dean Nelson [this message]
2010-10-08  3:15           ` Huang Ying
2010-10-08  5:54             ` Hidetoshi Seto
2010-10-08 12:02             ` Dean Nelson
2010-10-08  2:50       ` Huang Ying
2010-10-04 18:54 ` [patch uq/master 8/8] Add savevm/loadvm support for MCE Marcelo Tosatti
2010-10-05 16:31 ` [Qemu-devel] [patch uq/master 0/8] port qemu-kvm's MCE support Andreas Färber
2010-10-05 18:58   ` Chris Wright
2010-10-05 20:24     ` Marcelo Tosatti
2010-10-06 17:34 ` [patch uq/master 0/8] port qemu-kvm's MCE support (v2) Marcelo Tosatti
2010-10-06 17:34   ` [patch uq/master 1/8] signalfd compatibility Marcelo Tosatti
2010-10-06 17:34   ` [patch uq/master 2/8] iothread: use signalfd Marcelo Tosatti
2010-10-06 17:34   ` [patch uq/master 3/8] Expose thread_id in info cpus Marcelo Tosatti
2010-10-06 17:34   ` [patch uq/master 4/8] kvm: x86: add mce support Marcelo Tosatti
2010-10-06 19:32     ` Anthony Liguori
2010-10-06 17:34   ` [patch uq/master 5/8] Export qemu_ram_addr_from_host Marcelo Tosatti
2010-10-06 17:34   ` [patch uq/master 6/8] Add RAM -> physical addr mapping in MCE simulation Marcelo Tosatti
2010-10-06 17:34   ` [patch uq/master 7/8] MCE: Relay UCR MCE to guest Marcelo Tosatti
2010-10-06 17:34   ` [patch uq/master 8/8] Add savevm/loadvm support for MCE Marcelo Tosatti
2010-10-11 18:31   ` [patch 0/8] port qemu-kvm's MCE support (v3) Marcelo Tosatti
2010-10-11 18:31     ` [patch 1/8] signalfd compatibility Marcelo Tosatti
2010-10-11 18:31     ` [patch 2/8] iothread: use signalfd Marcelo Tosatti
2010-10-11 18:31     ` [patch 3/8] Expose thread_id in info cpus Marcelo Tosatti
2010-10-11 18:31     ` [patch 4/8] kvm: x86: add mce support Marcelo Tosatti
2010-10-11 18:31     ` [patch 5/8] Export qemu_ram_addr_from_host Marcelo Tosatti
2010-10-11 18:31     ` [patch 6/8] Add RAM -> physical addr mapping in MCE simulation Marcelo Tosatti
2010-10-11 18:31     ` [patch 7/8] MCE: Relay UCR MCE to guest Marcelo Tosatti
2010-10-11 18:31     ` [patch 8/8] Add savevm/loadvm support for MCE Marcelo Tosatti
2010-10-14 10:25     ` [patch 0/8] port qemu-kvm's MCE support (v3) Avi Kivity
2010-10-14 16:21       ` Marcelo Tosatti
2010-10-17  9:32     ` [patch 0/8] port qemu-kvm's MCE support (v3 resend) Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CADE5FF.8060605@redhat.com \
    --to=dnelson@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=seto.hidetoshi@jp.fujitsu.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox