From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: Huang Ying <ying.huang@intel.com>
Cc: Dean Nelson <dnelson@redhat.com>,
Marcelo Tosatti <mtosatti@redhat.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest
Date: Fri, 08 Oct 2010 14:54:07 +0900 [thread overview]
Message-ID: <4CAEB1FF.6070109@jp.fujitsu.com> (raw)
In-Reply-To: <1286507754.7768.66.camel@yhuang-dev>
Hi, Huang-san,
(2010/10/08 12:15), Huang Ying wrote:
> Hi, Seto,
>
> On Thu, 2010-10-07 at 11:41 +0800, Hidetoshi Seto wrote:
>> (2010/10/07 3:10), Dean Nelson wrote:
>>> On 10/06/2010 11:05 AM, Marcelo Tosatti wrote:
>>>> On Wed, Oct 06, 2010 at 10:58:36AM +0900, Hidetoshi Seto wrote:
>>>>> I got some more question:
>>>>>
>>>>> (2010/10/05 3:54), Marcelo Tosatti wrote:
>>>>>> Index: qemu/target-i386/cpu.h
>>>>>> ===================================================================
>>>>>> --- qemu.orig/target-i386/cpu.h
>>>>>> +++ qemu/target-i386/cpu.h
>>>>>> @@ -250,16 +250,32 @@
>>>>>> #define PG_ERROR_RSVD_MASK 0x08
>>>>>> #define PG_ERROR_I_D_MASK 0x10
>>>>>>
>>>>>> -#define MCG_CTL_P (1UL<<8) /* MCG_CAP register available */
>>>>>> +#define MCG_CTL_P (1ULL<<8) /* MCG_CAP register available */
>>>>>> +#define MCG_SER_P (1ULL<<24) /* MCA recovery/new status bits */
>>>>>>
>>>>>> -#define MCE_CAP_DEF MCG_CTL_P
>>>>>> +#define MCE_CAP_DEF (MCG_CTL_P|MCG_SER_P)
>>>>>> #define MCE_BANKS_DEF 10
>>>>>>
>>>>>
>>>>> It seems that current kvm doesn't support SER_P, so injecting SRAO
>>>>> to guest will mean that guest receives VAL|UC|!PCC and RIPV event
>>>>> from virtual processor that doesn't have SER_P.
>>>>
>>>> Dean also noted this. I don't think it was deliberate choice to not
>>>> expose SER_P. Huang?
>>>
>>> In my testing, I found that MCG_SER_P was not being set (and I was
>>> running on a Nehalem-EX system). Injecting a MCE resulted in the
>>> guest entering into panic() from mce_panic(). If crash_kexec()
>>> finds a kexec_crash_image the system ends up rebooting, otherwise,
>>> what happens next requires operator intervention.
>>
>> Good to know.
>> What I'm concerning is that if memory scrubbing SRAO event is
>> injected when !SER_P, linux guest with certain mce tolerant level
>> might grade it as "UC" severity and continue running with none of
>> panicking, killing and poisoning because of !PCC and RIPV.
>>
>> Could you provide the panic message of the guest in your test?
>> I think it can tell me why the mce handler decided to go panic.
>
> That is a bug that the SER_P is not in KVM_MCE_CAP_SUPPORTED in kernel.
> I will fix it as soon as possible. And SRAO MCE should not be sent
> when !SER_P, we should add that condition in qemu-kvm.
That makes sense.
I think it is qemu's responsibility for what follows the AO-SIGBUS,
what action should be taken depends on the KVM's capability.
>>> When I applied a patch to the guest's kernel which forces mce_ser to be
>>> set, as if MCG_SER_P was set (see __mcheck_cpu_cap_init()), I found
>>> that when the memory page was 'owned' by a guest process, the process
>>> would be killed (if the page was dirty), and the guest would stay
>>> running. The HWPoisoned page would be sidelined and not cause any more
>>> issues.
>>
>> Excellent.
>> So while guest kernel knows which page is poisoned, guest processes
>> are controlled not to touch the page.
>>
>> ... Therefore rebooting the vm and renewing kernel will lost the
>> information where is poisoned.
>
> Yes. That is an issue. Dean suggests that make qemu-kvm to refuse reboot
> the guest if there is poisoned page and ask for user to intervention. I
> have another idea to replace the poison pages with good pages when
> reboot, that is, recover without user intervention.
Sounds good.
I think it may be worth something to reserve pages for the replacement
before reboot is requested; at least we really don't want to fail
rebooting with 'no memory'.
>>>>> I think most OSes don't expect that it can receives MCE with !PCC
>>>>> on traditional x86 processor without SER_P.
>>>>>
>>>>> Q1: Is it safe to expect that guests can handle such !PCC event?
>>>
>>> This might be best answered by Huang, but as I mentioned above, without
>>> MCG_SER_P being set, the result was an orderly system panic on the
>>> guest.
>>
>> Though I'll wait Huang (I think he is on holiday), I believe that
>> system panic is just a possible option for AO (Action Optional)
>> event, no matter how the SER_P is.
>
> We should fix this as I said above.
>
>>>>> Q2: What is the expected behavior on the guest?
>>>
>>> I think I answered this above.
>>
>> Yeah, thanks.
>>
>>>
>>>>> Q3: What happen if guest reboots itself in response to the MCE?
>>>
>>> That depends...
>>>
>>> And the following issue also holds for a guest that is rebooted at
>>> some point having successfully sidelined the bad page.
>>>
>>> After the guest has panic'd, a system_reset of the guest or a restart
>>> initiated by crash_kexec() (called by panic() on the guest), usually
>>> results in the guest hanging because the bad page still belongs
>>> to qemu-kvm and is now being referenced by the new guest in some way.
>>
>> Yes. In other words my concern about reboot is that new guest kernel
>> including kdump kernel might try to read the bad page. If there is
>> no AR-SIGBUS etc., we need some tricks to inhibit such accesses.
>>
>>> (It actually may not hang, but successfully reboot and be runnable,
>>> with the bad page lurking in the background. It all seems to depend on
>>> where the bad page ends up, and whether it's ever referenced.)
>>
>> I know some tough guys using their PC with buggy DIMMs :-)
>>
>>>
>>> I believe there was an attempt to deal with this in kvm on the host.
>>> See kvm_handle_bad_page(). This function was suppose to result in the
>>> sending of a BUS_MCEERR_AR flavored SIGBUS by do_sigbus() to qemu-kvm
>>> which in theory would result in the right thing happening. But commit
>>> 96054569190bdec375fe824e48ca1f4e3b53dd36 prevents the signal from being
>>> sent. So this mechanism needs to be re-worked, and the issue remains.
>>
>> Definitely.
>> I guess Huang has some plan or hint for rework this point.
>
> Yes. This should be fixed. The SRAR SIGBUS should be sent directly
> instead of being sent via touching poisoned virtual address.
Good. It should work.
>>> I would think that if the the bad page can't be sidelined, such that
>>> the newly booting guest can't use it, then the new guest shouldn't be
>>> allowed to boot. But perhaps there is some merit in letting it try to
>>> boot and see if one gets 'lucky'.
>>
>> In case of booting a real machine in real world, hardware and firmware
>> usually (or often) do self-test before passing control to OS.
>> Some platform can boot OS with degraded configuration (for example,
>> fewer memory) if it has trouble on its component. Some BIOS may
>> stop booting and show messages like "please reseat [component]" on the
>> screen. So we could implement/request qemu to have such mechanism.
>>
>> I can understand the merit you mentioned here, in some degree. But I
>> think it is hard to say "unlucky" to customer in business...
>
> Because the contents of poisoned pages are not relevant after reboot.
> Qemu can replace the poisoned pages with good pages when reboot guest.
> Do you think that is good.
Sure.
Of course this trick will not needed if user has done migration or
save/restore the guest before a reboot.
Thank you for answering!
Thanks,
H.Seto
next prev parent reply other threads:[~2010-10-08 5:54 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-04 18:54 [patch uq/master 0/8] port qemu-kvm's MCE support Marcelo Tosatti
2010-10-04 18:54 ` [patch uq/master 1/8] signalfd compatibility Marcelo Tosatti
2010-10-04 18:54 ` [patch uq/master 2/8] iothread: use signalfd Marcelo Tosatti
2010-10-04 18:54 ` [patch uq/master 3/8] Expose thread_id in info cpus Marcelo Tosatti
2010-10-04 18:54 ` [patch uq/master 4/8] kvm: x86: add mce support Marcelo Tosatti
2010-10-04 18:54 ` [patch uq/master 5/8] Export qemu_ram_addr_from_host Marcelo Tosatti
2010-10-05 12:57 ` Anthony Liguori
2010-10-05 20:13 ` Marcelo Tosatti
2010-10-05 20:48 ` Anthony Liguori
2010-10-04 18:54 ` [patch uq/master 6/8] Add RAM -> physical addr mapping in MCE simulation Marcelo Tosatti
2010-10-04 18:54 ` [patch uq/master 7/8] MCE: Relay UCR MCE to guest Marcelo Tosatti
2010-10-06 1:10 ` Hidetoshi Seto
2010-10-06 16:02 ` Marcelo Tosatti
2010-10-06 1:58 ` Hidetoshi Seto
2010-10-06 16:05 ` Marcelo Tosatti
2010-10-06 18:10 ` Dean Nelson
2010-10-07 3:41 ` Hidetoshi Seto
2010-10-07 15:23 ` Dean Nelson
2010-10-08 3:15 ` Huang Ying
2010-10-08 5:54 ` Hidetoshi Seto [this message]
2010-10-08 12:02 ` Dean Nelson
2010-10-08 2:50 ` Huang Ying
2010-10-04 18:54 ` [patch uq/master 8/8] Add savevm/loadvm support for MCE Marcelo Tosatti
2010-10-05 16:31 ` [Qemu-devel] [patch uq/master 0/8] port qemu-kvm's MCE support Andreas Färber
2010-10-05 18:58 ` Chris Wright
2010-10-05 20:24 ` Marcelo Tosatti
2010-10-06 17:34 ` [patch uq/master 0/8] port qemu-kvm's MCE support (v2) Marcelo Tosatti
2010-10-06 17:34 ` [patch uq/master 1/8] signalfd compatibility Marcelo Tosatti
2010-10-06 17:34 ` [patch uq/master 2/8] iothread: use signalfd Marcelo Tosatti
2010-10-06 17:34 ` [patch uq/master 3/8] Expose thread_id in info cpus Marcelo Tosatti
2010-10-06 17:34 ` [patch uq/master 4/8] kvm: x86: add mce support Marcelo Tosatti
2010-10-06 19:32 ` Anthony Liguori
2010-10-06 17:34 ` [patch uq/master 5/8] Export qemu_ram_addr_from_host Marcelo Tosatti
2010-10-06 17:34 ` [patch uq/master 6/8] Add RAM -> physical addr mapping in MCE simulation Marcelo Tosatti
2010-10-06 17:34 ` [patch uq/master 7/8] MCE: Relay UCR MCE to guest Marcelo Tosatti
2010-10-06 17:34 ` [patch uq/master 8/8] Add savevm/loadvm support for MCE Marcelo Tosatti
2010-10-11 18:31 ` [patch 0/8] port qemu-kvm's MCE support (v3) Marcelo Tosatti
2010-10-11 18:31 ` [patch 1/8] signalfd compatibility Marcelo Tosatti
2010-10-11 18:31 ` [patch 2/8] iothread: use signalfd Marcelo Tosatti
2010-10-11 18:31 ` [patch 3/8] Expose thread_id in info cpus Marcelo Tosatti
2010-10-11 18:31 ` [patch 4/8] kvm: x86: add mce support Marcelo Tosatti
2010-10-11 18:31 ` [patch 5/8] Export qemu_ram_addr_from_host Marcelo Tosatti
2010-10-11 18:31 ` [patch 6/8] Add RAM -> physical addr mapping in MCE simulation Marcelo Tosatti
2010-10-11 18:31 ` [patch 7/8] MCE: Relay UCR MCE to guest Marcelo Tosatti
2010-10-11 18:31 ` [patch 8/8] Add savevm/loadvm support for MCE Marcelo Tosatti
2010-10-14 10:25 ` [patch 0/8] port qemu-kvm's MCE support (v3) Avi Kivity
2010-10-14 16:21 ` Marcelo Tosatti
2010-10-17 9:32 ` [patch 0/8] port qemu-kvm's MCE support (v3 resend) Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CAEB1FF.6070109@jp.fujitsu.com \
--to=seto.hidetoshi@jp.fujitsu.com \
--cc=dnelson@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox