qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Huang Ying <ying.huang@intel.com>
To: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Dean Nelson <dnelson@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: [Qemu-devel] Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest
Date: Fri, 08 Oct 2010 11:15:54 +0800	[thread overview]
Message-ID: <1286507754.7768.66.camel@yhuang-dev> (raw)
In-Reply-To: <4CAD417B.7060808@jp.fujitsu.com>

Hi, Seto,

On Thu, 2010-10-07 at 11:41 +0800, Hidetoshi Seto wrote:
> (2010/10/07 3:10), Dean Nelson wrote:
> > On 10/06/2010 11:05 AM, Marcelo Tosatti wrote:
> >> On Wed, Oct 06, 2010 at 10:58:36AM +0900, Hidetoshi Seto wrote:
> >>> I got some more question:
> >>>
> >>> (2010/10/05 3:54), Marcelo Tosatti wrote:
> >>>> Index: qemu/target-i386/cpu.h
> >>>> ===================================================================
> >>>> --- qemu.orig/target-i386/cpu.h
> >>>> +++ qemu/target-i386/cpu.h
> >>>> @@ -250,16 +250,32 @@
> >>>>   #define PG_ERROR_RSVD_MASK 0x08
> >>>>   #define PG_ERROR_I_D_MASK  0x10
> >>>>
> >>>> -#define MCG_CTL_P    (1UL<<8)   /* MCG_CAP register available */
> >>>> +#define MCG_CTL_P    (1ULL<<8)   /* MCG_CAP register available */
> >>>> +#define MCG_SER_P    (1ULL<<24) /* MCA recovery/new status bits */
> >>>>
> >>>> -#define MCE_CAP_DEF    MCG_CTL_P
> >>>> +#define MCE_CAP_DEF    (MCG_CTL_P|MCG_SER_P)
> >>>>   #define MCE_BANKS_DEF    10
> >>>>
> >>>
> >>> It seems that current kvm doesn't support SER_P, so injecting SRAO
> >>> to guest will mean that guest receives VAL|UC|!PCC and RIPV event
> >>> from virtual processor that doesn't have SER_P.
> >>
> >> Dean also noted this. I don't think it was deliberate choice to not
> >> expose SER_P. Huang?
> > 
> > In my testing, I found that MCG_SER_P was not being set (and I was
> > running on a Nehalem-EX system). Injecting a MCE resulted in the
> > guest entering into panic() from mce_panic(). If crash_kexec()
> > finds a kexec_crash_image the system ends up rebooting, otherwise,
> > what happens next requires operator intervention.
> 
> Good to know.
> What I'm concerning is that if memory scrubbing SRAO event is
> injected when !SER_P, linux guest with certain mce tolerant level
> might grade it as "UC" severity and continue running with none of
> panicking, killing and poisoning because of !PCC and RIPV.
> 
> Could you provide the panic message of the guest in your test?
> I think it can tell me why the mce handler decided to go panic.

That is a bug that the SER_P is not in KVM_MCE_CAP_SUPPORTED in kernel.
I will fix it as soon as possible. And SRAO MCE should not be sent
when !SER_P, we should add that condition in qemu-kvm.

> > When I applied a patch to the guest's kernel which forces mce_ser to be
> > set, as if MCG_SER_P was set (see __mcheck_cpu_cap_init()), I found
> > that when the memory page was 'owned' by a guest process, the process
> > would be killed (if the page was dirty), and the guest would stay
> > running. The HWPoisoned page would be sidelined and not cause any more
> > issues.
> 
> Excellent.
> So while guest kernel knows which page is poisoned, guest processes
> are controlled not to touch the page.
> 
> ... Therefore rebooting the vm and renewing kernel will lost the
> information where is poisoned.

Yes. That is an issue. Dean suggests that make qemu-kvm to refuse reboot
the guest if there is poisoned page and ask for user to intervention. I
have another idea to replace the poison pages with good pages when
reboot, that is, recover without user intervention.

> >>> I think most OSes don't expect that it can receives MCE with !PCC
> >>> on traditional x86 processor without SER_P.
> >>>
> >>> Q1: Is it safe to expect that guests can handle such !PCC event?
> > 
> > This might be best answered by Huang, but as I mentioned above, without
> > MCG_SER_P being set, the result was an orderly system panic on the
> > guest.
> 
> Though I'll wait Huang (I think he is on holiday), I believe that
> system panic is just a possible option for AO (Action Optional)
> event, no matter how the SER_P is.

We should fix this as I said above.

> >>> Q2: What is the expected behavior on the guest?
> > 
> > I think I answered this above.
> 
> Yeah, thanks.
> 
> > 
> >>> Q3: What happen if guest reboots itself in response to the MCE?
> > 
> > That depends...
> > 
> > And the following issue also holds for a guest that is rebooted at
> > some point having successfully sidelined the bad page.
> > 
> > After the guest has panic'd, a system_reset of the guest or a restart
> > initiated by crash_kexec() (called by panic() on the guest), usually
> > results in the guest hanging because the bad page still belongs
> > to qemu-kvm and is now being referenced by the new guest in some way.
> 
> Yes. In other words my concern about reboot is that new guest kernel
> including kdump kernel might try to read the bad page.  If there is
> no AR-SIGBUS etc., we need some tricks to inhibit such accesses.
> 
> > (It actually may not hang, but successfully reboot and be runnable,
> > with the bad page lurking in the background. It all seems to depend on
> > where the bad page ends up, and whether it's ever referenced.)
> 
> I know some tough guys using their PC with buggy DIMMs :-)
> 
> > 
> > I believe there was an attempt to deal with this in kvm on the host.
> > See kvm_handle_bad_page(). This function was suppose to result in the
> > sending of a BUS_MCEERR_AR flavored SIGBUS by do_sigbus() to qemu-kvm
> > which in theory would result in the right thing happening. But commit
> > 96054569190bdec375fe824e48ca1f4e3b53dd36 prevents the signal from being
> > sent. So this mechanism needs to be re-worked, and the issue remains.
> 
> Definitely.
> I guess Huang has some plan or hint for rework this point.

Yes. This should be fixed. The SRAR SIGBUS should be sent directly
instead of being sent via touching poisoned virtual address.
 
> > I would think that if the the bad page can't be sidelined, such that
> > the newly booting guest can't use it, then the new guest shouldn't be
> > allowed to boot. But perhaps there is some merit in letting it try to
> > boot and see if one gets 'lucky'.
> 
> In case of booting a real machine in real world, hardware and firmware
> usually (or often) do self-test before passing control to OS.
> Some platform can boot OS with degraded configuration (for example,
> fewer memory) if it has trouble on its component.  Some BIOS may
> stop booting and show messages like "please reseat [component]" on the
> screen.  So we could implement/request qemu to have such mechanism.
> 
> I can understand the merit you mentioned here, in some degree. But I
> think it is hard to say "unlucky" to customer in business...

Because the contents of poisoned pages are not relevant after reboot.
Qemu can replace the poisoned pages with good pages when reboot guest.
Do you think that is good.

Best Regards,
Huang Ying

  parent reply	other threads:[~2010-10-08  3:16 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-04 18:54 [Qemu-devel] [patch uq/master 0/8] port qemu-kvm's MCE support Marcelo Tosatti
2010-10-04 18:54 ` [Qemu-devel] [patch uq/master 1/8] signalfd compatibility Marcelo Tosatti
2010-10-04 18:54 ` [Qemu-devel] [patch uq/master 2/8] iothread: use signalfd Marcelo Tosatti
2010-10-04 18:54 ` [Qemu-devel] [patch uq/master 3/8] Expose thread_id in info cpus Marcelo Tosatti
2010-10-04 18:54 ` [Qemu-devel] [patch uq/master 4/8] kvm: x86: add mce support Marcelo Tosatti
2010-10-04 18:54 ` [Qemu-devel] [patch uq/master 5/8] Export qemu_ram_addr_from_host Marcelo Tosatti
2010-10-05 12:57   ` [Qemu-devel] " Anthony Liguori
2010-10-05 20:13     ` Marcelo Tosatti
2010-10-05 20:48       ` Anthony Liguori
2010-10-04 18:54 ` [Qemu-devel] [patch uq/master 6/8] Add RAM -> physical addr mapping in MCE simulation Marcelo Tosatti
2010-10-04 18:54 ` [Qemu-devel] [patch uq/master 7/8] MCE: Relay UCR MCE to guest Marcelo Tosatti
2010-10-06  1:10   ` [Qemu-devel] " Hidetoshi Seto
2010-10-06 16:02     ` Marcelo Tosatti
2010-10-06  1:58   ` Hidetoshi Seto
2010-10-06 16:05     ` Marcelo Tosatti
2010-10-06 18:10       ` Dean Nelson
2010-10-07  3:41         ` Hidetoshi Seto
2010-10-07 15:23           ` Dean Nelson
2010-10-08  3:15           ` Huang Ying [this message]
2010-10-08  5:54             ` Hidetoshi Seto
2010-10-08 12:02             ` Dean Nelson
2010-10-08  2:50       ` Huang Ying
2010-10-04 18:54 ` [Qemu-devel] [patch uq/master 8/8] Add savevm/loadvm support for MCE Marcelo Tosatti
2010-10-05 16:31 ` [Qemu-devel] [patch uq/master 0/8] port qemu-kvm's MCE support Andreas Färber
2010-10-05 18:58   ` Chris Wright
2010-10-05 20:24     ` Marcelo Tosatti
2010-10-06 17:34 ` [Qemu-devel] [patch uq/master 0/8] port qemu-kvm's MCE support (v2) Marcelo Tosatti
2010-10-06 17:34   ` [Qemu-devel] [patch uq/master 1/8] signalfd compatibility Marcelo Tosatti
2010-10-06 17:34   ` [Qemu-devel] [patch uq/master 2/8] iothread: use signalfd Marcelo Tosatti
2010-10-06 17:34   ` [Qemu-devel] [patch uq/master 3/8] Expose thread_id in info cpus Marcelo Tosatti
2010-10-06 17:34   ` [Qemu-devel] [patch uq/master 4/8] kvm: x86: add mce support Marcelo Tosatti
2010-10-06 19:32     ` [Qemu-devel] " Anthony Liguori
2010-10-06 17:34   ` [Qemu-devel] [patch uq/master 5/8] Export qemu_ram_addr_from_host Marcelo Tosatti
2010-10-06 17:34   ` [Qemu-devel] [patch uq/master 6/8] Add RAM -> physical addr mapping in MCE simulation Marcelo Tosatti
2010-10-06 17:34   ` [Qemu-devel] [patch uq/master 7/8] MCE: Relay UCR MCE to guest Marcelo Tosatti
2010-10-06 17:34   ` [Qemu-devel] [patch uq/master 8/8] Add savevm/loadvm support for MCE Marcelo Tosatti
2010-10-11 18:31   ` [Qemu-devel] [patch 0/8] port qemu-kvm's MCE support (v3) Marcelo Tosatti
2010-10-11 18:31     ` [Qemu-devel] [patch 1/8] signalfd compatibility Marcelo Tosatti
2010-10-11 18:31     ` [Qemu-devel] [patch 2/8] iothread: use signalfd Marcelo Tosatti
2010-10-11 18:31     ` [Qemu-devel] [patch 3/8] Expose thread_id in info cpus Marcelo Tosatti
2010-10-11 18:31     ` [Qemu-devel] [patch 4/8] kvm: x86: add mce support Marcelo Tosatti
2010-10-11 18:31     ` [Qemu-devel] [patch 5/8] Export qemu_ram_addr_from_host Marcelo Tosatti
2010-10-11 18:31     ` [Qemu-devel] [patch 6/8] Add RAM -> physical addr mapping in MCE simulation Marcelo Tosatti
2010-10-11 18:31     ` [Qemu-devel] [patch 7/8] MCE: Relay UCR MCE to guest Marcelo Tosatti
2010-10-11 18:31     ` [Qemu-devel] [patch 8/8] Add savevm/loadvm support for MCE Marcelo Tosatti
2010-10-14 10:25     ` [Qemu-devel] Re: [patch 0/8] port qemu-kvm's MCE support (v3) Avi Kivity
2010-10-14 16:21       ` Marcelo Tosatti
2010-10-17  9:32     ` [Qemu-devel] Re: [patch 0/8] port qemu-kvm's MCE support (v3 resend) Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1286507754.7768.66.camel@yhuang-dev \
    --to=ying.huang@intel.com \
    --cc=dnelson@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=seto.hidetoshi@jp.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).