From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=37552 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1P3YRt-00044P-S5 for qemu-devel@nongnu.org; Wed, 06 Oct 2010 14:10:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1P3YRr-0005TA-Aj for qemu-devel@nongnu.org; Wed, 06 Oct 2010 14:10:41 -0400 Received: from mx1.redhat.com ([209.132.183.28]:23841) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1P3YRr-0005S3-3K for qemu-devel@nongnu.org; Wed, 06 Oct 2010 14:10:39 -0400 Message-ID: <4CACBB94.10200@redhat.com> Date: Wed, 06 Oct 2010 13:10:28 -0500 From: Dean Nelson MIME-Version: 1.0 References: <20101004185447.891324545@redhat.com> <20101004185715.167557459@redhat.com> <4CABD7CC.6030909@jp.fujitsu.com> <20101006160531.GB4277@amt.cnet> In-Reply-To: <20101006160531.GB4277@amt.cnet> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Hidetoshi Seto Cc: Marcelo Tosatti , qemu-devel@nongnu.org, kvm@vger.kernel.org, Huang Ying On 10/06/2010 11:05 AM, Marcelo Tosatti wrote: > On Wed, Oct 06, 2010 at 10:58:36AM +0900, Hidetoshi Seto wrote: >> I got some more question: >> >> (2010/10/05 3:54), Marcelo Tosatti wrote: >>> Index: qemu/target-i386/cpu.h >>> =================================================================== >>> --- qemu.orig/target-i386/cpu.h >>> +++ qemu/target-i386/cpu.h >>> @@ -250,16 +250,32 @@ >>> #define PG_ERROR_RSVD_MASK 0x08 >>> #define PG_ERROR_I_D_MASK 0x10 >>> >>> -#define MCG_CTL_P (1UL<<8) /* MCG_CAP register available */ >>> +#define MCG_CTL_P (1ULL<<8) /* MCG_CAP register available */ >>> +#define MCG_SER_P (1ULL<<24) /* MCA recovery/new status bits */ >>> >>> -#define MCE_CAP_DEF MCG_CTL_P >>> +#define MCE_CAP_DEF (MCG_CTL_P|MCG_SER_P) >>> #define MCE_BANKS_DEF 10 >>> >> >> It seems that current kvm doesn't support SER_P, so injecting SRAO >> to guest will mean that guest receives VAL|UC|!PCC and RIPV event >> from virtual processor that doesn't have SER_P. > > Dean also noted this. I don't think it was deliberate choice to not > expose SER_P. Huang? In my testing, I found that MCG_SER_P was not being set (and I was running on a Nehalem-EX system). Injecting a MCE resulted in the guest entering into panic() from mce_panic(). If crash_kexec() finds a kexec_crash_image the system ends up rebooting, otherwise, what happens next requires operator intervention. When I applied a patch to the guest's kernel which forces mce_ser to be set, as if MCG_SER_P was set (see __mcheck_cpu_cap_init()), I found that when the memory page was 'owned' by a guest process, the process would be killed (if the page was dirty), and the guest would stay running. The HWPoisoned page would be sidelined and not cause any more issues. >> I think most OSes don't expect that it can receives MCE with !PCC >> on traditional x86 processor without SER_P. >> >> Q1: Is it safe to expect that guests can handle such !PCC event? This might be best answered by Huang, but as I mentioned above, without MCG_SER_P being set, the result was an orderly system panic on the guest. >> Q2: What is the expected behavior on the guest? I think I answered this above. >> Q3: What happen if guest reboots itself in response to the MCE? That depends... And the following issue also holds for a guest that is rebooted at some point having successfully sidelined the bad page. After the guest has panic'd, a system_reset of the guest or a restart initiated by crash_kexec() (called by panic() on the guest), usually results in the guest hanging because the bad page still belongs to qemu-kvm and is now being referenced by the new guest in some way. (It actually may not hang, but successfully reboot and be runnable, with the bad page lurking in the background. It all seems to depend on where the bad page ends up, and whether it's ever referenced.) I believe there was an attempt to deal with this in kvm on the host. See kvm_handle_bad_page(). This function was suppose to result in the sending of a BUS_MCEERR_AR flavored SIGBUS by do_sigbus() to qemu-kvm which in theory would result in the right thing happening. But commit 96054569190bdec375fe824e48ca1f4e3b53dd36 prevents the signal from being sent. So this mechanism needs to be re-worked, and the issue remains. I would think that if the the bad page can't be sidelined, such that the newly booting guest can't use it, then the new guest shouldn't be allowed to boot. But perhaps there is some merit in letting it try to boot and see if one gets 'lucky'. I understand that Huang is looking into what should be done. He can give you better information than I in answer to your questions. Dean