From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752753Ab2GTMSx (ORCPT ); Fri, 20 Jul 2012 08:18:53 -0400 Received: from s15943758.onlinehome-server.info ([217.160.130.188]:41352 "EHLO mail.x86-64.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751188Ab2GTMSw (ORCPT ); Fri, 20 Jul 2012 08:18:52 -0400 Date: Fri, 20 Jul 2012 14:18:48 +0200 From: Borislav Petkov To: Tony Luck Cc: linux-kernel@vger.kernel.org, Ingo Molnar , Chen Gong , "Huang, Ying" , Hidetoshi Seto Subject: Re: [PATCH 2/2] x86/mce: Add quirk for instruction recovery on Sandy Bridge processors Message-ID: <20120720121848.GA29183@aftab.osrc.amd.com> References: <180a06f3f357cf9f78259ae443a082b14a29535b.1342723082.git.tony.luck@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <180a06f3f357cf9f78259ae443a082b14a29535b.1342723082.git.tony.luck@intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 19, 2012 at 11:28:46AM -0700, Tony Luck wrote: > Sandy Bridge processors follow the SDM (Vol 3B, Table 15-20) and set > both the RIPV and EIPV bits in the MCG_STATUS register to zero for > machine checks during instruction fetch. This is more than a little > counter-intuitive and means that Linux cannot recover from these > errors. Rather than insert special case code at several places in mce.c > and mce-severity.c, we pretend the EIPV bit was set for just this case > early in processing the machine check. > > Signed-off-by: Tony Luck Looks ok, just minor nitpick below. > --- > arch/x86/kernel/cpu/mcheck/mce.c | 43 +++++++++++++++++++++++++++++++++++++--- > 1 file changed, 40 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c > index da27c5d..e65e738 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce.c > +++ b/arch/x86/kernel/cpu/mcheck/mce.c > @@ -102,6 +102,8 @@ DEFINE_PER_CPU(mce_banks_t, mce_poll_banks) = { > > static DEFINE_PER_CPU(struct work_struct, mce_work); > > +static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs); > + > /* > * CPU/chipset specific EDAC code can register a notifier call here to print > * MCE errors in a human-readable form. > @@ -649,14 +651,18 @@ EXPORT_SYMBOL_GPL(machine_check_poll); > * Do a quick check if any of the events requires a panic. > * This decides if we keep the events around or clear them. > */ > -static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp) > +static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp, > + struct pt_regs *regs) > { > int i, ret = 0; > > for (i = 0; i < banks; i++) { > m->status = mce_rdmsrl(MSR_IA32_MCx_STATUS(i)); > - if (m->status & MCI_STATUS_VAL) > + if (m->status & MCI_STATUS_VAL) { > __set_bit(i, validp); > + if (quirk_no_way_out) > + quirk_no_way_out(i, m, regs); Maybe define a default empty quirk_no_way_out() on the remaining families/vendors so that the compiler can optimize it away and we save ourselves the if-test? -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach GM: Alberto Bozzo Reg: Dornach, Landkreis Muenchen HRB Nr. 43632 WEEE Registernr: 129 19551