From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from mx1.redhat.com ([209.132.183.28]) by bombadil.infradead.org with esmtps (Exim 4.87 #1 (Red Hat Linux)) id 1ceXhj-000189-92 for kexec@lists.infradead.org; Fri, 17 Feb 2017 01:51:24 +0000 Subject: Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic References: <20170123125157.u2kefedwpvgcdyfo@pd.tnic> <588606B9.3070604@redhat.com> <20170123145056.fyraeehjfnwmmfb6@pd.tnic> <5886AD91.10803@redhat.com> <20170124122212.3dpdex5wjallypis@pd.tnic> <5889976A.9020802@redhat.com> <20170126064400.wfsn5pzxnpi6gcuk@pd.tnic> <58A53A65.3000405@redhat.com> <20170216101845.vkmnde4v6v72dgzx@pd.tnic> <58A59269.3050706@redhat.com> <20170216122215.uvrckt25g2msfxhe@pd.tnic> From: Xunlei Pang Message-ID: <58A65791.4090600@redhat.com> Date: Fri, 17 Feb 2017 09:53:21 +0800 MIME-Version: 1.0 In-Reply-To: <20170216122215.uvrckt25g2msfxhe@pd.tnic> List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: xlpang@redhat.com Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "kexec" Errors-To: kexec-bounces+dwmw2=infradead.org@lists.infradead.org To: Borislav Petkov , xlpang@redhat.com Cc: Prarit Bhargava , Kiyoshi Ueda , Tony Luck , Peter Zijlstra , x86@kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, Ingo Molnar , Junichi Nomura , Naoya Horiguchi , Dave Young , Thomas Gleixner On 02/16/2017 at 08:22 PM, Borislav Petkov wrote: > On Thu, Feb 16, 2017 at 07:52:09PM +0800, Xunlei Pang wrote: >> then mce will be broadcast to the other cpus which are still running >> in the first kernel(i.e. looping in crash_nmi_callback). > Simple: the crash code should really mark CPUs as not being online: > > void do_machine_check(struct pt_regs *regs, long error_code) > > ... > > /* If this CPU is offline, just bail out. */ > if (cpu_is_offline(smp_processor_id())) { > u64 mcgstatus; > > mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS); > if (mcgstatus & MCG_STATUS_RIPV) { > mce_wrmsrl(MSR_IA32_MCG_STATUS, 0); > return; > } > } > > because looping in crash_nmi_callback() does not really denote them as > CPUs being online. > > And just so that you don't disturb the machine too much during crashing, > you could simply clear them from the online masks, i.e., perhaps call > remove_cpu_from_maps() with the proper locking around it instead of > doing a full cpu_down(). It changes the value of cpu_online_mask/etc which will cause confusion to vmcore analysis. Moreover, for the code(see comment inlined) if (cpu_is_offline(smp_processor_id())) { u64 mcgstatus; mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS); if (mcgstatus & MCG_STATUS_RIPV) { // This condition may be not true, the mce triggered on kdump cpu // doesn't need to have this bit set for the other cpus remain in 1st kernel. mce_wrmsrl(MSR_IA32_MCG_STATUS, 0); return; } } Regards, Xunlei > > The machine will be killed anyway after kdump is done writing out > memory. > _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932134AbdBQBvE (ORCPT ); Thu, 16 Feb 2017 20:51:04 -0500 Received: from mx1.redhat.com ([209.132.183.28]:60660 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754856AbdBQBvC (ORCPT ); Thu, 16 Feb 2017 20:51:02 -0500 Reply-To: xlpang@redhat.com Subject: Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic References: <20170123125157.u2kefedwpvgcdyfo@pd.tnic> <588606B9.3070604@redhat.com> <20170123145056.fyraeehjfnwmmfb6@pd.tnic> <5886AD91.10803@redhat.com> <20170124122212.3dpdex5wjallypis@pd.tnic> <5889976A.9020802@redhat.com> <20170126064400.wfsn5pzxnpi6gcuk@pd.tnic> <58A53A65.3000405@redhat.com> <20170216101845.vkmnde4v6v72dgzx@pd.tnic> <58A59269.3050706@redhat.com> <20170216122215.uvrckt25g2msfxhe@pd.tnic> To: Borislav Petkov , xlpang@redhat.com Cc: x86@kernel.org, linux-kernel@vger.kernel.org, kexec@lists.infradead.org, Tony Luck , Ingo Molnar , Dave Young , Prarit Bhargava , Junichi Nomura , Kiyoshi Ueda , Naoya Horiguchi , Peter Zijlstra , Thomas Gleixner From: Xunlei Pang Message-ID: <58A65791.4090600@redhat.com> Date: Fri, 17 Feb 2017 09:53:21 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <20170216122215.uvrckt25g2msfxhe@pd.tnic> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Fri, 17 Feb 2017 01:51:03 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/16/2017 at 08:22 PM, Borislav Petkov wrote: > On Thu, Feb 16, 2017 at 07:52:09PM +0800, Xunlei Pang wrote: >> then mce will be broadcast to the other cpus which are still running >> in the first kernel(i.e. looping in crash_nmi_callback). > Simple: the crash code should really mark CPUs as not being online: > > void do_machine_check(struct pt_regs *regs, long error_code) > > ... > > /* If this CPU is offline, just bail out. */ > if (cpu_is_offline(smp_processor_id())) { > u64 mcgstatus; > > mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS); > if (mcgstatus & MCG_STATUS_RIPV) { > mce_wrmsrl(MSR_IA32_MCG_STATUS, 0); > return; > } > } > > because looping in crash_nmi_callback() does not really denote them as > CPUs being online. > > And just so that you don't disturb the machine too much during crashing, > you could simply clear them from the online masks, i.e., perhaps call > remove_cpu_from_maps() with the proper locking around it instead of > doing a full cpu_down(). It changes the value of cpu_online_mask/etc which will cause confusion to vmcore analysis. Moreover, for the code(see comment inlined) if (cpu_is_offline(smp_processor_id())) { u64 mcgstatus; mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS); if (mcgstatus & MCG_STATUS_RIPV) { // This condition may be not true, the mce triggered on kdump cpu // doesn't need to have this bit set for the other cpus remain in 1st kernel. mce_wrmsrl(MSR_IA32_MCG_STATUS, 0); return; } } Regards, Xunlei > > The machine will be killed anyway after kdump is done writing out > memory. >