From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754554AbbCBOdq (ORCPT ); Mon, 2 Mar 2015 09:33:46 -0500 Received: from mail-pd0-f175.google.com ([209.85.192.175]:40759 "EHLO mail-pd0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750966AbbCBOdl (ORCPT ); Mon, 2 Mar 2015 09:33:41 -0500 Message-ID: <54F474BD.1010802@gmail.com> Date: Mon, 02 Mar 2015 23:33:33 +0900 From: Naoya Horiguchi User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Borislav Petkov CC: Naoya Horiguchi , "Luck, Tony" , Prarit Bhargava , Vivek Goyal , "linux-kernel@vger.kernel.org" , Junichi Nomura , Kiyoshi Ueda Subject: Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec References: <1425013116-23581-1-git-send-email-n-horiguchi@ah.jp.nec.com> <54F05080.9090605@redhat.com> <20150227120648.GA3337@pd.tnic> <3908561D78D1C84285E8C5FCA982C28F329F18A7@ORSMSX114.amr.corp.intel.com> <20150302023118.GB25064@hori1.linux.bs1.fc.nec.co.jp> <20150302121701.GA17521@pd.tnic> In-Reply-To: <20150302121701.GA17521@pd.tnic> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 02, 2015 at 01:17:01PM +0100, Borislav Petkov wrote: > On Mon, Mar 02, 2015 at 02:31:19AM +0000, Naoya Horiguchi wrote: > > And please note that the target of this patch is an MCE when the kernel is > > already running on kdump code (so crashing happened *not* because of the MCE). > > In that case, we can expect that kdump works fine if the MCE hits the "kdump > > shotdown" CPU which are just running cpu_relax() loop, because a 2nd kernel's > > CPU isn't affected by the MCE (even the CPU failure is fatal one.) > > Well, why would you even want to disable MCA then? If all the CPUs are > offlined, it is very very highly unlikely they'd cause an MCE. Yes, CPU offlining is one option to keep other CPUs quiet. I'm not sure why current kexec implementation doesn't offline the other CPUs but just doing cpu_relax() loop, but my guess is that in some kernel panic situation (like soft lockup) we want to keep CPUs' status undisturbed to make sure the bug's info is captured in kdump. > > If a fatal MCE happens on the CPU running kdump code, there's no reason to > > try harder to get kdump as you pointed out. In such case, what we can do is > > to print out a message like "kdump failed due to MCE" and reset the system. > > Yes, so a primitive kdump-specific MCE handler would be more viable than > disabling MCA. OK. Thanks, Naoya Horiguchi