From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751282AbdBUB0C (ORCPT ); Mon, 20 Feb 2017 20:26:02 -0500 Received: from mx1.redhat.com ([209.132.183.28]:51526 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750928AbdBUB0B (ORCPT ); Mon, 20 Feb 2017 20:26:01 -0500 Reply-To: xlpang@redhat.com Subject: Re: [PATCH v2] x86/mce: Don't participate in rendezvous process once nmi_shootdown_cpus() was made References: <1487571037-10821-1-git-send-email-xlpang@redhat.com> <20170220110941.vwcm3je3e4kkei6o@pd.tnic> <58AAEF34.9000303@redhat.com> <20170220202654.kakjxgmxrlhak6q3@pd.tnic> To: Borislav Petkov , xlpang@redhat.com Cc: x86@kernel.org, linux-kernel@vger.kernel.org, kexec@lists.infradead.org, Tony Luck , Ingo Molnar , Dave Young , Prarit Bhargava , Junichi Nomura , Kiyoshi Ueda , Naoya Horiguchi From: Xunlei Pang Message-ID: <58AB97B4.8040303@redhat.com> Date: Tue, 21 Feb 2017 09:28:20 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <20170220202654.kakjxgmxrlhak6q3@pd.tnic> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Tue, 21 Feb 2017 01:26:01 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/21/2017 at 04:26 AM, Borislav Petkov wrote: > On Mon, Feb 20, 2017 at 09:29:24PM +0800, Xunlei Pang wrote: >> There is a small window between crash and kdump kernel boot, so >> if a SRAO comes within this window it will also cause the mce >> synchronization problem on the crashing cpu if we don't bail out the >> crashing cpu. > You mean, in the window between, kdump kernel starts writing out memory > and the second, kexec-ed kernel? Not kdump kernel starts dumping, just during nmi_shootdown_cpus(), if some MCE comes after crashing_cpu was set and we don't skip crashing_cpu, then the crashing cpu will enter mce handler and trigger the synchronization issue. > > If so, please add that information to the place in do_machine_check() > where we check crashing_cpu so that we know why we're doing this > temporary ignore of #MC. Ok, will add, thanks for the feedback. Regards, Xunlei