From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752962AbbDIHE6 (ORCPT ); Thu, 9 Apr 2015 03:04:58 -0400 Received: from mail.skyhub.de ([78.46.96.112]:53458 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751535AbbDIHE4 (ORCPT ); Thu, 9 Apr 2015 03:04:56 -0400 Date: Thu, 9 Apr 2015 09:02:45 +0200 From: Borislav Petkov To: Naoya Horiguchi , Tony Luck Cc: Ingo Molnar , Prarit Bhargava , Vivek Goyal , "linux-kernel@vger.kernel.org" , Junichi Nomura , Kiyoshi Ueda Subject: Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump Message-ID: <20150409070244.GC25434@pd.tnic> References: <20150306090930.GA14982@hori1.linux.bs1.fc.nec.co.jp> <20150306092738.GE3514@pd.tnic> <20150306093212.GB14982@hori1.linux.bs1.fc.nec.co.jp> <20150306102216.GA22787@hori1.linux.bs1.fc.nec.co.jp> <20150406071803.GA22950@hori1.linux.bs1.fc.nec.co.jp> <20150406115923.GD4078@pd.tnic> <20150407080017.GB27856@hori1.linux.bs1.fc.nec.co.jp> <20150407080218.GC27856@hori1.linux.bs1.fc.nec.co.jp> <20150409061346.GA25434@pd.tnic> <20150409065737.GA9862@hori1.linux.bs1.fc.nec.co.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20150409065737.GA9862@hori1.linux.bs1.fc.nec.co.jp> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 09, 2015 at 06:57:38AM +0000, Naoya Horiguchi wrote: > Yes, I did see it at fisrt, so I did two tweaks for the testing: > > 1) to fix qemu code. I think that current mce injection code of qemu is buggy, > because when we try to inject MCE in broadcast mode, all injections other than > the first one are done with MCG_STATUS_MCIP (see cpu_x86_inject_mce()@target-i386/helper.c.) > It looks to me a bug because this means that every (broadcast mode) MCE injection > causes triplet-fault, which seems not mimicking the real HW behavior. > > 2) to insert the delay (for a few seconds) into kdump_nmi_callback() before > disable_local_APIC(). This is because MCE interrupt is delivered to CPUs in > different manners in qemu and in bare metal. Bare metals do respond to MCE > interrupts after disable_local_APIC(), but qemu not. Lemme take a look at that. > Unfortunately our testing (~15000 times kdump/reboot cycles) with the debug > kernel on bare metals didn't reproduce the problem yet, but I believe that > the above testing on qemu should hit a target. If only APEI EINJ could be taught to do delayed injection, regardless of OS kernel running. Tony, is something like that even possible at all? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. --