From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from e23smtp07.au.ibm.com ([202.81.31.140]) by canuck.infradead.org with esmtps (Exim 4.72 #1 (Red Hat Linux)) id 1QJTs3-00046g-K7 for kexec@lists.infradead.org; Mon, 09 May 2011 17:03:49 +0000 Received: from d23relay04.au.ibm.com (d23relay04.au.ibm.com [202.81.31.246]) by e23smtp07.au.ibm.com (8.14.4/8.13.1) with ESMTP id p49H3in0025301 for ; Tue, 10 May 2011 03:03:44 +1000 Received: from d23av02.au.ibm.com (d23av02.au.ibm.com [9.190.235.138]) by d23relay04.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p49H3fp6770112 for ; Tue, 10 May 2011 03:03:41 +1000 Received: from d23av02.au.ibm.com (loopback [127.0.0.1]) by d23av02.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p49H3hcg005414 for ; Tue, 10 May 2011 03:03:43 +1000 Date: Mon, 9 May 2011 22:33:36 +0530 From: "K.Prasad" Subject: Re: [Bug] Kdump does not work when panic triggered due to MCE Message-ID: <20110509170336.GC1963@in.ibm.com> References: <20110506165412.GB2719@in.ibm.com> <20110509123902.GA5975@redhat.com> <4DC80662.9000703@canonical.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <4DC80662.9000703@canonical.com> Reply-To: prasad@linux.vnet.ibm.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Sender: kexec-bounces@lists.infradead.org Errors-To: kexec-bounces+dwmw2=twosheds.infradead.org@lists.infradead.org To: Bouchard Louis Cc: Andi Kleen , "kexec@lists.infradead.org" , Vivek Goyal On Mon, May 09, 2011 at 05:21:06PM +0200, Bouchard Louis wrote: > Hello, > = > Le 09/05/2011 14:39, Vivek Goyal a =E9crit : > > > > Prasad, > > > > I have never tried taking dump in MCE situation. Does kdump work on this > > machine with normal panic()? > > > > Use --debug and --serial option in kexec-tools to print some debug mess= age > > and look for "I am in purgatory". This will tell you whether you hanged > > in first kernel or second kernel. > > > > Then put "outb()" messages in the kernel to trace what happened. = > > > > Thanks > > Vivek > > > > _______________________________________________ > > kexec mailing list > > kexec@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/kexec > I have seen numerous occurrences of MCE triggered kernel panics on both > RHEL & SLES environment used on IA32 architecture. Both in contexts > where kexec/kdump was being used. > That's interesting! Assuming that these are not software induced MCEs but panic() calls invoked due to unrecoverable memory errors in a physical machine, did you experience any situation where the kdump kernel hung/rebooted due to a second MCE (triggered while reading the faulty memory location belonging to the first kernel)? = > Matter of fact, MCE triggered panic are part of the reason that pushed > me to work on crashdc : only one crash command is required to get the > MCE trace out of the kernel ring buffer. This avoids transfering massive > amount of vmcore file over the net. > = What is the data that is contained in the faulty memory location (whose I/O triggered an MCE in the first place)? Basically we'd like to understand what a 'read' operation on the corrupted memory location would result in. > crashdc does well on those, mcelog can be applied on the data gathered. > We're contemplating a solution on the similar lines (refer the description of 'slim' kdump at https://lkml.org/lkml/2011/5/4/396) to create a 'crash tool readable coredump containing a message that indicates the cause of the crash as MCE (and not any data from the old memory). I'll take a look at the crashdc code and see if there are ideas that we can borrow from there. Thanks, K.Prasad = _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec