From: Vivek Goyal <vgoyal@redhat.com>
To: "K.Prasad" <prasad@linux.vnet.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>,
kexec@lists.infradead.org,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
"Luck, Tony" <tony.luck@intel.com>
Subject: Re: [Bug] Kdump does not work when panic triggered due to MCE
Date: Mon, 9 May 2011 13:05:28 -0400 [thread overview]
Message-ID: <20110509170528.GG5975@redhat.com> (raw)
In-Reply-To: <20110509165357.GB1963@in.ibm.com>
On Mon, May 09, 2011 at 10:23:57PM +0530, K.Prasad wrote:
> On Mon, May 09, 2011 at 08:39:02AM -0400, Vivek Goyal wrote:
> > On Fri, May 06, 2011 at 10:24:12PM +0530, K.Prasad wrote:
> > > Hi All,
> > > I wanted to test the behaviour of kdump when panic is triggered
> > > due to MCE on x86 and found that kdump is not captured.
> > >
> > > While the kdump service is configured and running and non-MCE panics
> > > (such as those triggered through to /proc/sysrq-trigger) successfully
> > > capture a kdump, any fatal MCE error injected through the mce-inject
> > > tool causes a reboot of the machine.
> > >
> > > The code has been traced (using early_serial_putc()) to enter the kexec
> > > path i.e. panic()->crash_kexec()->machine_kexec()->relocate_kernel()
> > > but is untraceable further.
> > >
> > > Kdump works fine when the same the similar test is carried out inside a
> > > KVM guest.
> > >
> > > Has anybody tested this before? Or have found kdump working when fatal
> > > MCEs have actually occurred?
> >
> > Prasad,
> >
> > I have never tried taking dump in MCE situation. Does kdump work on this
> > machine with normal panic()?
> >
>
> Hi Vivek,
> kdump worked fine on this machine for non-MCE triggered panic
> calls (the /proc/sysrq-trigger initiated crashes got the kdump fine).
>
> > Use --debug and --serial option in kexec-tools to print some debug message
> > and look for "I am in purgatory". This will tell you whether you hanged
> > in first kernel or second kernel.
> >
>
> There were no boot logs from the second kernel while the "Rebooting in X
> seconds..." message had appeared before the system rebooted, suggesting
> that the second kernel did not boot at all.
>
> > Then put "outb()" messages in the kernel to trace what happened.
> >
>
> The outb logs showed that the system entered machine_kexec function (traceable
> upto relocate_kernel) but then rebooted from inside the panic() function.
Ok, that means that we returned from crash_kexec() function instead of
transitioning into second kernel. This is strange. machine_kexec() is not
supposed to return until and unless it finds that there is no crash
kernel loaded. As per your mail you can trace it to relocate_kernel()
being entered. So only thing I can suggest is debug relocate_kernel()
code now to see why it is returning.
Thanks
Vivek
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
WARNING: multiple messages have this Message-ID (diff)
From: Vivek Goyal <vgoyal@redhat.com>
To: "K.Prasad" <prasad@linux.vnet.ibm.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Andi Kleen <andi@firstfloor.org>,
"Luck, Tony" <tony.luck@intel.com>,
kexec@lists.infradead.org
Subject: Re: [Bug] Kdump does not work when panic triggered due to MCE
Date: Mon, 9 May 2011 13:05:28 -0400 [thread overview]
Message-ID: <20110509170528.GG5975@redhat.com> (raw)
In-Reply-To: <20110509165357.GB1963@in.ibm.com>
On Mon, May 09, 2011 at 10:23:57PM +0530, K.Prasad wrote:
> On Mon, May 09, 2011 at 08:39:02AM -0400, Vivek Goyal wrote:
> > On Fri, May 06, 2011 at 10:24:12PM +0530, K.Prasad wrote:
> > > Hi All,
> > > I wanted to test the behaviour of kdump when panic is triggered
> > > due to MCE on x86 and found that kdump is not captured.
> > >
> > > While the kdump service is configured and running and non-MCE panics
> > > (such as those triggered through to /proc/sysrq-trigger) successfully
> > > capture a kdump, any fatal MCE error injected through the mce-inject
> > > tool causes a reboot of the machine.
> > >
> > > The code has been traced (using early_serial_putc()) to enter the kexec
> > > path i.e. panic()->crash_kexec()->machine_kexec()->relocate_kernel()
> > > but is untraceable further.
> > >
> > > Kdump works fine when the same the similar test is carried out inside a
> > > KVM guest.
> > >
> > > Has anybody tested this before? Or have found kdump working when fatal
> > > MCEs have actually occurred?
> >
> > Prasad,
> >
> > I have never tried taking dump in MCE situation. Does kdump work on this
> > machine with normal panic()?
> >
>
> Hi Vivek,
> kdump worked fine on this machine for non-MCE triggered panic
> calls (the /proc/sysrq-trigger initiated crashes got the kdump fine).
>
> > Use --debug and --serial option in kexec-tools to print some debug message
> > and look for "I am in purgatory". This will tell you whether you hanged
> > in first kernel or second kernel.
> >
>
> There were no boot logs from the second kernel while the "Rebooting in X
> seconds..." message had appeared before the system rebooted, suggesting
> that the second kernel did not boot at all.
>
> > Then put "outb()" messages in the kernel to trace what happened.
> >
>
> The outb logs showed that the system entered machine_kexec function (traceable
> upto relocate_kernel) but then rebooted from inside the panic() function.
Ok, that means that we returned from crash_kexec() function instead of
transitioning into second kernel. This is strange. machine_kexec() is not
supposed to return until and unless it finds that there is no crash
kernel loaded. As per your mail you can trace it to relocate_kernel()
being entered. So only thing I can suggest is debug relocate_kernel()
code now to see why it is returning.
Thanks
Vivek
next prev parent reply other threads:[~2011-05-09 17:05 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-06 16:54 [Bug] Kdump does not work when panic triggered due to MCE K.Prasad
2011-05-06 16:54 ` K.Prasad
2011-05-06 17:38 ` Andi Kleen
2011-05-06 17:38 ` Andi Kleen
2011-05-09 16:35 ` K.Prasad
2011-05-09 16:35 ` K.Prasad
2011-05-10 1:28 ` Huang Ying
2011-05-10 1:28 ` Huang Ying
2011-05-09 12:39 ` Vivek Goyal
2011-05-09 12:39 ` Vivek Goyal
2011-05-09 15:21 ` Bouchard Louis
2011-05-09 15:46 ` Vivek Goyal
2011-05-10 7:31 ` Bouchard Louis
2011-05-09 17:03 ` K.Prasad
2011-05-10 7:19 ` Bouchard Louis
2011-05-10 10:21 ` WANG Cong
2011-05-09 16:53 ` K.Prasad
2011-05-09 16:53 ` K.Prasad
2011-05-09 17:05 ` Vivek Goyal [this message]
2011-05-09 17:05 ` Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110509170528.GG5975@redhat.com \
--to=vgoyal@redhat.com \
--cc=andi@firstfloor.org \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=prasad@linux.vnet.ibm.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.