* watchdogs and kdump
@ 2011-10-27 20:30 Don Zickus
2011-10-27 21:43 ` Pádraig Brady
0 siblings, 1 reply; 5+ messages in thread
From: Don Zickus @ 2011-10-27 20:30 UTC (permalink / raw)
To: linux-watchdog; +Cc: kexec, amwang, vgoyal
Hi,
I was assisting a customer the other day debugging a kdump[1] problem, when we
noticed the real problem was the hardware watchdog was firing and
rebooting the box.
Of course, this can be inconvienant if the panic happens right before the
watchdog is supposed to be kicked, leading to a spontaneous reboot before
the second kernel finishes booting and loading the watchdog module.
I was trying to think of a way to solve this and thought, one way to
minimize the problem is to kick the watchdog before we jump into the kdump
kernel. Another way is to disable the watchdog entirely, but that doesn't
work on all hardware I believe.
Anyway, I was posting on the watchdog mailing list to see if anyone had any
ideas that might help. And if my above idea to kick the watchdog before
jumping into the kdump kernel seems ok, then an api would need to be
developed.
I am willing to do any coding and testing necessary, but before I did, I
wanted help to get a direction to go in first.
Thoughts?
Cheers,
Don
[1] - I am ignorantly assuming everyone knows what kdump is. Kdumping is
the ability to jump into a previously loaded kernel in the case of a
panic. This kdump (second) kernel would run in reserved memory, copy the
first kernel's memory to a file and save it to a pre-determined location.
There is no system reboot in between the first and second kernel, so no
chance for the watchdog to disarm itself.
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: watchdogs and kdump
2011-10-27 20:30 watchdogs and kdump Don Zickus
@ 2011-10-27 21:43 ` Pádraig Brady
2011-10-28 13:39 ` Don Zickus
2012-01-21 18:21 ` kdump not working Intel S5520UR motherboards with Xeon processor Prashant Dinkar Kharche
0 siblings, 2 replies; 5+ messages in thread
From: Pádraig Brady @ 2011-10-27 21:43 UTC (permalink / raw)
To: Don Zickus; +Cc: kexec, linux-watchdog, vgoyal, amwang
On 10/27/2011 09:30 PM, Don Zickus wrote:
> Hi,
>
> I was assisting a customer the other day debugging a kdump[1] problem, when we
> noticed the real problem was the hardware watchdog was firing and
> rebooting the box.
>
> Of course, this can be inconvienant if the panic happens right before the
> watchdog is supposed to be kicked, leading to a spontaneous reboot before
> the second kernel finishes booting and loading the watchdog module.
>
> I was trying to think of a way to solve this and thought, one way to
> minimize the problem is to kick the watchdog before we jump into the kdump
> kernel. Another way is to disable the watchdog entirely, but that doesn't
> work on all hardware I believe.
>
> Anyway, I was posting on the watchdog mailing list to see if anyone had any
> ideas that might help. And if my above idea to kick the watchdog before
> jumping into the kdump kernel seems ok, then an api would need to be
> developed.
>
> I am willing to do any coding and testing necessary, but before I did, I
> wanted help to get a direction to go in first.
>
> Thoughts?
Seems like the appropriate thing to do is to call all the
reboot notifiers that each watchdog registers.
Since one is not doingn a full SYS_RESTART (SYS_DOWN) though,
i.e. not running through the BIOS code again,
it might be worth having a different SYS_JUMP code in notifier.h
that would allow you to kick rather than stop the watchdogs
as the reboot notifiers generally do at the moment.
I think it would be important not to stop the watchdog if possible,
given the large amount of logic that's going to be executed
after the jump.
cheers,
Pádraig.
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: watchdogs and kdump
2011-10-27 21:43 ` Pádraig Brady
@ 2011-10-28 13:39 ` Don Zickus
2012-01-21 18:21 ` kdump not working Intel S5520UR motherboards with Xeon processor Prashant Dinkar Kharche
1 sibling, 0 replies; 5+ messages in thread
From: Don Zickus @ 2011-10-28 13:39 UTC (permalink / raw)
To: Pádraig Brady; +Cc: kexec, linux-watchdog, vgoyal, amwang
On Thu, Oct 27, 2011 at 10:43:58PM +0100, Pádraig Brady wrote:
> On 10/27/2011 09:30 PM, Don Zickus wrote:
> > Hi,
> >
> > I was assisting a customer the other day debugging a kdump[1] problem, when we
> > noticed the real problem was the hardware watchdog was firing and
> > rebooting the box.
> >
> > Of course, this can be inconvienant if the panic happens right before the
> > watchdog is supposed to be kicked, leading to a spontaneous reboot before
> > the second kernel finishes booting and loading the watchdog module.
> >
> > I was trying to think of a way to solve this and thought, one way to
> > minimize the problem is to kick the watchdog before we jump into the kdump
> > kernel. Another way is to disable the watchdog entirely, but that doesn't
> > work on all hardware I believe.
> >
> > Anyway, I was posting on the watchdog mailing list to see if anyone had any
> > ideas that might help. And if my above idea to kick the watchdog before
> > jumping into the kdump kernel seems ok, then an api would need to be
> > developed.
> >
> > I am willing to do any coding and testing necessary, but before I did, I
> > wanted help to get a direction to go in first.
> >
> > Thoughts?
>
> Seems like the appropriate thing to do is to call all the
> reboot notifiers that each watchdog registers.
> Since one is not doingn a full SYS_RESTART (SYS_DOWN) though,
> i.e. not running through the BIOS code again,
> it might be worth having a different SYS_JUMP code in notifier.h
> that would allow you to kick rather than stop the watchdogs
> as the reboot notifiers generally do at the moment.
That is an interesting idea. Not sure if calling a blocking notifier in
the kdump path would be acceptable to the kexec folks. Then again using
the reboot notifier in the panic path may not be a good idea either, it
might lead to false expectations. :-/
> I think it would be important not to stop the watchdog if possible,
> given the large amount of logic that's going to be executed
> after the jump.
I agree. Especially since kdump is still not 100% reliable.
Thanks for the feedback!
Cheers,
Don
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 5+ messages in thread
* kdump not working Intel S5520UR motherboards with Xeon processor
2011-10-27 21:43 ` Pádraig Brady
2011-10-28 13:39 ` Don Zickus
@ 2012-01-21 18:21 ` Prashant Dinkar Kharche
2012-01-23 15:18 ` Don Zickus
1 sibling, 1 reply; 5+ messages in thread
From: Prashant Dinkar Kharche @ 2012-01-21 18:21 UTC (permalink / raw)
To: kexec@lists.infradead.org
Hi,
I have been trying to get kdump working on Intel S5520 chipset with Xeon processor on RHEL 5 update 5 or update 6, but it appears that when I trigger a crash dump, system reboots abruptly right after loading the kdump kernel. Key observation during this experiment was that when I turn off acpi interrupt handling (acpi=off), crash has worked on RHEL 5 update 5. I have captured output of lspci and cpuinfo below for reference.
Whereas, on RHEL 6 kdump works fine on this hardware. Could someone please help me identify the patches for RHEL 5 update 5 kernel that might be required for this issue?
00:00.0 Host bridge: Intel Corporation 5520 I/O Hub to ESI Port (rev 22)
00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 22)
00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 22)
00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 5 (rev 22)
00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 22)
00:09.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 9 (rev 22)
00:0a.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 10 (rev 22)
00:10.0 PIC: Intel Corporation 5520/5500/X58 Physical and Link Layer Registers Port 0 (rev 22)
00:10.1 PIC: Intel Corporation 5520/5500/X58 Routing and Protocol Layer Registers Port 0 (rev 22)
00:11.0 PIC: Intel Corporation 5520/5500 Physical and Link Layer Registers Port 1 (rev 22)
00:11.1 PIC: Intel Corporation 5520/5500 Routing & Protocol Layer Register Port 1 (rev 22)
00:13.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub I/OxAPIC Interrupt Controller (rev 22)
00:14.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub System Management Registers (rev 22)
00:14.1 PIC: Intel Corporation 5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers (rev 22)
00:14.2 PIC: Intel Corporation 5520/5500/X58 I/O Hub Control Status and RAS Registers (rev 22)
00:14.3 PIC: Intel Corporation 5520/5500/X58 I/O Hub Throttle Registers (rev 22)
00:15.0 PIC: Intel Corporation 5520/5500/X58 Trusted Execution Technology Registers (rev 22)
00:16.0 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.2 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.3 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.4 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.5 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.6 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:16.7 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)
00:1a.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4
00:1a.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5
00:1a.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6
00:1a.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2
00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 1
00:1c.4 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 5
00:1c.5 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 6
00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1
00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2
00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3
00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller
00:1f.2 IDE interface: Intel Corporation 82801JI (ICH10 Family) 4 port SATA IDE Controller #1
00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller
00:1f.5 IDE interface: Intel Corporation 82801JI (ICH10 Family) 2 port SATA IDE Controller #2
01:00.0 Ethernet controller: Intel Corporation 82575EB Gigabit Network Connection (rev 02)
01:00.1 Ethernet controller: Intel Corporation 82575EB Gigabit Network Connection (rev 02)
03:00.0 PCI bridge: Integrated Device Technology, Inc. PES24T3G2 PCI Express Gen2 Switch (rev 02)
04:02.0 PCI bridge: Integrated Device Technology, Inc. PES24T3G2 PCI Express Gen2 Switch (rev 02)
04:04.0 PCI bridge: Integrated Device Technology, Inc. PES24T3G2 PCI Express Gen2 Switch (rev 02)
05:00.0 PCI bridge: NEC Corporation Unknown device 014f (rev 02)
06:00.0 PCI bridge: NEC Corporation Unknown device 0150 (rev 02)
06:01.0 PCI bridge: NEC Corporation Unknown device 0151 (rev 02)
06:02.0 PCI bridge: NEC Corporation Unknown device 0152 (rev 02)
07:00.0 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA (rev 03)
07:00.1 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA (rev 03)
08:00.0 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA (rev 03)
08:00.1 Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA (rev 03)
0b:00.0 PCI bridge: Integrated Device Technology, Inc. PES24T3G2 PCI Express Gen2 Switch (rev 02)
0c:02.0 PCI bridge: Integrated Device Technology, Inc. PES24T3G2 PCI Express Gen2 Switch (rev 02)
0c:04.0 PCI bridge: Integrated Device Technology, Inc. PES24T3G2 PCI Express Gen2 Switch (rev 02)
0e:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
0e:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
0f:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 08)
12:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02)
Total 16 cores on CPU --
processor : 15
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) CPU E5540 @ 2.53GHz
stepping : 5
cpu MHz : 2527.364
cache size : 8192 KB
physical id : 1
siblings : 8
core id : 3
cpu cores : 4
apicid : 23
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc ida nonstop_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips : 5054.56
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: [8]
Thanking you in anticipation.
Regards,
Prashant
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: kdump not working Intel S5520UR motherboards with Xeon processor
2012-01-21 18:21 ` kdump not working Intel S5520UR motherboards with Xeon processor Prashant Dinkar Kharche
@ 2012-01-23 15:18 ` Don Zickus
0 siblings, 0 replies; 5+ messages in thread
From: Don Zickus @ 2012-01-23 15:18 UTC (permalink / raw)
To: Prashant Dinkar Kharche; +Cc: kexec@lists.infradead.org
On Sat, Jan 21, 2012 at 10:21:17AM -0800, Prashant Dinkar Kharche wrote:
> Hi,
>
> I have been trying to get kdump working on Intel S5520 chipset with Xeon processor on RHEL 5 update 5 or update 6, but it appears that when I trigger a crash dump, system reboots abruptly right after loading the kdump kernel. Key observation during this experiment was that when I turn off acpi interrupt handling (acpi=off), crash has worked on RHEL 5 update 5. I have captured output of lspci and cpuinfo below for reference.
>
> Whereas, on RHEL 6 kdump works fine on this hardware. Could someone please help me identify the patches for RHEL 5 update 5 kernel that might be required for this issue?
This can only be done by Red Hat as they are the only ones that know what
patches are in their kernel. Please file a bugzilla or work with your TAM
to bring attention to the problem at Red Hat.
Cheers,
Don
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-01-23 15:19 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-27 20:30 watchdogs and kdump Don Zickus
2011-10-27 21:43 ` Pádraig Brady
2011-10-28 13:39 ` Don Zickus
2012-01-21 18:21 ` kdump not working Intel S5520UR motherboards with Xeon processor Prashant Dinkar Kharche
2012-01-23 15:18 ` Don Zickus
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox