* RT bug with 2.6.13-rt4 and 3c905c tornado
@ 2005-09-20 8:46 Serge Noiraud
2005-09-20 8:55 ` Ingo Molnar
0 siblings, 1 reply; 6+ messages in thread
From: Serge Noiraud @ 2005-09-20 8:46 UTC (permalink / raw)
To: Ingo Molnar, linux-kernel
Hi
This driver works perfectly if you insert the physical card on a PCI slot. If
you insert this same card on a PCI-X slot, we got the following problem :
When you type "modprobe 3c59x", the system freeze.
Has someone already test this ?
This card works perfectly on the same PCI-X slot with a non RT kernel.
Do you need some more info ?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RT bug with 2.6.13-rt4 and 3c905c tornado
2005-09-20 8:46 RT bug with 2.6.13-rt4 and 3c905c tornado Serge Noiraud
@ 2005-09-20 8:55 ` Ingo Molnar
2005-09-21 15:34 ` Serge Noiraud
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Ingo Molnar @ 2005-09-20 8:55 UTC (permalink / raw)
To: Serge Noiraud; +Cc: linux-kernel
* Serge Noiraud <serge.noiraud@bull.net> wrote:
> Hi
>
> This driver works perfectly if you insert the physical card on a
> PCI slot. If you insert this same card on a PCI-X slot, we got the
> following problem : When you type "modprobe 3c59x", the system freeze.
>
> Has someone already test this ?
>
> This card works perfectly on the same PCI-X slot with a non RT kernel.
> Do you need some more info ?
use serial logging and the NMI watchdog to debug hard lockups (see the
info below). Use CONFIG_DETECT_SOFTLOCKUP=y to detect soft lockups.
Generally the use of debugging options can help as well. Here's a 'full'
debugging kernel:
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_PREEMPT=y
CONFIG_DEBUG_IRQ_FLAGS=y
CONFIG_WAKEUP_TIMING=y
CONFIG_WAKEUP_LATENCY_HIST=y
CONFIG_PREEMPT_TRACE=y
CONFIG_CRITICAL_PREEMPT_TIMING=y
CONFIG_PREEMPT_OFF_HIST=y
CONFIG_CRITICAL_IRQSOFF_TIMING=y
CONFIG_INTERRUPT_OFF_HIST=y
CONFIG_CRITICAL_TIMING=y
CONFIG_LATENCY_TIMING=y
CONFIG_CRITICAL_LATENCY_HIST=y
CONFIG_LATENCY_HIST=y
CONFIG_LATENCY_TRACE=y
CONFIG_MCOUNT=y
CONFIG_RT_DEADLOCK_DETECT=y
CONFIG_DEBUG_RT_LOCKING_MODE=y
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_FS=y
CONFIG_FRAME_POINTER=y
CONFIG_EARLY_PRINTK=y
CONFIG_EARLY_PRINTK=y
CONFIG_DEBUG_STACKOVERFLOW=y
# CONFIG_KPROBES is not set
CONFIG_DEBUG_STACK_USAGE=y
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_4KSTACKS=y
Ingo
to set up serial logging:
-------------------------
install a null modem cable (== serial cable) to one of the serial ports
of the server, connect the cable to another box, run a terminal program
on that other box (e.g. "minicom -m" - do Alt-L to switch on logging
after starting it up) and set up the server's kernel to do serial
logging: enable CONFIG_SERIAL_8250_CONSOLE and
CONFIG_SERIAL_CORE_CONSOLE, recompile & reinstall the kernel, add
"console=ttyS0,38400 console=tty0" to your /etc/grub.conf or
/etc/lilo.conf kernel boot line, reboot the server with the new kernel
command line - and configure minicom to run with that speed (Alt-S).
e.g. my /etc/grub.conf has:
title test-2.6 (test-2.6)
root (hd0,0)
kernel /boot/bzImage root=/dev/sda1 console=ttyS0,38400 console=tty0 nmi_watchdog=1 kernel_preempt=1
if everything is set up correctly then you should see kernel messages
showing up in the minicom session when you boot up.
When the messages do not show up then typical errors are mismatch
between the serial port (or speed) and the device names used - if it's
COM2 then use ttyS1, and dont forget to set up the serial speed option
of minicom, etc. You can test the serial connection by doing:
echo x > /dev/ttyS0
and that should show up in the minicom session on the other box.
to set up early-printk:
-----------------------
occasionally lockups/crashes happen so early in the bootup that nothing
makes it even to the serial log. In that case the 'earlyprintk' feature
is most useful. It is default-enabled on all 2.6 kernels, you only need
to add one more boot parameter to activate it over the serial console:
earlyprintk=serial,ttyS0,38400
to set up the NMI watchdog:
---------------------------
add nmi_watchdog=1 to your boot parameters and reboot - that should be
all to get it active. If all CPU's NMI count increases in
/proc/interrupts then it's working fine. If the counts do not increase
(or only one CPU increases it) then try nmi_watchdog=2 - this is another
type of NMI that might work better. (Very rarely there are boxes that
dont have reliable NMI counts with 1 and 2 either - but i dont think
your box is one of those.)
once the NMI watchdog is up and running it should catch all hard lockups
and print backtraces to the serial console - even if you are within X
while the lockup happens. You can test hard lockups by running the
attached 'lockupcli' userspace code as root - it turns off interrupts
and goes into an infinite loop => instant lockup. The NMI watchdog
should notice this condition after a couple of seconds and should abort
the task, printing a kernel trace as well. Your box should be back in
working order after that point.
now for the real lockup your box wont be 'fixed' by the NMI watchdog, it
will likely stay locked up, but you should get messages on the serial
console, giving us an idea where the kernel locked up and why. (Very
rarely it happens that not even the NMI watchdog prints anything for a
hard lockup - this is often the sign of hardware problems.)
Ingo
--- lockupcli.c
main ()
{
iopl(3);
for (;;) asm("cli");
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RT bug with 2.6.13-rt4 and 3c905c tornado
2005-09-20 8:55 ` Ingo Molnar
@ 2005-09-21 15:34 ` Serge Noiraud
2005-09-21 16:32 ` Serge Noiraud
2005-09-22 13:54 ` Serge Noiraud
2 siblings, 0 replies; 6+ messages in thread
From: Serge Noiraud @ 2005-09-21 15:34 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1060 bytes --]
mardi 20 Septembre 2005 10:55, Ingo Molnar wrote/a écrit :
> * Serge Noiraud <serge.noiraud@bull.net> wrote:
> > Hi
> >
> > This driver works perfectly if you insert the physical card on a
> > PCI slot. If you insert this same card on a PCI-X slot, we got the
> > following problem : When you type "modprobe 3c59x", the system freeze.
...
> use serial logging and the NMI watchdog to debug hard lockups (see the
> info below). Use CONFIG_DETECT_SOFTLOCKUP=y to detect soft lockups.
> Generally the use of debugging options can help as well. Here's a 'full'
> debugging kernel:
...
After multiple try, I must say I get some problems with nmi watchdog.
I can't boot with the options you specified.
The only differences are about :
suppressed CONFIG_DEBUG_RT_LOCKING_MODE : the kernel doesn't compile.
I get the following traces :
One avec crash in udev. Another with crash in hotplug.
I tried nmi_watchdog=1 and 2 without success.
I send you the .config too.
The machine is an HP workstation xw8200
I hope this mail is not too big.
[-- Attachment #2: 3c59x-pci-pbnmi.3.bz2 --]
[-- Type: application/x-bzip2, Size: 26321 bytes --]
[-- Attachment #3: config-dbg.bz2 --]
[-- Type: application/x-bzip2, Size: 14930 bytes --]
[-- Attachment #4: 3c59x-pci-pbnmi.1.bz2 --]
[-- Type: application/x-bzip2, Size: 1636 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RT bug with 2.6.13-rt4 and 3c905c tornado
2005-09-20 8:55 ` Ingo Molnar
2005-09-21 15:34 ` Serge Noiraud
@ 2005-09-21 16:32 ` Serge Noiraud
2005-09-22 13:54 ` Serge Noiraud
2 siblings, 0 replies; 6+ messages in thread
From: Serge Noiraud @ 2005-09-21 16:32 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel
mardi 20 Septembre 2005 10:55, Ingo Molnar wrote/a écrit :
> * Serge Noiraud <serge.noiraud@bull.net> wrote:
> > Hi
> >
> > This driver works perfectly if you insert the physical card on a
> > PCI slot. If you insert this same card on a PCI-X slot, we got the
> > following problem : When you type "modprobe 3c59x", the system freeze.
...
>
> use serial logging and the NMI watchdog to debug hard lockups (see the
> info below). Use CONFIG_DETECT_SOFTLOCKUP=y to detect soft lockups.
> Generally the use of debugging options can help as well. Here's a 'full'
> debugging kernel:
>
I have another big problem with debugging :
I have :
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
If the following config option is set, I can't get traces. If I unset it, I
can't get mouse or keyboard. Do you have a work around or a patch for this ?
# CONFIG_SERIO_I8042 is not set => the trace stop and keyboard works.
CONFIG_SERIO_I8042=y => the trace is ok but no keyboard and mouse.
If the system is loaded and you can type something, an echo x >/dev/ttyS0
fixes the problem. You can have new traces. The problem is still between the
i8042 loading module and the system ok to works : If you have a module
loading problem or something else, you can't get trace.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RT bug with 2.6.13-rt4 and 3c905c tornado
2005-09-20 8:55 ` Ingo Molnar
2005-09-21 15:34 ` Serge Noiraud
2005-09-21 16:32 ` Serge Noiraud
@ 2005-09-22 13:54 ` Serge Noiraud
2005-09-26 6:42 ` Ingo Molnar
2 siblings, 1 reply; 6+ messages in thread
From: Serge Noiraud @ 2005-09-22 13:54 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel
mardi 20 Septembre 2005 10:55, Ingo Molnar wrote/a écrit :
> * Serge Noiraud <serge.noiraud@bull.net> wrote:
> > Hi
> >
> > This driver works perfectly if you insert the physical card on a
> > PCI slot. If you insert this same card on a PCI-X slot, we got the
> > following problem : When you type "modprobe 3c59x", the system freeze.
> >
> > Has someone already test this ?
> >
> > This card works perfectly on the same PCI-X slot with a non RT kernel.
> > Do you need some more info ?
>
> use serial logging and the NMI watchdog to debug hard lockups (see the
> info below). Use CONFIG_DETECT_SOFTLOCKUP=y to detect soft lockups.
> Generally the use of debugging options can help as well. Here's a 'full'
> debugging kernel:
Big deal !
How can I debug this problem ?
If the kernel has no debug option, modprobe freeze the machine.
If the kernel has debug option, modprobe works correctly and the card works
perfectly. I compile one kernel and make recursively listing trough nfs
I got 140 millions nfs requests without problem.
I could have kgdb, but it doesn't work. I'm not sure it helps me. I think it's
a timing problem somewhere in the pci driver.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: RT bug with 2.6.13-rt4 and 3c905c tornado
2005-09-22 13:54 ` Serge Noiraud
@ 2005-09-26 6:42 ` Ingo Molnar
0 siblings, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2005-09-26 6:42 UTC (permalink / raw)
To: Serge Noiraud; +Cc: linux-kernel
could you try -rt2, do the lockups still occur? If it locks up then
could you try one more thing: boot -rt2 _without_ the NMI watchdog
enabled. Maybe the NMI watchdog itself got broken. (it happens
occasionally)
Ingo
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-09-26 6:41 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-20 8:46 RT bug with 2.6.13-rt4 and 3c905c tornado Serge Noiraud
2005-09-20 8:55 ` Ingo Molnar
2005-09-21 15:34 ` Serge Noiraud
2005-09-21 16:32 ` Serge Noiraud
2005-09-22 13:54 ` Serge Noiraud
2005-09-26 6:42 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox