* CUV4X-D lockup on boot
@ 2001-06-02 9:21 lk
2001-06-02 12:41 ` lk
2001-06-02 16:15 ` Alan Cox
0 siblings, 2 replies; 6+ messages in thread
From: lk @ 2001-06-02 9:21 UTC (permalink / raw)
To: linux-kernel
I have an ASUS CUV4X-D Dual Processor Mainboard based on a VIA
694XDP chipset. I notice from the archives that someone else
has also reported a lockup with the m/b when using two cpus
and have some info that may be useful to track it down.
Using kernel 2.4.5 the kernel locks up sporadically at boot
time. When I enable the NMI watchdog it occasionally gets
enabled prior to the lockup and perhaps can be useful for
debugging the problem. Here's what happens:
I typed this in, so there may be typos:
..TIMER: vector=49 pin1=2 pin2=0
activating NMI Watchdog ... done.
[locks up here, or before activating NMI watchdog]
[this normally happens next but not in this case
number of MP IRQ sources: 21.
number of IO-APIC #2 registers: 24.
testing the IO APIC.......................
]
NMI Watchdog detected LOCKUP on CPU1, registers:
CPU : 1
EIP: 0010:[<c0235cdb>]
EFLAGS: 00000246
eax: 00000000 ebx: 00000000 ecx: 00000001 edx: 00000001
esi: 00000000 edi: 00000000 ebp: 00000000 esp: cfff5fa4
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, stackpage = cfff5000)
Stack: 00000000 00000000 00000000 00000000 c0235e8f 00000001 00000002 c0235eaa
00000000 00000019 00000000 c1442000 00002700 0000b00f 00000000 00000000
0000000d 0000000e 00000000 00000000 c00bcf60 00000000 c0172029
Call Trace: [<c0172029>]
Code: 85 c0 74 bf 00 e0 ff ff 21 e7 31 f6 bd 10 00 00 00 31 db
Console shuts up ...
[ksymoops output]
Warning (compare_maps): ksyms_base symbol __VERSIONED_SYMBOL(shmem_file_setup) not found in System.map. Ignoring ksyms_base entry
activating NMI Watchdog ... done.
[locks up here, or before activating NMI watchdog]
NMI Watchdog detected LOCKUP on CPU1, registers:
EIP: 0010:[<c0235cdb>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00000246
eax: 00000000 ebx: 00000000 ecx: 00000001 edx: 00000001
esi: 00000000 edi: 00000000 ebp: 00000000 esp: cfff5fa4
ds: 0018 es: 0018 ss: 0018
Stack: 00000000 00000000 00000000 00000000 c0235e8f 00000001 00000002 c0235eaa
00000000 00000019 00000000 c1442000 00002700 0000b00f 00000000 00000000
0000000d 0000000e 00000000 00000000 c00bcf60 00000000 c0172029
Call Trace: [<c0172029>]
Code: 85 c0 74 bf 00 e0 ff ff 21 e7 31 f6 bd 10 00 00 00 31 db
>>EIP; c0235cdb <synchronize_tsc_ap+1b/a0> <=====
Trace; c0172029 <set_cursor+69/80>
Code; c0235cdb <synchronize_tsc_ap+1b/a0>
00000000 <_EIP>:
Code; c0235cdb <synchronize_tsc_ap+1b/a0> <=====
0: 85 c0 test %eax,%eax <=====
Code; c0235cdd <synchronize_tsc_ap+1d/a0>
2: 74 bf je ffffffc3 <_EIP+0xffffffc3> c0235c9e <synchronize_tsc_bp+1ee/210>
Code; c0235cdf <synchronize_tsc_ap+1f/a0>
4: 00 e0 add %ah,%al
Code; c0235ce1 <synchronize_tsc_ap+21/a0>
6: ff (bad)
Code; c0235ce2 <synchronize_tsc_ap+22/a0>
7: ff 21 jmp *(%ecx)
Code; c0235ce4 <synchronize_tsc_ap+24/a0>
9: e7 31 out %eax,$0x31
Code; c0235ce6 <synchronize_tsc_ap+26/a0>
b: f6 bd 10 00 00 00 idiv 0x10(%ebp),%al
Code; c0235cec <synchronize_tsc_ap+2c/a0>
11: 31 db xor %ebx,%ebx
2 warnings issued. Results may not be reliable.
# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 6
cpu MHz : 937.557
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips : 1867.77
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 6
cpu MHz : 937.557
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips : 1874.32
If this doesn't make someone go "aha!" then I can set up a serial
port for debugging and repeat this a few times.
Thanks,
Paul
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: CUV4X-D lockup on boot
2001-06-02 9:21 CUV4X-D lockup on boot lk
@ 2001-06-02 12:41 ` lk
2001-06-02 16:15 ` Alan Cox
1 sibling, 0 replies; 6+ messages in thread
From: lk @ 2001-06-02 12:41 UTC (permalink / raw)
To: linux-kernel
Further information:
I inserted some printk()s in arch/i386/kernel/smpboot.c
320 static void __init synchronize_tsc_ap (void)
321 {
322 int i;
323
324 /*
325 * smp_num_cpus is not necessarily known at the time
326 * this gets called, so we first wait for the BP to
327 * finish SMP initialization:
328 */
329 printk("ap %d\n", __LINE__);
330 while (!atomic_read(&tsc_start_flag)) mb();
331 printk("ap %d\n", __LINE__);
332
333 for (i = 0; i < NR_LOOPS; i++) {
334 atomic_inc(&tsc_count_start);
335 printk("ap %d\n", __LINE__);
...
When the kernel locks up it does so after line 329 is printk'd
However, on a successful boot it behaves as follows:
...
Intel machine check reporting enabled on CPU#1.
CPU: After vendor init, caps: 0383fbff 00000000 00000000 00000000
CPU: After generic, caps: 0383fbff 00000000 00000000 00000000
CPU: Common caps: 0383fbff 00000000 00000000 00000000
ap 329
OK.
CPU1: Intel Pentium III (Coppermine) stepping 06
CPU has booted.
Before bogomips.
Total of 2 processors activated (3742.10 BogoMIPS).
Before bogocount - setting activated=1.
Boot done.
ENABLING IO-APIC IRQs
...changing IO-APIC physical APIC ID to 2 ... ok.
Synchronizing Arb IDs.
init IO_APIC IRQs
IO-APIC (apicid-pin) 2-10, 2-13, 2-20, 2-21, 2-22, 2-23 not connected.
..TIMER: vector=49 pin1=2 pin2=0
activating NMI Watchdog ... done.
number of MP IRQ sources: 21.
number of IO-APIC #2 registers: 24.
testing the IO APIC.......................
IO APIC #2......
.... register #00: 02000000
....... : physical APIC id: 02
.... register #01: 00178011
....... : max redirection entries: 0017
....... : IO APIC version: 0011
WARNING: unexpected IO-APIC, please mail
to linux-smp@vger.kernel.org
.... register #02: 00000000
....... : arbitration: 00
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 003 03 0 1 1 1 1 1 1 31
01 003 03 0 0 0 0 0 1 1 39
02 003 03 0 0 0 0 0 1 1 31
03 003 03 0 0 0 0 0 1 1 41
04 003 03 0 0 0 0 0 1 1 49
05 003 03 0 0 0 0 0 1 1 51
06 003 03 0 0 0 0 0 1 1 59
07 003 03 0 0 0 0 0 1 1 61
08 003 03 0 0 0 0 0 1 1 69
09 003 03 0 0 0 0 0 1 1 71
0a 000 00 1 0 0 0 0 0 0 00
0b 003 03 0 0 0 0 0 1 1 79
0c 003 03 0 0 0 0 0 1 1 81
0d 000 00 1 0 0 0 0 0 0 00
0e 003 03 0 0 0 0 0 1 1 89
0f 003 03 0 0 0 0 0 1 1 91
10 003 03 1 1 0 1 0 1 1 99
11 003 03 1 1 0 1 0 1 1 A1
12 003 03 1 1 0 1 0 1 1 A9
13 003 03 1 1 0 1 0 1 1 B1
14 000 00 1 0 0 0 0 0 0 00
15 000 00 1 0 0 0 0 0 0 00
16 000 00 1 0 0 0 0 0 0 00
17 000 00 1 0 0 0 0 0 0 00
IRQ to pin mappings:
IRQ0 -> 0-> 2
IRQ1 -> 1
IRQ3 -> 3
IRQ4 -> 4
IRQ5 -> 5
IRQ6 -> 6
IRQ7 -> 7
IRQ8 -> 8
IRQ9 -> 9
IRQ11 -> 11
IRQ12 -> 12
IRQ14 -> 14
IRQ15 -> 15
IRQ16 -> 16
IRQ17 -> 17
IRQ18 -> 18
IRQ19 -> 19
.................................... done.
calibrating APIC timer ...
..... CPU clock speed is 937.5342 MHz.
..... host bus clock speed is 133.9332 MHz.
cpu: 0, clocks: 1339332, slice: 446444
CPU0<T0:1339328,T1:892800,D:84,S:446444,C:1339332>
cpu: 1, clocks: 1339332, slice: 446444
CPU1<T0:1339328,T1:446432,D:8,S:446444,C:1339332>
checking TSC synchronization across CPUs: ap 331
ap 335
ap 337
ap 340
ap 345
ap 347
ap 335
ap 337
ap 340
ap 345
ap 347
ap 335
ap 337
ap 340
ap 345
ap 347
ap 335
ap 337
ap 340
ap 345
ap 347
ap 335
ap 337
ap 340
ap 345
ap 347
BIOS BUG: CPU#0 improperly initialized, has -16 usecs TSC skew! FIXED.
BIOS BUG: CPU#1 improperly initialized, has 16 usecs TSC skew! FIXED.
Setting commenced=1, go go go
...
A notable difference is:
Synchronizing Arb IDs.
init IO_APIC IRQs
IO-APIC (apicid-pin) 2-10, 2-13, 2-20, 2-21, 2-22, 2-23 not connected.
..TIMER: vector=49 pin1=2 pin2=0
activating NMI Watchdog ... done.
whereas in a lockup the following occurs:
Synchronizing Arb IDs.
..TIMER: vector=49 pin1=2 pin2=0
activating NMI Watchdog ... done.
i.e. before the init IO_APIC IRQs
Paul
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: CUV4X-D lockup on boot
2001-06-02 9:21 CUV4X-D lockup on boot lk
2001-06-02 12:41 ` lk
@ 2001-06-02 16:15 ` Alan Cox
2001-06-02 19:02 ` lk
2001-06-02 19:17 ` John Cavan
1 sibling, 2 replies; 6+ messages in thread
From: Alan Cox @ 2001-06-02 16:15 UTC (permalink / raw)
To: lk; +Cc: linux-kernel
> I have an ASUS CUV4X-D Dual Processor Mainboard based on a VIA
> 694XDP chipset. I notice from the archives that someone else
> has also reported a lockup with the m/b when using two cpus
> and have some info that may be useful to track it down.
>
> Using kernel 2.4.5 the kernel locks up sporadically at boot
> time. When I enable the NMI watchdog it occasionally gets
> enabled prior to the lockup and perhaps can be useful for
> debugging the problem. Here's what happens:
At minimum you need the 1007 bios and to run noapic. As yet we don't know why
or what the newer BIOS has done to make it boot at all
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: CUV4X-D lockup on boot
2001-06-02 16:15 ` Alan Cox
@ 2001-06-02 19:02 ` lk
2001-06-02 19:17 ` John Cavan
1 sibling, 0 replies; 6+ messages in thread
From: lk @ 2001-06-02 19:02 UTC (permalink / raw)
To: Alan Cox; +Cc: linux-kernel
Alan Cox <alan@lxorguk.ukuu.org.uk> writes:
> > I have an ASUS CUV4X-D Dual Processor Mainboard based on a VIA
> > 694XDP chipset. I notice from the archives that someone else
> > has also reported a lockup with the m/b when using two cpus
> > and have some info that may be useful to track it down.
> >
> > Using kernel 2.4.5 the kernel locks up sporadically at boot
> > time. When I enable the NMI watchdog it occasionally gets
> > enabled prior to the lockup and perhaps can be useful for
> > debugging the problem. Here's what happens:
>
> At minimum you need the 1007 bios and to run noapic. As yet we don't know why
> or what the newer BIOS has done to make it boot at all
I had already replaced 1004 with 1007 and it didn't make any
difference. I'd rather solve the problem than work around it,
as it does boot, it just might take a couple of resets to do
so.
I've done a bit more printk tracing and now it consistently
hangs following this path in arch/i386/kernel/io_apic.c
1454 printk(KERN_INFO "..TIMER: vector=%d pin1=%d pin2=%d\n", vector, pin1, pin2);
1462 unmask_IO_APIC_irq(0);
1463 printk("io %d\n", __LINE__);
1464 if (timer_irq_works()) {
->
1076 static int __init timer_irq_works(void)
1077 {
1078 unsigned int t1 = jiffies;
1079
1080 printk("io %d\n", __LINE__);
1081 sti();
1082 printk("io %d\n", __LINE__);
1083 /* Let ten ticks pass... */
1084 printk("io %d\n", __LINE__);
1085 mdelay((10 * 1000) / HZ);
Following line isn't executed:
1086 printk("io %d\n", __LINE__);
Further details to follow...
Paul
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: CUV4X-D lockup on boot
2001-06-02 16:15 ` Alan Cox
2001-06-02 19:02 ` lk
@ 2001-06-02 19:17 ` John Cavan
2001-06-02 20:25 ` lk
1 sibling, 1 reply; 6+ messages in thread
From: John Cavan @ 2001-06-02 19:17 UTC (permalink / raw)
To: Alan Cox; +Cc: lk, linux-kernel
Alan Cox wrote:
> At minimum you need the 1007 bios and to run noapic. As yet we don't know why
> or what the newer BIOS has done to make it boot at all
Actually, I'm running this board with MPS 1.1, BIOS version 1007, and
APIC enabled without problem. Current kernel is 2.4.5-ac5, no lockups,
no boot failures, full access to my USB, etc.
With the older BIOS revision, you definitely need to have "noapic" as an
option. For the latest BIOS, just ensure that you set MPS 1.4 support
off.
John
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: CUV4X-D lockup on boot
2001-06-02 19:17 ` John Cavan
@ 2001-06-02 20:25 ` lk
0 siblings, 0 replies; 6+ messages in thread
From: lk @ 2001-06-02 20:25 UTC (permalink / raw)
To: John Cavan; +Cc: Alan Cox, linux-kernel
John Cavan <johnc@damncats.org> writes:
> Alan Cox wrote:
> > At minimum you need the 1007 bios and to run noapic. As yet we don't know why
> > or what the newer BIOS has done to make it boot at all
>
> Actually, I'm running this board with MPS 1.1, BIOS version 1007, and
> APIC enabled without problem. Current kernel is 2.4.5-ac5, no lockups,
> no boot failures, full access to my USB, etc.
>
> With the older BIOS revision, you definitely need to have "noapic" as an
> option. For the latest BIOS, just ensure that you set MPS 1.4 support
> off.
Indeed, disabling MPS 1.4 does appear to solve the problem. Incidentally,
I also had to enable legacy USB support always (instead of Auto) to
allow my usb camera to work whilst in SMP mode. Is MPS 1.4 worth
having and the problem worth solving, or should I stick with this?
Paul
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2001-06-02 20:26 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-06-02 9:21 CUV4X-D lockup on boot lk
2001-06-02 12:41 ` lk
2001-06-02 16:15 ` Alan Cox
2001-06-02 19:02 ` lk
2001-06-02 19:17 ` John Cavan
2001-06-02 20:25 ` lk
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox