* Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered
@ 2003-12-07 13:12 Ross Dickson
2003-12-09 15:20 ` Maciej W. Rozycki
2003-12-10 3:39 ` Jesse Allen
0 siblings, 2 replies; 35+ messages in thread
From: Ross Dickson @ 2003-12-07 13:12 UTC (permalink / raw)
To: linux-kernel; +Cc: AMartin, ross, andre, kernel
[-- Attachment #1: Type: text/plain, Size: 5471 bytes --]
Greetings,
I am not subscribed so please cc responses.
I have monitored list and know my nforce2 experiences have been common.
Attached patches are in a single bzip tar ball.
I have Albatron KM18G Pro & Epox 8RGA+ MOBOs both using nforce2 chipsets.
I made up a kernel as follows.
Get std 2.4.22 src
apply patch-2.4.23
apply 2.4.22-low-latency.patch
apply preempt-kernel-rml-2.4.23-pre5-1.patch
apply vhz-j64-2.4.22.patch
One patch fails on inode.c, dispose_list() so I placed conditional_schedule() as follows
=static void dispose_list(struct list_head *head)
={
= int nr_disposed = 0;
=
= while (!list_empty(head)) {
= struct inode *inode;
= conditional_schedule();
Config for athlon with 1000hz tics, preempt & low-lat on.
Compiled and installed nvnet & nvidia video driver.
Disclaimer: The following information and code patches are not fully tested and may be
dangerous, also these are the first patches I have made for public consumption so I hope
that their format works.
Note also that the patches are against 2.4.22 even though they were developed
against the heavily patched 2.4.23 mentioned above. The patch code is the same for both
kernels but at different line numbers.
When I enabled either apic or io-apic in kern config, lockups came hard and fast.
Particularly bad under hard disk load. Heaps of lost ints on irq7 in apic and ioapic mode.
Lockups disappeared when I lowered the ide hda udma speed to mode 3 with hdparm so
I went looking for answers which now follow.
There are three parts to this email.
a) apic mods.
b) io-apic mods
c) ide driver mods
a) Lockups are due to too fast an apic acknowledge of apic timer int.
Apic hard locked up the system - no nmi debug available.
Fixed it by introducing a delay of at least 500ns into smp_apic_timer_interrupt()
just prior to ack_APIC_irq().
See attached diff file "nforce2-apic.c-2.4.22.patch" for details.
I have guessed at a suitable cpu speed dependent delay.
Perhaps someone with AMD cpu docs (apic timing specs) & analyser tools could refine it.
Maybe nforce2 chipset really is very quick accessing ram in dual dimm mode?
Or AMD 2200XP has a really slow APIC?
--- linux-2.4.22/arch/i386/kernel/apic.c 2003-06-14 00:51:29.000000000 +1000
+++ linux-2.4.22-rd/arch/i386/kernel/apic.c 2003-12-07 18:27:32.000000000 +1000
@@ -1078,6 +1078,15 @@
*/
apic_timer_irqs[cpu]++;
+#ifdef CONFIG_MK7 && CONFIG_BLK_DEV_AMD74XX
+ /*
+ * on 2200XP & nforce2 chipset we need at least 500ns delay here
+ * to stop lockups with udma100 drive. try to scale delay time
+ * with cpu speed. Ross Dickson.
+ */
+ ndelay((cpu_khz >> 12)+200 ); /* don't ack too soon or hard lockup */
+#endif
+
/*
* NOTE! We'd better ACK the irq immediately,
* because timer handling can be slow.
b) I was also disappointed to see I could not have irq0 timer IO-APIC-edge.
So I have fixed it too (tested on both my epox and albatron MOBOs).
Firstly I found 8254 connected directly to pin 0 not pin 2 of io-apic.
I have modified check_timer() in io_apic.c to trial connect pin and test for it
after the existing test for connection to io-apic.
See attached diff file nforce2-io-apic.c-2.4.22 for details.
--- linux-2.4.22/arch/i386/kernel/io_apic.c 2003-08-25 21:44:39.000000000 +1000
+++ linux-2.4.22-rd/arch/i386/kernel/io_apic.c 2003-12-07 18:40:40.000000000 +1000
@@ -1614,9 +1614,44 @@
return;
}
clear_IO_APIC_pin(0, pin1);
- printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to IO-APIC\n");
+ printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to IO-APIC pin%d\n",pin1);
}
+#ifdef CONFIG_ACPI_BOOT && CONFIG_X86_UP_IOAPIC
+ /* for nforce2 try vector 0 on pin0
+ * Note the io_apic_set_pci_routing call disables the 8259 irq 0
+ * so we must be connected directly to the 8254 timer if this works
+ * Note2: this violates the above comment re Subtle but works!
+ */
+ printk(KERN_INFO "..TIMER: Is timer irq0 connected to IOAPIC Pin0? ...\n");
+ if ( pin1 != -1 && nr_ioapics ) {
+ int saved_timer_ack = timer_ack;
+ /* next call also disables 8259 irq0 */
+ int result = io_apic_set_pci_routing ( 0, 0, 0, 0, 0);
+ /*
+ * Ok, does IRQ0 through the IOAPIC work?
+ */
+ unmask_IO_APIC_irq(0);
+ timer_ack = 0 ;
+ if (timer_irq_works()) {
+ if (nmi_watchdog == NMI_IO_APIC) {
+ disable_8259A_irq(0);
+ setup_nmi();
+ enable_8259A_irq(0);
+ check_nmi_watchdog();
+ }
+ printk(KERN_INFO "..TIMER: works OK on apic pin0 irq0\n" );
+ return;
+ }
+ /* failed */
+ timer_ack = saved_timer_ack;
+ clear_IO_APIC_pin(0, 0);
+ result = io_apic_set_pci_routing ( 0, pin1, 0, 0, 0);
+ printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to IO-APIC Pin 0\n");
+ }
+#endif
+/* end new stuff for nforce2 */
+
printk(KERN_INFO "...trying to set up timer (IRQ0) through the 8259A ... ");
if (pin2 != -1) {
printk("\n..... (found pin %d) ...", pin2);
c) Finally during my fault finding I merged A.Martins patches for the nforce2 IDE driver.
I note that the nforce2 address setup timing bits are different to the AMD ones.
I have assumed the nforce2 address timings apply to nforce and nforce3 chipsets.
I could be wrong so if someone with the nvidia docs could check it please.
I have also not tested it with anything but a WDC ata100 hard drive.
For info see attached patch files (I think pci ids are already in 2.4.23)
nforce2-amd74xx.c-2.4.22.patch, nforce2-amd74xx.h-2.4.22.patch, nforce2-pci_ids.h-2.4.22.patch
Thanks
Ross Dickson
[-- Attachment #2: ross-diffs.tar.bz2 --]
[-- Type: application/x-tbz, Size: 4375 bytes --]
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-07 13:12 Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered Ross Dickson @ 2003-12-09 15:20 ` Maciej W. Rozycki 2003-12-10 5:43 ` Ross Dickson 2003-12-10 3:39 ` Jesse Allen 1 sibling, 1 reply; 35+ messages in thread From: Maciej W. Rozycki @ 2003-12-09 15:20 UTC (permalink / raw) To: Ross Dickson; +Cc: linux-kernel, AMartin, andre, kernel On Sun, 7 Dec 2003, Ross Dickson wrote: > b) I was also disappointed to see I could not have irq0 timer IO-APIC-edge. > So I have fixed it too (tested on both my epox and albatron MOBOs). > Firstly I found 8254 connected directly to pin 0 not pin 2 of io-apic. > I have modified check_timer() in io_apic.c to trial connect pin and test for it > after the existing test for connection to io-apic. I'm pretty sure this part is bogus. Have you actually verified it either by using a hardware probe or at least by investigating documentation you really have IRQ 0 routed to the I/O APIC interrupt #0 (INTIN 0)? If no, then you can almost surely see interrupts travelling across the pair of 8259A PICS which are connected to the INTIN 0 input of the first I/O APIC in every IA32-based PC system providing an I/O APIC seen so far. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--------------------------------------------------------------+ + e-mail: macro@ds2.pg.gda.pl, PGP key available + ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-09 15:20 ` Maciej W. Rozycki @ 2003-12-10 5:43 ` Ross Dickson 2003-12-10 16:06 ` Maciej W. Rozycki 0 siblings, 1 reply; 35+ messages in thread From: Ross Dickson @ 2003-12-10 5:43 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: linux-kernel, AMartin, kernel, Ian Kumlien On Wednesday 10 December 2003 01:20, Maciej W. Rozycki wrote: > On Sun, 7 Dec 2003, Ross Dickson wrote: > > > b) I was also disappointed to see I could not have irq0 timer IO-APIC-edge. > > So I have fixed it too (tested on both my epox and albatron MOBOs). > > Firstly I found 8254 connected directly to pin 0 not pin 2 of io-apic. > > I have modified check_timer() in io_apic.c to trial connect pin and test for it > > after the existing test for connection to io-apic. > > I'm pretty sure this part is bogus. Have you actually verified it either > by using a hardware probe or at least by investigating documentation you > really have IRQ 0 routed to the I/O APIC interrupt #0 (INTIN 0)? If no, > then you can almost surely see interrupts travelling across the pair of > 8259A PICS which are connected to the INTIN 0 input of the first I/O APIC > in every IA32-based PC system providing an I/O APIC seen so far. > > -- > + Maciej W. Rozycki, Technical University of Gdansk, Poland + > +--------------------------------------------------------------+ > + e-mail: macro@ds2.pg.gda.pl, PGP key available + > > > Thanks Maciej for your response. If everyone followed published recommendations then I would agree with your comments however nvidia? et al?. I have no appropriate docs so I cannot confirm via a real hardware probe so I can only offer a software confirmation. Background musings: I was forced to approach the problems using somewhat educated guesses and with the tools I had at hand. As with most discoveries about black boxes the answer comes about by a combination of educated guess, luck and checking the unlikely. The apic delay (a) came about because the lockup problem went away when I put a debugging outb_p() statement flipping bits at the lpt port while I was trying to catch the frozen IRQ state info on my CRO. I was pleasantly surprised when the lockups ceased so I replaced the outb_p with a delay and trimmed it as best I could without docs. I did not change it within the Ack call as I realised that all the other normal apic ack paths had considerably more code delay time - although could this be a gotcha depending on what code path is in the driver. What if we had a really fast cpu or is it restricted solely to the timer irq?? Back to your query: I approached the io-apic edge with the same what if? I think I got it right but please check my following code to confirm. I have since hacked the kernel as follows. WARNING Following Mods For Debugging Only! In File i8259.c I needed to get to "cached_irq_mask" /* * This contains the irq mask for both 8259A irq controllers, */ //static unsigned int cached_irq_mask = 0xffff; debug ross unsigned int cached_irq_mask = 0xffff; In File io_apic.c I have tried to fully mask the 8259. /* * This code may look a bit paranoid, but it's supposed to cooperate with * a wide range of boards and BIOS bugs. Fortunately only the timer IRQ * is so screwy. Thanks to Brian Perkins for testing/hacking this beast * fanatically on his truly buggy board. */ // debug ross extern spinlock_t i8259A_lock; extern unsigned int cached_irq_mask; static inline void check_timer(void) { .... <snip> .... #ifdef CONFIG_ACPI_BOOT && CONFIG_X86_UP_IOAPIC /* for nforce2 try vector 0 on pin0 * Note the io_apic_set_pci_routing call disables the 8259 irq 0 * so we must be connected directly to the 8254 timer if this works * Note2: this violates the above comment re Subtle but works! */ printk(KERN_INFO "..TIMER: Is timer irq0 connected to IOAPIC Pin0? ...\n"); if ( pin1 != -1 && nr_ioapics ) { int result, tok; unsigned long flags; unsigned int saved_cached_irq_mask; unsigned char imr1, imr2; int saved_timer_ack = timer_ack; // disable all of 8259 irq's spin_lock_irqsave(&i8259A_lock, flags); saved_cached_irq_mask = cached_irq_mask; cached_irq_mask = 0xffff;; // ensure nothing restores 8259 ints outb(0xff, 0x21); /* mask all of 8259A-1 */ outb(0xff, 0xA1); /* mask all of 8259A-2 */ spin_unlock_irqrestore(&i8259A_lock, flags); /* * Ok, does IRQ0 through the IOAPIC work? */ /* next call also disables 8259 irq0 */ result = io_apic_set_pci_routing ( 0, 0, 0, 0, 0); unmask_IO_APIC_irq(0); timer_ack = 0 ; spin_lock_irqsave(&i8259A_lock, flags); imr1 = inb(0x21); imr2 = inb(0xA1); printk("..TIMER check 8259 ints disabled, imr1:%02x, imr2:%02x\n", imr1, imr2); tok = timer_irq_works(); spin_unlock_irqrestore(&i8259A_lock, flags); // restore 8259 mask spin_lock_irqsave(&i8259A_lock, flags); cached_irq_mask = saved_cached_irq_mask; outb( cached_irq_mask & 0xff, 0x21 ); /* restore all of 8259A-1 */ outb( cached_irq_mask >> 8, 0xA1 ); /* restore all of 8259A-2 */ spin_unlock_irqrestore(&i8259A_lock, flags); /* * Ok, does IRQ0 through the IOAPIC work? */ // unmask_IO_APIC_irq(0); // timer_ack = 0 ; // if (timer_irq_works()) { if (tok) { if (nmi_watchdog == NMI_IO_APIC) { disable_8259A_irq(0); setup_nmi(); enable_8259A_irq(0); check_nmi_watchdog(); } printk(KERN_INFO "..TIMER: works OK on apic pin0 irq0\n" ); return; } /* failed */ timer_ack = saved_timer_ack; clear_IO_APIC_pin(0, 0); result = io_apic_set_pci_routing ( 0, pin1, 0, 0, 0); printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to IO-APIC Pin 0\n"); } #endif /* end new stuff for nforce2 */ The inner spinlock around timer_irq_works() I think is redundant but I put it there for good measure. Relevant dmesg output from Albatron KM18G Pro ( this is different MOBO (same type) but this time has a barton core 2500 XP cpu). enabled ExtINT on CPU#0 ESR value before enabling vector: 00000000 ESR value after enabling vector: 00000000 ENABLING IO-APIC IRQs init IO_APIC IRQs IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected. ..TIMER: vector=0x31 pin1=2 pin2=-1 ..MP-BIOS bug: 8254 timer not connected to IO-APIC pin2 ..TIMER: Is timer irq0 connected to IOAPIC Pin0? ... IOAPIC[0]: Set PCI routing entry (2-0 -> 0x31 -> IRQ 0 Mode:0 Active:0) ..TIMER check 8259 ints disabled, imr1:ff, imr2:ff ..TIMER: works OK on apic pin0 irq0 Using local APIC timer interrupts. calibrating APIC timer ... ..... CPU clock speed is 1829.0708 MHz. ..... host bus clock speed is 332.0674 MHz. cpu: 0, clocks: 332674, slice: 166337 CPU0<T0:332672,T1:166320,D:15,S:166337,C:332674> Please advise if anyone knows of extra registers which may have been added to the nforce2 8259 core which could allow the interrupts through the masked chip core? I note that they may exist after reading your email March 21 2002 (irq FosterP4) http://www.ussg.iu.edu/hypermail/linux/kernel/0203.2/1213.html Note that I think it is safe to leave the 8259 irq(0) implicitly disabled on failure exit as the code paths following my code patch do it anyway. Regards Ross. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-10 5:43 ` Ross Dickson @ 2003-12-10 16:06 ` Maciej W. Rozycki 2003-12-11 6:55 ` Ross Dickson 0 siblings, 1 reply; 35+ messages in thread From: Maciej W. Rozycki @ 2003-12-10 16:06 UTC (permalink / raw) To: Ross Dickson; +Cc: linux-kernel, AMartin, kernel, Ian Kumlien On Wed, 10 Dec 2003, Ross Dickson wrote: > Relevant dmesg output from Albatron KM18G Pro ( this is different MOBO (same type) but > this time has a barton core 2500 XP cpu). > > enabled ExtINT on CPU#0 > ESR value before enabling vector: 00000000 > ESR value after enabling vector: 00000000 > ENABLING IO-APIC IRQs > init IO_APIC IRQs > IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected. > ..TIMER: vector=0x31 pin1=2 pin2=-1 > ..MP-BIOS bug: 8254 timer not connected to IO-APIC pin2 > ..TIMER: Is timer irq0 connected to IOAPIC Pin0? ... > IOAPIC[0]: Set PCI routing entry (2-0 -> 0x31 -> IRQ 0 Mode:0 Active:0) > ..TIMER check 8259 ints disabled, imr1:ff, imr2:ff > ..TIMER: works OK on apic pin0 irq0 > Using local APIC timer interrupts. > calibrating APIC timer ... > ..... CPU clock speed is 1829.0708 MHz. > ..... host bus clock speed is 332.0674 MHz. > cpu: 0, clocks: 332674, slice: 166337 > CPU0<T0:332672,T1:166320,D:15,S:166337,C:332674> Hmm, while this is different from what is documented in the MP Spec, it looks like the 8254 IRQ is connected to INTIN0 indeed. We can handle such a setup if the BIOS reports routing correctly. Since you invoke io_apic_set_pci_routing() I assume you use ACPI for IRQ routing information. Can you please rebuild the kernel with APIC_DEBUG set to 1 in include/asm-i386/apic.h and send me the bootstrap log? Can you please send me the output of a tool called `mptable' as well, so that I can compare the results? Maciej -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--------------------------------------------------------------+ + e-mail: macro@ds2.pg.gda.pl, PGP key available + ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-10 16:06 ` Maciej W. Rozycki @ 2003-12-11 6:55 ` Ross Dickson 2003-12-11 11:47 ` Ian Kumlien 2003-12-11 15:15 ` Maciej W. Rozycki 0 siblings, 2 replies; 35+ messages in thread From: Ross Dickson @ 2003-12-11 6:55 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: linux-kernel, AMartin, kernel, Ian Kumlien On Thursday 11 December 2003 02:06, Maciej W. Rozycki wrote: > On Wed, 10 Dec 2003, Ross Dickson wrote: > > > Relevant dmesg output from Albatron KM18G Pro ( this is different MOBO (same type) but > > this time has a barton core 2500 XP cpu). > > > > enabled ExtINT on CPU#0 > > ESR value before enabling vector: 00000000 > > ESR value after enabling vector: 00000000 > > ENABLING IO-APIC IRQs > > init IO_APIC IRQs > > IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected. > > ..TIMER: vector=0x31 pin1=2 pin2=-1 > > ..MP-BIOS bug: 8254 timer not connected to IO-APIC pin2 > > ..TIMER: Is timer irq0 connected to IOAPIC Pin0? ... > > IOAPIC[0]: Set PCI routing entry (2-0 -> 0x31 -> IRQ 0 Mode:0 Active:0) > > ..TIMER check 8259 ints disabled, imr1:ff, imr2:ff > > ..TIMER: works OK on apic pin0 irq0 > > Using local APIC timer interrupts. > > calibrating APIC timer ... > > ..... CPU clock speed is 1829.0708 MHz. > > ..... host bus clock speed is 332.0674 MHz. > > cpu: 0, clocks: 332674, slice: 166337 > > CPU0<T0:332672,T1:166320,D:15,S:166337,C:332674> > > Hmm, while this is different from what is documented in the MP Spec, it > looks like the 8254 IRQ is connected to INTIN0 indeed. We can handle such > a setup if the BIOS reports routing correctly. Since you invoke > io_apic_set_pci_routing() I assume you use ACPI for IRQ routing > information. Can you please rebuild the kernel with APIC_DEBUG set to 1 > in include/asm-i386/apic.h and send me the bootstrap log? Can you please > send me the output of a tool called `mptable' as well, so that I can > compare the results? > > Maciej > > -- > + Maciej W. Rozycki, Technical University of Gdansk, Poland + > +--------------------------------------------------------------+ > + e-mail: macro@ds2.pg.gda.pl, PGP key available + > > > Thanks Maciej, bootstrap log follows CPI: RSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x1dff3000 ACPI: FADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x1dff3040 ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x1dff7980 ACPI: DSDT (v001 NVIDIA AWRDACPI 0x00001000 MSFT 0x0100000e) @ 0x00000000 ACPI: Local APIC address 0xfee00000 Boot CPU = 0 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 Pentium(tm) Pro APIC version 16 Floating point unit present. Machine Exception supported. 64 bit compare & exchange supported. Internal APIC present. SEP present. MTRR present. PGE present. MCA present. CMOV present. PAT present. PSE present. MMX present. FXSR present. XMM present. Bootup CPU ACPI: LAPIC_NMI (acpi_id[0x00] polarity[0x1] trigger[0x1] lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec00000] global_irq_base[0x0]) IOAPIC[0]: Assigned apic_id 2 IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, IRQ 0-23 Bus #0 is ISA Int: type 3, pol 0, trig 0, bus 0, irq 0, 2-0 Int: type 0, pol 0, trig 0, bus 0, irq 1, 2-1 Int: type 0, pol 0, trig 0, bus 0, irq 3, 2-3 Int: type 0, pol 0, trig 0, bus 0, irq 4, 2-4 Int: type 0, pol 0, trig 0, bus 0, irq 5, 2-5 Int: type 0, pol 0, trig 0, bus 0, irq 6, 2-6 Int: type 0, pol 0, trig 0, bus 0, irq 7, 2-7 Int: type 0, pol 0, trig 0, bus 0, irq 8, 2-8 Int: type 0, pol 0, trig 0, bus 0, irq 9, 2-9 Int: type 0, pol 0, trig 0, bus 0, irq 10, 2-10 Int: type 0, pol 0, trig 0, bus 0, irq 11, 2-11 Int: type 0, pol 0, trig 0, bus 0, irq 12, 2-12 Int: type 0, pol 0, trig 0, bus 0, irq 13, 2-13 Int: type 0, pol 0, trig 0, bus 0, irq 14, 2-14 Int: type 0, pol 0, trig 0, bus 0, irq 15, 2-15 ACPI: INT_SRC_OVR (bus[0] irq[0x0] global_irq[0x2] polarity[0x0] trigger[0x0]) Int: type 0, pol 0, trig 0, bus 0, irq 0, 2-2 ACPI: INT_SRC_OVR (bus[0] irq[0x9] global_irq[0x9] polarity[0x1] trigger[0x3]) Int: type 0, pol 1, trig 3, bus 0, irq 9, 2-9 ACPI BALANCE SET Using ACPI (MADT) for SMP configuration information Kernel command line: splash=silent root=/dev/hda2 hdc=ide-scsi hdclun=0 ide_setup: hdc=ide-scsi ide_setup: hdclun=0 mapped APIC to ffffe000 (fee00000) mapped IOAPIC to ffffd000 (fec00000) Initializing CPU#0 Detected 1830.076 MHz processor. Console: colour VGA+ 80x25 Calibrating delay loop... 3620.86 BogoMIPS Memory: 482980k/491456k available (1800k kernel code, 8088k reserved, 622k data, 112k init, 0k highmem) Dentry cache hash table entries: 65536 (order: 7, 524288 bytes) Inode cache hash table entries: 32768 (order: 6, 262144 bytes) Mount cache hash table entries: 512 (order: 0, 4096 bytes) Buffer cache hash table entries: 32768 (order: 5, 131072 bytes) Page-cache hash table entries: 131072 (order: 7, 524288 bytes) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 512K (64 bytes/line) Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU: After generic, caps: 0383fbff c1c3fbff 00000000 00000000 CPU: Common caps: 0383fbff c1c3fbff 00000000 00000000 CPU: AMD Athlon(tm) XP 2500+ stepping 00 Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. POSIX conformance testing by UNIFIX Getting VERSION: 40010 Getting VERSION: 40010 Getting ID: 0 Getting ID: f000000 Getting LVT0: 700 Getting LVT1: 400 enabled ExtINT on CPU#0 ESR value before enabling vector: 00000000 ESR value after enabling vector: 00000000 ENABLING IO-APIC IRQs Synchronizing Arb IDs. init IO_APIC IRQs IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected. ..TIMER: vector=0x31 pin1=2 pin2=-1 ..MP-BIOS bug: 8254 timer not connected to IO-APIC pin2 ..TIMER: Is timer irq0 connected to IOAPIC Pin0? ... IOAPIC[0]: Set PCI routing entry (2-0 -> 0x31 -> IRQ 0 Mode:0 Active:0) ..TIMER check 8259 ints disabled, imr1:ff, imr2:ff ..TIMER: works OK on apic pin0 irq0 Using local APIC timer interrupts. calibrating APIC timer ... ..... CPU clock speed is 1829.0813 MHz. ..... host bus clock speed is 332.0693 MHz. cpu: 0, clocks: 332693, slice: 166346 CPU0<T0:332688,T1:166336,D:6,S:166346,C:332693> mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au) mtrr: detected mtrr type: Intel ACPI: Subsystem revision 20031002 PCI: PCI BIOS revision 2.10 entry at 0xfb4e0, last bus=2 PCI: Using configuration type 1 IOAPIC[0]: Set PCI routing entry (2-9 -> 0x71 -> IRQ 9 Mode:1 Active:0) ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: System [ACPI] (supports S0 S1 S4 S5) ACPI: PCI Root Bridge [PCI0] (00:00) PCI: Probing PCI hardware (bus 00) ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.HUB0._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.AGPB._PRT] ACPI: PCI Interrupt Link [LNK1] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNK2] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNK3] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNK4] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNK5] (IRQs 3 4 *5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LUBA] (IRQs 3 4 *5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LUBB] (IRQs 3 4 5 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LMAC] (IRQs 3 4 5 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LAPU] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LACI] (IRQs 3 4 5 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LMCI] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LSMB] (IRQs 3 4 5 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LUB2] (IRQs 3 4 *5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LFIR] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [L3CM] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LIDE] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [APC1] (IRQs 16) ACPI: PCI Interrupt Link [APC2] (IRQs 17) ACPI: PCI Interrupt Link [APC3] (IRQs 18) ACPI: PCI Interrupt Link [APC4] (IRQs 19) ACPI: PCI Interrupt Link [APC5] (IRQs *16) ACPI: PCI Interrupt Link [APCF] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCG] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCH] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCI] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCJ] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCK] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCS] (IRQs *23) ACPI: PCI Interrupt Link [APCL] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCM] (IRQs 20 21 22) ACPI: PCI Interrupt Link [AP3C] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCZ] (IRQs 20 21 22) PCI: Probing PCI hardware ACPI: PCI Interrupt Link [APCS] enabled at IRQ 23 IOAPIC[0]: Set PCI routing entry (2-23 -> 0xa9 -> IRQ 23 Mode:1 Active:0) 00:00:01[A] -> 2-23 -> IRQ 23 Pin 2-23 already programmed ACPI: PCI Interrupt Link [APCF] enabled at IRQ 20 IOAPIC[0]: Set PCI routing entry (2-20 -> 0xb1 -> IRQ 20 Mode:1 Active:0) 00:00:02[A] -> 2-20 -> IRQ 20 ACPI: PCI Interrupt Link [APCG] enabled at IRQ 22 IOAPIC[0]: Set PCI routing entry (2-22 -> 0xb9 -> IRQ 22 Mode:1 Active:0) 00:00:02[B] -> 2-22 -> IRQ 22 ACPI: PCI Interrupt Link [APCL] enabled at IRQ 21 IOAPIC[0]: Set PCI routing entry (2-21 -> 0xc1 -> IRQ 21 Mode:1 Active:0) 00:00:02[C] -> 2-21 -> IRQ 21 ACPI: PCI Interrupt Link [APCH] enabled at IRQ 20 Pin 2-20 already programmed ACPI: PCI Interrupt Link [APCI] enabled at IRQ 22 Pin 2-22 already programmed ACPI: PCI Interrupt Link [APCJ] enabled at IRQ 21 Pin 2-21 already programmed ACPI: PCI Interrupt Link [APCK] enabled at IRQ 20 Pin 2-20 already programmed ACPI: PCI Interrupt Link [APCM] enabled at IRQ 22 Pin 2-22 already programmed ACPI: PCI Interrupt Link [APCZ] enabled at IRQ 21 Pin 2-21 already programmed ACPI: PCI Interrupt Link [APC3] enabled at IRQ 18 IOAPIC[0]: Set PCI routing entry (2-18 -> 0xc9 -> IRQ 18 Mode:1 Active:0) 00:01:06[A] -> 2-18 -> IRQ 18 ACPI: PCI Interrupt Link [APC4] enabled at IRQ 19 IOAPIC[0]: Set PCI routing entry (2-19 -> 0xd1 -> IRQ 19 Mode:1 Active:0) 00:01:06[B] -> 2-19 -> IRQ 19 ACPI: PCI Interrupt Link [APC1] enabled at IRQ 16 IOAPIC[0]: Set PCI routing entry (2-16 -> 0xd9 -> IRQ 16 Mode:1 Active:0) 00:01:06[C] -> 2-16 -> IRQ 16 ACPI: PCI Interrupt Link [APC2] enabled at IRQ 17 IOAPIC[0]: Set PCI routing entry (2-17 -> 0xe1 -> IRQ 17 Mode:1 Active:0) 00:01:06[D] -> 2-17 -> IRQ 17 Pin 2-19 already programmed Pin 2-16 already programmed Pin 2-17 already programmed Pin 2-18 already programmed Pin 2-16 already programmed Pin 2-17 already programmed Pin 2-18 already programmed Pin 2-19 already programmed ACPI: PCI Interrupt Link [APC5] enabled at IRQ 16 Pin 2-16 already programmed number of MP IRQ sources: 15. number of IO-APIC #2 registers: 24. testing the IO APIC....................... IO APIC #2...... .... register #00: 02000000 ....... : physical APIC id: 02 ....... : Delivery Type: 0 ....... : LTS : 0 .... register #01: 00170011 ....... : max redirection entries: 0017 ....... : PRQ implemented: 0 ....... : IO APIC version: 0011 .... register #02: 00000000 ....... : arbitration: 00 .... IRQ redirection table: NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: 00 001 01 0 0 0 0 0 1 1 31 01 001 01 0 0 0 0 0 1 1 39 02 000 00 0 0 0 0 0 0 0 00 03 001 01 0 0 0 0 0 1 1 41 04 001 01 0 0 0 0 0 1 1 49 05 001 01 0 0 0 0 0 1 1 51 06 001 01 0 0 0 0 0 1 1 59 07 001 01 0 0 0 0 0 1 1 61 08 001 01 0 0 0 0 0 1 1 69 09 001 01 0 1 0 0 0 1 1 71 0a 001 01 0 0 0 0 0 1 1 79 0b 001 01 0 0 0 0 0 1 1 81 0c 001 01 0 0 0 0 0 1 1 89 0d 001 01 0 0 0 0 0 1 1 91 0e 001 01 0 0 0 0 0 1 1 99 0f 001 01 0 0 0 0 0 1 1 A1 10 001 01 1 1 0 0 0 1 1 D9 11 001 01 1 1 0 0 0 1 1 E1 12 001 01 1 1 0 0 0 1 1 C9 13 001 01 1 1 0 0 0 1 1 D1 14 001 01 1 1 0 0 0 1 1 B1 15 001 01 1 1 0 0 0 1 1 C1 16 001 01 1 1 0 0 0 1 1 B9 17 001 01 1 1 0 0 0 1 1 A9 IRQ to pin mappings: IRQ0 -> 0:2-> 0:0 IRQ1 -> 0:1 IRQ3 -> 0:3 IRQ4 -> 0:4 IRQ5 -> 0:5 IRQ6 -> 0:6 IRQ7 -> 0:7 IRQ8 -> 0:8 IRQ9 -> 0:9-> 0:9 IRQ10 -> 0:10 IRQ11 -> 0:11 IRQ12 -> 0:12 IRQ13 -> 0:13 IRQ14 -> 0:14 IRQ15 -> 0:15 IRQ16 -> 0:16 IRQ17 -> 0:17 IRQ18 -> 0:18 IRQ19 -> 0:19 IRQ20 -> 0:20 IRQ21 -> 0:21 IRQ22 -> 0:22 IRQ23 -> 0:23 .................................... done. PCI: Using ACPI for IRQ routing PCI: if you experience problems, try using option 'pci=noacpi' or even 'acpi=off' mptable doesn't like my bios I tried setting bios mp versions to both 1.1 and 1.4 albatron:/usr/src/mptable-2.0.15a # ./mptable -verbose =============================================================================== MPTable, version 2.0.15 Linux looking for EBDA pointer @ 0x040e, found, searching EBDA @ 0x0009fc00 searching CMOS 'top of mem' @ 0x0009f800 (638K) searching default 'top of mem' @ 0x0009fc00 (639K) searching BIOS @ 0x000f0000 MP FPS found in BIOS @ physical addr: 0x000f50b0 ------------------------------------------------------------------------------- MP Floating Pointer Structure: location: BIOS physical address: 0x000f50b0 signature: '_MP_' length: 16 bytes version: 1.1 checksum: 0x00 mode: Virtual Wire ------------------------------------------------------------------------------- MP Config Table Header: physical address: 0x0xf0c00 signature: '$ml$' base table length: 0 version: 1.6 checksum: 0x00 OEM ID: 'Ä ¸§' °öProduct ID: '( m'P OEM table pointer: 0x12d90e22 OEM table size: 7964 entry count: 7964 local APIC address: 0x1f1c1f1c extended table length: 65284 extended table checksum: 255 ------------------------------------------------------------------------------- MP Config Base Table Entries: -- MPTABLE HOSED! record type = 55 albatron:/usr/src/mptable-2.0.15a # Finally others working with kern 2.6 earlier trialled the following patch which may provide some more clues: retrieved from: http://www.kernel.org/pub/linux/kernel/people/bart/2.6.0-test11-bart1/broken-out/nforce2-apic.patch [x86] do not wrongly override mp_ExtINT IRQ From: Mathieu <cheuche+lkml@free.fr>. With this patch timer IRQ0 is correctly set to IO-APIC-edge (not XT-PIC) on nForce2 boards when using APIC and ACPI. arch/i386/kernel/mpparse.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletion(-) diff -puN arch/i386/kernel/mpparse.c~nforce2-apic arch/i386/kernel/mpparse.c --- linux-2.6.0-test11/arch/i386/kernel/mpparse.c~nforce2-apic 2003-12-08 00:12:25.782597272 +0100 +++ linux-2.6.0-test11-root/arch/i386/kernel/mpparse.c 2003-12-08 00:12:25.786596664 +0100 @@ -962,7 +962,8 @@ void __init mp_override_legacy_irq ( */ for (i = 0; i < mp_irq_entries; i++) { if ((mp_irqs[i].mpc_dstapic == intsrc.mpc_dstapic) - && (mp_irqs[i].mpc_srcbusirq == intsrc.mpc_srcbusirq)) { + && (mp_irqs[i].mpc_srcbusirq == intsrc.mpc_srcbusirq) + && (mp_irqs[i].mpc_irqtype == intsrc.mpc_irqtype)) { mp_irqs[i] = intsrc; found = 1; break; _ however the results were not completely successful as this posting shows it routing through the 8259? http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/1303.html dmesg differences: 1. after: ..TIMER: vector=0x31 pin1=2 pin2=0 before: ..TIMER: vector=0x31 pin1=2 pin2=-1 2. after: ...trying to set up timer (IRQ0) through the 8259A ... ..... (found pin 0) ...works. number of MP IRQ sources: 16. before: ...trying to set up timer (IRQ0) through the 8259A ... failed. ...trying to set up timer as Virtual Wire IRQ... failed. ...trying to set up timer as ExtINT IRQ... works. number of MP IRQ sources: 15. Perhaps someone else could get mptable to run on their machine and send you the result. Regards Ross ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-11 6:55 ` Ross Dickson @ 2003-12-11 11:47 ` Ian Kumlien 2003-12-11 9:12 ` Ross Dickson 2003-12-11 14:58 ` Jesse Allen 2003-12-11 15:15 ` Maciej W. Rozycki 1 sibling, 2 replies; 35+ messages in thread From: Ian Kumlien @ 2003-12-11 11:47 UTC (permalink / raw) To: ross; +Cc: Maciej W. Rozycki, linux-kernel, AMartin, kernel [-- Attachment #1: Type: text/plain, Size: 4023 bytes --] On Thu, 2003-12-11 at 07:55, Ross Dickson wrote: > albatron:/usr/src/mptable-2.0.15a # ./mptable -verbose > > =============================================================================== > > MPTable, version 2.0.15 Linux > > looking for EBDA pointer @ 0x040e, found, searching EBDA @ 0x0009fc00 > searching CMOS 'top of mem' @ 0x0009f800 (638K) > searching default 'top of mem' @ 0x0009fc00 (639K) > searching BIOS @ 0x000f0000 > > MP FPS found in BIOS @ physical addr: 0x000f50b0 > > ------------------------------------------------------------------------------- > > MP Floating Pointer Structure: > > location: BIOS > physical address: 0x000f50b0 > signature: '_MP_' > length: 16 bytes > version: 1.1 > checksum: 0x00 > mode: Virtual Wire > > ------------------------------------------------------------------------------- > > MP Config Table Header: > > physical address: 0x0xf0c00 > signature: '$ml$' > base table length: 0 > version: 1.6 > checksum: 0x00 > OEM ID: 'Ä > ¸§' > °öProduct ID: '( > m'P > OEM table pointer: 0x12d90e22 > OEM table size: 7964 > entry count: 7964 > local APIC address: 0x1f1c1f1c > extended table length: 65284 > extended table checksum: 255 > > ------------------------------------------------------------------------------- > > MP Config Base Table Entries: > > -- > MPTABLE HOSED! record type = 55 > albatron:/usr/src/mptable-2.0.15a # > > Perhaps someone else could get mptable to run on their machine and send you > the result. mptable dosn't seem to accept it's own options, anyways, heres the output. mptable -extra -verbose -pirq =============================================================================== MPTable, version 2.0.15 Linux looking for EBDA pointer @ 0x040e, found, searching EBDA @ 0x0009fc00 searching CMOS 'top of mem' @ 0x0009f800 (638K) searching default 'top of mem' @ 0x0009fc00 (639K) searching BIOS @ 0x000f0000 MP FPS found in BIOS @ physical addr: 0x000f5ce0 ------------------------------------------------------------------------------- MP Floating Pointer Structure: location: BIOS physical address: 0x000f5ce0 signature: '_MP_' length: 16 bytes version: 1.1 checksum: 0x00 mode: Virtual Wire ------------------------------------------------------------------------------- MP Config Table Header: physical address: 0x0xf0c00 signature: '' base table length: 1280 version: 1.7 checksum: 0x00 OEM ID: '' Product ID: '' OEM table pointer: 0x0000ffff OEM table size: 0 entry count: 65535 local APIC address: 0x000000c4 extended table length: 1 extended table checksum: 0 ------------------------------------------------------------------------------- MP Config Base Table Entries: -- Processors: APIC ID Version State Family Model Step Flags 0 0x 7 BSP, usable 15 15 15 0x1a00c035 0 0x 0 AP, unusable 0 0 10 0x78ffff0a -- MPTABLE HOSED! record type = 15 I couldn't find the source so i used a old RedHat rpm... (Asus A7N8X-X bios 1007) -- Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-11 11:47 ` Ian Kumlien @ 2003-12-11 9:12 ` Ross Dickson 2003-12-11 17:52 ` Ian Kumlien 2003-12-11 14:58 ` Jesse Allen 1 sibling, 1 reply; 35+ messages in thread From: Ross Dickson @ 2003-12-11 9:12 UTC (permalink / raw) To: Ian Kumlien; +Cc: Maciej W. Rozycki, linux-kernel, AMartin, kernel On Thursday 11 December 2003 21:47, Ian Kumlien wrote: > On Thu, 2003-12-11 at 07:55, Ross Dickson wrote: > > albatron:/usr/src/mptable-2.0.15a # ./mptable -verbose > > > > =============================================================================== > > > > MPTable, version 2.0.15 Linux > > > > looking for EBDA pointer @ 0x040e, found, searching EBDA @ 0x0009fc00 > > searching CMOS 'top of mem' @ 0x0009f800 (638K) > > searching default 'top of mem' @ 0x0009fc00 (639K) > > searching BIOS @ 0x000f0000 > > > > MP FPS found in BIOS @ physical addr: 0x000f50b0 > > > > ------------------------------------------------------------------------------- > > > > MP Floating Pointer Structure: > > > > location: BIOS > > physical address: 0x000f50b0 > > signature: '_MP_' > > length: 16 bytes > > version: 1.1 > > checksum: 0x00 > > mode: Virtual Wire > > > > ------------------------------------------------------------------------------- > > > > MP Config Table Header: > > > > physical address: 0x0xf0c00 > > signature: '$ml$' > > base table length: 0 > > version: 1.6 > > checksum: 0x00 > > OEM ID: 'Ä > > ¸§' > > °öProduct ID: '( > > m'P > > OEM table pointer: 0x12d90e22 > > OEM table size: 7964 > > entry count: 7964 > > local APIC address: 0x1f1c1f1c > > extended table length: 65284 > > extended table checksum: 255 > > > > ------------------------------------------------------------------------------- > > > > MP Config Base Table Entries: > > > > -- > > MPTABLE HOSED! record type = 55 > > albatron:/usr/src/mptable-2.0.15a # > > > > > Perhaps someone else could get mptable to run on their machine and send you > > the result. > > mptable dosn't seem to accept it's own options, anyways, heres the > output. > > mptable -extra -verbose -pirq > > =============================================================================== > > MPTable, version 2.0.15 Linux > > looking for EBDA pointer @ 0x040e, found, searching EBDA @ 0x0009fc00 > searching CMOS 'top of mem' @ 0x0009f800 (638K) > searching default 'top of mem' @ 0x0009fc00 (639K) > searching BIOS @ 0x000f0000 > > MP FPS found in BIOS @ physical addr: 0x000f5ce0 > > ------------------------------------------------------------------------------- > > MP Floating Pointer Structure: > > location: BIOS > physical address: 0x000f5ce0 > signature: '_MP_' > length: 16 bytes > version: 1.1 > checksum: 0x00 > mode: Virtual Wire > > ------------------------------------------------------------------------------- > > MP Config Table Header: > > physical address: 0x0xf0c00 > signature: '' > base table length: 1280 > version: 1.7 > checksum: 0x00 > OEM ID: '' > Product ID: '' > OEM table pointer: 0x0000ffff > OEM table size: 0 > entry count: 65535 > local APIC address: 0x000000c4 > extended table length: 1 > extended table checksum: 0 > > ------------------------------------------------------------------------------- > > MP Config Base Table Entries: > > -- > Processors: APIC ID Version State Family Model Step Flags > 0 0x 7 BSP, usable 15 15 15 0x1a00c035 > 0 0x 0 AP, unusable 0 0 10 0x78ffff0a > -- > MPTABLE HOSED! record type = 15 > > I couldn't find the source so i used a old RedHat rpm... > (Asus A7N8X-X bios 1007) > > -- > Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net > Thanks Ian Also many thanks for pointing out the relevant section to look in with the AMD cpu link that you sent - Credit where credit is due (assuming we are both on the right track). I had a read and refined your surmisings. I think the problem appears synchronous with the apic timer because of two reasons. 1) any apic irq can cause re-connection of the system bus after disconnect. 2) the apic timer irq in my examinations has the shortest path to an ack. I also had a look back through the athlon cooler and power management postings and web site articles. I was blissfully ignorant of these issues when I started and now I wonder what I have stepped into... Yuk I submitted a support request to AMD, apologies for not cc'ing you, I kept the cc's down to just nvidia and the mailing list. If you have not seen it yet then it is here http://lkml.org/lkml/2003/12/11/17 We hope.... Regards Ross ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-11 9:12 ` Ross Dickson @ 2003-12-11 17:52 ` Ian Kumlien 2003-12-11 18:21 ` Jesse Allen 0 siblings, 1 reply; 35+ messages in thread From: Ian Kumlien @ 2003-12-11 17:52 UTC (permalink / raw) To: ross; +Cc: macro, linux-kernel, AMartin, kernel [-- Attachment #1: Type: text/plain, Size: 1456 bytes --] On Thu, 2003-12-11 at 10:12, Ross Dickson wrote: > On Thursday 11 December 2003 21:47, Ian Kumlien wrote: > Thanks Ian > > Also many thanks for pointing out the relevant section to look in with the AMD > cpu link that you sent - Credit where credit is due (assuming we are both on the > right track). Heh, thanks, feels nice to have someone who agrees with you =). > I had a read and refined your surmisings. I think the > problem appears synchronous with the apic timer because of two reasons. > 1) any apic irq can cause re-connection of the system bus after disconnect. > 2) the apic timer irq in my examinations has the shortest path to an ack. http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24416.pdf Page 42 and 94 might help as well. I haven't grasped it all or had any food yet but i hope i'm right =) > I also had a look back through the athlon cooler and power management > postings and web site articles. I was blissfully ignorant of these issues when I > started and now I wonder what I have stepped into... Yuk Heh, yeah, the need for disconnect is somewhat dodgy, i haven't read up on th rest. > I submitted a support request to AMD, apologies for not cc'ing you, I kept > the cc's down to just nvidia and the mailing list. If you have not seen it yet > then it is here Thanks > We hope.... Yup... -- Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-11 17:52 ` Ian Kumlien @ 2003-12-11 18:21 ` Jesse Allen 2003-12-12 9:27 ` Bob 0 siblings, 1 reply; 35+ messages in thread From: Jesse Allen @ 2003-12-11 18:21 UTC (permalink / raw) To: Ian Kumlien; +Cc: linux-kernel On Thu, Dec 11, 2003 at 06:52:41PM +0100, Ian Kumlien wrote: > Heh, yeah, the need for disconnect is somewhat dodgy, i haven't read up > on th rest. > Hmm, weird. I went to go look at the Shuttle motherboard maker's site - maybe so that I can bug them for a bios disconnect option - but I checked for a bios update first. And sure enough like they read my mind, just posted online today, an update. Here are the details of fixes: " Checksum: 8B00H Date Code: 12/05/03 1.Support 0.18 micron AMD Duron (Palomino) CPU. 2.Add C1 disconnect item." It's almost as they're reading this list. This disconnect problem was discovered on the 5th (well the 5th in my timezone). Perhaps they're aware of this issue... I'm gonna talk to them. Jesse ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-11 18:21 ` Jesse Allen @ 2003-12-12 9:27 ` Bob 2003-12-12 16:59 ` Working nforce2, was " Jesse Allen 0 siblings, 1 reply; 35+ messages in thread From: Bob @ 2003-12-12 9:27 UTC (permalink / raw) To: linux-kernel Jesse Allen wrote: >On Thu, Dec 11, 2003 at 06:52:41PM +0100, Ian Kumlien wrote: > > >>Heh, yeah, the need for disconnect is somewhat dodgy, i haven't read up >>on th rest. >> >Hmm, weird. I went to go look at the Shuttle motherboard maker's site - maybe so that I can bug them for a bios disconnect option - but I checked for a bios update first. And sure enough like they read my mind, just posted online today, an update. Here are the details of fixes: > >" Checksum: 8B00H Date Code: 12/05/03 >1.Support 0.18 micron AMD Duron (Palomino) CPU. >2.Add C1 disconnect item." > >It's almost as they're reading this list. This disconnect problem was discovered on the 5th (well the 5th in my timezone). Perhaps they're aware of this issue... I'm gonna talk to them. > >Jesse > A bios update for MSI K7N2 MCP2-T nforce2 board fixed the crashing BEFORE these patches were developed, but there was no documentation that would relate or explain. http://www.msi.com.tw/program/support/bios/bos/spt_bos_detail.php?UID=436&kind=1 http://download.msi.com.tw/support/bos_exe/6570v76.exe Award 7.6 at the top of the list. Maybe somebody can figure out what they're doing. Nvidia X driver for ti4200 agp8 still locks up linux though, but X nv works fine. agp8 3d may expose the timer issue. -Bob ^ permalink raw reply [flat|nested] 35+ messages in thread
* Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-12 9:27 ` Bob @ 2003-12-12 16:59 ` Jesse Allen 2003-12-12 17:18 ` Jesse Allen ` (2 more replies) 0 siblings, 3 replies; 35+ messages in thread From: Jesse Allen @ 2003-12-12 16:59 UTC (permalink / raw) To: linux-kernel On Fri, Dec 12, 2003 at 04:27:59AM -0500, Bob wrote: > Jesse Allen wrote: > > >On Thu, Dec 11, 2003 at 06:52:41PM +0100, Ian Kumlien wrote: > > > > > >>Heh, yeah, the need for disconnect is somewhat dodgy, i haven't read up > >>on th rest. > >> > >Hmm, weird. I went to go look at the Shuttle motherboard maker's site - > >maybe so that I can bug them for a bios disconnect option - but I checked > >for a bios update first. And sure enough like they read my mind, just > >posted online today, an update. Here are the details of fixes: > > > >" Checksum: 8B00H Date Code: 12/05/03 > >1.Support 0.18 micron AMD Duron (Palomino) CPU. > >2.Add C1 disconnect item." > > > >It's almost as they're reading this list. This disconnect problem was > >discovered on the 5th (well the 5th in my timezone). Perhaps they're > >aware of this issue... I'm gonna talk to them. > > > >Jesse > > > A bios update for MSI K7N2 MCP2-T nforce2 board > fixed the crashing BEFORE these patches were developed, > but there was no documentation that would relate or explain. Last night, I updated the bios to the 12-5-03 released yesterday (see above). I looked at the new option under Advanced Chipset Features, "C1 Disconnect". It has three selections: Auto, Enabled, Disabled. There seems to be no default. The item help says: "Force En/Disabled or Auto mode: C17 IGP/SPP NB A03 C18D SPP NM A01 (C01) enabled C1 disconnect otherwise disabled it" Auto sounded nice, so I selected that first. I compiled a new kernel without the disconnect off patch, or the ack delay. These are the exact patches I used on 2.6.0-test11: patch-2.6.0-test11-bk8.bz2 acpi-2.6.0t11.patch acpi bugfixes from Maciej. nforce-ioapic-timer-2.6t11.patch from Ross Dickson. Timer patch. forcedeth.patch Patch stolen from -test10-mm1? Unused. forcedeth-update-2.patch Same. Sure enough, under this kernel, no lockups. Athcool reported Disconnect was "on". I decided to wait till this morning, to try the BIOS "C1 Disconnect" set to enabled. Still no lockups under this kernel. Tried a vanilla kernel, no lockups (but timer and watchdog messed up still). Now that I read your message Bob, I understand what you are saying. Luckily, the updated BIOS changelog states "Add C1 disconnect item." And this exact version seems to have fixed it, and now we have an exact fix (another one?) to refer to. So the fix was absolutely a BIOS fix. It seems a lot of people have buggy BIOSes on nforce2 boards. Even some that have the option. I guess I haven't proved that it was the BIOS fix, because I haven't stressed it for a long period of time. But I don't believe I have to because I can do grep's and kernel compiles with disconnect on now, where before I couldn't (always been very easy to reproduce lockup). > > http://www.msi.com.tw/program/support/bios/bos/spt_bos_detail.php?UID=436&kind=1 > http://download.msi.com.tw/support/bos_exe/6570v76.exe > > Award 7.6 at the top of the list. Maybe somebody can figure > out what they're doing. I think I'll continue on contacting shuttle and ask them why they added the option, and how they added it. Maybe that will give us the right information. > > Nvidia X driver for ti4200 agp8 still locks up linux though, > but X nv works fine. agp8 3d may expose the timer issue. > That's either an nvidia driver problem, or agpgart-nforce problem. I'd try 4x agp, and or NVAGP (or agpgart, if already using NVAGP). If you think it's the timer, try the timer patch, or with nolapic noapic. Jesse ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-12 16:59 ` Working nforce2, was " Jesse Allen @ 2003-12-12 17:18 ` Jesse Allen 2003-12-12 18:18 ` Josh McKinney 2003-12-13 6:34 ` Bob 2 siblings, 0 replies; 35+ messages in thread From: Jesse Allen @ 2003-12-12 17:18 UTC (permalink / raw) To: linux-kernel Oops, typo: NM supposed to be NB On Fri, Dec 12, 2003 at 09:59:29AM -0700, Jesse Allen wrote: > The item help says: > "Force En/Disabled > or Auto mode: > C17 IGP/SPP NB A03 > C18D SPP NM A01 (C01) C18D SPP /NB/ A01 (C01) > enabled C1 disconnect > otherwise disabled it" > Maybe NB means northbridge? ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-12 16:59 ` Working nforce2, was " Jesse Allen 2003-12-12 17:18 ` Jesse Allen @ 2003-12-12 18:18 ` Josh McKinney 2003-12-12 19:29 ` Jesse Allen ` (2 more replies) 2003-12-13 6:34 ` Bob 2 siblings, 3 replies; 35+ messages in thread From: Josh McKinney @ 2003-12-12 18:18 UTC (permalink / raw) To: linux-kernel On approximately Fri, Dec 12, 2003 at 09:59:29AM -0700, Jesse Allen wrote: > On Fri, Dec 12, 2003 at 04:27:59AM -0500, Bob wrote: > > Jesse Allen wrote: > > > > >On Thu, Dec 11, 2003 at 06:52:41PM +0100, Ian Kumlien wrote: > > > > > > > > >>Heh, yeah, the need for disconnect is somewhat dodgy, i haven't read up > > >>on th rest. > > >> > > >Hmm, weird. I went to go look at the Shuttle motherboard maker's site - > > >maybe so that I can bug them for a bios disconnect option - but I checked > > >for a bios update first. And sure enough like they read my mind, just > > >posted online today, an update. Here are the details of fixes: > > > > > >" Checksum: 8B00H Date Code: 12/05/03 > > >1.Support 0.18 micron AMD Duron (Palomino) CPU. > > >2.Add C1 disconnect item." > > > > > >It's almost as they're reading this list. This disconnect problem was > > >discovered on the 5th (well the 5th in my timezone). Perhaps they're > > >aware of this issue... I'm gonna talk to them. > > > > > >Jesse > > > > > A bios update for MSI K7N2 MCP2-T nforce2 board > > fixed the crashing BEFORE these patches were developed, > > but there was no documentation that would relate or explain. > > Last night, I updated the bios to the 12-5-03 released yesterday (see above). I looked at the new option under Advanced Chipset Features, "C1 Disconnect". It has three selections: Auto, Enabled, Disabled. There seems to be no default. The item help says: > "Force En/Disabled > or Auto mode: > C17 IGP/SPP NB A03 > C18D SPP NM A01 (C01) > enabled C1 disconnect > otherwise disabled it" > > Auto sounded nice, so I selected that first. I compiled a new kernel without the disconnect off patch, or the ack delay. These are the exact patches I used on 2.6.0-test11: > patch-2.6.0-test11-bk8.bz2 > acpi-2.6.0t11.patch acpi bugfixes from Maciej. > nforce-ioapic-timer-2.6t11.patch from Ross Dickson. Timer patch. > forcedeth.patch Patch stolen from -test10-mm1? Unused. > forcedeth-update-2.patch Same. > > Sure enough, under this kernel, no lockups. Athcool reported Disconnect was "on". > <snip> > So the fix was absolutely a BIOS fix. It seems a lot of people have buggy BIOSes on nforce2 boards. Even some that have the option. I guess I haven't proved that it was the BIOS fix, because I haven't stressed it for a long period of time. But I don't believe I have to because I can do grep's and kernel compiles with disconnect on now, where before I couldn't (always been very easy to reproduce lockup). <snip> The thing that strikes me funny is that you get no crashes with the updated BIOS and Disconnect on, but without the updated BIOS we have to turn disconnect off with athcool or the patch? This makes me think that there is some voodoo going on in the BIOS update that they aren't saying, surprise surprise, or something is just slowing down the time it takes for it to crash. I say this because I have gone 5+ days without any of the patches from these threads, acpi apic lapic enabled, and CPU disconnect on as stated by athcool. This was with much stress testing, idle time, etc. One day I just ran a grep that I have done probably 30 times and boom, hang. Good luck, hope the BIOS is the trick, now off to see how I can get ASUS to put the C1 Disconnect in the next revision. -- Josh McKinney | Webmaster: http://joshandangie.org -------------------------------------------------------------------------- | They that can give up essential liberty Linux, the choice -o) | to obtain a little temporary safety deserve of the GNU generation /\ | neither liberty or safety. _\_v | -Benjamin Franklin ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-12 18:18 ` Josh McKinney @ 2003-12-12 19:29 ` Jesse Allen 2003-12-12 21:42 ` Craig Bradney 2003-12-13 4:18 ` Bob 2 siblings, 0 replies; 35+ messages in thread From: Jesse Allen @ 2003-12-12 19:29 UTC (permalink / raw) To: Josh McKinney; +Cc: linux-kernel On Fri, Dec 12, 2003 at 01:18:27PM -0500, Josh McKinney wrote: > > The thing that strikes me funny is that you get no crashes with the > updated BIOS and Disconnect on, but without the updated BIOS we have > to turn disconnect off with athcool or the patch? This makes me think > that there is some voodoo going on in the BIOS update that they aren't > saying, surprise surprise, Yes, it is weird. I've now asked shuttle for more information. > or something is just slowing down the time > it takes for it to crash. I say this because I have gone 5+ days > without any of the patches from these threads, acpi apic lapic > enabled, and CPU disconnect on as stated by athcool. This was with > much stress testing, idle time, etc. One day I just ran a grep that I > have done probably 30 times and boom, hang. I hope this is not the case! The one/two grep test worked flawlessly, but now if it's delayed, then I can't do that anymore. (but at least I have the bios option now! heh) I suggest you reference the Shuttle AN35 12-05-2003 BIOS, and maybe Bob's MSI, when you talk to Asus. If they can do it, then Asus should be able as well. Jesse ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-12 18:18 ` Josh McKinney 2003-12-12 19:29 ` Jesse Allen @ 2003-12-12 21:42 ` Craig Bradney 2003-12-13 4:18 ` Bob 2 siblings, 0 replies; 35+ messages in thread From: Craig Bradney @ 2003-12-12 21:42 UTC (permalink / raw) To: Josh McKinney; +Cc: linux-kernel On Fri, 2003-12-12 at 19:18, Josh McKinney wrote: > On approximately Fri, Dec 12, 2003 at 09:59:29AM -0700, Jesse Allen wrote: > > On Fri, Dec 12, 2003 at 04:27:59AM -0500, Bob wrote: > > > Jesse Allen wrote: > > > > > > >On Thu, Dec 11, 2003 at 06:52:41PM +0100, Ian Kumlien wrote: > > > > > > > > > > > >>Heh, yeah, the need for disconnect is somewhat dodgy, i haven't read up > > > >>on th rest. > > > >> > > > >Hmm, weird. I went to go look at the Shuttle motherboard maker's site - > > > >maybe so that I can bug them for a bios disconnect option - but I checked > > > >for a bios update first. And sure enough like they read my mind, just > > > >posted online today, an update. Here are the details of fixes: > > > > > > > >" Checksum: 8B00H Date Code: 12/05/03 > > > >1.Support 0.18 micron AMD Duron (Palomino) CPU. > > > >2.Add C1 disconnect item." > > > > > > > >It's almost as they're reading this list. This disconnect problem was > > > >discovered on the 5th (well the 5th in my timezone). Perhaps they're > > > >aware of this issue... I'm gonna talk to them. > > > > > > > >Jesse > > > > > > > A bios update for MSI K7N2 MCP2-T nforce2 board > > > fixed the crashing BEFORE these patches were developed, > > > but there was no documentation that would relate or explain. > > > > Last night, I updated the bios to the 12-5-03 released yesterday (see above). I looked at the new option under Advanced Chipset Features, "C1 Disconnect". It has three selections: Auto, Enabled, Disabled. There seems to be no default. The item help says: > > "Force En/Disabled > > or Auto mode: > > C17 IGP/SPP NB A03 > > C18D SPP NM A01 (C01) > > enabled C1 disconnect > > otherwise disabled it" > > > > Auto sounded nice, so I selected that first. I compiled a new kernel without the disconnect off patch, or the ack delay. These are the exact patches I used on 2.6.0-test11: > > patch-2.6.0-test11-bk8.bz2 > > acpi-2.6.0t11.patch acpi bugfixes from Maciej. > > nforce-ioapic-timer-2.6t11.patch from Ross Dickson. Timer patch. > > forcedeth.patch Patch stolen from -test10-mm1? Unused. > > forcedeth-update-2.patch Same. > > > > Sure enough, under this kernel, no lockups. Athcool reported Disconnect was "on". > > > <snip> > > So the fix was absolutely a BIOS fix. It seems a lot of people have buggy BIOSes on nforce2 boards. Even some that have the option. I guess I haven't proved that it was the BIOS fix, because I haven't stressed it for a long period of time. But I don't believe I have to because I can do grep's and kernel compiles with disconnect on now, where before I couldn't (always been very easy to reproduce lockup). > <snip> > > The thing that strikes me funny is that you get no crashes with the > updated BIOS and Disconnect on, but without the updated BIOS we have > to turn disconnect off with athcool or the patch? This makes me think > that there is some voodoo going on in the BIOS update that they aren't > saying, surprise surprise, or something is just slowing down the time > it takes for it to crash. I say this because I have gone 5+ days > without any of the patches from these threads, acpi apic lapic > enabled, and CPU disconnect on as stated by athcool. This was with > much stress testing, idle time, etc. One day I just ran a grep that I > have done probably 30 times and boom, hang. > > Good luck, hope the BIOS is the trick, now off to see how I can get > ASUS to put the C1 Disconnect in the next revision. Yes, thats how it was for me.. I was the only one here saying "no problems, la la la", then at about 5.25 days.. boom. Then the next day it crashed twice. Hopefully you make some progress with ASUS.. (for the A7N8X Deluxe as well as you mobo please :) ). Ive been playing with hardware in the past few days (new quieter Zalman PSU, and Zalman 7000 Cu fan etc) so no uptime to speak of here now. I did compile KDE 3.2 beta 2 last night though.. 6 hours of solid compilation.. no hassles. I have never turned off Disconnect either. Thanks to all you guys who are working on this one. Seems to be getting somewhere. Craig ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-12 18:18 ` Josh McKinney 2003-12-12 19:29 ` Jesse Allen 2003-12-12 21:42 ` Craig Bradney @ 2003-12-13 4:18 ` Bob 2 siblings, 0 replies; 35+ messages in thread From: Bob @ 2003-12-13 4:18 UTC (permalink / raw) To: linux-kernel Re: two instances of good but undocumented bios voodoo Josh McKinney wrote: >On approximately Fri, Dec 12, 2003 at 09:59:29AM -0700, Jesse Allen wrote: > > >>On Fri, Dec 12, 2003 at 04:27:59AM -0500, Bob wrote: >> >> >>>Jesse Allen wrote: >>> >>> >>> >>>>On Thu, Dec 11, 2003 at 06:52:41PM +0100, Ian Kumlien wrote: >>>> >>>> >>>> >>>> >>>> ............ >>>> >>>>but I checked >>>>for a bios update first. And sure enough like they read my mind, just >>>>posted online today, an update. Here are the details of fixes: >>>> >>>>" Checksum: 8B00H Date Code: 12/05/03 >>>>1.Support 0.18 micron AMD Duron (Palomino) CPU. >>>>2.Add C1 disconnect item."..........Jesse >>>> >>>> -Jesse got a bios update that gives him a cpu disconnect option now in setup >>>> >>>> >>>A bios update for MSI K7N2 MCP2-T nforce2 board >>>fixed the crashing BEFORE these patches were developed, >>>but there was no documentation that would relate or explain. >>> >>> -Bob said that about his bios update fixing the lockup problem entirely, but no doc, needing no patch except to turn on ioapic edge timer(another clue--without ioapic edge timer working bios update fixed this nforce2 situation!), no clue as to whether bios update sets cpu disconnect one way or the other, no opt to choose cpu disconnect in new or old setup. Jesse continues-- >>Last night, I updated the bios to the 12-5-03 released yesterday (see above). I looked at the new option under Advanced Chipset Features, "C1 Disconnect". It has three selections: Auto, Enabled, Disabled. There seems to be no default. The item help says: >>"Force En/Disabled >> or Auto mode: >> C17 IGP/SPP NB A03 >> C18D SPP NM A01 (C01) >> enabled C1 disconnect >> otherwise disabled it" >> >>Auto sounded nice, so I selected that first. I compiled a new kernel without the disconnect off patch, or the ack delay. These are the exact patches I used on 2.6.0-test11: >>patch-2.6.0-test11-bk8.bz2 >>acpi-2.6.0t11.patch acpi bugfixes from Maciej. >>nforce-ioapic-timer-2.6t11.patch from Ross Dickson. Timer patch. >>forcedeth.patch Patch stolen from -test10-mm1? Unused. >>forcedeth-update-2.patch Same. >> >>Sure enough, under this kernel, no lockups. Athcool reported Disconnect was "on". >> >> Disconnect was ON!!! > <snip> ...one case the bios update fixed the problem without needing cpu disconnect off, the other case we don't know how or whether cpu disconnect is on or off now but bios update fixed nforce2 without turning ioapic edge timer on. I guess these two case prove that neither cpu disconnect =on or ioapic timer =off are causing the problem directly. >The thing that strikes me funny is that you get no crashes with the >updated BIOS and Disconnect on, but without the updated BIOS we have >to turn disconnect off with athcool or the patch? This makes me think >that there is some voodoo going on in the BIOS update that they aren't >saying, surprise surprise, or something is just slowing down the time >it takes for it to crash. I say this because I have gone 5+ days >without any of the patches from these threads, acpi apic lapic >enabled, and CPU disconnect on as stated by athcool. This was with >much stress testing, idle time, etc. One day I just ran a grep that I >have done probably 30 times and boom, hang. > >Good luck, hope the BIOS is the trick, now off to see how I can get >ASUS to put the C1 Disconnect in the next revision. > ...and at least two motherboard makers have voodoo to fix the problem. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-12 16:59 ` Working nforce2, was " Jesse Allen 2003-12-12 17:18 ` Jesse Allen 2003-12-12 18:18 ` Josh McKinney @ 2003-12-13 6:34 ` Bob 2 siblings, 0 replies; 35+ messages in thread From: Bob @ 2003-12-13 6:34 UTC (permalink / raw) To: linux-kernel hackers be clever-- "system temperature was getting -- above 40 deg C. CPU was getting up to 49 deg C...how poorly it's thermal management was operating then. Now with the new patches, and ultimately, BIOS update, system temperature is about 35 deg C -JesseAllen" Maybe that tells me that my bios update fixed my lockup problems without turning on cpu disconnect or even by turning it off with no doc as face-saver and not allowing me to see a choice in setup, since like yours before cpu disconnect working my temp is 41C most of the time and 48C under a heavy load, possibly 49C, the exact range you are looking at before you had cpu disconnect working or they turned cpu disconnect off without saying anything, buying time, saving embarrassment anyway it's probably off here since I have exactly the same heat profile I have 120mm fans one in one out, blowing air across Zalman cpu and gpu heatsinks, no 80mm extra Zalman fan. amd xp 3000+ 333mhz 1:1 arctic silver compound on heatsinks Thermal 1: ok, 41.0 degrees C 105.8 degrees F - 41C in X, running realplayer - 48C compile a fat kernel or several heavy tasks -Bob Jesse Allen wrote: > ....I compiled a new kernel without the disconnect off patch, or the > ack delay. These are the exact patches I used on 2.6.0-test11: > >patch-2.6.0-test11-bk8.bz2 >acpi-2.6.0t11.patch acpi bugfixes from Maciej. >nforce-ioapic-timer-2.6t11.patch from Ross Dickson. Timer patch. >forcedeth.patch Patch stolen from -test10-mm1? Unused. >forcedeth-update-2.patch Same. > >Sure enough, under this kernel, no lockups. Athcool reported Disconnect was "on". > >I decided to wait till this morning, to try the BIOS "C1 Disconnect" set to enabled. Still no lockups under this kernel. Tried a vanilla kernel, no lockups (but timer and watchdog messed up still). Now that I read your message Bob, I understand what you are saying. Luckily, the updated BIOS changelog states "Add C1 disconnect item." And this exact version seems to have fixed it, and now we have an exact fix (another one?) to refer to. > >So the fix was absolutely a BIOS fix. > ...but we're stuck looking at smoke and mirrors, when the kernel might be able to work around bioses that have not been "updated". Or to put it another way, "voodoo" may be done by kernel if not done by bios. Whatever is being tweaked may be accessible to kernel code. I can't read anything useful in my bios flash file w6570nms.760 which is contained in-- >>http://download.msi.com.tw/support/bos_exe/6570v76.exe >> >>Nvidia X driver for ti4200 agp8 still locks up linux though, >>but X nv works fine. agp8 3d may expose the timer issue. >> >> >> > >That's either an nvidia driver problem, or agpgart-nforce problem. I'd try 4x agp, and or NVAGP (or agpgart, if already using NVAGP). If you think it's the timer, try the timer patch, or with nolapic noapic. > >Jesse > Thanks, I've tried all of those except passing agp4 or agp2 to the nvidia X "nvidia" driver. Another clue that it's related to interrupts or timing of access to interrupts is that before I put another card on the pci bus I could get into X for a few seconds with the nvidia driver before linux locked up, now with an elan pcmcia 32-bit cardbus pci card that claims it needs its own interrupt(can't give it one yet!) X just locks up linux on load. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-11 11:47 ` Ian Kumlien 2003-12-11 9:12 ` Ross Dickson @ 2003-12-11 14:58 ` Jesse Allen 2003-12-11 15:20 ` Craig Bradney 1 sibling, 1 reply; 35+ messages in thread From: Jesse Allen @ 2003-12-11 14:58 UTC (permalink / raw) To: Ian Kumlien; +Cc: linux-kernel, ross, macro My mptable output looks pretty weird. (Product ID "ny Key "?) It doesn't even compare to the other two. I have a shuttle AN35N. =============================================================================== MPTable, version 2.0.15 Linux ------------------------------------------------------------------------------- MP Floating Pointer Structure: location: BIOS physical address: 0x000f5650 signature: '_MP_' length: 16 bytes version: 1.1 checksum: 0x00 mode: Virtual Wire ------------------------------------------------------------------------------- MP Config Table Header: physical address: 0x0xf0c00 signature: 'N ' base table length: 8224 version: 1.32 checksum: 0x20 OEM ID: ' : ' Product ID: 'ny Key ' OEM table pointer: 0x2031462d OEM table size: 17152 entry count: 29300 local APIC address: 0x32462d6c extended table length: 32 extended table checksum: 67 ------------------------------------------------------------------------------- MP Config Base Table Entries: -- MPTABLE HOSED! record type = 114 ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-11 14:58 ` Jesse Allen @ 2003-12-11 15:20 ` Craig Bradney 2003-12-11 16:05 ` Jesse Allen 0 siblings, 1 reply; 35+ messages in thread From: Craig Bradney @ 2003-12-11 15:20 UTC (permalink / raw) To: Jesse Allen; +Cc: Ian Kumlien, linux-kernel, ross, macro Not really sure what I'm looking at here but as you guys are showing this information I thought it might be helpful for those that can use it to have the information run on a Asus A7N8X Deluxe (v2.0 bios 1007) with Athlon XP 2600+. =============================================================================== MPTable, version 2.0.15 Linux ------------------------------------------------------------------------------- MP Floating Pointer Structure: location: BIOS physical address: 0x000f5ce0 signature: '_MP_' length: 16 bytes version: 1.1 checksum: 0x00 mode: Virtual Wire ------------------------------------------------------------------------------- MP Config Table Header: physical address: 0x0xf0c00 signature: ' ' base table length: 65287 version: 1.255 checksum: 0x04 OEM ID: '' Product ID: '' OEM table pointer: 0x00000704 OEM table size: 15 entry count: 3896 local APIC address: 0x00070500 extended table length: 3584 extended table checksum: 0 ------------------------------------------------------------------------------- MP Config Base Table Entries: -- Processors: APIC ID Version State Family Model Step Flags 13 0x 0 AP, usable 3 0 0 0xff070600 0 0xff BSP, usable 0 12 4 0x0001 -- MPTABLE HOSED! record type = 53 Craig On Thu, 2003-12-11 at 15:58, Jesse Allen wrote: > My mptable output looks pretty weird. (Product ID "ny Key "?) > It doesn't even compare to the other two. I have a shuttle AN35N. > > > =============================================================================== > > MPTable, version 2.0.15 Linux > > ------------------------------------------------------------------------------- > > MP Floating Pointer Structure: > > location: BIOS > physical address: 0x000f5650 > signature: '_MP_' > length: 16 bytes > version: 1.1 > checksum: 0x00 > mode: Virtual Wire > > ------------------------------------------------------------------------------- > > MP Config Table Header: > > physical address: 0x0xf0c00 > signature: 'N ' > base table length: 8224 > version: 1.32 > checksum: 0x20 > OEM ID: ' : ' > Product ID: 'ny Key ' > OEM table pointer: 0x2031462d > OEM table size: 17152 > entry count: 29300 > local APIC address: 0x32462d6c > extended table length: 32 > extended table checksum: 67 > > ------------------------------------------------------------------------------- > > MP Config Base Table Entries: > > -- > MPTABLE HOSED! record type = 114 > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-11 15:20 ` Craig Bradney @ 2003-12-11 16:05 ` Jesse Allen 0 siblings, 0 replies; 35+ messages in thread From: Jesse Allen @ 2003-12-11 16:05 UTC (permalink / raw) To: Craig Bradney; +Cc: linux-kernel On Thu, Dec 11, 2003 at 04:20:58PM +0100, Craig Bradney wrote: > Not really sure what I'm looking at here but as you guys are showing > this information I thought it might be helpful for those that can use it > to have the information run on a Asus A7N8X Deluxe (v2.0 bios 1007) with > Athlon XP 2600+. > Unfortunately, it looks as all our MP tables are invalid. So I don't think we can use them. I thought mine was especailly weird because of the Product ID seems to be pointing to a "Press Any Key" string which proves that. Jesse ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-11 6:55 ` Ross Dickson 2003-12-11 11:47 ` Ian Kumlien @ 2003-12-11 15:15 ` Maciej W. Rozycki 2003-12-11 16:23 ` Josh McKinney 1 sibling, 1 reply; 35+ messages in thread From: Maciej W. Rozycki @ 2003-12-11 15:15 UTC (permalink / raw) To: Ross Dickson, len.brown; +Cc: linux-kernel, AMartin, kernel, Ian Kumlien On Thu, 11 Dec 2003, Ross Dickson wrote: > ACPI: IOAPIC (id[0x02] address[0xfec00000] global_irq_base[0x0]) > IOAPIC[0]: Assigned apic_id 2 > IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, IRQ 0-23 > Bus #0 is ISA > Int: type 3, pol 0, trig 0, bus 0, irq 0, 2-0 I've browsed the relevant part of the ACPI spec and the above entry is incorrect. It looks like INTIN0 is now the preferred line for the 8254 timer; at least it is the default one when using ACPI tables. This is a bug in Linux. > ACPI: INT_SRC_OVR (bus[0] irq[0x0] global_irq[0x2] polarity[0x0] trigger[0x0]) > Int: type 0, pol 0, trig 0, bus 0, irq 0, 2-2 Now this is an explicit entry stating the 8254 timer is connected to INTIN2. If this is not the case, the BIOS is buggy and the solution is to fix it. I don't consider it possible to be worked around in Linux except maybe with a command line option added manually. > ACPI: INT_SRC_OVR (bus[0] irq[0x9] global_irq[0x9] polarity[0x1] trigger[0x3]) > Int: type 0, pol 1, trig 3, bus 0, irq 9, 2-9 And yet another explicit entry which has an effect on configuration as reported below. > init IO_APIC IRQs > IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected. > ..TIMER: vector=0x31 pin1=2 pin2=-1 > ..MP-BIOS bug: 8254 timer not connected to IO-APIC pin2 As reported above, the BIOS explicitly reports the timer is there. > ..TIMER: Is timer irq0 connected to IOAPIC Pin0? ... > IOAPIC[0]: Set PCI routing entry (2-0 -> 0x31 -> IRQ 0 Mode:0 Active:0) > ..TIMER check 8259 ints disabled, imr1:ff, imr2:ff > ..TIMER: works OK on apic pin0 irq0 And this may be correct if the default ACPI settings reflect the actual wiring of this board (but the BIOS says otherwise). > IRQ to pin mappings: > IRQ0 -> 0:2-> 0:0 [...] > IRQ9 -> 0:9-> 0:9 These two entries are wrong -- the interrupts are set up as if they were connected to multiple I/O APIC inputs. The first entry is a result of your hack, but the second one suggests a bug somewhere. > Finally others working with kern 2.6 earlier trialled the following patch which may provide some > more clues: > retrieved from: > > http://www.kernel.org/pub/linux/kernel/people/bart/2.6.0-test11-bart1/broken-out/nforce2-apic.patch > > [x86] do not wrongly override mp_ExtINT IRQ That's a workaround to the bug in Linux I've mentioned earlier. The bug should be fixed instead. The ACPI spec doesn't support mixed configurations, so ExtINT is irrelevant. > Perhaps someone else could get mptable to run on their machine and send you > the result. I wanted it to compare with the ACPI table and possibly to treat as a reference for a workaround. Since you have no valid MP-table, there's nothing to do. Here's a patch that fixes a few bugs I've spotted browsing through our ACPI code. Please try it and report the result. I don't have a system with ACPI available, so I cannot verify the changes at all. The same bugs are present in 2.4 and I have a corresponding patch available if some wants to test the changes with that version. Maciej -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--------------------------------------------------------------+ + e-mail: macro@ds2.pg.gda.pl, PGP key available + patch-mips-2.6.0-test11-20031209-acpi-irq0-1 diff -up --recursive --new-file linux-mips-2.6.0-test11-20031209.macro/arch/i386/kernel/mpparse.c linux-mips-2.6.0-test11-20031209/arch/i386/kernel/mpparse.c --- linux-mips-2.6.0-test11-20031209.macro/arch/i386/kernel/mpparse.c 2003-11-25 04:57:01.000000000 +0000 +++ linux-mips-2.6.0-test11-20031209/arch/i386/kernel/mpparse.c 2003-12-11 09:43:26.000000000 +0000 @@ -940,7 +940,7 @@ void __init mp_override_legacy_irq ( * erroneously sets the trigger to level, resulting in a HUGE * increase of timer interrupts! */ - if ((bus_irq == 0) && (global_irq == 2) && (trigger == 3)) + if ((bus_irq == 0) && (trigger == 3)) trigger = 1; intsrc.mpc_type = MP_INTSRC; @@ -961,7 +961,7 @@ void __init mp_override_legacy_irq ( * Otherwise create a new entry (e.g. global_irq == 2). */ for (i = 0; i < mp_irq_entries; i++) { - if ((mp_irqs[i].mpc_dstapic == intsrc.mpc_dstapic) + if ((mp_irqs[i].mpc_srcbus == intsrc.mpc_srcbus) && (mp_irqs[i].mpc_srcbusirq == intsrc.mpc_srcbusirq)) { mp_irqs[i] = intsrc; found = 1; @@ -1008,9 +1008,10 @@ void __init mp_config_acpi_legacy_irqs ( */ for (i = 0; i < 16; i++) { - if (i == 2) continue; /* Don't connect IRQ2 */ + if (i == 2) + continue; /* Don't connect IRQ2 */ - intsrc.mpc_irqtype = i ? mp_INT : mp_ExtINT; /* 8259A to #0 */ + intsrc.mpc_irqtype = mp_INT; intsrc.mpc_srcbusirq = i; /* Identity mapped */ intsrc.mpc_dstirq = i; ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-11 15:15 ` Maciej W. Rozycki @ 2003-12-11 16:23 ` Josh McKinney 2003-12-11 17:04 ` Maciej W. Rozycki 0 siblings, 1 reply; 35+ messages in thread From: Josh McKinney @ 2003-12-11 16:23 UTC (permalink / raw) To: linux-kernel Trying to get a grasp on the all the fixes floating around. I have been running the first "timer" patch, the two liner to mpparse.c, for about five days until I made it crash with by catting 4 drives to /dev/null. It crashed after I turned on disconnect with athcool, so that may be related, because I could crash it with disconnect off. Now I am running both of Ross's patches for 2.6 for just 10 hours, but disconnect is still enabled, so far so good. So the consensus seems to be that Ross's timer patch and the disconnect OR delay ACK patch is the mostly *correct* fix? As of right now I am compiling kernels with the disconnect patch and ross's timer patch, and one with those fixes and Maciej's acpi fixes below. Should I try it with just the acpi fixes sent by Maciej or are these just general fixes? I also tried running mptable, but the output is "hosed". Thanks On approximately Thu, Dec 11, 2003 at 04:15:28PM +0100, Maciej W. Rozycki wrote: > On Thu, 11 Dec 2003, Ross Dickson wrote: > > > ACPI: IOAPIC (id[0x02] address[0xfec00000] global_irq_base[0x0]) > > IOAPIC[0]: Assigned apic_id 2 > > IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, IRQ 0-23 > > Bus #0 is ISA > > Int: type 3, pol 0, trig 0, bus 0, irq 0, 2-0 > > I've browsed the relevant part of the ACPI spec and the above entry is > incorrect. It looks like INTIN0 is now the preferred line for the 8254 > timer; at least it is the default one when using ACPI tables. This is a > bug in Linux. > > > ACPI: INT_SRC_OVR (bus[0] irq[0x0] global_irq[0x2] polarity[0x0] trigger[0x0]) > > Int: type 0, pol 0, trig 0, bus 0, irq 0, 2-2 > > Now this is an explicit entry stating the 8254 timer is connected to > INTIN2. If this is not the case, the BIOS is buggy and the solution is to > fix it. I don't consider it possible to be worked around in Linux except > maybe with a command line option added manually. > > > ACPI: INT_SRC_OVR (bus[0] irq[0x9] global_irq[0x9] polarity[0x1] trigger[0x3]) > > Int: type 0, pol 1, trig 3, bus 0, irq 9, 2-9 > > And yet another explicit entry which has an effect on configuration as > reported below. > > > init IO_APIC IRQs > > IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected. > > ..TIMER: vector=0x31 pin1=2 pin2=-1 > > ..MP-BIOS bug: 8254 timer not connected to IO-APIC pin2 > > As reported above, the BIOS explicitly reports the timer is there. > > > ..TIMER: Is timer irq0 connected to IOAPIC Pin0? ... > > IOAPIC[0]: Set PCI routing entry (2-0 -> 0x31 -> IRQ 0 Mode:0 Active:0) > > ..TIMER check 8259 ints disabled, imr1:ff, imr2:ff > > ..TIMER: works OK on apic pin0 irq0 > > And this may be correct if the default ACPI settings reflect the actual > wiring of this board (but the BIOS says otherwise). > > > IRQ to pin mappings: > > IRQ0 -> 0:2-> 0:0 > [...] > > IRQ9 -> 0:9-> 0:9 > > These two entries are wrong -- the interrupts are set up as if they were > connected to multiple I/O APIC inputs. The first entry is a result of > your hack, but the second one suggests a bug somewhere. > > > Finally others working with kern 2.6 earlier trialled the following patch which may provide some > > more clues: > > retrieved from: > > > > http://www.kernel.org/pub/linux/kernel/people/bart/2.6.0-test11-bart1/broken-out/nforce2-apic.patch > > > > [x86] do not wrongly override mp_ExtINT IRQ > > That's a workaround to the bug in Linux I've mentioned earlier. The bug > should be fixed instead. The ACPI spec doesn't support mixed > configurations, so ExtINT is irrelevant. > > > Perhaps someone else could get mptable to run on their machine and send you > > the result. > > I wanted it to compare with the ACPI table and possibly to treat as a > reference for a workaround. Since you have no valid MP-table, there's > nothing to do. > > Here's a patch that fixes a few bugs I've spotted browsing through our > ACPI code. Please try it and report the result. I don't have a system > with ACPI available, so I cannot verify the changes at all. > > The same bugs are present in 2.4 and I have a corresponding patch > available if some wants to test the changes with that version. > > Maciej > > -- > + Maciej W. Rozycki, Technical University of Gdansk, Poland + > +--------------------------------------------------------------+ > + e-mail: macro@ds2.pg.gda.pl, PGP key available + > > patch-mips-2.6.0-test11-20031209-acpi-irq0-1 > diff -up --recursive --new-file linux-mips-2.6.0-test11-20031209.macro/arch/i386/kernel/mpparse.c linux-mips-2.6.0-test11-20031209/arch/i386/kernel/mpparse.c > --- linux-mips-2.6.0-test11-20031209.macro/arch/i386/kernel/mpparse.c 2003-11-25 04:57:01.000000000 +0000 > +++ linux-mips-2.6.0-test11-20031209/arch/i386/kernel/mpparse.c 2003-12-11 09:43:26.000000000 +0000 > @@ -940,7 +940,7 @@ void __init mp_override_legacy_irq ( > * erroneously sets the trigger to level, resulting in a HUGE > * increase of timer interrupts! > */ > - if ((bus_irq == 0) && (global_irq == 2) && (trigger == 3)) > + if ((bus_irq == 0) && (trigger == 3)) > trigger = 1; > > intsrc.mpc_type = MP_INTSRC; > @@ -961,7 +961,7 @@ void __init mp_override_legacy_irq ( > * Otherwise create a new entry (e.g. global_irq == 2). > */ > for (i = 0; i < mp_irq_entries; i++) { > - if ((mp_irqs[i].mpc_dstapic == intsrc.mpc_dstapic) > + if ((mp_irqs[i].mpc_srcbus == intsrc.mpc_srcbus) > && (mp_irqs[i].mpc_srcbusirq == intsrc.mpc_srcbusirq)) { > mp_irqs[i] = intsrc; > found = 1; > @@ -1008,9 +1008,10 @@ void __init mp_config_acpi_legacy_irqs ( > */ > for (i = 0; i < 16; i++) { > > - if (i == 2) continue; /* Don't connect IRQ2 */ > + if (i == 2) > + continue; /* Don't connect IRQ2 */ > > - intsrc.mpc_irqtype = i ? mp_INT : mp_ExtINT; /* 8259A to #0 */ > + intsrc.mpc_irqtype = mp_INT; > intsrc.mpc_srcbusirq = i; /* Identity mapped */ > intsrc.mpc_dstirq = i; > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Josh McKinney | Webmaster: http://joshandangie.org -------------------------------------------------------------------------- | They that can give up essential liberty Linux, the choice -o) | to obtain a little temporary safety deserve of the GNU generation /\ | neither liberty or safety. _\_v | -Benjamin Franklin ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-11 16:23 ` Josh McKinney @ 2003-12-11 17:04 ` Maciej W. Rozycki 2003-12-11 17:25 ` Jesse Allen 0 siblings, 1 reply; 35+ messages in thread From: Maciej W. Rozycki @ 2003-12-11 17:04 UTC (permalink / raw) To: Josh McKinney; +Cc: linux-kernel On Thu, 11 Dec 2003, Josh McKinney wrote: > Should I try it with just the acpi fixes sent by Maciej or are these > just general fixes? They should make (at least some of) the reported problems go away, superseding the respective workarounds. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--------------------------------------------------------------+ + e-mail: macro@ds2.pg.gda.pl, PGP key available + ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-11 17:04 ` Maciej W. Rozycki @ 2003-12-11 17:25 ` Jesse Allen 0 siblings, 0 replies; 35+ messages in thread From: Jesse Allen @ 2003-12-11 17:25 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: linux-kernel On Thu, Dec 11, 2003 at 06:04:54PM +0100, Maciej W. Rozycki wrote: > On Thu, 11 Dec 2003, Josh McKinney wrote: > > > Should I try it with just the acpi fixes sent by Maciej or are these > > just general fixes? > > They should make (at least some of) the reported problems go away, > superseding the respective workarounds. > As far as I can tell, your patch _alone_ doesn't prevent the lockup, fix the timer, or nmi_watchdog. I have attached a dmesg of my current running kernel that includes Ross' io_apic patch, the disconnect quirk patch, your acpi patch, and other minor patches. ACPI and APIC debugging are on. Linux version 2.6.0-test11 (jesse@tesore) (gcc version 3.3.2) #2 Thu Dec 11 09:45:15 MST 2003 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 00000000000a0000 (usable) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000000fff0000 (usable) BIOS-e820: 000000000fff0000 - 000000000fff3000 (ACPI NVS) BIOS-e820: 000000000fff3000 - 0000000010000000 (ACPI data) BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved) 255MB LOWMEM available. On node 0 totalpages: 65520 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 61424 pages, LIFO batch:14 HighMem zone: 0 pages, LIFO batch:1 DMI 2.2 present. ACPI: RSDP (v000 Nvidia ) @ 0x000f6f60 ACPI: RSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x0fff3000 ACPI: FADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x0fff3040 ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x0fff7880 ACPI: DSDT (v001 NVIDIA AWRDACPI 0x00001000 MSFT 0x0100000e) @ 0x00000000 ACPI: Local APIC address 0xfee00000 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 6:10 APIC version 16 ACPI: LAPIC_NMI (acpi_id[0x00] polarity[0x1] trigger[0x1] lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec00000] global_irq_base[0x0]) IOAPIC[0]: Assigned apic_id 2 IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, IRQ 0-23 ACPI: INT_SRC_OVR (bus[0] irq[0x0] global_irq[0x2] polarity[0x0] trigger[0x0]) ACPI: INT_SRC_OVR (bus[0] irq[0x9] global_irq[0x9] polarity[0x1] trigger[0x3]) ACPI: INT_SRC_OVR (bus[0] irq[0xe] global_irq[0xe] polarity[0x1] trigger[0x1]) ACPI: INT_SRC_OVR (bus[0] irq[0xf] global_irq[0xf] polarity[0x1] trigger[0x1]) Enabling APIC mode: Flat. Using 1 I/O APICs Using ACPI (MADT) for SMP configuration information Building zonelist for node : 0 Kernel command line: BOOT_IMAGE=Linux-2.6 ro root=301 Initializing CPU#0 PID hash table entries: 1024 (order 10: 8192 bytes) Detected 1913.621 MHz processor. Console: colour VGA+ 80x25 Memory: 256144k/262080k available (1611k kernel code, 5212k reserved, 693k data, 128k init, 0k highmem) Calibrating delay loop... 3784.70 BogoMIPS Dentry cache hash table entries: 32768 (order: 5, 131072 bytes) Inode-cache hash table entries: 16384 (order: 4, 65536 bytes) Mount-cache hash table entries: 512 (order: 0, 4096 bytes) CPU: After generic identify, caps: 0383fbff c1c3fbff 00000000 00000000 CPU: After vendor identify, caps: 0383fbff c1c3fbff 00000000 00000000 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 512K (64 bytes/line) CPU: After all inits, caps: 0383fbff c1c3fbff 00000000 00000020 Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU: AMD Athlon(tm) XP 2600+ stepping 00 Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. POSIX conformance testing by UNIFIX enabled ExtINT on CPU#0 ESR value before enabling vector: 00000000 ESR value after enabling vector: 00000000 ENABLING IO-APIC IRQs init IO_APIC IRQs IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected. ..TIMER: vector=0x31 pin1=2 pin2=-1 ..MP-BIOS bug: 8254 timer not connected to IO-APIC ..TIMER: Is timer irq0 connected to IOAPIC Pin0? ... IOAPIC[0]: Set PCI routing entry (2-0 -> 0x31 -> IRQ 0 Mode:0 Active:0) ..TIMER: works OK on apic pin0 irq0 number of MP IRQ sources: 15. number of IO-APIC #2 registers: 24. testing the IO APIC....................... IO APIC #2...... .... register #00: 02000000 ....... : physical APIC id: 02 ....... : Delivery Type: 0 ....... : LTS : 0 .... register #01: 00170011 ....... : max redirection entries: 0017 ....... : PRQ implemented: 0 ....... : IO APIC version: 0011 .... register #02: 00000000 ....... : arbitration: 00 .... IRQ redirection table: NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: 00 001 01 0 0 0 0 0 1 1 31 01 001 01 0 0 0 0 0 1 1 39 02 000 00 0 0 0 0 0 0 0 00 03 001 01 0 0 0 0 0 1 1 41 04 001 01 0 0 0 0 0 1 1 49 05 001 01 0 0 0 0 0 1 1 51 06 001 01 0 0 0 0 0 1 1 59 07 001 01 0 0 0 0 0 1 1 61 08 001 01 0 0 0 0 0 1 1 69 09 001 01 1 1 0 0 0 1 1 71 0a 001 01 0 0 0 0 0 1 1 79 0b 001 01 0 0 0 0 0 1 1 81 0c 001 01 0 0 0 0 0 1 1 89 0d 001 01 0 0 0 0 0 1 1 91 0e 001 01 0 0 0 0 0 1 1 99 0f 001 01 0 0 0 0 0 1 1 A1 10 000 00 1 0 0 0 0 0 0 00 11 000 00 1 0 0 0 0 0 0 00 12 000 00 1 0 0 0 0 0 0 00 13 000 00 1 0 0 0 0 0 0 00 14 000 00 1 0 0 0 0 0 0 00 15 000 00 1 0 0 0 0 0 0 00 16 000 00 1 0 0 0 0 0 0 00 17 000 00 1 0 0 0 0 0 0 00 IRQ to pin mappings: IRQ0 -> 0:2-> 0:0 IRQ1 -> 0:1 IRQ3 -> 0:3 IRQ4 -> 0:4 IRQ5 -> 0:5 IRQ6 -> 0:6 IRQ7 -> 0:7 IRQ8 -> 0:8 IRQ9 -> 0:9 IRQ10 -> 0:10 IRQ11 -> 0:11 IRQ12 -> 0:12 IRQ13 -> 0:13 IRQ14 -> 0:14 IRQ15 -> 0:15 .................................... done. Using local APIC timer interrupts. calibrating APIC timer ... ..... CPU clock speed is 1912.0861 MHz. ..... host bus clock speed is 332.0671 MHz. NET: Registered protocol family 16 PCI: PCI BIOS revision 2.10 entry at 0xfb590, last bus=2 PCI: Using configuration type 1 mtrr: v2.0 (20020519) ACPI: Subsystem revision 20031002 tbxface-0117 [03] acpi_load_tables : ACPI Tables successfully acquired Parsing all Control Methods:........................................................................................................................................................................................................................................................................................ Table [DSDT](id F004) - 761 Objects with 78 Devices 280 Methods 30 Regions ACPI Namespace successfully loaded at root c0378d3c IOAPIC[0]: Set PCI routing entry (2-9 -> 0x71 -> IRQ 9 Mode:1 Active:0) evxfevnt-0093 [04] acpi_enable : Transition to ACPI mode successful evgpeblk-0748 [06] ev_create_gpe_block : GPE 00 to 31 [_GPE] 4 regs at 0000000000004020 on int 9 evgpeblk-0748 [06] ev_create_gpe_block : GPE 32 to 95 [_GPE] 8 regs at 00000000000044A0 on int 9 Completing Region/Field/Buffer/Package initialization:................................................................................................. Initialized 30/30 Regions 9/9 Fields 31/31 Buffers 27/27 Packages (769 nodes) Executing all Device _STA and_INI methods:............................................................................... 79 Devices found containing: 79 _STA, 2 _INI methods ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (00:00) PCI: Probing PCI hardware (bus 00) ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.HUB0._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.AGPB._PRT] ACPI: PCI Interrupt Link [LNK1] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNK2] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNK3] (IRQs 3 4 5 6 7 10 11 *12 14 15) ACPI: PCI Interrupt Link [LNK4] (IRQs 3 4 *5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNK5] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LUBA] (IRQs 3 4 5 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LUBB] (IRQs 3 4 *5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LMAC] (IRQs 3 4 5 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LAPU] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LACI] (IRQs 3 4 5 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LMCI] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LSMB] (IRQs 3 4 5 6 7 10 11 *12 14 15) ACPI: PCI Interrupt Link [LUB2] (IRQs 3 4 5 6 7 10 11 *12 14 15) ACPI: PCI Interrupt Link [LFIR] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [L3CM] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LIDE] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [APC1] (IRQs 16) ACPI: PCI Interrupt Link [APC2] (IRQs 17) ACPI: PCI Interrupt Link [APC3] (IRQs *18) ACPI: PCI Interrupt Link [APC4] (IRQs *19) ACPI: PCI Interrupt Link [APC5] (IRQs 16) pci_link-0262 [40] acpi_pci_link_get_curr: No IRQ resource found ACPI: PCI Interrupt Link [APCF] (IRQs 20 21 22) pci_link-0262 [42] acpi_pci_link_get_curr: No IRQ resource found ACPI: PCI Interrupt Link [APCG] (IRQs 20 21 22) pci_link-0262 [44] acpi_pci_link_get_curr: No IRQ resource found ACPI: PCI Interrupt Link [APCH] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCI] (IRQs 20 21 22) pci_link-0262 [47] acpi_pci_link_get_curr: No IRQ resource found ACPI: PCI Interrupt Link [APCJ] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCK] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCS] (IRQs *23) pci_link-0262 [52] acpi_pci_link_get_curr: No IRQ resource found ACPI: PCI Interrupt Link [APCL] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCM] (IRQs 20 21 22) ACPI: PCI Interrupt Link [AP3C] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCZ] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCS] enabled at IRQ 23 IOAPIC[0]: Set PCI routing entry (2-23 -> 0xa9 -> IRQ 23 Mode:1 Active:0) 00:00:01[A] -> 2-23 -> IRQ 23 Pin 2-23 already programmed ACPI: PCI Interrupt Link [APCF] enabled at IRQ 20 IOAPIC[0]: Set PCI routing entry (2-20 -> 0xb1 -> IRQ 20 Mode:1 Active:0) 00:00:02[A] -> 2-20 -> IRQ 20 ACPI: PCI Interrupt Link [APCG] enabled at IRQ 22 IOAPIC[0]: Set PCI routing entry (2-22 -> 0xb9 -> IRQ 22 Mode:1 Active:0) 00:00:02[B] -> 2-22 -> IRQ 22 ACPI: PCI Interrupt Link [APCL] enabled at IRQ 21 IOAPIC[0]: Set PCI routing entry (2-21 -> 0xc1 -> IRQ 21 Mode:1 Active:0) 00:00:02[C] -> 2-21 -> IRQ 21 ACPI: PCI Interrupt Link [APCH] enabled at IRQ 20 Pin 2-20 already programmed ACPI: PCI Interrupt Link [APCI] enabled at IRQ 22 Pin 2-22 already programmed ACPI: PCI Interrupt Link [APCJ] enabled at IRQ 21 Pin 2-21 already programmed ACPI: PCI Interrupt Link [APCK] enabled at IRQ 20 Pin 2-20 already programmed ACPI: PCI Interrupt Link [APCM] enabled at IRQ 22 Pin 2-22 already programmed ACPI: PCI Interrupt Link [APCZ] enabled at IRQ 21 Pin 2-21 already programmed ACPI: PCI Interrupt Link [APC1] enabled at IRQ 16 IOAPIC[0]: Set PCI routing entry (2-16 -> 0xc9 -> IRQ 16 Mode:1 Active:0) 00:01:08[A] -> 2-16 -> IRQ 16 ACPI: PCI Interrupt Link [APC2] enabled at IRQ 17 IOAPIC[0]: Set PCI routing entry (2-17 -> 0xd1 -> IRQ 17 Mode:1 Active:0) 00:01:08[B] -> 2-17 -> IRQ 17 ACPI: PCI Interrupt Link [APC3] enabled at IRQ 18 IOAPIC[0]: Set PCI routing entry (2-18 -> 0xd9 -> IRQ 18 Mode:1 Active:0) 00:01:08[C] -> 2-18 -> IRQ 18 ACPI: PCI Interrupt Link [APC4] enabled at IRQ 19 IOAPIC[0]: Set PCI routing entry (2-19 -> 0xe1 -> IRQ 19 Mode:1 Active:0) 00:01:08[D] -> 2-19 -> IRQ 19 Pin 2-17 already programmed Pin 2-18 already programmed Pin 2-19 already programmed Pin 2-16 already programmed Pin 2-18 already programmed Pin 2-19 already programmed Pin 2-16 already programmed Pin 2-17 already programmed Pin 2-19 already programmed Pin 2-16 already programmed Pin 2-17 already programmed Pin 2-18 already programmed Pin 2-16 already programmed Pin 2-17 already programmed Pin 2-18 already programmed Pin 2-19 already programmed Pin 2-19 already programmed PCI: Using ACPI for IRQ routing PCI: if you experience problems, try using option 'pci=noacpi' or even 'acpi=off' Machine check exception polling timer started. ACPI: Power Button (FF) [PWRF] ACPI: Sleep Button (CM) [SLPB] ACPI: Fan [FAN] (on) ACPI: Processor [CPU0] (supports C1) ACPI: Thermal Zone [THRM] (38 C) pty: 256 Unix98 ptys configured Real Time Clock Driver v1.12 Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing disabled ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A Using anticipatory io scheduler Floppy drive(s): fd0 is 1.44M FDC 0 is a post-1991 82077 loop: loaded (max 8 devices) Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx NFORCE2: IDE controller at PCI slot 0000:00:09.0 NFORCE2: chipset revision 162 NFORCE2: not 100% native mode: will probe irqs later NFORCE2: BIOS didn't set cable bits correctly. Enabling workaround. ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx NFORCE2: 0000:00:09.0 (rev a2) UDMA133 controller ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:DMA ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:DMA hda: WDC WD200BB-00DEA0, ATA DISK drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 hdc: MATSHITA CR-585, ATAPI CD/DVD-ROM drive ide1 at 0x170-0x177,0x376 on irq 15 hda: max request size: 128KiB hda: 39102336 sectors (20020 MB) w/2048KiB Cache, CHS=38792/16/63, UDMA(100) hda: hda1 hda2 hda3 hdc: ATAPI 24X CD-ROM drive, 128kB Cache, DMA Uniform CD-ROM driver Revision: 3.12 mice: PS/2 mouse device common for all mice input: PC Speaker serio: i8042 AUX port at 0x60,0x64 irq 12 input: AT Translated Set 2 keyboard on isa0060/serio0 serio: i8042 KBD port at 0x60,0x64 irq 1 NET: Registered protocol family 2 IP: routing cache hash table of 2048 buckets, 16Kbytes TCP: Hash tables configured (established 16384 bind 32768) NET: Registered protocol family 1 NET: Registered protocol family 17 found reiserfs format "3.6" with standard journal Reiserfs journal params: device hda1, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 reiserfs: checking transaction log (hda1) for (hda1) Using r5 hash to sort names VFS: Mounted root (reiserfs filesystem) readonly. Freeing unused kernel memory: 128k freed Adding 377516k swap on /dev/hda2. Priority:-1 extents:1 found reiserfs format "3.6" with standard journal Reiserfs journal params: device hda3, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 reiserfs: checking transaction log (hda3) for (hda3) Using r5 hash to sort names Linux agpgart interface v0.100 (c) Dave Jones 8139too Fast Ethernet driver 0.9.27 eth0: RealTek RTL8139 at 0xd09a1000, 00:40:c7:77:0a:d5, IRQ 18 eth0: Identified 8139 chip type 'RTL-8139 rev K' eth0: link up, 100Mbps, full-duplex, lpa 0x45E1 agpgart: Detected NVIDIA nForce2 chipset agpgart: Maximum main memory to use for agp memory: 203M agpgart: AGP aperture is 64M @ 0xe8000000 PCI: Setting latency timer of device 0000:00:06.0 to 64 intel8x0: clocking to 47451 /home/jesse/linux/drivers/usb/core/usb.c: registered new driver hub ohci_hcd: 2003 Oct 13 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI) ohci_hcd: block sizes: ed 64 td 64 ohci_hcd 0000:00:02.0: OHCI Host Controller PCI: Setting latency timer of device 0000:00:02.0 to 64 ohci_hcd 0000:00:02.0: irq 20, pci mem d0a17000 ohci_hcd 0000:00:02.0: new USB bus registered, assigned bus number 1 hub 1-0:1.0: USB hub found hub 1-0:1.0: 3 ports detected ohci_hcd 0000:00:02.1: OHCI Host Controller PCI: Setting latency timer of device 0000:00:02.1 to 64 ohci_hcd 0000:00:02.1: irq 22, pci mem d0a20000 ohci_hcd 0000:00:02.1: new USB bus registered, assigned bus number 2 hub 2-0:1.0: USB hub found hub 2-0:1.0: 3 ports detected ehci_hcd 0000:00:02.2: EHCI Host Controller PCI: Setting latency timer of device 0000:00:02.2 to 64 ehci_hcd 0000:00:02.2: irq 21, pci mem d0a2e000 ehci_hcd 0000:00:02.2: new USB bus registered, assigned bus number 3 PCI: cache line size of 64 is not supported by device 0000:00:02.2 ehci_hcd 0000:00:02.2: USB 2.0 enabled, EHCI 1.00, driver 2003-Jun-13 hub 3-0:1.0: USB hub found hub 3-0:1.0: 6 ports detected parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE] parport0: cpp_daisy: aa5500ff(38) parport0: assign_addrs: aa5500ff(38) parport0: cpp_daisy: aa5500ff(38) parport0: assign_addrs: aa5500ff(38) i2c_adapter i2c-0: nForce2 SMBus adapter at 0x5000 i2c_adapter i2c-1: nForce2 SMBus adapter at 0x5100 ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-07 13:12 Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered Ross Dickson 2003-12-09 15:20 ` Maciej W. Rozycki @ 2003-12-10 3:39 ` Jesse Allen 2003-12-10 9:22 ` Ross Dickson 2003-12-10 10:00 ` Mikael Pettersson 1 sibling, 2 replies; 35+ messages in thread From: Jesse Allen @ 2003-12-10 3:39 UTC (permalink / raw) To: Ross Dickson; +Cc: linux-kernel, AMartin [-- Attachment #1: Type: text/plain, Size: 957 bytes --] Hi Ross, I have rediffed your two patches for vanilla 2.6.0-test11. Briefly, I tried the apic patch first, and found that there are no lockups so far; well it passed my grep tests and even a kernel compile =). Then I tried your io_apic patch + apic patch. With nmi_watchdog=1 "NMI:" in /proc/interrupts increments alot compared to nmi_watchdog=2 before (as much as the timer). So I believe your two patches are more correct than the other two. Especially the fact I can run with CPU Disconnect and not lock up just like windows ... for people that have windows (I dont have windows =) plus a probably working nmi_watchdog. And for comparison, my setup: Shuttle AN35N Ultra v 1.1 (Nforce2 400 ultra), bios updated Athlon Barton 2600+ (1.9 Ghz) 256 MB PC3200, single stick. The patches are included in this mail. I suppose the next thing to do is get out of nvidia the corresponding information. And then clean up the patch for inclusion. Jesse [-- Attachment #2: nforce2-apic-delay-2.6t11.patch --] [-- Type: text/plain, Size: 611 bytes --] --- linux/arch/i386/kernel/apic.c 2003-10-25 11:44:59.000000000 -0700 +++ linux-jla/arch/i386/kernel/apic.c 2003-12-09 19:07:19.000000000 -0700 @@ -1089,6 +1089,16 @@ */ irq_stat[cpu].apic_timer_irqs++; +#ifdef CONFIG_MK7 && CONFIG_BLK_DEV_AMD74XX + + /* + * on 2200XP & nforce2 chipset we need at least 500ns delay here + * to stop lockups with udma100 drive. try to scale delay time + * with cpu speed. Ross Dickson. + */ + ndelay((cpu_khz >> 12)+200 ); /* don't ack too soon or hard lockup */ +#endif + /* * NOTE! We'd better ACK the irq immediately, * because timer handling can be slow. [-- Attachment #3: nforce2-ioapic-timer-2.6t11.patch --] [-- Type: text/plain, Size: 1564 bytes --] --- linux/arch/i386/kernel/io_apic.c 2003-10-25 11:43:20.000000000 -0700 +++ linux-jla/arch/i386/kernel/io_apic.c 2003-12-09 19:56:07.000000000 -0700 @@ -2128,6 +2128,41 @@ printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to IO-APIC\n"); } +#ifdef CONFIG_ACPI_BOOT && CONFIG_X86_UP_IOAPIC + /* for nforce2 try vector 0 on pin0 + * Note the io_apic_set_pci_routing call disables the 8259 irq 0 + * so we must be connected directly to the 8254 timer if this works + * Note2: this violates the above comment re Subtle but works! + */ + printk(KERN_INFO "..TIMER: Is timer irq0 connected to IOAPIC Pin0? ...\n"); + if ( pin1 != -1 && nr_ioapics ) { + int saved_timer_ack = timer_ack; + /* next call also disables 8259 irq0 */ + int result = io_apic_set_pci_routing ( 0, 0, 0, 0, 0); + /* + * Ok, does IRQ0 through the IOAPIC work? + */ + unmask_IO_APIC_irq(0); + timer_ack = 0 ; + if (timer_irq_works()) { + if (nmi_watchdog == NMI_IO_APIC) { + disable_8259A_irq(0); + setup_nmi(); + enable_8259A_irq(0); + check_nmi_watchdog(); + } + printk(KERN_INFO "..TIMER: works OK on apic pin0 irq0\n" ); + return; + } + /* failed */ + timer_ack = saved_timer_ack; + clear_IO_APIC_pin(0, 0); + result = io_apic_set_pci_routing ( 0, pin1, 0, 0, 0); + printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to IO-APIC Pin 0\n"); + } +#endif +/* end new stuff for nforce2 */ + printk(KERN_INFO "...trying to set up timer (IRQ0) through the 8259A ... "); if (pin2 != -1) { printk("\n..... (found pin %d) ...", pin2); ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-10 3:39 ` Jesse Allen @ 2003-12-10 9:22 ` Ross Dickson 2003-12-10 10:00 ` Mikael Pettersson 1 sibling, 0 replies; 35+ messages in thread From: Ross Dickson @ 2003-12-10 9:22 UTC (permalink / raw) To: Jesse Allen; +Cc: linux-kernel, AMartin On Wednesday 10 December 2003 13:39, Jesse Allen wrote: > Hi Ross, > > I have rediffed your two patches for vanilla 2.6.0-test11. Briefly, I tried the apic patch first, and found that there are no lockups so far; well it passed my grep tests and even a kernel compile =). Then I tried your io_apic patch + apic patch. With nmi_watchdog=1 "NMI:" in /proc/interrupts increments alot compared to nmi_watchdog=2 before (as much as the timer). So I believe your two patches are more correct than the other two. Especially the fact I can run with CPU Disconnect and not lock up just like windows ... for people that have windows (I dont have windows =) plus a probably working nmi_watchdog. > > And for comparison, my setup: > Shuttle AN35N Ultra v 1.1 (Nforce2 400 ultra), bios updated > Athlon Barton 2600+ (1.9 Ghz) > 256 MB PC3200, single stick. > > The patches are included in this mail. I suppose the next thing to do is get out of nvidia the corresponding information. And then clean up the patch for inclusion. > > Jesse > > > Thank Jesse It is interesting that the lockup problems also occur with a single memory stick, I have only tried dual sticks. Regards Ross. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-10 3:39 ` Jesse Allen 2003-12-10 9:22 ` Ross Dickson @ 2003-12-10 10:00 ` Mikael Pettersson 2003-12-10 8:40 ` Ross Dickson 2003-12-11 14:32 ` Jesse Allen 1 sibling, 2 replies; 35+ messages in thread From: Mikael Pettersson @ 2003-12-10 10:00 UTC (permalink / raw) To: Jesse Allen; +Cc: Ross Dickson, linux-kernel, AMartin Jesse Allen writes: > --- linux/arch/i386/kernel/apic.c 2003-10-25 11:44:59.000000000 -0700 > +++ linux-jla/arch/i386/kernel/apic.c 2003-12-09 19:07:19.000000000 -0700 > @@ -1089,6 +1089,16 @@ > */ > irq_stat[cpu].apic_timer_irqs++; > > +#ifdef CONFIG_MK7 && CONFIG_BLK_DEV_AMD74XX > + > + /* > + * on 2200XP & nforce2 chipset we need at least 500ns delay here > + * to stop lockups with udma100 drive. try to scale delay time > + * with cpu speed. Ross Dickson. > + */ > + ndelay((cpu_khz >> 12)+200 ); /* don't ack too soon or hard lockup */ > +#endif > + > /* > * NOTE! We'd better ACK the irq immediately, > * because timer handling can be slow. This is too much of a kludge. APIC timer ACKing is supposed to be fast. Please try without this delay but with the disconnect PCI quirk. If the delay is still needed even when disconnect is disabled, _then_ can discuss how to do the delay properly. /Mikael ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-10 10:00 ` Mikael Pettersson @ 2003-12-10 8:40 ` Ross Dickson 2003-12-11 14:32 ` Jesse Allen 1 sibling, 0 replies; 35+ messages in thread From: Ross Dickson @ 2003-12-10 8:40 UTC (permalink / raw) To: Mikael Pettersson, Jesse Allen; +Cc: linux-kernel, AMartin, Ian Kumlien On Wednesday 10 December 2003 20:00, Mikael Pettersson wrote: > Jesse Allen writes: > > --- linux/arch/i386/kernel/apic.c 2003-10-25 11:44:59.000000000 -0700 > > +++ linux-jla/arch/i386/kernel/apic.c 2003-12-09 19:07:19.000000000 -0700 > > @@ -1089,6 +1089,16 @@ > > */ > > irq_stat[cpu].apic_timer_irqs++; > > > > +#ifdef CONFIG_MK7 && CONFIG_BLK_DEV_AMD74XX > > + > > + /* > > + * on 2200XP & nforce2 chipset we need at least 500ns delay here > > + * to stop lockups with udma100 drive. try to scale delay time > > + * with cpu speed. Ross Dickson. > > + */ > > + ndelay((cpu_khz >> 12)+200 ); /* don't ack too soon or hard lockup */ > > +#endif > > + > > /* > > * NOTE! We'd better ACK the irq immediately, > > * because timer handling can be slow. > > This is too much of a kludge. APIC timer ACKing is supposed to be fast. > Please try without this delay but with the disconnect PCI quirk. > > If the delay is still needed even when disconnect is disabled, _then_ > can discuss how to do the delay properly. > > /Mikael > > > Thanks Mikael, I think the more heads on this problem the better. I don't like timing kludges either such as this existing one in ide-iops.c in kernel 2.4.23 hwif->OUTBSYNC(drive, cmd, IDE_COMMAND_REG); /* Drive takes 400nS to respond, we must avoid the IRQ being serviced before that. FIXME: we could skip this delay with care on non shared devices For DMA transfers highpoint have a neat trick we could use. When they take an IRQ they check STS but also that the DMA count is not zero (see hpt's own driver) */ ndelay(400); spin_unlock_irqrestore(&io_request_lock, flags); } But does anyone exactly know what nvidia and the bios writer are doing - why the cpu-disconnect is an issue for the nforce2 boards? Is it technically correct in their view to turn off features that some pci or other device they have made may expect? I wonder about their ram devices because I note after some more testing that without any lockup fixes the lockups were spaced a lot further apart in time when I used a pair of KINGMAX 256MB DDR-333 then when I used a pair of SEITEC 256MB DDR-400 memory. The cpu used XP2500 has a 333 fsb. Is the ram driver chip core enforcing the disconnect for a reason? When using the ack delay, lockups with both memory types ceased - as they may also cease with the disconnect patch. So the disconnect cycles seem related to the nforce2 ram driver circuitry. (See Ian's take towards end of this email) The reason why I put the ack delay in only the apic timer servicing path is that I think it is the only commonly traversed path which acks the apic so quickly. If we end up stuck with a delay then we could probably make it more accurate by reading the apic timer withinin the delay and using the counts down from the reload value because if our irq was already pre delayed then no additional delay would be required. I am sure many clever programmers can improve on it - not that we want it at all. I note the following comments in 2.2.23 io_apic.c /* * Level triggered interrupts can just be masked, * and shutting down and starting up the interrupt * is the same as enabling and disabling them -- except * with a startup need to return a "was pending" value. * * Level triggered interrupts are special because we * do not touch any IO-APIC register while handling * them. We ack the APIC in the end-IRQ handler, not * in the start-IRQ-handler. Protection against reentrance * from the same interrupt is still provided, both by the * generic IRQ layer and by the fact that an unacked local * APIC does not accept IRQs. */ If I am reading this correctly then PCI interrupts which are level triggered are processed with the equivalent of a global (maskable) hardware interrupt disable (on a uniprocessor machine) if all hardware interrupts are routed via the APIC. Chances are that we have more than 500ns irq off times occurring with these servicing routines especially if several handlers are chained on the one pirq. Another clue may have just come to light, does the ack in this routine (io_apic.c) usually get done within the 500ns or so from its activation? If it does then either the mask_IO_APIC_irq() has a positive effect on the lockups or alternately the problem is synchronous with, or inherent to the apic timer. /* * Once we have recorded IRQ_PENDING already, we can mask the * interrupt for real. This prevents IRQ storms from unhandled * devices. */ static void ack_edge_ioapic_irq(unsigned int irq) { if ((irq_desc[irq].status & (IRQ_PENDING | IRQ_DISABLED)) == (IRQ_PENDING | IRQ_DISABLED)) mask_IO_APIC_irq(irq); ack_APIC_irq(); } How slow can timer handling be? When I was debugging with my CRO on the LPT port and turning a bit on going into the smp_apic_timer_interrupt() routine and turning the bit off when exiting I saw times of greater than 0.5ms for the routine to complete. Thats milliseconds!. I certainly agree with the comment regarding the ack immediately and think it means before 0.5ms instead of after 0.5ms because 0.5ms is an eternity to have interrupts disabled in a hardware interrupt context. /* * NOTE! We'd better ACK the irq immediately, * because timer handling can be slow. I am not too crazy about having them off for 500ns to 1000ns either but until I know for certain that the cpu disconnect issue is a non issue then I will choose to suffer a time hit, and leave the hardware run as the maker intended. BIG HINT TO THOSE IN THE KNOW. If we had the docs from nvidia regarding the unknown pci devices? 00:00.0 Host bridge: nVidia Corporation: Unknown device 01e0 (rev a2) 00:00.1 RAM memory: nVidia Corporation: Unknown device 01eb (rev a2) 00:00.2 RAM memory: nVidia Corporation: Unknown device 01ee (rev a2) 00:00.3 RAM memory: nVidia Corporation: Unknown device 01ed (rev a2) 00:00.4 RAM memory: nVidia Corporation: Unknown device 01ec (rev a2) 00:00.5 RAM memory: nVidia Corporation: Unknown device 01ef (rev a2) then perhaps the underlying cause would present itself. Then we could properly deal with the issue because we would know why we should do whatever it is we should do. If the disconnect should be left on then hopefully we could test a register somewhere to know when it is safe to ack or not - something like that. I think Ian is heading in the right direction with his comments: On Wednesday 10 December 2003 11:20, Ian Kumlien wrote: > Hi, again. > > I did some reading on amd's site, and if the disconnect + apic fixed the > same problem as the ~500ns delay, then it could be as i suspect... > > I suspect that something goes wrong with apic ack when the cpu is > disconnected and according to the amd docs we could check the > Northbridge's CLKFWDRST or isn't that avail on the outside? > (It would be interesting to see if that fixes the problem as well.) > > http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26237.PDF > > I don't really have the knowledge but it would sure be nicer to fix this > by checking this than to just disable it. I dunno if there is something > we could do from within the kernel aswell with the sending of HLT but i > doubt it. > > Anyways, we need a generalized patch that does better checking on the > NMI bit (like Ross' patch). > > PS. Anyone that can point me to northbridge tech docks? and CC > > -- > Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net > Regards Ross. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-10 10:00 ` Mikael Pettersson 2003-12-10 8:40 ` Ross Dickson @ 2003-12-11 14:32 ` Jesse Allen 1 sibling, 0 replies; 35+ messages in thread From: Jesse Allen @ 2003-12-11 14:32 UTC (permalink / raw) To: Mikael Pettersson; +Cc: Ross Dickson, linux-kernel, AMartin On Wed, Dec 10, 2003 at 11:00:39AM +0100, Mikael Pettersson wrote: > Please try without this delay but with the disconnect PCI quirk. > OK, I have tried it without the delay, and with Ross' timer patch. It will obviously lockup, and nmi_watchdog doesn't work. Added the disconnect quirk patch, and lockups are gone and nmi_watchdog works. So there is no difference between the disconnect patch or the ACK delay patch. Though I found nmi_watchdog does depend on having either the disconnect patch or the delay patch (not an io_apic patch). You think the disconnect patch is better? In any event, they both indicate a behavior, and there maybe a better solution to all of it. Jesse ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered @ 2003-12-13 5:16 Ross Dickson 2003-12-13 6:04 ` Jesse Allen 0 siblings, 1 reply; 35+ messages in thread From: Ross Dickson @ 2003-12-13 5:16 UTC (permalink / raw) To: cbradney; +Cc: linux-kernel, AMartin, Ian Kumlien <snip> >> The thing that strikes me funny is that you get no crashes with the > > updated BIOS and Disconnect on, but without the updated BIOS we have > > to turn disconnect off with athcool or the patch? This makes me think > > that there is some voodoo going on in the BIOS update that they aren't > > saying, surprise surprise, or something is just slowing down the time > > it takes for it to crash. I say this because I have gone 5+ days > > without any of the patches from these threads, acpi apic lapic > > enabled, and CPU disconnect on as stated by athcool. This was with > > much stress testing, idle time, etc. One day I just ran a grep that I > > have done probably 30 times and boom, hang. >> Good luck, hope the BIOS is the trick, now off to see how I can get > > ASUS to put the C1 Disconnect in the next revision. >Yes, thats how it was for me.. I was the only one here saying "no > problems, la la la", then at about 5.25 days.. boom. Then the next day > it crashed twice. Hopefully you make some progress with ASUS.. (for the > A7N8X Deluxe as well as you mobo please :) ). >Ive been playing with hardware in the past few days (new quieter Zalman > PSU, and Zalman 7000 Cu fan etc) so no uptime to speak of here now. I > did compile KDE 3.2 beta 2 last night though.. 6 hours of solid > compilation.. no hassles. I have never turned off Disconnect either. >Thanks to all you guys who are working on this one. Seems to be getting > somewhere. >Craig I wonder about the "voodoo" because my apic ack delay patch was developed without knowledge of the C1 disconnect bit and reports I have received so far are that the hard lockups go away when using it independent of the state of the disconnect bit. Apparently the bit was on in my test systems. Ian Kumlien pointed out the linkage with the northbridge timing signals to the CPU to do with the connect disconnect handshake so I now wonder just how programmable the nforce2 northbridge is? Is it a bit fpga'ish in that they may be using the bios boot to alter the handshake timing enough to accomplish what the ack delay does but like it should be - transparent to the OS? Of course they -the makers- have access to knowledge we don't so it could be something completely different that they are doing! In short I agree with the suggestion that the new bios options do more behind the scenes than what the athcool and disconnect patches do. I am pretty sure that I read somewhere that when the epox boards were first released the epox 8rda bios started out with it (the disconnect bit) off then the 8rga+ came out with it on by default? So back then people were wanting to turn it on in the 8rda to lower their CPU temperature - now some want it off in search of stability? Back then under win.... some experienced lockups depending on which IDE driver was used and which state the bit was in! Out of interest has anyone seen new disconnect bit options in the Pheonix bios or only in the award bios? Finally I have done some more work and found that the ack delay patch on my system is about 13 apic timer counts, about half that required to write a byte directly outb(0x00, 0x378) to the printer port at 28 apic timer counts. So the ack delay is about twice as quick as writing a single EOI to the 8259 in XTPIC mode provided the 8259 accesses are not souped up under the hood. In other words whilst it is a timing hit it is not much of one and it won't be needed once this is all fixed by the respective manufacturers -lets hope they can do it on the hardware we have already bought. Regards Ross Dickson ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-13 5:16 Working nforce2, was " Ross Dickson @ 2003-12-13 6:04 ` Jesse Allen 0 siblings, 0 replies; 35+ messages in thread From: Jesse Allen @ 2003-12-13 6:04 UTC (permalink / raw) To: Ross Dickson; +Cc: AMartin, linux-kernel On Sat, Dec 13, 2003 at 03:16:51PM +1000, Ross Dickson wrote: > I wonder about the "voodoo" because my apic ack delay patch was developed > without knowledge of the C1 disconnect bit and reports I have received so far > are that the hard lockups go away when using it independent of the state of the > disconnect bit. Apparently the bit was on in my test systems. > > Ian Kumlien pointed out the linkage with the northbridge timing signals > to the CPU to do with the connect disconnect handshake This is what the item help for C1 Disconnect in my BIOS said: "Force En/Disabled or Auto mode: C17 IGP/SPP NB A03 C18D SPP NB A01 (C01) enabled C1 disconnect otherwise disabled it" I was thinking NB referred to northbridge. SPP is the type of NForce chip. IGP would be a graphics chip(?), though this board don't have that. So yes, we do have at least some relationship with the northbridge and disconnect. This BIOS update probably addressed that, and the BIOS changelog is just a summary. > so I now wonder just how > programmable the nforce2 northbridge is? Is it a bit fpga'ish in that they may be > using the bios boot to alter the handshake timing enough to accomplish what > the ack delay does but like it should be - transparent to the OS? Probably. That's what I'm thinking too now. > > Of course they -the makers- have access to knowledge we don't so it could be > something completely different that they are doing! > > In short I agree with the suggestion that the new bios options do more behind > the scenes than what the athcool and disconnect patches do. That's why I'm trying to contact shuttle. > > I am pretty sure that I read somewhere that when the epox boards > were first released the epox 8rda bios started out with it (the disconnect bit) off > then the 8rga+ came out with it on by default? So back then people were wanting > to turn it on in the 8rda to lower their CPU temperature - now some want it off > in search of stability? Ah, that reminds me. The very first day I ran this board last week, I was very worried on how high the system temperature was getting -- above 40 deg C. CPU was getting up to 49 deg C. Not that it was locking up because of temperature - it would on a cold-boot - but that I was experiencing lock ups and higher than normal temperatures which indicates to me now on how poorly it's thermal management was operating then. Now with the new patches, and ultimately, BIOS update, system temperature is about 35 deg C, which aint too bad =) > Back then under win.... some experienced lockups depending > on which IDE driver was used and which state the bit was in! Good point! I was reading some message boards discussing nforce2s yesterday. And they pretty much unaminiously said, don't use NForce IDE driver, use windows provided IDE driver, because the NForce IDE _locks up_. So windows does have the same problem after all. I wouldn't know because I don't have windows... but you can find this same issue everywhere then. > > Out of interest has anyone seen new disconnect bit options in the Pheonix bios or > only in the award bios? I have an award bios. > > Finally I have done some more work and found that the ack delay patch on my > system is about 13 apic timer counts, about half that required to write a byte > directly outb(0x00, 0x378) to the printer port at 28 apic timer counts. > So the ack delay is about twice as quick as writing a single EOI to the 8259 in > XTPIC mode provided the 8259 accesses are not souped up under the hood. > In other words whilst it is a timing hit it is not much of one and it won't be > needed once this is all fixed by the respective manufacturers -lets hope they > can do it on the hardware we have already bought. > > Regards > Ross Dickson > > Good work. Lets hope the hardware manufacturers come through. Jesse ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered @ 2003-12-13 9:20 Ross Dickson 2003-12-13 9:51 ` Bob 0 siblings, 1 reply; 35+ messages in thread From: Ross Dickson @ 2003-12-13 9:20 UTC (permalink / raw) To: linux-kernel <snip> >>I decided to wait till this morning, to try the BIOS "C1 Disconnect" set to >> enabled. Still no lockups under this kernel. Tried a vanilla kernel, no >> lockups (but timer and watchdog messed up still). Now that I read >> your message Bob, I understand what you are saying. Luckily, the >>updated BIOS changelog states "Add C1 disconnect item." And this exact >> version seems to have fixed it, and now we have an exact fix (another one?) >>to refer to. > > > >So the fix was absolutely a BIOS fix. > > <snip> ==That's why I'm trying to contact shuttle. Jesse Good Work Jesse, I hope shuttle give up some info - especially as I have pheonix bioses and they are doing ?? about it? > ...but we're stuck looking at smoke and mirrors, > when the kernel might be able to work around > bioses that have not been "updated". Or to put > it another way, "voodoo" may be done by > kernel if not done by bios. Whatever is being > tweaked may be accessible to kernel code. <snip> Bob Please ignore the following if you are already up to speed on SMM. Some readers may not know why we cannot do all that the bios can do aside from a lack of information. Agreed but the keywords are might and may. I remember doing dos based data acquisition with 486SX laptops and then Intel brought out the 486Sl and our pulse counting went bad because of the power saving core. I got the data book from Intel and was very dismayed to see that bios code was being executed when I thought our code was running and there was not a darn thing I could do about it and keep the laptop warranty intact. Its offspring as you may already know is SMM. It is a priviledged mode that we can do pretty much squat about. It can pop up anywhere in the middle of our code and the only thing we will know about it aside from missing time is when it has stuffed something up - like setting registers back to the wrong values. Think of it like a kernel within our kernel with permissions set so it can hack us but we cannot hack it. Maciej recently writes of its continuing effect on NMI debug here. http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/2940.html Regards Ross. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-13 9:20 Ross Dickson @ 2003-12-13 9:51 ` Bob 0 siblings, 0 replies; 35+ messages in thread From: Bob @ 2003-12-13 9:51 UTC (permalink / raw) To: linux-kernel Ross Dickson wrote: >>>So the fix was absolutely a BIOS fix. >>> >>> >>> ><snip> > >==That's why I'm trying to contact shuttle. >Jesse > > R> Good Work Jesse, I hope shuttle give up some info - especially as I have pheonix bioses and they are doing ?? about it? -Ross B> I was expecting to hear that. I have an Award bios on MSI nforce2 mboard. Their bios flash file begins with "w" for Award w6570nms.760(W6570 v760 bios flash file) "p" phoenix "a" ami also appears at boot but goes by in a flash and appears on first cmos setup page So Award bios has a fix for the nforce2. How about Jesse's bios that can fix the problem without a kernel patch, as my Award bios is doing? What kind of bios is that you have, Jesse? My Award bios does not make any way for me to have ioapic edge timer turn on, though. I need a patch to get that on. Also I don't have a cpu disconnect choice in setup and by running temp range 41C to 48C I guess cpu disconnect is not on. 48C once in a while does not hurt anything though. -Bob >>...but we're stuck looking at smoke and mirrors, >>when the kernel might be able to work around >>bioses that have not been "updated". Or to put >>it another way, "voodoo" may be done by >>kernel if not done by bios. Whatever is being >>tweaked may be accessible to kernel code. >> >> ><snip> >Bob > >Please ignore the following if you are already up to speed on SMM. Some >readers may not know why we cannot do all that the bios can do aside from >a lack of information. > >Agreed but the keywords are might and may. I remember doing dos based data acquisition >with 486SX laptops and then Intel brought out the 486Sl and our pulse counting >went bad because of the power saving core. I got the data book from Intel and >was very dismayed to see that bios code was being executed when I thought our code >was running and there was not a darn thing I could do about it and keep the >laptop warranty intact. > >Its offspring as you may already know is SMM. It is a priviledged mode that we can >do pretty much squat about. It can pop up anywhere in the middle of our code >and the only thing we will know about it aside from missing time is when it has >stuffed something up - like setting registers back to the wrong values. Think of >it like a kernel within our kernel with permissions set so it can hack us but we >cannot hack it. > >Maciej recently writes of its continuing effect on NMI debug here. > >http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/2940.html > >Regards >Ross > > > Thanks for explaining. We got some new functionality just by turning nmi_watchdog on but I don't know if anybody has learned anything from the extra debug have they, as far as this nforce2 timing thing? -Bob ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered @ 2003-12-15 14:30 Ross Dickson 2003-12-15 15:02 ` Craig Bradney 0 siblings, 1 reply; 35+ messages in thread From: Ross Dickson @ 2003-12-15 14:30 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: recbo, linux-kernel >> APIC error on CPU0: 02(02) > > what?? no crash though. > [...] > > bob@where cat /proc/interrupts > > CPU0 > > 0: 3350153 IO-APIC-edge timer > > 1: 5775 IO-APIC-edge i8042 > > 2: 0 XT-PIC cascade > > 8: 1 IO-APIC-edge rtc > > 9: 0 IO-APIC-level acpi > > 12: 5385 IO-APIC-edge i8042 > > 14: 10 IO-APIC-edge ide0 > > 15: 10 IO-APIC-edge ide1 > > 16: 1717957 IO-APIC-level ide2, ide3, eth0 > > 19: 472929 IO-APIC-level ide4, ide5 > > 21: 0 IO-APIC-level NVidia nForce2 > > NMI: 822 > > LOC: 3350073 > > ERR: 35 > > MIS: 15818 >It looks like the infamous APIC delivery bug -- the "MIS" counter shows >how many level-triggered interrupts has been erronously delivered as >edge-triggered ones. No wonder the system shows instability -- you have >noise problems at the APIC bus. Thanks Maciej I was wondering about those, I had seen the work around code and would not have thought it need apply to recent athlon chipsets? For comparison here is my proc/interrupts CPU0 0: 50462204 IO-APIC-edge timer 1: 49153 IO-APIC-edge keyboard 2: 0 XT-PIC cascade 9: 0 IO-APIC-level acpi 12: 395912 IO-APIC-edge PS/2 Mouse 14: 995872 IO-APIC-edge ide0 15: 283 IO-APIC-edge ide1 16: 3921102 IO-APIC-level nvidia 18: 2 IO-APIC-level bttv 20: 136325 IO-APIC-level eth0, usb-ohci 21: 146903 IO-APIC-level ehci_hcd, NVIDIA nForce Audio 22: 0 IO-APIC-level usb-ohci NMI: 0 LOC: 50457798 ERR: 0 MIS: 0 Albatron KM18G-Pro, nforce2, pheonix bios, 2200XP, 255fsb, ddr400, ide0 is hard drive, ide1 is cdrom, nmi watchdog off Report seems OK but this machine locks up hard without the apic delay patch. I am currently trying the simpler v1 (always add a delay) patch but on all apic acks as per this posting http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/3291.html which is a reply to an earlier posting of the same name but I accidently omitted the Re in the subject. Regards, Ross. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-15 14:30 Fwd: " Ross Dickson @ 2003-12-15 15:02 ` Craig Bradney 2003-12-15 16:54 ` Ross Dickson 0 siblings, 1 reply; 35+ messages in thread From: Craig Bradney @ 2003-12-15 15:02 UTC (permalink / raw) To: ross; +Cc: Maciej W. Rozycki, recbo, linux-kernel Just to give the status here ... Im still running the original 2.6 test 11 patches for apic and ioapic. Uptime is now 2d 20h with lots of idle time and hard work too.. /proc/interrupts as follows: CPU0 0: 245382420 IO-APIC-edge timer 1: 139577 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 3 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 1478615 IO-APIC-edge i8042 14: 1055548 IO-APIC-edge ide0 15: 737664 IO-APIC-edge ide1 19: 18405692 IO-APIC-level radeon@PCI:3:0:0 21: 5257090 IO-APIC-level ehci_hcd, NVidia nForce2, eth0 22: 3 IO-APIC-level ohci1394 NMI: 14944 LOC: 245087891 ERR: 0 MIS: 6 As for NMI.. I actually forget which I booted from... I think =1, but NMI is a small number now.. would it have wrapped? Craig A7N8X Deluxe V2 BIOS 1007 On Mon, 2003-12-15 at 15:30, Ross Dickson wrote: > >> APIC error on CPU0: 02(02) > > > what?? no crash though. > > [...] > > > bob@where cat /proc/interrupts > > > CPU0 > > > 0: 3350153 IO-APIC-edge timer > > > 1: 5775 IO-APIC-edge i8042 > > > 2: 0 XT-PIC cascade > > > 8: 1 IO-APIC-edge rtc > > > 9: 0 IO-APIC-level acpi > > > 12: 5385 IO-APIC-edge i8042 > > > 14: 10 IO-APIC-edge ide0 > > > 15: 10 IO-APIC-edge ide1 > > > 16: 1717957 IO-APIC-level ide2, ide3, eth0 > > > 19: 472929 IO-APIC-level ide4, ide5 > > > 21: 0 IO-APIC-level NVidia nForce2 > > > NMI: 822 > > > LOC: 3350073 > > > ERR: 35 > > > MIS: 15818 > > >It looks like the infamous APIC delivery bug -- the "MIS" counter shows > >how many level-triggered interrupts has been erronously delivered as > >edge-triggered ones. No wonder the system shows instability -- you have > >noise problems at the APIC bus. > > Thanks Maciej > I was wondering about those, I had seen the work around code and would not > have thought it need apply to recent athlon chipsets? > > > For comparison here is my proc/interrupts > CPU0 > 0: 50462204 IO-APIC-edge timer > 1: 49153 IO-APIC-edge keyboard > 2: 0 XT-PIC cascade > 9: 0 IO-APIC-level acpi > 12: 395912 IO-APIC-edge PS/2 Mouse > 14: 995872 IO-APIC-edge ide0 > 15: 283 IO-APIC-edge ide1 > 16: 3921102 IO-APIC-level nvidia > 18: 2 IO-APIC-level bttv > 20: 136325 IO-APIC-level eth0, usb-ohci > 21: 146903 IO-APIC-level ehci_hcd, NVIDIA nForce Audio > 22: 0 IO-APIC-level usb-ohci > NMI: 0 > LOC: 50457798 > ERR: 0 > MIS: 0 > > Albatron KM18G-Pro, nforce2, pheonix bios, 2200XP, 255fsb, ddr400, > ide0 is hard drive, ide1 is cdrom, nmi watchdog off > > Report seems OK but this machine locks up hard without the apic delay patch. > > I am currently trying the simpler v1 (always add a delay) patch but on all apic > acks as per this posting > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/3291.html > > which is a reply to an earlier posting of the same name but I accidently > omitted the Re in the subject. > > Regards, > Ross. > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-15 15:02 ` Craig Bradney @ 2003-12-15 16:54 ` Ross Dickson 2003-12-16 6:07 ` Bob 0 siblings, 1 reply; 35+ messages in thread From: Ross Dickson @ 2003-12-15 16:54 UTC (permalink / raw) To: Craig Bradney; +Cc: recbo, linux-kernel, Ian Kumlien On Tuesday 16 December 2003 01:02, you wrote: > Just to give the status here ... > Im still running the original 2.6 test 11 patches for apic and ioapic. > Uptime is now 2d 20h with lots of idle time and hard work too.. > > /proc/interrupts as follows: > > CPU0 > 0: 245382420 IO-APIC-edge timer > 1: 139577 IO-APIC-edge i8042 > 2: 0 XT-PIC cascade > 8: 3 IO-APIC-edge rtc > 9: 0 IO-APIC-level acpi > 12: 1478615 IO-APIC-edge i8042 > 14: 1055548 IO-APIC-edge ide0 > 15: 737664 IO-APIC-edge ide1 > 19: 18405692 IO-APIC-level radeon@PCI:3:0:0 > 21: 5257090 IO-APIC-level ehci_hcd, NVidia nForce2, eth0 > 22: 3 IO-APIC-level ohci1394 > NMI: 14944 > LOC: 245087891 > ERR: 0 > MIS: 6 Uptime sounds good so far. I am not convinced my v2 apic patch is a great overall improvement, I am thinking v1 apic, is safer for now. Having said that Ian Kumlien currently has an uptime of 1 day, 15 hours + on v2 patches but with the apic delay timeout increased from 600UL to 800UL. He has a Barton core - see below. > > Craig > A7N8X Deluxe V2 BIOS 1007 > > <snip> > > I am currently trying the simpler v1 (always add a delay) patch but on all apic > > acks as per this posting > > > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/3291.html > > > > which is a reply to an earlier posting of the same name but I accidently > > omitted the Re in the subject. > > I don't think it is necessary to put the delay in all apic acks - I just tried it to see if it worked and have not yet put my code back the way it was. My hard lockups went away with the original v1 apic timer delay patch anyway. Please note in that (above) posting I write that I stuffed up the #ifdefs in my v1 and v2 patches and adjust code accordingly. Patches worked but were only testing on the first config item after #ifdef apic code should have had #if defined(CONFIG_MK7) && defined(CONFIG_BLK_DEV_AMD74XX) ioapic code should have had #if defined(CONFIG_ACPI_BOOT) && defined(CONFIG_X86_UP_IOAPIC) Brief summary at this point 1) 2? reports are in that latest award bios with "C1 disconnect" set to "auto?" may remove need for apic ack delay patch and still keep cpu thermo managed 2) apic ack delay v1 patch seems safe for all cpu cores but introduces a small delay of about half the time of an XTPIC access on each apic timer interrupt 3) apic ack delay v2 patch seems safe only on barton cores and gives more debugging info and wastes less time than apic v1 patch 4) io-apic v2 patch gives more debugging info but functions same as io-apic v1 patch Regards Ross ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-15 16:54 ` Ross Dickson @ 2003-12-16 6:07 ` Bob 0 siblings, 0 replies; 35+ messages in thread From: Bob @ 2003-12-16 6:07 UTC (permalink / raw) To: linux-kernel Ross, my_make_script nf2-800UL 2>&1 | tee /tmp/make.err #/tmp/make.err <snip> CC arch/i386/kernel/apic.o arch/i386/kernel/apic.c: In function `smp_apic_timer_interrupt': arch/i386/kernel/apic.c:1105: warning: unsigned int format, long unsigned int arg (arg 2) ...which is around the printk line here-- printk("..APIC TIMER ack delay, reload:%u, safe:%u\n", + if(!passno) { /* calculate timing */ + safecnt = apic_read(APIC_TMICT) - + ( (800UL * apic_read(APIC_TMICT) ) / + (1000000000UL/HZ) ); + printk("..APIC TIMER ack delay, reload:%u, safe:%u\n", + apic_read(APIC_TMICT), safecnt); + passno++; Here are the two patches with "#ifdef N" to "#if defined(N)" change but not the unsigned int change -- diff -urN linux-2.6.0-test11/arch/i386/kernel/apic.c linux-2.6.0-test11-nf2/arch/i386/kernel/apic.c --- linux-2.6.0-test11/arch/i386/kernel/apic.c 2003-11-26 15:46:07.000000000 -0500 +++ linux-2.6.0-test11-nf2/arch/i386/kernel/apic.c 2003-12-13 23:48:30.000000000 -0500 @@ -1089,6 +1089,37 @@ */ irq_stat[cpu].apic_timer_irqs++; +#if defined(CONFIG_MK7) && defined(CONFIG_BLK_DEV_AMD74XX) + /* + * on 2200XP & nforce2 chipset we need 600ns? 800? 1000? 1100? + * from timer irq start to apic irq ack to prevent + * hard lockups, use apic timer itself. + * C1 disconnect bit related. Ross Dickson. + */ + { + static unsigned int passno, safecnt; + if(!passno) { /* calculate timing */ + safecnt = apic_read(APIC_TMICT) - + ( (800UL * apic_read(APIC_TMICT) ) / + (1000000000UL/HZ) ); + printk("..APIC TIMER ack delay, reload:%u, safe:%u\n", + apic_read(APIC_TMICT), safecnt); + passno++; + } +#if APIC_DEBUG + if(passno<12) { + unsigned int at1 = apic_read(APIC_TMCCT); + if( passno > 1 ) + Dprintk("..APIC TIMER ack delay, predelay count:%u \n", at1 ); + passno++; + } +#endif + /* delay only if required */ + while( apic_read(APIC_TMCCT) > safecnt ) + ndelay(100); + } +#endif + /* * NOTE! We'd better ACK the irq immediately, * because timer handling can be slow.*/ diff -urN linux-2.6.0-test11/arch/i386/kernel/io_apic.c linux-2.6.0-test11-nf2/arch/i386/kernel/io_apic.c --- linux-2.6.0-test11/arch/i386/kernel/io_apic.c 2003-11-26 15:43:32.000000000 -0500 +++ linux-2.6.0-test11-nf2/arch/i386/kernel/io_apic.c 2003-12-13 15:14:25.000000000 -0500 @@ -2128,6 +2128,54 @@ printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to IO-APIC\n"); } +#if defined (CONFIG_ACPI_BOOT) && (CONFIG_X86_UP_IOAPIC) + /* for nforce2 try vector 0 on pin0 + * Note 8259a is already masked, also by default + * the io_apic_set_pci_routing call disables the 8259 irq 0 + * so we must be connected directly to the 8254 timer if this works + * Note2: this violates the above comment re Subtle but works! + */ + printk(KERN_INFO "..TIMER: Is timer irq0 connected to IOAPIC Pin0? ...\n"); + if (pin1 != -1) { + extern spinlock_t i8259A_lock; + unsigned long flags; + int tok, saved_timer_ack = timer_ack; + /* + * Ok, does IRQ0 through the IOAPIC work? + */ + io_apic_set_pci_routing ( 0, 0, 0, 0, 0); /* connect pin */ + unmask_IO_APIC_irq(0); + timer_ack = 0; + + /* + + + + * Ok, does IRQ0 through the IOAPIC work? + */ + spin_lock_irqsave(&i8259A_lock, flags); + Dprintk("..TIMER check 8259 ints disabled, imr1:%02x, imr2:%02x\n", inb(0x21), inb(0xA1)); + tok = timer_irq_works(); + spin_unlock_irqrestore(&i8259A_lock, flags); + if (tok) { + if (nmi_watchdog == NMI_IO_APIC) { + disable_8259A_irq(0); + setup_nmi(); + enable_8259A_irq(0); + check_nmi_watchdog(); + } + printk(KERN_INFO "..TIMER: works OK on apic pin0 irq0\n" ); + return; + } + /* failed */ + timer_ack = saved_timer_ack; + clear_IO_APIC_pin(0, 0); + io_apic_set_pci_routing ( 0, pin1, 0, 0, 0); + printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to IO-APIC Pin 0\n"); + } +/* end new stuff for nforce2 */ +#endif + printk(KERN_INFO "...trying to set up timer (IRQ0) through the 8259A ... "); if (pin2 != -1) { printk("\n..... (found pin %d) ...", pin2); ^ permalink raw reply [flat|nested] 35+ messages in thread
[parent not found: <200312132040.00875.ross@datscreative.com.au>]
* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered [not found] <200312132040.00875.ross@datscreative.com.au> @ 2003-12-13 12:00 ` Bob 2003-12-15 13:11 ` Maciej W. Rozycki 0 siblings, 1 reply; 35+ messages in thread From: Bob @ 2003-12-13 12:00 UTC (permalink / raw) To: linux-kernel udma133 with Award bios update and nforce2 APIC error on CPU0: 02(02) what?? no crash though. Ross Dickson wrote: >Hi Bob > >Jesse has award bios, see attached >Ross. > Months ago I thought using a 3ware card might help with nforce2 crashes so I gave up on promise and sii hd cards after a lot of experiments(hdparm, no lapic, no acpi, apic off in bios) and put in a 3ware card but I flashed the bios at the same time so didn't know if the 3ware card helped with the nforce2 crashing or not, since the bios flash did the job. With 3ware I couldn't use hdparm to see what udma settings the drives were set to. Now I can report. Just now I took the 3ware card out and went back to promise cards(using 4 hd's either method, 2 cd's on mboard amd74xx, onboard sata disabled). bob@where cat /proc/interrupts CPU0 0: 3350153 IO-APIC-edge timer 1: 5775 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 1 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 5385 IO-APIC-edge i8042 14: 10 IO-APIC-edge ide0 15: 10 IO-APIC-edge ide1 16: 1717957 IO-APIC-level ide2, ide3, eth0 19: 472929 IO-APIC-level ide4, ide5 21: 0 IO-APIC-level NVidia nForce2 NMI: 822 LOC: 3350073 ERR: 35 MIS: 15818 cd's on amd74xx onboard, amd74xx onboard is always solid, 4 ide hd's on two promise cards. not many nmi ticks without the better patch there. bonnie++ smooth, then hdparm up the settings, udma6, bonnie++ again, saw a few "APIC error on CPU0: 02(02)" but no lockup. not sure if data lost since it was a test. APIC error might be fixed by changing hdparm settings. This second test was with unmasked irq and udma6. I have to patch to get ioapic edge timer on. This 11/7/2003 updated award bios does not have a cpu disconnect option but it does eliminate the crashes with no patch and it is no longer impossible to use promise ide udma133 controller cards. MSI K7N2 Delta MCP2-T mboard I don't have the promise patch in yet, either, so the APIC error might be from that, or hdparm unmasked irq. -Bob ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-13 12:00 ` Fwd: " Bob @ 2003-12-15 13:11 ` Maciej W. Rozycki 2003-12-16 7:18 ` Bob 0 siblings, 1 reply; 35+ messages in thread From: Maciej W. Rozycki @ 2003-12-15 13:11 UTC (permalink / raw) To: Bob; +Cc: linux-kernel On Sat, 13 Dec 2003, Bob wrote: > APIC error on CPU0: 02(02) > what?? no crash though. [...] > bob@where cat /proc/interrupts > CPU0 > 0: 3350153 IO-APIC-edge timer > 1: 5775 IO-APIC-edge i8042 > 2: 0 XT-PIC cascade > 8: 1 IO-APIC-edge rtc > 9: 0 IO-APIC-level acpi > 12: 5385 IO-APIC-edge i8042 > 14: 10 IO-APIC-edge ide0 > 15: 10 IO-APIC-edge ide1 > 16: 1717957 IO-APIC-level ide2, ide3, eth0 > 19: 472929 IO-APIC-level ide4, ide5 > 21: 0 IO-APIC-level NVidia nForce2 > NMI: 822 > LOC: 3350073 > ERR: 35 > MIS: 15818 It looks like the infamous APIC delivery bug -- the "MIS" counter shows how many level-triggered interrupts has been erronously delivered as edge-triggered ones. No wonder the system shows instability -- you have noise problems at the APIC bus. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--------------------------------------------------------------+ + e-mail: macro@ds2.pg.gda.pl, PGP key available + ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-15 13:11 ` Maciej W. Rozycki @ 2003-12-16 7:18 ` Bob 0 siblings, 0 replies; 35+ messages in thread From: Bob @ 2003-12-16 7:18 UTC (permalink / raw) To: linux-kernel apic.c patch needs reload:%lu instead of %u ----------> printk("..APIC TIMER ack delay, reload:%lu, safe:%u\n", amd xp3000+, 1:1 333mhz fsb to ram, 166mhz cpu bus clock x dual channel 2-512mb pc3200 tested cas2 sticks, 1:1 fsb to ram for 333mhz, Award bios with update that works for non-crashing but not for edge timer without patch. MSI K7N2 Delta MCP2-T mbo linux-2.6.0-test11 This was with 3ware controller and unpatched 2.6.0-test11 Note low MIS score but PIC timer and no nmi-- CPU0 0: 244393560 XT-PIC timer 1: 31963 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 1 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 251884 IO-APIC-edge i8042 14: 22 IO-APIC-edge ide0 15: 24 IO-APIC-edge ide1 16: 4290216 IO-APIC-level 3ware Storage Controller, yenta, yenta 17: 5929405 IO-APIC-level eth0 21: 0 IO-APIC-level NVidia nForce2 NMI: 0 LOC: 244378698 ERR: 0 MIS: 6 Next is with the first edge timer patch, nmi_watchdog=2 works but =1 does not, MIS really high("noisy bus"), replacing 3ware with promise cards and hdparm udma133 causes apic error logged to console during bonnie++ test-- >>APIC error on CPU0: 02(02) >>what?? no crash though. >> >> >>bob@where cat /proc/interrupts >> CPU0 >> 0: 3350153 IO-APIC-edge timer >> 1: 5775 IO-APIC-edge i8042 >> 2: 0 XT-PIC cascade >> 8: 1 IO-APIC-edge rtc >> 9: 0 IO-APIC-level acpi >> 12: 5385 IO-APIC-edge i8042 >> 14: 10 IO-APIC-edge ide0 >> 15: 10 IO-APIC-edge ide1 >> 16: 1717957 IO-APIC-level ide2, ide3, eth0 >> 19: 472929 IO-APIC-level ide4, ide5 >> 21: 0 IO-APIC-level NVidia nForce2 >>NMI: 822 >>LOC: 3350073 >>ERR: 35 >>MIS: 15818 >> >> now with promise controllers again, new edge timer patch permits nmi_watchdog=1 not =2, lots of nmi ticks, MIS count is only half with first timer patch, NMI ticks = LOC? bob@where cat /proc/interrupts CPU0 0: 46188571 IO-APIC-edge timer 1: 12396 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 1 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 147429 IO-APIC-edge i8042 14: 10 IO-APIC-edge ide0 15: 10 IO-APIC-edge ide1 16: 1413705 IO-APIC-level ide2, ide3, eth0 17: 0 IO-APIC-level yenta, yenta 19: 258804 IO-APIC-level ide4, ide5 21: 0 IO-APIC-level NVidia nForce2 NMI: 46188592 LOC: 46188482 ERR: 36 MIS: 6877 Now I'll try 800UL/100ndelay to see if it helps with MIS count(pseudo-sci masochism), be back in a while. Oh, by the way, I set debug 1 in apic.h but I don't see anything, and I thought I saw a compile error flash by, so now I'll compile > logfile 2>&1 and might see why I don't see-- "..APIC TIMER ack delay, predelay count: 20769" I don't see any of that debug stuff. Maybe the compile errors I found were it, see my previous message about "unsigned in format", maybe printk needs %lu(I don't know hardly nuffing yet). I'm going to boot 800UL/100ndelay now. it needs reload:%lu instead of %u ----------> printk("..APIC TIMER ack delay, reload:%lu, safe:%u\n", Ross: "Can you also advise if your bios setting of the "C1 disconnect" is set" I can only guess by my 41C low load 48C high load temps exactly equal to range for "2.1Ghz 333mhz" of Ian Kumlien(his?) which is same speed as mine, that probably cpu disconnect is not on. I have no visible choice in setup for cpu disconnect. I'll try athcool to see how disconnect is set. Ross:"I have heard lockups are not supposed to happen at all if the fsb (host bus clock speed) matches the ddr speed. One of my systems went about 4 hours (xp2500 333fsb, DDR333) without the apic delay patch on a phoenix bios before lockup" A couple of months ago I was overly optimistic a couple of times before the bios update, and it seemed to work to use 1:1 and only amd74xx onboard hd controller, no hd cards, and pre-emptive, anticipatory sched not deadline, apic off in setup but on in linux, lapic off, acpi on. It was almost stable if using only one drive, but I really can't go without hd cards for software raid, so the first fsck on boot if using hd card, and crash. I could finesse stability by using options but never quite reach reliability without a bios update, and certain functions need patching, and I still have "MIS count, noisy bus" and agp8 crash(I can use the X nv driver and agpgart no problem, but not nvidia drivers for X and agp8). ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2003-12-16 7:18 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-12-07 13:12 Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered Ross Dickson
2003-12-09 15:20 ` Maciej W. Rozycki
2003-12-10 5:43 ` Ross Dickson
2003-12-10 16:06 ` Maciej W. Rozycki
2003-12-11 6:55 ` Ross Dickson
2003-12-11 11:47 ` Ian Kumlien
2003-12-11 9:12 ` Ross Dickson
2003-12-11 17:52 ` Ian Kumlien
2003-12-11 18:21 ` Jesse Allen
2003-12-12 9:27 ` Bob
2003-12-12 16:59 ` Working nforce2, was " Jesse Allen
2003-12-12 17:18 ` Jesse Allen
2003-12-12 18:18 ` Josh McKinney
2003-12-12 19:29 ` Jesse Allen
2003-12-12 21:42 ` Craig Bradney
2003-12-13 4:18 ` Bob
2003-12-13 6:34 ` Bob
2003-12-11 14:58 ` Jesse Allen
2003-12-11 15:20 ` Craig Bradney
2003-12-11 16:05 ` Jesse Allen
2003-12-11 15:15 ` Maciej W. Rozycki
2003-12-11 16:23 ` Josh McKinney
2003-12-11 17:04 ` Maciej W. Rozycki
2003-12-11 17:25 ` Jesse Allen
2003-12-10 3:39 ` Jesse Allen
2003-12-10 9:22 ` Ross Dickson
2003-12-10 10:00 ` Mikael Pettersson
2003-12-10 8:40 ` Ross Dickson
2003-12-11 14:32 ` Jesse Allen
-- strict thread matches above, loose matches on Subject: below --
2003-12-13 5:16 Working nforce2, was " Ross Dickson
2003-12-13 6:04 ` Jesse Allen
2003-12-13 9:20 Ross Dickson
2003-12-13 9:51 ` Bob
2003-12-15 14:30 Fwd: " Ross Dickson
2003-12-15 15:02 ` Craig Bradney
2003-12-15 16:54 ` Ross Dickson
2003-12-16 6:07 ` Bob
[not found] <200312132040.00875.ross@datscreative.com.au>
2003-12-13 12:00 ` Fwd: " Bob
2003-12-15 13:11 ` Maciej W. Rozycki
2003-12-16 7:18 ` Bob
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox