* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered @ 2003-12-15 14:30 Ross Dickson 2003-12-15 15:02 ` Craig Bradney 0 siblings, 1 reply; 7+ messages in thread From: Ross Dickson @ 2003-12-15 14:30 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: recbo, linux-kernel >> APIC error on CPU0: 02(02) > > what?? no crash though. > [...] > > bob@where cat /proc/interrupts > > CPU0 > > 0: 3350153 IO-APIC-edge timer > > 1: 5775 IO-APIC-edge i8042 > > 2: 0 XT-PIC cascade > > 8: 1 IO-APIC-edge rtc > > 9: 0 IO-APIC-level acpi > > 12: 5385 IO-APIC-edge i8042 > > 14: 10 IO-APIC-edge ide0 > > 15: 10 IO-APIC-edge ide1 > > 16: 1717957 IO-APIC-level ide2, ide3, eth0 > > 19: 472929 IO-APIC-level ide4, ide5 > > 21: 0 IO-APIC-level NVidia nForce2 > > NMI: 822 > > LOC: 3350073 > > ERR: 35 > > MIS: 15818 >It looks like the infamous APIC delivery bug -- the "MIS" counter shows >how many level-triggered interrupts has been erronously delivered as >edge-triggered ones. No wonder the system shows instability -- you have >noise problems at the APIC bus. Thanks Maciej I was wondering about those, I had seen the work around code and would not have thought it need apply to recent athlon chipsets? For comparison here is my proc/interrupts CPU0 0: 50462204 IO-APIC-edge timer 1: 49153 IO-APIC-edge keyboard 2: 0 XT-PIC cascade 9: 0 IO-APIC-level acpi 12: 395912 IO-APIC-edge PS/2 Mouse 14: 995872 IO-APIC-edge ide0 15: 283 IO-APIC-edge ide1 16: 3921102 IO-APIC-level nvidia 18: 2 IO-APIC-level bttv 20: 136325 IO-APIC-level eth0, usb-ohci 21: 146903 IO-APIC-level ehci_hcd, NVIDIA nForce Audio 22: 0 IO-APIC-level usb-ohci NMI: 0 LOC: 50457798 ERR: 0 MIS: 0 Albatron KM18G-Pro, nforce2, pheonix bios, 2200XP, 255fsb, ddr400, ide0 is hard drive, ide1 is cdrom, nmi watchdog off Report seems OK but this machine locks up hard without the apic delay patch. I am currently trying the simpler v1 (always add a delay) patch but on all apic acks as per this posting http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/3291.html which is a reply to an earlier posting of the same name but I accidently omitted the Re in the subject. Regards, Ross. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-15 14:30 Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered Ross Dickson @ 2003-12-15 15:02 ` Craig Bradney 2003-12-15 15:56 ` Maciej W. Rozycki 2003-12-15 16:54 ` Ross Dickson 0 siblings, 2 replies; 7+ messages in thread From: Craig Bradney @ 2003-12-15 15:02 UTC (permalink / raw) To: ross; +Cc: Maciej W. Rozycki, recbo, linux-kernel Just to give the status here ... Im still running the original 2.6 test 11 patches for apic and ioapic. Uptime is now 2d 20h with lots of idle time and hard work too.. /proc/interrupts as follows: CPU0 0: 245382420 IO-APIC-edge timer 1: 139577 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 3 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 1478615 IO-APIC-edge i8042 14: 1055548 IO-APIC-edge ide0 15: 737664 IO-APIC-edge ide1 19: 18405692 IO-APIC-level radeon@PCI:3:0:0 21: 5257090 IO-APIC-level ehci_hcd, NVidia nForce2, eth0 22: 3 IO-APIC-level ohci1394 NMI: 14944 LOC: 245087891 ERR: 0 MIS: 6 As for NMI.. I actually forget which I booted from... I think =1, but NMI is a small number now.. would it have wrapped? Craig A7N8X Deluxe V2 BIOS 1007 On Mon, 2003-12-15 at 15:30, Ross Dickson wrote: > >> APIC error on CPU0: 02(02) > > > what?? no crash though. > > [...] > > > bob@where cat /proc/interrupts > > > CPU0 > > > 0: 3350153 IO-APIC-edge timer > > > 1: 5775 IO-APIC-edge i8042 > > > 2: 0 XT-PIC cascade > > > 8: 1 IO-APIC-edge rtc > > > 9: 0 IO-APIC-level acpi > > > 12: 5385 IO-APIC-edge i8042 > > > 14: 10 IO-APIC-edge ide0 > > > 15: 10 IO-APIC-edge ide1 > > > 16: 1717957 IO-APIC-level ide2, ide3, eth0 > > > 19: 472929 IO-APIC-level ide4, ide5 > > > 21: 0 IO-APIC-level NVidia nForce2 > > > NMI: 822 > > > LOC: 3350073 > > > ERR: 35 > > > MIS: 15818 > > >It looks like the infamous APIC delivery bug -- the "MIS" counter shows > >how many level-triggered interrupts has been erronously delivered as > >edge-triggered ones. No wonder the system shows instability -- you have > >noise problems at the APIC bus. > > Thanks Maciej > I was wondering about those, I had seen the work around code and would not > have thought it need apply to recent athlon chipsets? > > > For comparison here is my proc/interrupts > CPU0 > 0: 50462204 IO-APIC-edge timer > 1: 49153 IO-APIC-edge keyboard > 2: 0 XT-PIC cascade > 9: 0 IO-APIC-level acpi > 12: 395912 IO-APIC-edge PS/2 Mouse > 14: 995872 IO-APIC-edge ide0 > 15: 283 IO-APIC-edge ide1 > 16: 3921102 IO-APIC-level nvidia > 18: 2 IO-APIC-level bttv > 20: 136325 IO-APIC-level eth0, usb-ohci > 21: 146903 IO-APIC-level ehci_hcd, NVIDIA nForce Audio > 22: 0 IO-APIC-level usb-ohci > NMI: 0 > LOC: 50457798 > ERR: 0 > MIS: 0 > > Albatron KM18G-Pro, nforce2, pheonix bios, 2200XP, 255fsb, ddr400, > ide0 is hard drive, ide1 is cdrom, nmi watchdog off > > Report seems OK but this machine locks up hard without the apic delay patch. > > I am currently trying the simpler v1 (always add a delay) patch but on all apic > acks as per this posting > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/3291.html > > which is a reply to an earlier posting of the same name but I accidently > omitted the Re in the subject. > > Regards, > Ross. > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-15 15:02 ` Craig Bradney @ 2003-12-15 15:56 ` Maciej W. Rozycki 2003-12-15 16:54 ` Ross Dickson 1 sibling, 0 replies; 7+ messages in thread From: Maciej W. Rozycki @ 2003-12-15 15:56 UTC (permalink / raw) To: Craig Bradney; +Cc: ross, recbo, linux-kernel On Mon, 15 Dec 2003, Craig Bradney wrote: > CPU0 > 0: 245382420 IO-APIC-edge timer > 1: 139577 IO-APIC-edge i8042 > 2: 0 XT-PIC cascade > 8: 3 IO-APIC-edge rtc > 9: 0 IO-APIC-level acpi > 12: 1478615 IO-APIC-edge i8042 > 14: 1055548 IO-APIC-edge ide0 > 15: 737664 IO-APIC-edge ide1 > 19: 18405692 IO-APIC-level radeon@PCI:3:0:0 > 21: 5257090 IO-APIC-level ehci_hcd, NVidia nForce2, eth0 > 22: 3 IO-APIC-level ohci1394 > NMI: 14944 > LOC: 245087891 > ERR: 0 > MIS: 6 > > As for NMI.. I actually forget which I booted from... I think =1, but NMI is a small number now.. would it have wrapped? That's "=2" -- otherwise the NMI count would be rougly the same as the sum of counts for IRQ 0 for all processors. And you can actually get your kernel's command line from /proc/cmdline. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--------------------------------------------------------------+ + e-mail: macro@ds2.pg.gda.pl, PGP key available + ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-15 15:02 ` Craig Bradney 2003-12-15 15:56 ` Maciej W. Rozycki @ 2003-12-15 16:54 ` Ross Dickson 2003-12-16 6:07 ` Bob 1 sibling, 1 reply; 7+ messages in thread From: Ross Dickson @ 2003-12-15 16:54 UTC (permalink / raw) To: Craig Bradney; +Cc: recbo, linux-kernel, Ian Kumlien On Tuesday 16 December 2003 01:02, you wrote: > Just to give the status here ... > Im still running the original 2.6 test 11 patches for apic and ioapic. > Uptime is now 2d 20h with lots of idle time and hard work too.. > > /proc/interrupts as follows: > > CPU0 > 0: 245382420 IO-APIC-edge timer > 1: 139577 IO-APIC-edge i8042 > 2: 0 XT-PIC cascade > 8: 3 IO-APIC-edge rtc > 9: 0 IO-APIC-level acpi > 12: 1478615 IO-APIC-edge i8042 > 14: 1055548 IO-APIC-edge ide0 > 15: 737664 IO-APIC-edge ide1 > 19: 18405692 IO-APIC-level radeon@PCI:3:0:0 > 21: 5257090 IO-APIC-level ehci_hcd, NVidia nForce2, eth0 > 22: 3 IO-APIC-level ohci1394 > NMI: 14944 > LOC: 245087891 > ERR: 0 > MIS: 6 Uptime sounds good so far. I am not convinced my v2 apic patch is a great overall improvement, I am thinking v1 apic, is safer for now. Having said that Ian Kumlien currently has an uptime of 1 day, 15 hours + on v2 patches but with the apic delay timeout increased from 600UL to 800UL. He has a Barton core - see below. > > Craig > A7N8X Deluxe V2 BIOS 1007 > > <snip> > > I am currently trying the simpler v1 (always add a delay) patch but on all apic > > acks as per this posting > > > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/3291.html > > > > which is a reply to an earlier posting of the same name but I accidently > > omitted the Re in the subject. > > I don't think it is necessary to put the delay in all apic acks - I just tried it to see if it worked and have not yet put my code back the way it was. My hard lockups went away with the original v1 apic timer delay patch anyway. Please note in that (above) posting I write that I stuffed up the #ifdefs in my v1 and v2 patches and adjust code accordingly. Patches worked but were only testing on the first config item after #ifdef apic code should have had #if defined(CONFIG_MK7) && defined(CONFIG_BLK_DEV_AMD74XX) ioapic code should have had #if defined(CONFIG_ACPI_BOOT) && defined(CONFIG_X86_UP_IOAPIC) Brief summary at this point 1) 2? reports are in that latest award bios with "C1 disconnect" set to "auto?" may remove need for apic ack delay patch and still keep cpu thermo managed 2) apic ack delay v1 patch seems safe for all cpu cores but introduces a small delay of about half the time of an XTPIC access on each apic timer interrupt 3) apic ack delay v2 patch seems safe only on barton cores and gives more debugging info and wastes less time than apic v1 patch 4) io-apic v2 patch gives more debugging info but functions same as io-apic v1 patch Regards Ross ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-15 16:54 ` Ross Dickson @ 2003-12-16 6:07 ` Bob 0 siblings, 0 replies; 7+ messages in thread From: Bob @ 2003-12-16 6:07 UTC (permalink / raw) To: linux-kernel Ross, my_make_script nf2-800UL 2>&1 | tee /tmp/make.err #/tmp/make.err <snip> CC arch/i386/kernel/apic.o arch/i386/kernel/apic.c: In function `smp_apic_timer_interrupt': arch/i386/kernel/apic.c:1105: warning: unsigned int format, long unsigned int arg (arg 2) ...which is around the printk line here-- printk("..APIC TIMER ack delay, reload:%u, safe:%u\n", + if(!passno) { /* calculate timing */ + safecnt = apic_read(APIC_TMICT) - + ( (800UL * apic_read(APIC_TMICT) ) / + (1000000000UL/HZ) ); + printk("..APIC TIMER ack delay, reload:%u, safe:%u\n", + apic_read(APIC_TMICT), safecnt); + passno++; Here are the two patches with "#ifdef N" to "#if defined(N)" change but not the unsigned int change -- diff -urN linux-2.6.0-test11/arch/i386/kernel/apic.c linux-2.6.0-test11-nf2/arch/i386/kernel/apic.c --- linux-2.6.0-test11/arch/i386/kernel/apic.c 2003-11-26 15:46:07.000000000 -0500 +++ linux-2.6.0-test11-nf2/arch/i386/kernel/apic.c 2003-12-13 23:48:30.000000000 -0500 @@ -1089,6 +1089,37 @@ */ irq_stat[cpu].apic_timer_irqs++; +#if defined(CONFIG_MK7) && defined(CONFIG_BLK_DEV_AMD74XX) + /* + * on 2200XP & nforce2 chipset we need 600ns? 800? 1000? 1100? + * from timer irq start to apic irq ack to prevent + * hard lockups, use apic timer itself. + * C1 disconnect bit related. Ross Dickson. + */ + { + static unsigned int passno, safecnt; + if(!passno) { /* calculate timing */ + safecnt = apic_read(APIC_TMICT) - + ( (800UL * apic_read(APIC_TMICT) ) / + (1000000000UL/HZ) ); + printk("..APIC TIMER ack delay, reload:%u, safe:%u\n", + apic_read(APIC_TMICT), safecnt); + passno++; + } +#if APIC_DEBUG + if(passno<12) { + unsigned int at1 = apic_read(APIC_TMCCT); + if( passno > 1 ) + Dprintk("..APIC TIMER ack delay, predelay count:%u \n", at1 ); + passno++; + } +#endif + /* delay only if required */ + while( apic_read(APIC_TMCCT) > safecnt ) + ndelay(100); + } +#endif + /* * NOTE! We'd better ACK the irq immediately, * because timer handling can be slow.*/ diff -urN linux-2.6.0-test11/arch/i386/kernel/io_apic.c linux-2.6.0-test11-nf2/arch/i386/kernel/io_apic.c --- linux-2.6.0-test11/arch/i386/kernel/io_apic.c 2003-11-26 15:43:32.000000000 -0500 +++ linux-2.6.0-test11-nf2/arch/i386/kernel/io_apic.c 2003-12-13 15:14:25.000000000 -0500 @@ -2128,6 +2128,54 @@ printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to IO-APIC\n"); } +#if defined (CONFIG_ACPI_BOOT) && (CONFIG_X86_UP_IOAPIC) + /* for nforce2 try vector 0 on pin0 + * Note 8259a is already masked, also by default + * the io_apic_set_pci_routing call disables the 8259 irq 0 + * so we must be connected directly to the 8254 timer if this works + * Note2: this violates the above comment re Subtle but works! + */ + printk(KERN_INFO "..TIMER: Is timer irq0 connected to IOAPIC Pin0? ...\n"); + if (pin1 != -1) { + extern spinlock_t i8259A_lock; + unsigned long flags; + int tok, saved_timer_ack = timer_ack; + /* + * Ok, does IRQ0 through the IOAPIC work? + */ + io_apic_set_pci_routing ( 0, 0, 0, 0, 0); /* connect pin */ + unmask_IO_APIC_irq(0); + timer_ack = 0; + + /* + + + + * Ok, does IRQ0 through the IOAPIC work? + */ + spin_lock_irqsave(&i8259A_lock, flags); + Dprintk("..TIMER check 8259 ints disabled, imr1:%02x, imr2:%02x\n", inb(0x21), inb(0xA1)); + tok = timer_irq_works(); + spin_unlock_irqrestore(&i8259A_lock, flags); + if (tok) { + if (nmi_watchdog == NMI_IO_APIC) { + disable_8259A_irq(0); + setup_nmi(); + enable_8259A_irq(0); + check_nmi_watchdog(); + } + printk(KERN_INFO "..TIMER: works OK on apic pin0 irq0\n" ); + return; + } + /* failed */ + timer_ack = saved_timer_ack; + clear_IO_APIC_pin(0, 0); + io_apic_set_pci_routing ( 0, pin1, 0, 0, 0); + printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to IO-APIC Pin 0\n"); + } +/* end new stuff for nforce2 */ +#endif + printk(KERN_INFO "...trying to set up timer (IRQ0) through the 8259A ... "); if (pin2 != -1) { printk("\n..... (found pin %d) ...", pin2); ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <200312132040.00875.ross@datscreative.com.au>]
* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered [not found] <200312132040.00875.ross@datscreative.com.au> @ 2003-12-13 12:00 ` Bob 2003-12-15 13:11 ` Maciej W. Rozycki 0 siblings, 1 reply; 7+ messages in thread From: Bob @ 2003-12-13 12:00 UTC (permalink / raw) To: linux-kernel udma133 with Award bios update and nforce2 APIC error on CPU0: 02(02) what?? no crash though. Ross Dickson wrote: >Hi Bob > >Jesse has award bios, see attached >Ross. > Months ago I thought using a 3ware card might help with nforce2 crashes so I gave up on promise and sii hd cards after a lot of experiments(hdparm, no lapic, no acpi, apic off in bios) and put in a 3ware card but I flashed the bios at the same time so didn't know if the 3ware card helped with the nforce2 crashing or not, since the bios flash did the job. With 3ware I couldn't use hdparm to see what udma settings the drives were set to. Now I can report. Just now I took the 3ware card out and went back to promise cards(using 4 hd's either method, 2 cd's on mboard amd74xx, onboard sata disabled). bob@where cat /proc/interrupts CPU0 0: 3350153 IO-APIC-edge timer 1: 5775 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 1 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 5385 IO-APIC-edge i8042 14: 10 IO-APIC-edge ide0 15: 10 IO-APIC-edge ide1 16: 1717957 IO-APIC-level ide2, ide3, eth0 19: 472929 IO-APIC-level ide4, ide5 21: 0 IO-APIC-level NVidia nForce2 NMI: 822 LOC: 3350073 ERR: 35 MIS: 15818 cd's on amd74xx onboard, amd74xx onboard is always solid, 4 ide hd's on two promise cards. not many nmi ticks without the better patch there. bonnie++ smooth, then hdparm up the settings, udma6, bonnie++ again, saw a few "APIC error on CPU0: 02(02)" but no lockup. not sure if data lost since it was a test. APIC error might be fixed by changing hdparm settings. This second test was with unmasked irq and udma6. I have to patch to get ioapic edge timer on. This 11/7/2003 updated award bios does not have a cpu disconnect option but it does eliminate the crashes with no patch and it is no longer impossible to use promise ide udma133 controller cards. MSI K7N2 Delta MCP2-T mboard I don't have the promise patch in yet, either, so the APIC error might be from that, or hdparm unmasked irq. -Bob ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-13 12:00 ` Fwd: " Bob @ 2003-12-15 13:11 ` Maciej W. Rozycki 0 siblings, 0 replies; 7+ messages in thread From: Maciej W. Rozycki @ 2003-12-15 13:11 UTC (permalink / raw) To: Bob; +Cc: linux-kernel On Sat, 13 Dec 2003, Bob wrote: > APIC error on CPU0: 02(02) > what?? no crash though. [...] > bob@where cat /proc/interrupts > CPU0 > 0: 3350153 IO-APIC-edge timer > 1: 5775 IO-APIC-edge i8042 > 2: 0 XT-PIC cascade > 8: 1 IO-APIC-edge rtc > 9: 0 IO-APIC-level acpi > 12: 5385 IO-APIC-edge i8042 > 14: 10 IO-APIC-edge ide0 > 15: 10 IO-APIC-edge ide1 > 16: 1717957 IO-APIC-level ide2, ide3, eth0 > 19: 472929 IO-APIC-level ide4, ide5 > 21: 0 IO-APIC-level NVidia nForce2 > NMI: 822 > LOC: 3350073 > ERR: 35 > MIS: 15818 It looks like the infamous APIC delivery bug -- the "MIS" counter shows how many level-triggered interrupts has been erronously delivered as edge-triggered ones. No wonder the system shows instability -- you have noise problems at the APIC bus. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--------------------------------------------------------------+ + e-mail: macro@ds2.pg.gda.pl, PGP key available + ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2003-12-16 6:07 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-12-15 14:30 Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered Ross Dickson
2003-12-15 15:02 ` Craig Bradney
2003-12-15 15:56 ` Maciej W. Rozycki
2003-12-15 16:54 ` Ross Dickson
2003-12-16 6:07 ` Bob
[not found] <200312132040.00875.ross@datscreative.com.au>
2003-12-13 12:00 ` Fwd: " Bob
2003-12-15 13:11 ` Maciej W. Rozycki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox