* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered [not found] <200312132040.00875.ross@datscreative.com.au> @ 2003-12-13 12:00 ` Bob 2003-12-15 13:11 ` Maciej W. Rozycki 0 siblings, 1 reply; 7+ messages in thread From: Bob @ 2003-12-13 12:00 UTC (permalink / raw) To: linux-kernel udma133 with Award bios update and nforce2 APIC error on CPU0: 02(02) what?? no crash though. Ross Dickson wrote: >Hi Bob > >Jesse has award bios, see attached >Ross. > Months ago I thought using a 3ware card might help with nforce2 crashes so I gave up on promise and sii hd cards after a lot of experiments(hdparm, no lapic, no acpi, apic off in bios) and put in a 3ware card but I flashed the bios at the same time so didn't know if the 3ware card helped with the nforce2 crashing or not, since the bios flash did the job. With 3ware I couldn't use hdparm to see what udma settings the drives were set to. Now I can report. Just now I took the 3ware card out and went back to promise cards(using 4 hd's either method, 2 cd's on mboard amd74xx, onboard sata disabled). bob@where cat /proc/interrupts CPU0 0: 3350153 IO-APIC-edge timer 1: 5775 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 1 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 5385 IO-APIC-edge i8042 14: 10 IO-APIC-edge ide0 15: 10 IO-APIC-edge ide1 16: 1717957 IO-APIC-level ide2, ide3, eth0 19: 472929 IO-APIC-level ide4, ide5 21: 0 IO-APIC-level NVidia nForce2 NMI: 822 LOC: 3350073 ERR: 35 MIS: 15818 cd's on amd74xx onboard, amd74xx onboard is always solid, 4 ide hd's on two promise cards. not many nmi ticks without the better patch there. bonnie++ smooth, then hdparm up the settings, udma6, bonnie++ again, saw a few "APIC error on CPU0: 02(02)" but no lockup. not sure if data lost since it was a test. APIC error might be fixed by changing hdparm settings. This second test was with unmasked irq and udma6. I have to patch to get ioapic edge timer on. This 11/7/2003 updated award bios does not have a cpu disconnect option but it does eliminate the crashes with no patch and it is no longer impossible to use promise ide udma133 controller cards. MSI K7N2 Delta MCP2-T mboard I don't have the promise patch in yet, either, so the APIC error might be from that, or hdparm unmasked irq. -Bob ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-13 12:00 ` Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered Bob @ 2003-12-15 13:11 ` Maciej W. Rozycki 2003-12-16 7:18 ` Bob 0 siblings, 1 reply; 7+ messages in thread From: Maciej W. Rozycki @ 2003-12-15 13:11 UTC (permalink / raw) To: Bob; +Cc: linux-kernel On Sat, 13 Dec 2003, Bob wrote: > APIC error on CPU0: 02(02) > what?? no crash though. [...] > bob@where cat /proc/interrupts > CPU0 > 0: 3350153 IO-APIC-edge timer > 1: 5775 IO-APIC-edge i8042 > 2: 0 XT-PIC cascade > 8: 1 IO-APIC-edge rtc > 9: 0 IO-APIC-level acpi > 12: 5385 IO-APIC-edge i8042 > 14: 10 IO-APIC-edge ide0 > 15: 10 IO-APIC-edge ide1 > 16: 1717957 IO-APIC-level ide2, ide3, eth0 > 19: 472929 IO-APIC-level ide4, ide5 > 21: 0 IO-APIC-level NVidia nForce2 > NMI: 822 > LOC: 3350073 > ERR: 35 > MIS: 15818 It looks like the infamous APIC delivery bug -- the "MIS" counter shows how many level-triggered interrupts has been erronously delivered as edge-triggered ones. No wonder the system shows instability -- you have noise problems at the APIC bus. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--------------------------------------------------------------+ + e-mail: macro@ds2.pg.gda.pl, PGP key available + ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-15 13:11 ` Maciej W. Rozycki @ 2003-12-16 7:18 ` Bob 0 siblings, 0 replies; 7+ messages in thread From: Bob @ 2003-12-16 7:18 UTC (permalink / raw) To: linux-kernel apic.c patch needs reload:%lu instead of %u ----------> printk("..APIC TIMER ack delay, reload:%lu, safe:%u\n", amd xp3000+, 1:1 333mhz fsb to ram, 166mhz cpu bus clock x dual channel 2-512mb pc3200 tested cas2 sticks, 1:1 fsb to ram for 333mhz, Award bios with update that works for non-crashing but not for edge timer without patch. MSI K7N2 Delta MCP2-T mbo linux-2.6.0-test11 This was with 3ware controller and unpatched 2.6.0-test11 Note low MIS score but PIC timer and no nmi-- CPU0 0: 244393560 XT-PIC timer 1: 31963 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 1 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 251884 IO-APIC-edge i8042 14: 22 IO-APIC-edge ide0 15: 24 IO-APIC-edge ide1 16: 4290216 IO-APIC-level 3ware Storage Controller, yenta, yenta 17: 5929405 IO-APIC-level eth0 21: 0 IO-APIC-level NVidia nForce2 NMI: 0 LOC: 244378698 ERR: 0 MIS: 6 Next is with the first edge timer patch, nmi_watchdog=2 works but =1 does not, MIS really high("noisy bus"), replacing 3ware with promise cards and hdparm udma133 causes apic error logged to console during bonnie++ test-- >>APIC error on CPU0: 02(02) >>what?? no crash though. >> >> >>bob@where cat /proc/interrupts >> CPU0 >> 0: 3350153 IO-APIC-edge timer >> 1: 5775 IO-APIC-edge i8042 >> 2: 0 XT-PIC cascade >> 8: 1 IO-APIC-edge rtc >> 9: 0 IO-APIC-level acpi >> 12: 5385 IO-APIC-edge i8042 >> 14: 10 IO-APIC-edge ide0 >> 15: 10 IO-APIC-edge ide1 >> 16: 1717957 IO-APIC-level ide2, ide3, eth0 >> 19: 472929 IO-APIC-level ide4, ide5 >> 21: 0 IO-APIC-level NVidia nForce2 >>NMI: 822 >>LOC: 3350073 >>ERR: 35 >>MIS: 15818 >> >> now with promise controllers again, new edge timer patch permits nmi_watchdog=1 not =2, lots of nmi ticks, MIS count is only half with first timer patch, NMI ticks = LOC? bob@where cat /proc/interrupts CPU0 0: 46188571 IO-APIC-edge timer 1: 12396 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 1 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 147429 IO-APIC-edge i8042 14: 10 IO-APIC-edge ide0 15: 10 IO-APIC-edge ide1 16: 1413705 IO-APIC-level ide2, ide3, eth0 17: 0 IO-APIC-level yenta, yenta 19: 258804 IO-APIC-level ide4, ide5 21: 0 IO-APIC-level NVidia nForce2 NMI: 46188592 LOC: 46188482 ERR: 36 MIS: 6877 Now I'll try 800UL/100ndelay to see if it helps with MIS count(pseudo-sci masochism), be back in a while. Oh, by the way, I set debug 1 in apic.h but I don't see anything, and I thought I saw a compile error flash by, so now I'll compile > logfile 2>&1 and might see why I don't see-- "..APIC TIMER ack delay, predelay count: 20769" I don't see any of that debug stuff. Maybe the compile errors I found were it, see my previous message about "unsigned in format", maybe printk needs %lu(I don't know hardly nuffing yet). I'm going to boot 800UL/100ndelay now. it needs reload:%lu instead of %u ----------> printk("..APIC TIMER ack delay, reload:%lu, safe:%u\n", Ross: "Can you also advise if your bios setting of the "C1 disconnect" is set" I can only guess by my 41C low load 48C high load temps exactly equal to range for "2.1Ghz 333mhz" of Ian Kumlien(his?) which is same speed as mine, that probably cpu disconnect is not on. I have no visible choice in setup for cpu disconnect. I'll try athcool to see how disconnect is set. Ross:"I have heard lockups are not supposed to happen at all if the fsb (host bus clock speed) matches the ddr speed. One of my systems went about 4 hours (xp2500 333fsb, DDR333) without the apic delay patch on a phoenix bios before lockup" A couple of months ago I was overly optimistic a couple of times before the bios update, and it seemed to work to use 1:1 and only amd74xx onboard hd controller, no hd cards, and pre-emptive, anticipatory sched not deadline, apic off in setup but on in linux, lapic off, acpi on. It was almost stable if using only one drive, but I really can't go without hd cards for software raid, so the first fsck on boot if using hd card, and crash. I could finesse stability by using options but never quite reach reliability without a bios update, and certain functions need patching, and I still have "MIS count, noisy bus" and agp8 crash(I can use the X nv driver and agpgart no problem, but not nvidia drivers for X and agp8). ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered @ 2003-12-15 14:30 Ross Dickson 2003-12-15 15:02 ` Craig Bradney 0 siblings, 1 reply; 7+ messages in thread From: Ross Dickson @ 2003-12-15 14:30 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: recbo, linux-kernel >> APIC error on CPU0: 02(02) > > what?? no crash though. > [...] > > bob@where cat /proc/interrupts > > CPU0 > > 0: 3350153 IO-APIC-edge timer > > 1: 5775 IO-APIC-edge i8042 > > 2: 0 XT-PIC cascade > > 8: 1 IO-APIC-edge rtc > > 9: 0 IO-APIC-level acpi > > 12: 5385 IO-APIC-edge i8042 > > 14: 10 IO-APIC-edge ide0 > > 15: 10 IO-APIC-edge ide1 > > 16: 1717957 IO-APIC-level ide2, ide3, eth0 > > 19: 472929 IO-APIC-level ide4, ide5 > > 21: 0 IO-APIC-level NVidia nForce2 > > NMI: 822 > > LOC: 3350073 > > ERR: 35 > > MIS: 15818 >It looks like the infamous APIC delivery bug -- the "MIS" counter shows >how many level-triggered interrupts has been erronously delivered as >edge-triggered ones. No wonder the system shows instability -- you have >noise problems at the APIC bus. Thanks Maciej I was wondering about those, I had seen the work around code and would not have thought it need apply to recent athlon chipsets? For comparison here is my proc/interrupts CPU0 0: 50462204 IO-APIC-edge timer 1: 49153 IO-APIC-edge keyboard 2: 0 XT-PIC cascade 9: 0 IO-APIC-level acpi 12: 395912 IO-APIC-edge PS/2 Mouse 14: 995872 IO-APIC-edge ide0 15: 283 IO-APIC-edge ide1 16: 3921102 IO-APIC-level nvidia 18: 2 IO-APIC-level bttv 20: 136325 IO-APIC-level eth0, usb-ohci 21: 146903 IO-APIC-level ehci_hcd, NVIDIA nForce Audio 22: 0 IO-APIC-level usb-ohci NMI: 0 LOC: 50457798 ERR: 0 MIS: 0 Albatron KM18G-Pro, nforce2, pheonix bios, 2200XP, 255fsb, ddr400, ide0 is hard drive, ide1 is cdrom, nmi watchdog off Report seems OK but this machine locks up hard without the apic delay patch. I am currently trying the simpler v1 (always add a delay) patch but on all apic acks as per this posting http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/3291.html which is a reply to an earlier posting of the same name but I accidently omitted the Re in the subject. Regards, Ross. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-15 14:30 Fwd: " Ross Dickson @ 2003-12-15 15:02 ` Craig Bradney 2003-12-15 15:56 ` Maciej W. Rozycki 2003-12-15 16:54 ` Ross Dickson 0 siblings, 2 replies; 7+ messages in thread From: Craig Bradney @ 2003-12-15 15:02 UTC (permalink / raw) To: ross; +Cc: Maciej W. Rozycki, recbo, linux-kernel Just to give the status here ... Im still running the original 2.6 test 11 patches for apic and ioapic. Uptime is now 2d 20h with lots of idle time and hard work too.. /proc/interrupts as follows: CPU0 0: 245382420 IO-APIC-edge timer 1: 139577 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 3 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 1478615 IO-APIC-edge i8042 14: 1055548 IO-APIC-edge ide0 15: 737664 IO-APIC-edge ide1 19: 18405692 IO-APIC-level radeon@PCI:3:0:0 21: 5257090 IO-APIC-level ehci_hcd, NVidia nForce2, eth0 22: 3 IO-APIC-level ohci1394 NMI: 14944 LOC: 245087891 ERR: 0 MIS: 6 As for NMI.. I actually forget which I booted from... I think =1, but NMI is a small number now.. would it have wrapped? Craig A7N8X Deluxe V2 BIOS 1007 On Mon, 2003-12-15 at 15:30, Ross Dickson wrote: > >> APIC error on CPU0: 02(02) > > > what?? no crash though. > > [...] > > > bob@where cat /proc/interrupts > > > CPU0 > > > 0: 3350153 IO-APIC-edge timer > > > 1: 5775 IO-APIC-edge i8042 > > > 2: 0 XT-PIC cascade > > > 8: 1 IO-APIC-edge rtc > > > 9: 0 IO-APIC-level acpi > > > 12: 5385 IO-APIC-edge i8042 > > > 14: 10 IO-APIC-edge ide0 > > > 15: 10 IO-APIC-edge ide1 > > > 16: 1717957 IO-APIC-level ide2, ide3, eth0 > > > 19: 472929 IO-APIC-level ide4, ide5 > > > 21: 0 IO-APIC-level NVidia nForce2 > > > NMI: 822 > > > LOC: 3350073 > > > ERR: 35 > > > MIS: 15818 > > >It looks like the infamous APIC delivery bug -- the "MIS" counter shows > >how many level-triggered interrupts has been erronously delivered as > >edge-triggered ones. No wonder the system shows instability -- you have > >noise problems at the APIC bus. > > Thanks Maciej > I was wondering about those, I had seen the work around code and would not > have thought it need apply to recent athlon chipsets? > > > For comparison here is my proc/interrupts > CPU0 > 0: 50462204 IO-APIC-edge timer > 1: 49153 IO-APIC-edge keyboard > 2: 0 XT-PIC cascade > 9: 0 IO-APIC-level acpi > 12: 395912 IO-APIC-edge PS/2 Mouse > 14: 995872 IO-APIC-edge ide0 > 15: 283 IO-APIC-edge ide1 > 16: 3921102 IO-APIC-level nvidia > 18: 2 IO-APIC-level bttv > 20: 136325 IO-APIC-level eth0, usb-ohci > 21: 146903 IO-APIC-level ehci_hcd, NVIDIA nForce Audio > 22: 0 IO-APIC-level usb-ohci > NMI: 0 > LOC: 50457798 > ERR: 0 > MIS: 0 > > Albatron KM18G-Pro, nforce2, pheonix bios, 2200XP, 255fsb, ddr400, > ide0 is hard drive, ide1 is cdrom, nmi watchdog off > > Report seems OK but this machine locks up hard without the apic delay patch. > > I am currently trying the simpler v1 (always add a delay) patch but on all apic > acks as per this posting > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/3291.html > > which is a reply to an earlier posting of the same name but I accidently > omitted the Re in the subject. > > Regards, > Ross. > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-15 15:02 ` Craig Bradney @ 2003-12-15 15:56 ` Maciej W. Rozycki 2003-12-15 16:54 ` Ross Dickson 1 sibling, 0 replies; 7+ messages in thread From: Maciej W. Rozycki @ 2003-12-15 15:56 UTC (permalink / raw) To: Craig Bradney; +Cc: ross, recbo, linux-kernel On Mon, 15 Dec 2003, Craig Bradney wrote: > CPU0 > 0: 245382420 IO-APIC-edge timer > 1: 139577 IO-APIC-edge i8042 > 2: 0 XT-PIC cascade > 8: 3 IO-APIC-edge rtc > 9: 0 IO-APIC-level acpi > 12: 1478615 IO-APIC-edge i8042 > 14: 1055548 IO-APIC-edge ide0 > 15: 737664 IO-APIC-edge ide1 > 19: 18405692 IO-APIC-level radeon@PCI:3:0:0 > 21: 5257090 IO-APIC-level ehci_hcd, NVidia nForce2, eth0 > 22: 3 IO-APIC-level ohci1394 > NMI: 14944 > LOC: 245087891 > ERR: 0 > MIS: 6 > > As for NMI.. I actually forget which I booted from... I think =1, but NMI is a small number now.. would it have wrapped? That's "=2" -- otherwise the NMI count would be rougly the same as the sum of counts for IRQ 0 for all processors. And you can actually get your kernel's command line from /proc/cmdline. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--------------------------------------------------------------+ + e-mail: macro@ds2.pg.gda.pl, PGP key available + ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-15 15:02 ` Craig Bradney 2003-12-15 15:56 ` Maciej W. Rozycki @ 2003-12-15 16:54 ` Ross Dickson 1 sibling, 0 replies; 7+ messages in thread From: Ross Dickson @ 2003-12-15 16:54 UTC (permalink / raw) To: Craig Bradney; +Cc: recbo, linux-kernel, Ian Kumlien On Tuesday 16 December 2003 01:02, you wrote: > Just to give the status here ... > Im still running the original 2.6 test 11 patches for apic and ioapic. > Uptime is now 2d 20h with lots of idle time and hard work too.. > > /proc/interrupts as follows: > > CPU0 > 0: 245382420 IO-APIC-edge timer > 1: 139577 IO-APIC-edge i8042 > 2: 0 XT-PIC cascade > 8: 3 IO-APIC-edge rtc > 9: 0 IO-APIC-level acpi > 12: 1478615 IO-APIC-edge i8042 > 14: 1055548 IO-APIC-edge ide0 > 15: 737664 IO-APIC-edge ide1 > 19: 18405692 IO-APIC-level radeon@PCI:3:0:0 > 21: 5257090 IO-APIC-level ehci_hcd, NVidia nForce2, eth0 > 22: 3 IO-APIC-level ohci1394 > NMI: 14944 > LOC: 245087891 > ERR: 0 > MIS: 6 Uptime sounds good so far. I am not convinced my v2 apic patch is a great overall improvement, I am thinking v1 apic, is safer for now. Having said that Ian Kumlien currently has an uptime of 1 day, 15 hours + on v2 patches but with the apic delay timeout increased from 600UL to 800UL. He has a Barton core - see below. > > Craig > A7N8X Deluxe V2 BIOS 1007 > > <snip> > > I am currently trying the simpler v1 (always add a delay) patch but on all apic > > acks as per this posting > > > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/3291.html > > > > which is a reply to an earlier posting of the same name but I accidently > > omitted the Re in the subject. > > I don't think it is necessary to put the delay in all apic acks - I just tried it to see if it worked and have not yet put my code back the way it was. My hard lockups went away with the original v1 apic timer delay patch anyway. Please note in that (above) posting I write that I stuffed up the #ifdefs in my v1 and v2 patches and adjust code accordingly. Patches worked but were only testing on the first config item after #ifdef apic code should have had #if defined(CONFIG_MK7) && defined(CONFIG_BLK_DEV_AMD74XX) ioapic code should have had #if defined(CONFIG_ACPI_BOOT) && defined(CONFIG_X86_UP_IOAPIC) Brief summary at this point 1) 2? reports are in that latest award bios with "C1 disconnect" set to "auto?" may remove need for apic ack delay patch and still keep cpu thermo managed 2) apic ack delay v1 patch seems safe for all cpu cores but introduces a small delay of about half the time of an XTPIC access on each apic timer interrupt 3) apic ack delay v2 patch seems safe only on barton cores and gives more debugging info and wastes less time than apic v1 patch 4) io-apic v2 patch gives more debugging info but functions same as io-apic v1 patch Regards Ross ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2003-12-16 7:18 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200312132040.00875.ross@datscreative.com.au>
2003-12-13 12:00 ` Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered Bob
2003-12-15 13:11 ` Maciej W. Rozycki
2003-12-16 7:18 ` Bob
2003-12-15 14:30 Fwd: " Ross Dickson
2003-12-15 15:02 ` Craig Bradney
2003-12-15 15:56 ` Maciej W. Rozycki
2003-12-15 16:54 ` Ross Dickson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox