* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered @ 2003-12-13 5:16 Ross Dickson 2003-12-13 6:04 ` Jesse Allen 0 siblings, 1 reply; 13+ messages in thread From: Ross Dickson @ 2003-12-13 5:16 UTC (permalink / raw) To: cbradney; +Cc: linux-kernel, AMartin, Ian Kumlien <snip> >> The thing that strikes me funny is that you get no crashes with the > > updated BIOS and Disconnect on, but without the updated BIOS we have > > to turn disconnect off with athcool or the patch? This makes me think > > that there is some voodoo going on in the BIOS update that they aren't > > saying, surprise surprise, or something is just slowing down the time > > it takes for it to crash. I say this because I have gone 5+ days > > without any of the patches from these threads, acpi apic lapic > > enabled, and CPU disconnect on as stated by athcool. This was with > > much stress testing, idle time, etc. One day I just ran a grep that I > > have done probably 30 times and boom, hang. >> Good luck, hope the BIOS is the trick, now off to see how I can get > > ASUS to put the C1 Disconnect in the next revision. >Yes, thats how it was for me.. I was the only one here saying "no > problems, la la la", then at about 5.25 days.. boom. Then the next day > it crashed twice. Hopefully you make some progress with ASUS.. (for the > A7N8X Deluxe as well as you mobo please :) ). >Ive been playing with hardware in the past few days (new quieter Zalman > PSU, and Zalman 7000 Cu fan etc) so no uptime to speak of here now. I > did compile KDE 3.2 beta 2 last night though.. 6 hours of solid > compilation.. no hassles. I have never turned off Disconnect either. >Thanks to all you guys who are working on this one. Seems to be getting > somewhere. >Craig I wonder about the "voodoo" because my apic ack delay patch was developed without knowledge of the C1 disconnect bit and reports I have received so far are that the hard lockups go away when using it independent of the state of the disconnect bit. Apparently the bit was on in my test systems. Ian Kumlien pointed out the linkage with the northbridge timing signals to the CPU to do with the connect disconnect handshake so I now wonder just how programmable the nforce2 northbridge is? Is it a bit fpga'ish in that they may be using the bios boot to alter the handshake timing enough to accomplish what the ack delay does but like it should be - transparent to the OS? Of course they -the makers- have access to knowledge we don't so it could be something completely different that they are doing! In short I agree with the suggestion that the new bios options do more behind the scenes than what the athcool and disconnect patches do. I am pretty sure that I read somewhere that when the epox boards were first released the epox 8rda bios started out with it (the disconnect bit) off then the 8rga+ came out with it on by default? So back then people were wanting to turn it on in the 8rda to lower their CPU temperature - now some want it off in search of stability? Back then under win.... some experienced lockups depending on which IDE driver was used and which state the bit was in! Out of interest has anyone seen new disconnect bit options in the Pheonix bios or only in the award bios? Finally I have done some more work and found that the ack delay patch on my system is about 13 apic timer counts, about half that required to write a byte directly outb(0x00, 0x378) to the printer port at 28 apic timer counts. So the ack delay is about twice as quick as writing a single EOI to the 8259 in XTPIC mode provided the 8259 accesses are not souped up under the hood. In other words whilst it is a timing hit it is not much of one and it won't be needed once this is all fixed by the respective manufacturers -lets hope they can do it on the hardware we have already bought. Regards Ross Dickson ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-13 5:16 Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered Ross Dickson @ 2003-12-13 6:04 ` Jesse Allen 0 siblings, 0 replies; 13+ messages in thread From: Jesse Allen @ 2003-12-13 6:04 UTC (permalink / raw) To: Ross Dickson; +Cc: AMartin, linux-kernel On Sat, Dec 13, 2003 at 03:16:51PM +1000, Ross Dickson wrote: > I wonder about the "voodoo" because my apic ack delay patch was developed > without knowledge of the C1 disconnect bit and reports I have received so far > are that the hard lockups go away when using it independent of the state of the > disconnect bit. Apparently the bit was on in my test systems. > > Ian Kumlien pointed out the linkage with the northbridge timing signals > to the CPU to do with the connect disconnect handshake This is what the item help for C1 Disconnect in my BIOS said: "Force En/Disabled or Auto mode: C17 IGP/SPP NB A03 C18D SPP NB A01 (C01) enabled C1 disconnect otherwise disabled it" I was thinking NB referred to northbridge. SPP is the type of NForce chip. IGP would be a graphics chip(?), though this board don't have that. So yes, we do have at least some relationship with the northbridge and disconnect. This BIOS update probably addressed that, and the BIOS changelog is just a summary. > so I now wonder just how > programmable the nforce2 northbridge is? Is it a bit fpga'ish in that they may be > using the bios boot to alter the handshake timing enough to accomplish what > the ack delay does but like it should be - transparent to the OS? Probably. That's what I'm thinking too now. > > Of course they -the makers- have access to knowledge we don't so it could be > something completely different that they are doing! > > In short I agree with the suggestion that the new bios options do more behind > the scenes than what the athcool and disconnect patches do. That's why I'm trying to contact shuttle. > > I am pretty sure that I read somewhere that when the epox boards > were first released the epox 8rda bios started out with it (the disconnect bit) off > then the 8rga+ came out with it on by default? So back then people were wanting > to turn it on in the 8rda to lower their CPU temperature - now some want it off > in search of stability? Ah, that reminds me. The very first day I ran this board last week, I was very worried on how high the system temperature was getting -- above 40 deg C. CPU was getting up to 49 deg C. Not that it was locking up because of temperature - it would on a cold-boot - but that I was experiencing lock ups and higher than normal temperatures which indicates to me now on how poorly it's thermal management was operating then. Now with the new patches, and ultimately, BIOS update, system temperature is about 35 deg C, which aint too bad =) > Back then under win.... some experienced lockups depending > on which IDE driver was used and which state the bit was in! Good point! I was reading some message boards discussing nforce2s yesterday. And they pretty much unaminiously said, don't use NForce IDE driver, use windows provided IDE driver, because the NForce IDE _locks up_. So windows does have the same problem after all. I wouldn't know because I don't have windows... but you can find this same issue everywhere then. > > Out of interest has anyone seen new disconnect bit options in the Pheonix bios or > only in the award bios? I have an award bios. > > Finally I have done some more work and found that the ack delay patch on my > system is about 13 apic timer counts, about half that required to write a byte > directly outb(0x00, 0x378) to the printer port at 28 apic timer counts. > So the ack delay is about twice as quick as writing a single EOI to the 8259 in > XTPIC mode provided the 8259 accesses are not souped up under the hood. > In other words whilst it is a timing hit it is not much of one and it won't be > needed once this is all fixed by the respective manufacturers -lets hope they > can do it on the hardware we have already bought. > > Regards > Ross Dickson > > Good work. Lets hope the hardware manufacturers come through. Jesse ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <200312132040.00875.ross@datscreative.com.au>]
* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered [not found] <200312132040.00875.ross@datscreative.com.au> @ 2003-12-13 12:00 ` Bob 2003-12-15 13:11 ` Maciej W. Rozycki 0 siblings, 1 reply; 13+ messages in thread From: Bob @ 2003-12-13 12:00 UTC (permalink / raw) To: linux-kernel udma133 with Award bios update and nforce2 APIC error on CPU0: 02(02) what?? no crash though. Ross Dickson wrote: >Hi Bob > >Jesse has award bios, see attached >Ross. > Months ago I thought using a 3ware card might help with nforce2 crashes so I gave up on promise and sii hd cards after a lot of experiments(hdparm, no lapic, no acpi, apic off in bios) and put in a 3ware card but I flashed the bios at the same time so didn't know if the 3ware card helped with the nforce2 crashing or not, since the bios flash did the job. With 3ware I couldn't use hdparm to see what udma settings the drives were set to. Now I can report. Just now I took the 3ware card out and went back to promise cards(using 4 hd's either method, 2 cd's on mboard amd74xx, onboard sata disabled). bob@where cat /proc/interrupts CPU0 0: 3350153 IO-APIC-edge timer 1: 5775 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 1 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 5385 IO-APIC-edge i8042 14: 10 IO-APIC-edge ide0 15: 10 IO-APIC-edge ide1 16: 1717957 IO-APIC-level ide2, ide3, eth0 19: 472929 IO-APIC-level ide4, ide5 21: 0 IO-APIC-level NVidia nForce2 NMI: 822 LOC: 3350073 ERR: 35 MIS: 15818 cd's on amd74xx onboard, amd74xx onboard is always solid, 4 ide hd's on two promise cards. not many nmi ticks without the better patch there. bonnie++ smooth, then hdparm up the settings, udma6, bonnie++ again, saw a few "APIC error on CPU0: 02(02)" but no lockup. not sure if data lost since it was a test. APIC error might be fixed by changing hdparm settings. This second test was with unmasked irq and udma6. I have to patch to get ioapic edge timer on. This 11/7/2003 updated award bios does not have a cpu disconnect option but it does eliminate the crashes with no patch and it is no longer impossible to use promise ide udma133 controller cards. MSI K7N2 Delta MCP2-T mboard I don't have the promise patch in yet, either, so the APIC error might be from that, or hdparm unmasked irq. -Bob ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-13 12:00 ` Fwd: " Bob @ 2003-12-15 13:11 ` Maciej W. Rozycki 2003-12-16 7:18 ` Bob 0 siblings, 1 reply; 13+ messages in thread From: Maciej W. Rozycki @ 2003-12-15 13:11 UTC (permalink / raw) To: Bob; +Cc: linux-kernel On Sat, 13 Dec 2003, Bob wrote: > APIC error on CPU0: 02(02) > what?? no crash though. [...] > bob@where cat /proc/interrupts > CPU0 > 0: 3350153 IO-APIC-edge timer > 1: 5775 IO-APIC-edge i8042 > 2: 0 XT-PIC cascade > 8: 1 IO-APIC-edge rtc > 9: 0 IO-APIC-level acpi > 12: 5385 IO-APIC-edge i8042 > 14: 10 IO-APIC-edge ide0 > 15: 10 IO-APIC-edge ide1 > 16: 1717957 IO-APIC-level ide2, ide3, eth0 > 19: 472929 IO-APIC-level ide4, ide5 > 21: 0 IO-APIC-level NVidia nForce2 > NMI: 822 > LOC: 3350073 > ERR: 35 > MIS: 15818 It looks like the infamous APIC delivery bug -- the "MIS" counter shows how many level-triggered interrupts has been erronously delivered as edge-triggered ones. No wonder the system shows instability -- you have noise problems at the APIC bus. -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--------------------------------------------------------------+ + e-mail: macro@ds2.pg.gda.pl, PGP key available + ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-15 13:11 ` Maciej W. Rozycki @ 2003-12-16 7:18 ` Bob 0 siblings, 0 replies; 13+ messages in thread From: Bob @ 2003-12-16 7:18 UTC (permalink / raw) To: linux-kernel apic.c patch needs reload:%lu instead of %u ----------> printk("..APIC TIMER ack delay, reload:%lu, safe:%u\n", amd xp3000+, 1:1 333mhz fsb to ram, 166mhz cpu bus clock x dual channel 2-512mb pc3200 tested cas2 sticks, 1:1 fsb to ram for 333mhz, Award bios with update that works for non-crashing but not for edge timer without patch. MSI K7N2 Delta MCP2-T mbo linux-2.6.0-test11 This was with 3ware controller and unpatched 2.6.0-test11 Note low MIS score but PIC timer and no nmi-- CPU0 0: 244393560 XT-PIC timer 1: 31963 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 1 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 251884 IO-APIC-edge i8042 14: 22 IO-APIC-edge ide0 15: 24 IO-APIC-edge ide1 16: 4290216 IO-APIC-level 3ware Storage Controller, yenta, yenta 17: 5929405 IO-APIC-level eth0 21: 0 IO-APIC-level NVidia nForce2 NMI: 0 LOC: 244378698 ERR: 0 MIS: 6 Next is with the first edge timer patch, nmi_watchdog=2 works but =1 does not, MIS really high("noisy bus"), replacing 3ware with promise cards and hdparm udma133 causes apic error logged to console during bonnie++ test-- >>APIC error on CPU0: 02(02) >>what?? no crash though. >> >> >>bob@where cat /proc/interrupts >> CPU0 >> 0: 3350153 IO-APIC-edge timer >> 1: 5775 IO-APIC-edge i8042 >> 2: 0 XT-PIC cascade >> 8: 1 IO-APIC-edge rtc >> 9: 0 IO-APIC-level acpi >> 12: 5385 IO-APIC-edge i8042 >> 14: 10 IO-APIC-edge ide0 >> 15: 10 IO-APIC-edge ide1 >> 16: 1717957 IO-APIC-level ide2, ide3, eth0 >> 19: 472929 IO-APIC-level ide4, ide5 >> 21: 0 IO-APIC-level NVidia nForce2 >>NMI: 822 >>LOC: 3350073 >>ERR: 35 >>MIS: 15818 >> >> now with promise controllers again, new edge timer patch permits nmi_watchdog=1 not =2, lots of nmi ticks, MIS count is only half with first timer patch, NMI ticks = LOC? bob@where cat /proc/interrupts CPU0 0: 46188571 IO-APIC-edge timer 1: 12396 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 1 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 147429 IO-APIC-edge i8042 14: 10 IO-APIC-edge ide0 15: 10 IO-APIC-edge ide1 16: 1413705 IO-APIC-level ide2, ide3, eth0 17: 0 IO-APIC-level yenta, yenta 19: 258804 IO-APIC-level ide4, ide5 21: 0 IO-APIC-level NVidia nForce2 NMI: 46188592 LOC: 46188482 ERR: 36 MIS: 6877 Now I'll try 800UL/100ndelay to see if it helps with MIS count(pseudo-sci masochism), be back in a while. Oh, by the way, I set debug 1 in apic.h but I don't see anything, and I thought I saw a compile error flash by, so now I'll compile > logfile 2>&1 and might see why I don't see-- "..APIC TIMER ack delay, predelay count: 20769" I don't see any of that debug stuff. Maybe the compile errors I found were it, see my previous message about "unsigned in format", maybe printk needs %lu(I don't know hardly nuffing yet). I'm going to boot 800UL/100ndelay now. it needs reload:%lu instead of %u ----------> printk("..APIC TIMER ack delay, reload:%lu, safe:%u\n", Ross: "Can you also advise if your bios setting of the "C1 disconnect" is set" I can only guess by my 41C low load 48C high load temps exactly equal to range for "2.1Ghz 333mhz" of Ian Kumlien(his?) which is same speed as mine, that probably cpu disconnect is not on. I have no visible choice in setup for cpu disconnect. I'll try athcool to see how disconnect is set. Ross:"I have heard lockups are not supposed to happen at all if the fsb (host bus clock speed) matches the ddr speed. One of my systems went about 4 hours (xp2500 333fsb, DDR333) without the apic delay patch on a phoenix bios before lockup" A couple of months ago I was overly optimistic a couple of times before the bios update, and it seemed to work to use 1:1 and only amd74xx onboard hd controller, no hd cards, and pre-emptive, anticipatory sched not deadline, apic off in setup but on in linux, lapic off, acpi on. It was almost stable if using only one drive, but I really can't go without hd cards for software raid, so the first fsck on boot if using hd card, and crash. I could finesse stability by using options but never quite reach reliability without a bios update, and certain functions need patching, and I still have "MIS count, noisy bus" and agp8 crash(I can use the X nv driver and agpgart no problem, but not nvidia drivers for X and agp8). ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered @ 2003-12-15 14:30 Ross Dickson 2003-12-15 15:02 ` Craig Bradney 0 siblings, 1 reply; 13+ messages in thread From: Ross Dickson @ 2003-12-15 14:30 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: recbo, linux-kernel >> APIC error on CPU0: 02(02) > > what?? no crash though. > [...] > > bob@where cat /proc/interrupts > > CPU0 > > 0: 3350153 IO-APIC-edge timer > > 1: 5775 IO-APIC-edge i8042 > > 2: 0 XT-PIC cascade > > 8: 1 IO-APIC-edge rtc > > 9: 0 IO-APIC-level acpi > > 12: 5385 IO-APIC-edge i8042 > > 14: 10 IO-APIC-edge ide0 > > 15: 10 IO-APIC-edge ide1 > > 16: 1717957 IO-APIC-level ide2, ide3, eth0 > > 19: 472929 IO-APIC-level ide4, ide5 > > 21: 0 IO-APIC-level NVidia nForce2 > > NMI: 822 > > LOC: 3350073 > > ERR: 35 > > MIS: 15818 >It looks like the infamous APIC delivery bug -- the "MIS" counter shows >how many level-triggered interrupts has been erronously delivered as >edge-triggered ones. No wonder the system shows instability -- you have >noise problems at the APIC bus. Thanks Maciej I was wondering about those, I had seen the work around code and would not have thought it need apply to recent athlon chipsets? For comparison here is my proc/interrupts CPU0 0: 50462204 IO-APIC-edge timer 1: 49153 IO-APIC-edge keyboard 2: 0 XT-PIC cascade 9: 0 IO-APIC-level acpi 12: 395912 IO-APIC-edge PS/2 Mouse 14: 995872 IO-APIC-edge ide0 15: 283 IO-APIC-edge ide1 16: 3921102 IO-APIC-level nvidia 18: 2 IO-APIC-level bttv 20: 136325 IO-APIC-level eth0, usb-ohci 21: 146903 IO-APIC-level ehci_hcd, NVIDIA nForce Audio 22: 0 IO-APIC-level usb-ohci NMI: 0 LOC: 50457798 ERR: 0 MIS: 0 Albatron KM18G-Pro, nforce2, pheonix bios, 2200XP, 255fsb, ddr400, ide0 is hard drive, ide1 is cdrom, nmi watchdog off Report seems OK but this machine locks up hard without the apic delay patch. I am currently trying the simpler v1 (always add a delay) patch but on all apic acks as per this posting http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/3291.html which is a reply to an earlier posting of the same name but I accidently omitted the Re in the subject. Regards, Ross. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-15 14:30 Fwd: " Ross Dickson @ 2003-12-15 15:02 ` Craig Bradney 2003-12-15 16:54 ` Ross Dickson 0 siblings, 1 reply; 13+ messages in thread From: Craig Bradney @ 2003-12-15 15:02 UTC (permalink / raw) To: ross; +Cc: Maciej W. Rozycki, recbo, linux-kernel Just to give the status here ... Im still running the original 2.6 test 11 patches for apic and ioapic. Uptime is now 2d 20h with lots of idle time and hard work too.. /proc/interrupts as follows: CPU0 0: 245382420 IO-APIC-edge timer 1: 139577 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 3 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 1478615 IO-APIC-edge i8042 14: 1055548 IO-APIC-edge ide0 15: 737664 IO-APIC-edge ide1 19: 18405692 IO-APIC-level radeon@PCI:3:0:0 21: 5257090 IO-APIC-level ehci_hcd, NVidia nForce2, eth0 22: 3 IO-APIC-level ohci1394 NMI: 14944 LOC: 245087891 ERR: 0 MIS: 6 As for NMI.. I actually forget which I booted from... I think =1, but NMI is a small number now.. would it have wrapped? Craig A7N8X Deluxe V2 BIOS 1007 On Mon, 2003-12-15 at 15:30, Ross Dickson wrote: > >> APIC error on CPU0: 02(02) > > > what?? no crash though. > > [...] > > > bob@where cat /proc/interrupts > > > CPU0 > > > 0: 3350153 IO-APIC-edge timer > > > 1: 5775 IO-APIC-edge i8042 > > > 2: 0 XT-PIC cascade > > > 8: 1 IO-APIC-edge rtc > > > 9: 0 IO-APIC-level acpi > > > 12: 5385 IO-APIC-edge i8042 > > > 14: 10 IO-APIC-edge ide0 > > > 15: 10 IO-APIC-edge ide1 > > > 16: 1717957 IO-APIC-level ide2, ide3, eth0 > > > 19: 472929 IO-APIC-level ide4, ide5 > > > 21: 0 IO-APIC-level NVidia nForce2 > > > NMI: 822 > > > LOC: 3350073 > > > ERR: 35 > > > MIS: 15818 > > >It looks like the infamous APIC delivery bug -- the "MIS" counter shows > >how many level-triggered interrupts has been erronously delivered as > >edge-triggered ones. No wonder the system shows instability -- you have > >noise problems at the APIC bus. > > Thanks Maciej > I was wondering about those, I had seen the work around code and would not > have thought it need apply to recent athlon chipsets? > > > For comparison here is my proc/interrupts > CPU0 > 0: 50462204 IO-APIC-edge timer > 1: 49153 IO-APIC-edge keyboard > 2: 0 XT-PIC cascade > 9: 0 IO-APIC-level acpi > 12: 395912 IO-APIC-edge PS/2 Mouse > 14: 995872 IO-APIC-edge ide0 > 15: 283 IO-APIC-edge ide1 > 16: 3921102 IO-APIC-level nvidia > 18: 2 IO-APIC-level bttv > 20: 136325 IO-APIC-level eth0, usb-ohci > 21: 146903 IO-APIC-level ehci_hcd, NVIDIA nForce Audio > 22: 0 IO-APIC-level usb-ohci > NMI: 0 > LOC: 50457798 > ERR: 0 > MIS: 0 > > Albatron KM18G-Pro, nforce2, pheonix bios, 2200XP, 255fsb, ddr400, > ide0 is hard drive, ide1 is cdrom, nmi watchdog off > > Report seems OK but this machine locks up hard without the apic delay patch. > > I am currently trying the simpler v1 (always add a delay) patch but on all apic > acks as per this posting > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/3291.html > > which is a reply to an earlier posting of the same name but I accidently > omitted the Re in the subject. > > Regards, > Ross. > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fwd: Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-15 15:02 ` Craig Bradney @ 2003-12-15 16:54 ` Ross Dickson 2003-12-16 6:07 ` Bob 0 siblings, 1 reply; 13+ messages in thread From: Ross Dickson @ 2003-12-15 16:54 UTC (permalink / raw) To: Craig Bradney; +Cc: recbo, linux-kernel, Ian Kumlien On Tuesday 16 December 2003 01:02, you wrote: > Just to give the status here ... > Im still running the original 2.6 test 11 patches for apic and ioapic. > Uptime is now 2d 20h with lots of idle time and hard work too.. > > /proc/interrupts as follows: > > CPU0 > 0: 245382420 IO-APIC-edge timer > 1: 139577 IO-APIC-edge i8042 > 2: 0 XT-PIC cascade > 8: 3 IO-APIC-edge rtc > 9: 0 IO-APIC-level acpi > 12: 1478615 IO-APIC-edge i8042 > 14: 1055548 IO-APIC-edge ide0 > 15: 737664 IO-APIC-edge ide1 > 19: 18405692 IO-APIC-level radeon@PCI:3:0:0 > 21: 5257090 IO-APIC-level ehci_hcd, NVidia nForce2, eth0 > 22: 3 IO-APIC-level ohci1394 > NMI: 14944 > LOC: 245087891 > ERR: 0 > MIS: 6 Uptime sounds good so far. I am not convinced my v2 apic patch is a great overall improvement, I am thinking v1 apic, is safer for now. Having said that Ian Kumlien currently has an uptime of 1 day, 15 hours + on v2 patches but with the apic delay timeout increased from 600UL to 800UL. He has a Barton core - see below. > > Craig > A7N8X Deluxe V2 BIOS 1007 > > <snip> > > I am currently trying the simpler v1 (always add a delay) patch but on all apic > > acks as per this posting > > > > http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/3291.html > > > > which is a reply to an earlier posting of the same name but I accidently > > omitted the Re in the subject. > > I don't think it is necessary to put the delay in all apic acks - I just tried it to see if it worked and have not yet put my code back the way it was. My hard lockups went away with the original v1 apic timer delay patch anyway. Please note in that (above) posting I write that I stuffed up the #ifdefs in my v1 and v2 patches and adjust code accordingly. Patches worked but were only testing on the first config item after #ifdef apic code should have had #if defined(CONFIG_MK7) && defined(CONFIG_BLK_DEV_AMD74XX) ioapic code should have had #if defined(CONFIG_ACPI_BOOT) && defined(CONFIG_X86_UP_IOAPIC) Brief summary at this point 1) 2? reports are in that latest award bios with "C1 disconnect" set to "auto?" may remove need for apic ack delay patch and still keep cpu thermo managed 2) apic ack delay v1 patch seems safe for all cpu cores but introduces a small delay of about half the time of an XTPIC access on each apic timer interrupt 3) apic ack delay v2 patch seems safe only on barton cores and gives more debugging info and wastes less time than apic v1 patch 4) io-apic v2 patch gives more debugging info but functions same as io-apic v1 patch Regards Ross ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-15 16:54 ` Ross Dickson @ 2003-12-16 6:07 ` Bob 0 siblings, 0 replies; 13+ messages in thread From: Bob @ 2003-12-16 6:07 UTC (permalink / raw) To: linux-kernel Ross, my_make_script nf2-800UL 2>&1 | tee /tmp/make.err #/tmp/make.err <snip> CC arch/i386/kernel/apic.o arch/i386/kernel/apic.c: In function `smp_apic_timer_interrupt': arch/i386/kernel/apic.c:1105: warning: unsigned int format, long unsigned int arg (arg 2) ...which is around the printk line here-- printk("..APIC TIMER ack delay, reload:%u, safe:%u\n", + if(!passno) { /* calculate timing */ + safecnt = apic_read(APIC_TMICT) - + ( (800UL * apic_read(APIC_TMICT) ) / + (1000000000UL/HZ) ); + printk("..APIC TIMER ack delay, reload:%u, safe:%u\n", + apic_read(APIC_TMICT), safecnt); + passno++; Here are the two patches with "#ifdef N" to "#if defined(N)" change but not the unsigned int change -- diff -urN linux-2.6.0-test11/arch/i386/kernel/apic.c linux-2.6.0-test11-nf2/arch/i386/kernel/apic.c --- linux-2.6.0-test11/arch/i386/kernel/apic.c 2003-11-26 15:46:07.000000000 -0500 +++ linux-2.6.0-test11-nf2/arch/i386/kernel/apic.c 2003-12-13 23:48:30.000000000 -0500 @@ -1089,6 +1089,37 @@ */ irq_stat[cpu].apic_timer_irqs++; +#if defined(CONFIG_MK7) && defined(CONFIG_BLK_DEV_AMD74XX) + /* + * on 2200XP & nforce2 chipset we need 600ns? 800? 1000? 1100? + * from timer irq start to apic irq ack to prevent + * hard lockups, use apic timer itself. + * C1 disconnect bit related. Ross Dickson. + */ + { + static unsigned int passno, safecnt; + if(!passno) { /* calculate timing */ + safecnt = apic_read(APIC_TMICT) - + ( (800UL * apic_read(APIC_TMICT) ) / + (1000000000UL/HZ) ); + printk("..APIC TIMER ack delay, reload:%u, safe:%u\n", + apic_read(APIC_TMICT), safecnt); + passno++; + } +#if APIC_DEBUG + if(passno<12) { + unsigned int at1 = apic_read(APIC_TMCCT); + if( passno > 1 ) + Dprintk("..APIC TIMER ack delay, predelay count:%u \n", at1 ); + passno++; + } +#endif + /* delay only if required */ + while( apic_read(APIC_TMCCT) > safecnt ) + ndelay(100); + } +#endif + /* * NOTE! We'd better ACK the irq immediately, * because timer handling can be slow.*/ diff -urN linux-2.6.0-test11/arch/i386/kernel/io_apic.c linux-2.6.0-test11-nf2/arch/i386/kernel/io_apic.c --- linux-2.6.0-test11/arch/i386/kernel/io_apic.c 2003-11-26 15:43:32.000000000 -0500 +++ linux-2.6.0-test11-nf2/arch/i386/kernel/io_apic.c 2003-12-13 15:14:25.000000000 -0500 @@ -2128,6 +2128,54 @@ printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to IO-APIC\n"); } +#if defined (CONFIG_ACPI_BOOT) && (CONFIG_X86_UP_IOAPIC) + /* for nforce2 try vector 0 on pin0 + * Note 8259a is already masked, also by default + * the io_apic_set_pci_routing call disables the 8259 irq 0 + * so we must be connected directly to the 8254 timer if this works + * Note2: this violates the above comment re Subtle but works! + */ + printk(KERN_INFO "..TIMER: Is timer irq0 connected to IOAPIC Pin0? ...\n"); + if (pin1 != -1) { + extern spinlock_t i8259A_lock; + unsigned long flags; + int tok, saved_timer_ack = timer_ack; + /* + * Ok, does IRQ0 through the IOAPIC work? + */ + io_apic_set_pci_routing ( 0, 0, 0, 0, 0); /* connect pin */ + unmask_IO_APIC_irq(0); + timer_ack = 0; + + /* + + + + * Ok, does IRQ0 through the IOAPIC work? + */ + spin_lock_irqsave(&i8259A_lock, flags); + Dprintk("..TIMER check 8259 ints disabled, imr1:%02x, imr2:%02x\n", inb(0x21), inb(0xA1)); + tok = timer_irq_works(); + spin_unlock_irqrestore(&i8259A_lock, flags); + if (tok) { + if (nmi_watchdog == NMI_IO_APIC) { + disable_8259A_irq(0); + setup_nmi(); + enable_8259A_irq(0); + check_nmi_watchdog(); + } + printk(KERN_INFO "..TIMER: works OK on apic pin0 irq0\n" ); + return; + } + /* failed */ + timer_ack = saved_timer_ack; + clear_IO_APIC_pin(0, 0); + io_apic_set_pci_routing ( 0, pin1, 0, 0, 0); + printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to IO-APIC Pin 0\n"); + } +/* end new stuff for nforce2 */ +#endif + printk(KERN_INFO "...trying to set up timer (IRQ0) through the 8259A ... "); if (pin2 != -1) { printk("\n..... (found pin %d) ...", pin2); ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered @ 2003-12-13 9:20 Ross Dickson 2003-12-13 9:51 ` Bob 0 siblings, 1 reply; 13+ messages in thread From: Ross Dickson @ 2003-12-13 9:20 UTC (permalink / raw) To: linux-kernel <snip> >>I decided to wait till this morning, to try the BIOS "C1 Disconnect" set to >> enabled. Still no lockups under this kernel. Tried a vanilla kernel, no >> lockups (but timer and watchdog messed up still). Now that I read >> your message Bob, I understand what you are saying. Luckily, the >>updated BIOS changelog states "Add C1 disconnect item." And this exact >> version seems to have fixed it, and now we have an exact fix (another one?) >>to refer to. > > > >So the fix was absolutely a BIOS fix. > > <snip> ==That's why I'm trying to contact shuttle. Jesse Good Work Jesse, I hope shuttle give up some info - especially as I have pheonix bioses and they are doing ?? about it? > ...but we're stuck looking at smoke and mirrors, > when the kernel might be able to work around > bioses that have not been "updated". Or to put > it another way, "voodoo" may be done by > kernel if not done by bios. Whatever is being > tweaked may be accessible to kernel code. <snip> Bob Please ignore the following if you are already up to speed on SMM. Some readers may not know why we cannot do all that the bios can do aside from a lack of information. Agreed but the keywords are might and may. I remember doing dos based data acquisition with 486SX laptops and then Intel brought out the 486Sl and our pulse counting went bad because of the power saving core. I got the data book from Intel and was very dismayed to see that bios code was being executed when I thought our code was running and there was not a darn thing I could do about it and keep the laptop warranty intact. Its offspring as you may already know is SMM. It is a priviledged mode that we can do pretty much squat about. It can pop up anywhere in the middle of our code and the only thing we will know about it aside from missing time is when it has stuffed something up - like setting registers back to the wrong values. Think of it like a kernel within our kernel with permissions set so it can hack us but we cannot hack it. Maciej recently writes of its continuing effect on NMI debug here. http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/2940.html Regards Ross. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-13 9:20 Ross Dickson @ 2003-12-13 9:51 ` Bob 0 siblings, 0 replies; 13+ messages in thread From: Bob @ 2003-12-13 9:51 UTC (permalink / raw) To: linux-kernel Ross Dickson wrote: >>>So the fix was absolutely a BIOS fix. >>> >>> >>> ><snip> > >==That's why I'm trying to contact shuttle. >Jesse > > R> Good Work Jesse, I hope shuttle give up some info - especially as I have pheonix bioses and they are doing ?? about it? -Ross B> I was expecting to hear that. I have an Award bios on MSI nforce2 mboard. Their bios flash file begins with "w" for Award w6570nms.760(W6570 v760 bios flash file) "p" phoenix "a" ami also appears at boot but goes by in a flash and appears on first cmos setup page So Award bios has a fix for the nforce2. How about Jesse's bios that can fix the problem without a kernel patch, as my Award bios is doing? What kind of bios is that you have, Jesse? My Award bios does not make any way for me to have ioapic edge timer turn on, though. I need a patch to get that on. Also I don't have a cpu disconnect choice in setup and by running temp range 41C to 48C I guess cpu disconnect is not on. 48C once in a while does not hurt anything though. -Bob >>...but we're stuck looking at smoke and mirrors, >>when the kernel might be able to work around >>bioses that have not been "updated". Or to put >>it another way, "voodoo" may be done by >>kernel if not done by bios. Whatever is being >>tweaked may be accessible to kernel code. >> >> ><snip> >Bob > >Please ignore the following if you are already up to speed on SMM. Some >readers may not know why we cannot do all that the bios can do aside from >a lack of information. > >Agreed but the keywords are might and may. I remember doing dos based data acquisition >with 486SX laptops and then Intel brought out the 486Sl and our pulse counting >went bad because of the power saving core. I got the data book from Intel and >was very dismayed to see that bios code was being executed when I thought our code >was running and there was not a darn thing I could do about it and keep the >laptop warranty intact. > >Its offspring as you may already know is SMM. It is a priviledged mode that we can >do pretty much squat about. It can pop up anywhere in the middle of our code >and the only thing we will know about it aside from missing time is when it has >stuffed something up - like setting registers back to the wrong values. Think of >it like a kernel within our kernel with permissions set so it can hack us but we >cannot hack it. > >Maciej recently writes of its continuing effect on NMI debug here. > >http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/2940.html > >Regards >Ross > > > Thanks for explaining. We got some new functionality just by turning nmi_watchdog on but I don't know if anybody has learned anything from the extra debug have they, as far as this nforce2 timing thing? -Bob ^ permalink raw reply [flat|nested] 13+ messages in thread
* Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered
@ 2003-12-07 13:12 Ross Dickson
2003-12-11 6:55 ` Ross Dickson
0 siblings, 1 reply; 13+ messages in thread
From: Ross Dickson @ 2003-12-07 13:12 UTC (permalink / raw)
To: linux-kernel; +Cc: AMartin, ross, andre, kernel
[-- Attachment #1: Type: text/plain, Size: 5471 bytes --]
Greetings,
I am not subscribed so please cc responses.
I have monitored list and know my nforce2 experiences have been common.
Attached patches are in a single bzip tar ball.
I have Albatron KM18G Pro & Epox 8RGA+ MOBOs both using nforce2 chipsets.
I made up a kernel as follows.
Get std 2.4.22 src
apply patch-2.4.23
apply 2.4.22-low-latency.patch
apply preempt-kernel-rml-2.4.23-pre5-1.patch
apply vhz-j64-2.4.22.patch
One patch fails on inode.c, dispose_list() so I placed conditional_schedule() as follows
=static void dispose_list(struct list_head *head)
={
= int nr_disposed = 0;
=
= while (!list_empty(head)) {
= struct inode *inode;
= conditional_schedule();
Config for athlon with 1000hz tics, preempt & low-lat on.
Compiled and installed nvnet & nvidia video driver.
Disclaimer: The following information and code patches are not fully tested and may be
dangerous, also these are the first patches I have made for public consumption so I hope
that their format works.
Note also that the patches are against 2.4.22 even though they were developed
against the heavily patched 2.4.23 mentioned above. The patch code is the same for both
kernels but at different line numbers.
When I enabled either apic or io-apic in kern config, lockups came hard and fast.
Particularly bad under hard disk load. Heaps of lost ints on irq7 in apic and ioapic mode.
Lockups disappeared when I lowered the ide hda udma speed to mode 3 with hdparm so
I went looking for answers which now follow.
There are three parts to this email.
a) apic mods.
b) io-apic mods
c) ide driver mods
a) Lockups are due to too fast an apic acknowledge of apic timer int.
Apic hard locked up the system - no nmi debug available.
Fixed it by introducing a delay of at least 500ns into smp_apic_timer_interrupt()
just prior to ack_APIC_irq().
See attached diff file "nforce2-apic.c-2.4.22.patch" for details.
I have guessed at a suitable cpu speed dependent delay.
Perhaps someone with AMD cpu docs (apic timing specs) & analyser tools could refine it.
Maybe nforce2 chipset really is very quick accessing ram in dual dimm mode?
Or AMD 2200XP has a really slow APIC?
--- linux-2.4.22/arch/i386/kernel/apic.c 2003-06-14 00:51:29.000000000 +1000
+++ linux-2.4.22-rd/arch/i386/kernel/apic.c 2003-12-07 18:27:32.000000000 +1000
@@ -1078,6 +1078,15 @@
*/
apic_timer_irqs[cpu]++;
+#ifdef CONFIG_MK7 && CONFIG_BLK_DEV_AMD74XX
+ /*
+ * on 2200XP & nforce2 chipset we need at least 500ns delay here
+ * to stop lockups with udma100 drive. try to scale delay time
+ * with cpu speed. Ross Dickson.
+ */
+ ndelay((cpu_khz >> 12)+200 ); /* don't ack too soon or hard lockup */
+#endif
+
/*
* NOTE! We'd better ACK the irq immediately,
* because timer handling can be slow.
b) I was also disappointed to see I could not have irq0 timer IO-APIC-edge.
So I have fixed it too (tested on both my epox and albatron MOBOs).
Firstly I found 8254 connected directly to pin 0 not pin 2 of io-apic.
I have modified check_timer() in io_apic.c to trial connect pin and test for it
after the existing test for connection to io-apic.
See attached diff file nforce2-io-apic.c-2.4.22 for details.
--- linux-2.4.22/arch/i386/kernel/io_apic.c 2003-08-25 21:44:39.000000000 +1000
+++ linux-2.4.22-rd/arch/i386/kernel/io_apic.c 2003-12-07 18:40:40.000000000 +1000
@@ -1614,9 +1614,44 @@
return;
}
clear_IO_APIC_pin(0, pin1);
- printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to IO-APIC\n");
+ printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to IO-APIC pin%d\n",pin1);
}
+#ifdef CONFIG_ACPI_BOOT && CONFIG_X86_UP_IOAPIC
+ /* for nforce2 try vector 0 on pin0
+ * Note the io_apic_set_pci_routing call disables the 8259 irq 0
+ * so we must be connected directly to the 8254 timer if this works
+ * Note2: this violates the above comment re Subtle but works!
+ */
+ printk(KERN_INFO "..TIMER: Is timer irq0 connected to IOAPIC Pin0? ...\n");
+ if ( pin1 != -1 && nr_ioapics ) {
+ int saved_timer_ack = timer_ack;
+ /* next call also disables 8259 irq0 */
+ int result = io_apic_set_pci_routing ( 0, 0, 0, 0, 0);
+ /*
+ * Ok, does IRQ0 through the IOAPIC work?
+ */
+ unmask_IO_APIC_irq(0);
+ timer_ack = 0 ;
+ if (timer_irq_works()) {
+ if (nmi_watchdog == NMI_IO_APIC) {
+ disable_8259A_irq(0);
+ setup_nmi();
+ enable_8259A_irq(0);
+ check_nmi_watchdog();
+ }
+ printk(KERN_INFO "..TIMER: works OK on apic pin0 irq0\n" );
+ return;
+ }
+ /* failed */
+ timer_ack = saved_timer_ack;
+ clear_IO_APIC_pin(0, 0);
+ result = io_apic_set_pci_routing ( 0, pin1, 0, 0, 0);
+ printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to IO-APIC Pin 0\n");
+ }
+#endif
+/* end new stuff for nforce2 */
+
printk(KERN_INFO "...trying to set up timer (IRQ0) through the 8259A ... ");
if (pin2 != -1) {
printk("\n..... (found pin %d) ...", pin2);
c) Finally during my fault finding I merged A.Martins patches for the nforce2 IDE driver.
I note that the nforce2 address setup timing bits are different to the AMD ones.
I have assumed the nforce2 address timings apply to nforce and nforce3 chipsets.
I could be wrong so if someone with the nvidia docs could check it please.
I have also not tested it with anything but a WDC ata100 hard drive.
For info see attached patch files (I think pci ids are already in 2.4.23)
nforce2-amd74xx.c-2.4.22.patch, nforce2-amd74xx.h-2.4.22.patch, nforce2-pci_ids.h-2.4.22.patch
Thanks
Ross Dickson
[-- Attachment #2: ross-diffs.tar.bz2 --]
[-- Type: application/x-tbz, Size: 4375 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered @ 2003-12-11 6:55 ` Ross Dickson 2003-12-11 11:47 ` Ian Kumlien 0 siblings, 1 reply; 13+ messages in thread From: Ross Dickson @ 2003-12-11 6:55 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: linux-kernel, AMartin, kernel, Ian Kumlien On Thursday 11 December 2003 02:06, Maciej W. Rozycki wrote: > On Wed, 10 Dec 2003, Ross Dickson wrote: > > > Relevant dmesg output from Albatron KM18G Pro ( this is different MOBO (same type) but > > this time has a barton core 2500 XP cpu). > > > > enabled ExtINT on CPU#0 > > ESR value before enabling vector: 00000000 > > ESR value after enabling vector: 00000000 > > ENABLING IO-APIC IRQs > > init IO_APIC IRQs > > IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected. > > ..TIMER: vector=0x31 pin1=2 pin2=-1 > > ..MP-BIOS bug: 8254 timer not connected to IO-APIC pin2 > > ..TIMER: Is timer irq0 connected to IOAPIC Pin0? ... > > IOAPIC[0]: Set PCI routing entry (2-0 -> 0x31 -> IRQ 0 Mode:0 Active:0) > > ..TIMER check 8259 ints disabled, imr1:ff, imr2:ff > > ..TIMER: works OK on apic pin0 irq0 > > Using local APIC timer interrupts. > > calibrating APIC timer ... > > ..... CPU clock speed is 1829.0708 MHz. > > ..... host bus clock speed is 332.0674 MHz. > > cpu: 0, clocks: 332674, slice: 166337 > > CPU0<T0:332672,T1:166320,D:15,S:166337,C:332674> > > Hmm, while this is different from what is documented in the MP Spec, it > looks like the 8254 IRQ is connected to INTIN0 indeed. We can handle such > a setup if the BIOS reports routing correctly. Since you invoke > io_apic_set_pci_routing() I assume you use ACPI for IRQ routing > information. Can you please rebuild the kernel with APIC_DEBUG set to 1 > in include/asm-i386/apic.h and send me the bootstrap log? Can you please > send me the output of a tool called `mptable' as well, so that I can > compare the results? > > Maciej > > -- > + Maciej W. Rozycki, Technical University of Gdansk, Poland + > +--------------------------------------------------------------+ > + e-mail: macro@ds2.pg.gda.pl, PGP key available + > > > Thanks Maciej, bootstrap log follows CPI: RSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x1dff3000 ACPI: FADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x1dff3040 ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x1dff7980 ACPI: DSDT (v001 NVIDIA AWRDACPI 0x00001000 MSFT 0x0100000e) @ 0x00000000 ACPI: Local APIC address 0xfee00000 Boot CPU = 0 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 Pentium(tm) Pro APIC version 16 Floating point unit present. Machine Exception supported. 64 bit compare & exchange supported. Internal APIC present. SEP present. MTRR present. PGE present. MCA present. CMOV present. PAT present. PSE present. MMX present. FXSR present. XMM present. Bootup CPU ACPI: LAPIC_NMI (acpi_id[0x00] polarity[0x1] trigger[0x1] lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec00000] global_irq_base[0x0]) IOAPIC[0]: Assigned apic_id 2 IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, IRQ 0-23 Bus #0 is ISA Int: type 3, pol 0, trig 0, bus 0, irq 0, 2-0 Int: type 0, pol 0, trig 0, bus 0, irq 1, 2-1 Int: type 0, pol 0, trig 0, bus 0, irq 3, 2-3 Int: type 0, pol 0, trig 0, bus 0, irq 4, 2-4 Int: type 0, pol 0, trig 0, bus 0, irq 5, 2-5 Int: type 0, pol 0, trig 0, bus 0, irq 6, 2-6 Int: type 0, pol 0, trig 0, bus 0, irq 7, 2-7 Int: type 0, pol 0, trig 0, bus 0, irq 8, 2-8 Int: type 0, pol 0, trig 0, bus 0, irq 9, 2-9 Int: type 0, pol 0, trig 0, bus 0, irq 10, 2-10 Int: type 0, pol 0, trig 0, bus 0, irq 11, 2-11 Int: type 0, pol 0, trig 0, bus 0, irq 12, 2-12 Int: type 0, pol 0, trig 0, bus 0, irq 13, 2-13 Int: type 0, pol 0, trig 0, bus 0, irq 14, 2-14 Int: type 0, pol 0, trig 0, bus 0, irq 15, 2-15 ACPI: INT_SRC_OVR (bus[0] irq[0x0] global_irq[0x2] polarity[0x0] trigger[0x0]) Int: type 0, pol 0, trig 0, bus 0, irq 0, 2-2 ACPI: INT_SRC_OVR (bus[0] irq[0x9] global_irq[0x9] polarity[0x1] trigger[0x3]) Int: type 0, pol 1, trig 3, bus 0, irq 9, 2-9 ACPI BALANCE SET Using ACPI (MADT) for SMP configuration information Kernel command line: splash=silent root=/dev/hda2 hdc=ide-scsi hdclun=0 ide_setup: hdc=ide-scsi ide_setup: hdclun=0 mapped APIC to ffffe000 (fee00000) mapped IOAPIC to ffffd000 (fec00000) Initializing CPU#0 Detected 1830.076 MHz processor. Console: colour VGA+ 80x25 Calibrating delay loop... 3620.86 BogoMIPS Memory: 482980k/491456k available (1800k kernel code, 8088k reserved, 622k data, 112k init, 0k highmem) Dentry cache hash table entries: 65536 (order: 7, 524288 bytes) Inode cache hash table entries: 32768 (order: 6, 262144 bytes) Mount cache hash table entries: 512 (order: 0, 4096 bytes) Buffer cache hash table entries: 32768 (order: 5, 131072 bytes) Page-cache hash table entries: 131072 (order: 7, 524288 bytes) CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 512K (64 bytes/line) Intel machine check architecture supported. Intel machine check reporting enabled on CPU#0. CPU: After generic, caps: 0383fbff c1c3fbff 00000000 00000000 CPU: Common caps: 0383fbff c1c3fbff 00000000 00000000 CPU: AMD Athlon(tm) XP 2500+ stepping 00 Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. POSIX conformance testing by UNIFIX Getting VERSION: 40010 Getting VERSION: 40010 Getting ID: 0 Getting ID: f000000 Getting LVT0: 700 Getting LVT1: 400 enabled ExtINT on CPU#0 ESR value before enabling vector: 00000000 ESR value after enabling vector: 00000000 ENABLING IO-APIC IRQs Synchronizing Arb IDs. init IO_APIC IRQs IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected. ..TIMER: vector=0x31 pin1=2 pin2=-1 ..MP-BIOS bug: 8254 timer not connected to IO-APIC pin2 ..TIMER: Is timer irq0 connected to IOAPIC Pin0? ... IOAPIC[0]: Set PCI routing entry (2-0 -> 0x31 -> IRQ 0 Mode:0 Active:0) ..TIMER check 8259 ints disabled, imr1:ff, imr2:ff ..TIMER: works OK on apic pin0 irq0 Using local APIC timer interrupts. calibrating APIC timer ... ..... CPU clock speed is 1829.0813 MHz. ..... host bus clock speed is 332.0693 MHz. cpu: 0, clocks: 332693, slice: 166346 CPU0<T0:332688,T1:166336,D:6,S:166346,C:332693> mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au) mtrr: detected mtrr type: Intel ACPI: Subsystem revision 20031002 PCI: PCI BIOS revision 2.10 entry at 0xfb4e0, last bus=2 PCI: Using configuration type 1 IOAPIC[0]: Set PCI routing entry (2-9 -> 0x71 -> IRQ 9 Mode:1 Active:0) ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: System [ACPI] (supports S0 S1 S4 S5) ACPI: PCI Root Bridge [PCI0] (00:00) PCI: Probing PCI hardware (bus 00) ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.HUB0._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.AGPB._PRT] ACPI: PCI Interrupt Link [LNK1] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNK2] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNK3] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNK4] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNK5] (IRQs 3 4 *5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LUBA] (IRQs 3 4 *5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LUBB] (IRQs 3 4 5 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LMAC] (IRQs 3 4 5 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LAPU] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LACI] (IRQs 3 4 5 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LMCI] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LSMB] (IRQs 3 4 5 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LUB2] (IRQs 3 4 *5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LFIR] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [L3CM] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LIDE] (IRQs 3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [APC1] (IRQs 16) ACPI: PCI Interrupt Link [APC2] (IRQs 17) ACPI: PCI Interrupt Link [APC3] (IRQs 18) ACPI: PCI Interrupt Link [APC4] (IRQs 19) ACPI: PCI Interrupt Link [APC5] (IRQs *16) ACPI: PCI Interrupt Link [APCF] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCG] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCH] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCI] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCJ] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCK] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCS] (IRQs *23) ACPI: PCI Interrupt Link [APCL] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCM] (IRQs 20 21 22) ACPI: PCI Interrupt Link [AP3C] (IRQs 20 21 22) ACPI: PCI Interrupt Link [APCZ] (IRQs 20 21 22) PCI: Probing PCI hardware ACPI: PCI Interrupt Link [APCS] enabled at IRQ 23 IOAPIC[0]: Set PCI routing entry (2-23 -> 0xa9 -> IRQ 23 Mode:1 Active:0) 00:00:01[A] -> 2-23 -> IRQ 23 Pin 2-23 already programmed ACPI: PCI Interrupt Link [APCF] enabled at IRQ 20 IOAPIC[0]: Set PCI routing entry (2-20 -> 0xb1 -> IRQ 20 Mode:1 Active:0) 00:00:02[A] -> 2-20 -> IRQ 20 ACPI: PCI Interrupt Link [APCG] enabled at IRQ 22 IOAPIC[0]: Set PCI routing entry (2-22 -> 0xb9 -> IRQ 22 Mode:1 Active:0) 00:00:02[B] -> 2-22 -> IRQ 22 ACPI: PCI Interrupt Link [APCL] enabled at IRQ 21 IOAPIC[0]: Set PCI routing entry (2-21 -> 0xc1 -> IRQ 21 Mode:1 Active:0) 00:00:02[C] -> 2-21 -> IRQ 21 ACPI: PCI Interrupt Link [APCH] enabled at IRQ 20 Pin 2-20 already programmed ACPI: PCI Interrupt Link [APCI] enabled at IRQ 22 Pin 2-22 already programmed ACPI: PCI Interrupt Link [APCJ] enabled at IRQ 21 Pin 2-21 already programmed ACPI: PCI Interrupt Link [APCK] enabled at IRQ 20 Pin 2-20 already programmed ACPI: PCI Interrupt Link [APCM] enabled at IRQ 22 Pin 2-22 already programmed ACPI: PCI Interrupt Link [APCZ] enabled at IRQ 21 Pin 2-21 already programmed ACPI: PCI Interrupt Link [APC3] enabled at IRQ 18 IOAPIC[0]: Set PCI routing entry (2-18 -> 0xc9 -> IRQ 18 Mode:1 Active:0) 00:01:06[A] -> 2-18 -> IRQ 18 ACPI: PCI Interrupt Link [APC4] enabled at IRQ 19 IOAPIC[0]: Set PCI routing entry (2-19 -> 0xd1 -> IRQ 19 Mode:1 Active:0) 00:01:06[B] -> 2-19 -> IRQ 19 ACPI: PCI Interrupt Link [APC1] enabled at IRQ 16 IOAPIC[0]: Set PCI routing entry (2-16 -> 0xd9 -> IRQ 16 Mode:1 Active:0) 00:01:06[C] -> 2-16 -> IRQ 16 ACPI: PCI Interrupt Link [APC2] enabled at IRQ 17 IOAPIC[0]: Set PCI routing entry (2-17 -> 0xe1 -> IRQ 17 Mode:1 Active:0) 00:01:06[D] -> 2-17 -> IRQ 17 Pin 2-19 already programmed Pin 2-16 already programmed Pin 2-17 already programmed Pin 2-18 already programmed Pin 2-16 already programmed Pin 2-17 already programmed Pin 2-18 already programmed Pin 2-19 already programmed ACPI: PCI Interrupt Link [APC5] enabled at IRQ 16 Pin 2-16 already programmed number of MP IRQ sources: 15. number of IO-APIC #2 registers: 24. testing the IO APIC....................... IO APIC #2...... .... register #00: 02000000 ....... : physical APIC id: 02 ....... : Delivery Type: 0 ....... : LTS : 0 .... register #01: 00170011 ....... : max redirection entries: 0017 ....... : PRQ implemented: 0 ....... : IO APIC version: 0011 .... register #02: 00000000 ....... : arbitration: 00 .... IRQ redirection table: NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: 00 001 01 0 0 0 0 0 1 1 31 01 001 01 0 0 0 0 0 1 1 39 02 000 00 0 0 0 0 0 0 0 00 03 001 01 0 0 0 0 0 1 1 41 04 001 01 0 0 0 0 0 1 1 49 05 001 01 0 0 0 0 0 1 1 51 06 001 01 0 0 0 0 0 1 1 59 07 001 01 0 0 0 0 0 1 1 61 08 001 01 0 0 0 0 0 1 1 69 09 001 01 0 1 0 0 0 1 1 71 0a 001 01 0 0 0 0 0 1 1 79 0b 001 01 0 0 0 0 0 1 1 81 0c 001 01 0 0 0 0 0 1 1 89 0d 001 01 0 0 0 0 0 1 1 91 0e 001 01 0 0 0 0 0 1 1 99 0f 001 01 0 0 0 0 0 1 1 A1 10 001 01 1 1 0 0 0 1 1 D9 11 001 01 1 1 0 0 0 1 1 E1 12 001 01 1 1 0 0 0 1 1 C9 13 001 01 1 1 0 0 0 1 1 D1 14 001 01 1 1 0 0 0 1 1 B1 15 001 01 1 1 0 0 0 1 1 C1 16 001 01 1 1 0 0 0 1 1 B9 17 001 01 1 1 0 0 0 1 1 A9 IRQ to pin mappings: IRQ0 -> 0:2-> 0:0 IRQ1 -> 0:1 IRQ3 -> 0:3 IRQ4 -> 0:4 IRQ5 -> 0:5 IRQ6 -> 0:6 IRQ7 -> 0:7 IRQ8 -> 0:8 IRQ9 -> 0:9-> 0:9 IRQ10 -> 0:10 IRQ11 -> 0:11 IRQ12 -> 0:12 IRQ13 -> 0:13 IRQ14 -> 0:14 IRQ15 -> 0:15 IRQ16 -> 0:16 IRQ17 -> 0:17 IRQ18 -> 0:18 IRQ19 -> 0:19 IRQ20 -> 0:20 IRQ21 -> 0:21 IRQ22 -> 0:22 IRQ23 -> 0:23 .................................... done. PCI: Using ACPI for IRQ routing PCI: if you experience problems, try using option 'pci=noacpi' or even 'acpi=off' mptable doesn't like my bios I tried setting bios mp versions to both 1.1 and 1.4 albatron:/usr/src/mptable-2.0.15a # ./mptable -verbose =============================================================================== MPTable, version 2.0.15 Linux looking for EBDA pointer @ 0x040e, found, searching EBDA @ 0x0009fc00 searching CMOS 'top of mem' @ 0x0009f800 (638K) searching default 'top of mem' @ 0x0009fc00 (639K) searching BIOS @ 0x000f0000 MP FPS found in BIOS @ physical addr: 0x000f50b0 ------------------------------------------------------------------------------- MP Floating Pointer Structure: location: BIOS physical address: 0x000f50b0 signature: '_MP_' length: 16 bytes version: 1.1 checksum: 0x00 mode: Virtual Wire ------------------------------------------------------------------------------- MP Config Table Header: physical address: 0x0xf0c00 signature: '$ml$' base table length: 0 version: 1.6 checksum: 0x00 OEM ID: 'Ä ¸§' °öProduct ID: '( m'P OEM table pointer: 0x12d90e22 OEM table size: 7964 entry count: 7964 local APIC address: 0x1f1c1f1c extended table length: 65284 extended table checksum: 255 ------------------------------------------------------------------------------- MP Config Base Table Entries: -- MPTABLE HOSED! record type = 55 albatron:/usr/src/mptable-2.0.15a # Finally others working with kern 2.6 earlier trialled the following patch which may provide some more clues: retrieved from: http://www.kernel.org/pub/linux/kernel/people/bart/2.6.0-test11-bart1/broken-out/nforce2-apic.patch [x86] do not wrongly override mp_ExtINT IRQ From: Mathieu <cheuche+lkml@free.fr>. With this patch timer IRQ0 is correctly set to IO-APIC-edge (not XT-PIC) on nForce2 boards when using APIC and ACPI. arch/i386/kernel/mpparse.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletion(-) diff -puN arch/i386/kernel/mpparse.c~nforce2-apic arch/i386/kernel/mpparse.c --- linux-2.6.0-test11/arch/i386/kernel/mpparse.c~nforce2-apic 2003-12-08 00:12:25.782597272 +0100 +++ linux-2.6.0-test11-root/arch/i386/kernel/mpparse.c 2003-12-08 00:12:25.786596664 +0100 @@ -962,7 +962,8 @@ void __init mp_override_legacy_irq ( */ for (i = 0; i < mp_irq_entries; i++) { if ((mp_irqs[i].mpc_dstapic == intsrc.mpc_dstapic) - && (mp_irqs[i].mpc_srcbusirq == intsrc.mpc_srcbusirq)) { + && (mp_irqs[i].mpc_srcbusirq == intsrc.mpc_srcbusirq) + && (mp_irqs[i].mpc_irqtype == intsrc.mpc_irqtype)) { mp_irqs[i] = intsrc; found = 1; break; _ however the results were not completely successful as this posting shows it routing through the 8259? http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/1303.html dmesg differences: 1. after: ..TIMER: vector=0x31 pin1=2 pin2=0 before: ..TIMER: vector=0x31 pin1=2 pin2=-1 2. after: ...trying to set up timer (IRQ0) through the 8259A ... ..... (found pin 0) ...works. number of MP IRQ sources: 16. before: ...trying to set up timer (IRQ0) through the 8259A ... failed. ...trying to set up timer as Virtual Wire IRQ... failed. ...trying to set up timer as ExtINT IRQ... works. number of MP IRQ sources: 15. Perhaps someone else could get mptable to run on their machine and send you the result. Regards Ross ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-11 6:55 ` Ross Dickson @ 2003-12-11 11:47 ` Ian Kumlien 2003-12-11 9:12 ` Ross Dickson 0 siblings, 1 reply; 13+ messages in thread From: Ian Kumlien @ 2003-12-11 11:47 UTC (permalink / raw) To: ross; +Cc: Maciej W. Rozycki, linux-kernel, AMartin, kernel [-- Attachment #1: Type: text/plain, Size: 4023 bytes --] On Thu, 2003-12-11 at 07:55, Ross Dickson wrote: > albatron:/usr/src/mptable-2.0.15a # ./mptable -verbose > > =============================================================================== > > MPTable, version 2.0.15 Linux > > looking for EBDA pointer @ 0x040e, found, searching EBDA @ 0x0009fc00 > searching CMOS 'top of mem' @ 0x0009f800 (638K) > searching default 'top of mem' @ 0x0009fc00 (639K) > searching BIOS @ 0x000f0000 > > MP FPS found in BIOS @ physical addr: 0x000f50b0 > > ------------------------------------------------------------------------------- > > MP Floating Pointer Structure: > > location: BIOS > physical address: 0x000f50b0 > signature: '_MP_' > length: 16 bytes > version: 1.1 > checksum: 0x00 > mode: Virtual Wire > > ------------------------------------------------------------------------------- > > MP Config Table Header: > > physical address: 0x0xf0c00 > signature: '$ml$' > base table length: 0 > version: 1.6 > checksum: 0x00 > OEM ID: 'Ä > ¸§' > °öProduct ID: '( > m'P > OEM table pointer: 0x12d90e22 > OEM table size: 7964 > entry count: 7964 > local APIC address: 0x1f1c1f1c > extended table length: 65284 > extended table checksum: 255 > > ------------------------------------------------------------------------------- > > MP Config Base Table Entries: > > -- > MPTABLE HOSED! record type = 55 > albatron:/usr/src/mptable-2.0.15a # > > Perhaps someone else could get mptable to run on their machine and send you > the result. mptable dosn't seem to accept it's own options, anyways, heres the output. mptable -extra -verbose -pirq =============================================================================== MPTable, version 2.0.15 Linux looking for EBDA pointer @ 0x040e, found, searching EBDA @ 0x0009fc00 searching CMOS 'top of mem' @ 0x0009f800 (638K) searching default 'top of mem' @ 0x0009fc00 (639K) searching BIOS @ 0x000f0000 MP FPS found in BIOS @ physical addr: 0x000f5ce0 ------------------------------------------------------------------------------- MP Floating Pointer Structure: location: BIOS physical address: 0x000f5ce0 signature: '_MP_' length: 16 bytes version: 1.1 checksum: 0x00 mode: Virtual Wire ------------------------------------------------------------------------------- MP Config Table Header: physical address: 0x0xf0c00 signature: '' base table length: 1280 version: 1.7 checksum: 0x00 OEM ID: '' Product ID: '' OEM table pointer: 0x0000ffff OEM table size: 0 entry count: 65535 local APIC address: 0x000000c4 extended table length: 1 extended table checksum: 0 ------------------------------------------------------------------------------- MP Config Base Table Entries: -- Processors: APIC ID Version State Family Model Step Flags 0 0x 7 BSP, usable 15 15 15 0x1a00c035 0 0x 0 AP, unusable 0 0 10 0x78ffff0a -- MPTABLE HOSED! record type = 15 I couldn't find the source so i used a old RedHat rpm... (Asus A7N8X-X bios 1007) -- Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-11 11:47 ` Ian Kumlien @ 2003-12-11 9:12 ` Ross Dickson 2003-12-11 17:52 ` Ian Kumlien 0 siblings, 1 reply; 13+ messages in thread From: Ross Dickson @ 2003-12-11 9:12 UTC (permalink / raw) To: Ian Kumlien; +Cc: Maciej W. Rozycki, linux-kernel, AMartin, kernel On Thursday 11 December 2003 21:47, Ian Kumlien wrote: > On Thu, 2003-12-11 at 07:55, Ross Dickson wrote: > > albatron:/usr/src/mptable-2.0.15a # ./mptable -verbose > > > > =============================================================================== > > > > MPTable, version 2.0.15 Linux > > > > looking for EBDA pointer @ 0x040e, found, searching EBDA @ 0x0009fc00 > > searching CMOS 'top of mem' @ 0x0009f800 (638K) > > searching default 'top of mem' @ 0x0009fc00 (639K) > > searching BIOS @ 0x000f0000 > > > > MP FPS found in BIOS @ physical addr: 0x000f50b0 > > > > ------------------------------------------------------------------------------- > > > > MP Floating Pointer Structure: > > > > location: BIOS > > physical address: 0x000f50b0 > > signature: '_MP_' > > length: 16 bytes > > version: 1.1 > > checksum: 0x00 > > mode: Virtual Wire > > > > ------------------------------------------------------------------------------- > > > > MP Config Table Header: > > > > physical address: 0x0xf0c00 > > signature: '$ml$' > > base table length: 0 > > version: 1.6 > > checksum: 0x00 > > OEM ID: 'Ä > > ¸§' > > °öProduct ID: '( > > m'P > > OEM table pointer: 0x12d90e22 > > OEM table size: 7964 > > entry count: 7964 > > local APIC address: 0x1f1c1f1c > > extended table length: 65284 > > extended table checksum: 255 > > > > ------------------------------------------------------------------------------- > > > > MP Config Base Table Entries: > > > > -- > > MPTABLE HOSED! record type = 55 > > albatron:/usr/src/mptable-2.0.15a # > > > > > Perhaps someone else could get mptable to run on their machine and send you > > the result. > > mptable dosn't seem to accept it's own options, anyways, heres the > output. > > mptable -extra -verbose -pirq > > =============================================================================== > > MPTable, version 2.0.15 Linux > > looking for EBDA pointer @ 0x040e, found, searching EBDA @ 0x0009fc00 > searching CMOS 'top of mem' @ 0x0009f800 (638K) > searching default 'top of mem' @ 0x0009fc00 (639K) > searching BIOS @ 0x000f0000 > > MP FPS found in BIOS @ physical addr: 0x000f5ce0 > > ------------------------------------------------------------------------------- > > MP Floating Pointer Structure: > > location: BIOS > physical address: 0x000f5ce0 > signature: '_MP_' > length: 16 bytes > version: 1.1 > checksum: 0x00 > mode: Virtual Wire > > ------------------------------------------------------------------------------- > > MP Config Table Header: > > physical address: 0x0xf0c00 > signature: '' > base table length: 1280 > version: 1.7 > checksum: 0x00 > OEM ID: '' > Product ID: '' > OEM table pointer: 0x0000ffff > OEM table size: 0 > entry count: 65535 > local APIC address: 0x000000c4 > extended table length: 1 > extended table checksum: 0 > > ------------------------------------------------------------------------------- > > MP Config Base Table Entries: > > -- > Processors: APIC ID Version State Family Model Step Flags > 0 0x 7 BSP, usable 15 15 15 0x1a00c035 > 0 0x 0 AP, unusable 0 0 10 0x78ffff0a > -- > MPTABLE HOSED! record type = 15 > > I couldn't find the source so i used a old RedHat rpm... > (Asus A7N8X-X bios 1007) > > -- > Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net > Thanks Ian Also many thanks for pointing out the relevant section to look in with the AMD cpu link that you sent - Credit where credit is due (assuming we are both on the right track). I had a read and refined your surmisings. I think the problem appears synchronous with the apic timer because of two reasons. 1) any apic irq can cause re-connection of the system bus after disconnect. 2) the apic timer irq in my examinations has the shortest path to an ack. I also had a look back through the athlon cooler and power management postings and web site articles. I was blissfully ignorant of these issues when I started and now I wonder what I have stepped into... Yuk I submitted a support request to AMD, apologies for not cc'ing you, I kept the cc's down to just nvidia and the mailing list. If you have not seen it yet then it is here http://lkml.org/lkml/2003/12/11/17 We hope.... Regards Ross ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-11 9:12 ` Ross Dickson @ 2003-12-11 17:52 ` Ian Kumlien 2003-12-11 18:21 ` Jesse Allen 0 siblings, 1 reply; 13+ messages in thread From: Ian Kumlien @ 2003-12-11 17:52 UTC (permalink / raw) To: ross; +Cc: macro, linux-kernel, AMartin, kernel [-- Attachment #1: Type: text/plain, Size: 1456 bytes --] On Thu, 2003-12-11 at 10:12, Ross Dickson wrote: > On Thursday 11 December 2003 21:47, Ian Kumlien wrote: > Thanks Ian > > Also many thanks for pointing out the relevant section to look in with the AMD > cpu link that you sent - Credit where credit is due (assuming we are both on the > right track). Heh, thanks, feels nice to have someone who agrees with you =). > I had a read and refined your surmisings. I think the > problem appears synchronous with the apic timer because of two reasons. > 1) any apic irq can cause re-connection of the system bus after disconnect. > 2) the apic timer irq in my examinations has the shortest path to an ack. http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24416.pdf Page 42 and 94 might help as well. I haven't grasped it all or had any food yet but i hope i'm right =) > I also had a look back through the athlon cooler and power management > postings and web site articles. I was blissfully ignorant of these issues when I > started and now I wonder what I have stepped into... Yuk Heh, yeah, the need for disconnect is somewhat dodgy, i haven't read up on th rest. > I submitted a support request to AMD, apologies for not cc'ing you, I kept > the cc's down to just nvidia and the mailing list. If you have not seen it yet > then it is here Thanks > We hope.... Yup... -- Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-11 17:52 ` Ian Kumlien @ 2003-12-11 18:21 ` Jesse Allen 2003-12-12 9:27 ` Bob 0 siblings, 1 reply; 13+ messages in thread From: Jesse Allen @ 2003-12-11 18:21 UTC (permalink / raw) To: Ian Kumlien; +Cc: linux-kernel On Thu, Dec 11, 2003 at 06:52:41PM +0100, Ian Kumlien wrote: > Heh, yeah, the need for disconnect is somewhat dodgy, i haven't read up > on th rest. > Hmm, weird. I went to go look at the Shuttle motherboard maker's site - maybe so that I can bug them for a bios disconnect option - but I checked for a bios update first. And sure enough like they read my mind, just posted online today, an update. Here are the details of fixes: " Checksum: 8B00H Date Code: 12/05/03 1.Support 0.18 micron AMD Duron (Palomino) CPU. 2.Add C1 disconnect item." It's almost as they're reading this list. This disconnect problem was discovered on the 5th (well the 5th in my timezone). Perhaps they're aware of this issue... I'm gonna talk to them. Jesse ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-11 18:21 ` Jesse Allen @ 2003-12-12 9:27 ` Bob 2003-12-12 16:59 ` Working nforce2, was " Jesse Allen 0 siblings, 1 reply; 13+ messages in thread From: Bob @ 2003-12-12 9:27 UTC (permalink / raw) To: linux-kernel Jesse Allen wrote: >On Thu, Dec 11, 2003 at 06:52:41PM +0100, Ian Kumlien wrote: > > >>Heh, yeah, the need for disconnect is somewhat dodgy, i haven't read up >>on th rest. >> >Hmm, weird. I went to go look at the Shuttle motherboard maker's site - maybe so that I can bug them for a bios disconnect option - but I checked for a bios update first. And sure enough like they read my mind, just posted online today, an update. Here are the details of fixes: > >" Checksum: 8B00H Date Code: 12/05/03 >1.Support 0.18 micron AMD Duron (Palomino) CPU. >2.Add C1 disconnect item." > >It's almost as they're reading this list. This disconnect problem was discovered on the 5th (well the 5th in my timezone). Perhaps they're aware of this issue... I'm gonna talk to them. > >Jesse > A bios update for MSI K7N2 MCP2-T nforce2 board fixed the crashing BEFORE these patches were developed, but there was no documentation that would relate or explain. http://www.msi.com.tw/program/support/bios/bos/spt_bos_detail.php?UID=436&kind=1 http://download.msi.com.tw/support/bos_exe/6570v76.exe Award 7.6 at the top of the list. Maybe somebody can figure out what they're doing. Nvidia X driver for ti4200 agp8 still locks up linux though, but X nv works fine. agp8 3d may expose the timer issue. -Bob ^ permalink raw reply [flat|nested] 13+ messages in thread
* Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-12 9:27 ` Bob @ 2003-12-12 16:59 ` Jesse Allen 2003-12-12 17:18 ` Jesse Allen ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Jesse Allen @ 2003-12-12 16:59 UTC (permalink / raw) To: linux-kernel On Fri, Dec 12, 2003 at 04:27:59AM -0500, Bob wrote: > Jesse Allen wrote: > > >On Thu, Dec 11, 2003 at 06:52:41PM +0100, Ian Kumlien wrote: > > > > > >>Heh, yeah, the need for disconnect is somewhat dodgy, i haven't read up > >>on th rest. > >> > >Hmm, weird. I went to go look at the Shuttle motherboard maker's site - > >maybe so that I can bug them for a bios disconnect option - but I checked > >for a bios update first. And sure enough like they read my mind, just > >posted online today, an update. Here are the details of fixes: > > > >" Checksum: 8B00H Date Code: 12/05/03 > >1.Support 0.18 micron AMD Duron (Palomino) CPU. > >2.Add C1 disconnect item." > > > >It's almost as they're reading this list. This disconnect problem was > >discovered on the 5th (well the 5th in my timezone). Perhaps they're > >aware of this issue... I'm gonna talk to them. > > > >Jesse > > > A bios update for MSI K7N2 MCP2-T nforce2 board > fixed the crashing BEFORE these patches were developed, > but there was no documentation that would relate or explain. Last night, I updated the bios to the 12-5-03 released yesterday (see above). I looked at the new option under Advanced Chipset Features, "C1 Disconnect". It has three selections: Auto, Enabled, Disabled. There seems to be no default. The item help says: "Force En/Disabled or Auto mode: C17 IGP/SPP NB A03 C18D SPP NM A01 (C01) enabled C1 disconnect otherwise disabled it" Auto sounded nice, so I selected that first. I compiled a new kernel without the disconnect off patch, or the ack delay. These are the exact patches I used on 2.6.0-test11: patch-2.6.0-test11-bk8.bz2 acpi-2.6.0t11.patch acpi bugfixes from Maciej. nforce-ioapic-timer-2.6t11.patch from Ross Dickson. Timer patch. forcedeth.patch Patch stolen from -test10-mm1? Unused. forcedeth-update-2.patch Same. Sure enough, under this kernel, no lockups. Athcool reported Disconnect was "on". I decided to wait till this morning, to try the BIOS "C1 Disconnect" set to enabled. Still no lockups under this kernel. Tried a vanilla kernel, no lockups (but timer and watchdog messed up still). Now that I read your message Bob, I understand what you are saying. Luckily, the updated BIOS changelog states "Add C1 disconnect item." And this exact version seems to have fixed it, and now we have an exact fix (another one?) to refer to. So the fix was absolutely a BIOS fix. It seems a lot of people have buggy BIOSes on nforce2 boards. Even some that have the option. I guess I haven't proved that it was the BIOS fix, because I haven't stressed it for a long period of time. But I don't believe I have to because I can do grep's and kernel compiles with disconnect on now, where before I couldn't (always been very easy to reproduce lockup). > > http://www.msi.com.tw/program/support/bios/bos/spt_bos_detail.php?UID=436&kind=1 > http://download.msi.com.tw/support/bos_exe/6570v76.exe > > Award 7.6 at the top of the list. Maybe somebody can figure > out what they're doing. I think I'll continue on contacting shuttle and ask them why they added the option, and how they added it. Maybe that will give us the right information. > > Nvidia X driver for ti4200 agp8 still locks up linux though, > but X nv works fine. agp8 3d may expose the timer issue. > That's either an nvidia driver problem, or agpgart-nforce problem. I'd try 4x agp, and or NVAGP (or agpgart, if already using NVAGP). If you think it's the timer, try the timer patch, or with nolapic noapic. Jesse ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-12 16:59 ` Working nforce2, was " Jesse Allen @ 2003-12-12 17:18 ` Jesse Allen 2003-12-12 18:18 ` Josh McKinney 2003-12-13 6:34 ` Bob 2 siblings, 0 replies; 13+ messages in thread From: Jesse Allen @ 2003-12-12 17:18 UTC (permalink / raw) To: linux-kernel Oops, typo: NM supposed to be NB On Fri, Dec 12, 2003 at 09:59:29AM -0700, Jesse Allen wrote: > The item help says: > "Force En/Disabled > or Auto mode: > C17 IGP/SPP NB A03 > C18D SPP NM A01 (C01) C18D SPP /NB/ A01 (C01) > enabled C1 disconnect > otherwise disabled it" > Maybe NB means northbridge? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-12 16:59 ` Working nforce2, was " Jesse Allen 2003-12-12 17:18 ` Jesse Allen @ 2003-12-12 18:18 ` Josh McKinney 2003-12-12 19:29 ` Jesse Allen ` (2 more replies) 2003-12-13 6:34 ` Bob 2 siblings, 3 replies; 13+ messages in thread From: Josh McKinney @ 2003-12-12 18:18 UTC (permalink / raw) To: linux-kernel On approximately Fri, Dec 12, 2003 at 09:59:29AM -0700, Jesse Allen wrote: > On Fri, Dec 12, 2003 at 04:27:59AM -0500, Bob wrote: > > Jesse Allen wrote: > > > > >On Thu, Dec 11, 2003 at 06:52:41PM +0100, Ian Kumlien wrote: > > > > > > > > >>Heh, yeah, the need for disconnect is somewhat dodgy, i haven't read up > > >>on th rest. > > >> > > >Hmm, weird. I went to go look at the Shuttle motherboard maker's site - > > >maybe so that I can bug them for a bios disconnect option - but I checked > > >for a bios update first. And sure enough like they read my mind, just > > >posted online today, an update. Here are the details of fixes: > > > > > >" Checksum: 8B00H Date Code: 12/05/03 > > >1.Support 0.18 micron AMD Duron (Palomino) CPU. > > >2.Add C1 disconnect item." > > > > > >It's almost as they're reading this list. This disconnect problem was > > >discovered on the 5th (well the 5th in my timezone). Perhaps they're > > >aware of this issue... I'm gonna talk to them. > > > > > >Jesse > > > > > A bios update for MSI K7N2 MCP2-T nforce2 board > > fixed the crashing BEFORE these patches were developed, > > but there was no documentation that would relate or explain. > > Last night, I updated the bios to the 12-5-03 released yesterday (see above). I looked at the new option under Advanced Chipset Features, "C1 Disconnect". It has three selections: Auto, Enabled, Disabled. There seems to be no default. The item help says: > "Force En/Disabled > or Auto mode: > C17 IGP/SPP NB A03 > C18D SPP NM A01 (C01) > enabled C1 disconnect > otherwise disabled it" > > Auto sounded nice, so I selected that first. I compiled a new kernel without the disconnect off patch, or the ack delay. These are the exact patches I used on 2.6.0-test11: > patch-2.6.0-test11-bk8.bz2 > acpi-2.6.0t11.patch acpi bugfixes from Maciej. > nforce-ioapic-timer-2.6t11.patch from Ross Dickson. Timer patch. > forcedeth.patch Patch stolen from -test10-mm1? Unused. > forcedeth-update-2.patch Same. > > Sure enough, under this kernel, no lockups. Athcool reported Disconnect was "on". > <snip> > So the fix was absolutely a BIOS fix. It seems a lot of people have buggy BIOSes on nforce2 boards. Even some that have the option. I guess I haven't proved that it was the BIOS fix, because I haven't stressed it for a long period of time. But I don't believe I have to because I can do grep's and kernel compiles with disconnect on now, where before I couldn't (always been very easy to reproduce lockup). <snip> The thing that strikes me funny is that you get no crashes with the updated BIOS and Disconnect on, but without the updated BIOS we have to turn disconnect off with athcool or the patch? This makes me think that there is some voodoo going on in the BIOS update that they aren't saying, surprise surprise, or something is just slowing down the time it takes for it to crash. I say this because I have gone 5+ days without any of the patches from these threads, acpi apic lapic enabled, and CPU disconnect on as stated by athcool. This was with much stress testing, idle time, etc. One day I just ran a grep that I have done probably 30 times and boom, hang. Good luck, hope the BIOS is the trick, now off to see how I can get ASUS to put the C1 Disconnect in the next revision. -- Josh McKinney | Webmaster: http://joshandangie.org -------------------------------------------------------------------------- | They that can give up essential liberty Linux, the choice -o) | to obtain a little temporary safety deserve of the GNU generation /\ | neither liberty or safety. _\_v | -Benjamin Franklin ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-12 18:18 ` Josh McKinney @ 2003-12-12 19:29 ` Jesse Allen 2003-12-12 21:42 ` Craig Bradney 2003-12-13 4:18 ` Bob 2 siblings, 0 replies; 13+ messages in thread From: Jesse Allen @ 2003-12-12 19:29 UTC (permalink / raw) To: Josh McKinney; +Cc: linux-kernel On Fri, Dec 12, 2003 at 01:18:27PM -0500, Josh McKinney wrote: > > The thing that strikes me funny is that you get no crashes with the > updated BIOS and Disconnect on, but without the updated BIOS we have > to turn disconnect off with athcool or the patch? This makes me think > that there is some voodoo going on in the BIOS update that they aren't > saying, surprise surprise, Yes, it is weird. I've now asked shuttle for more information. > or something is just slowing down the time > it takes for it to crash. I say this because I have gone 5+ days > without any of the patches from these threads, acpi apic lapic > enabled, and CPU disconnect on as stated by athcool. This was with > much stress testing, idle time, etc. One day I just ran a grep that I > have done probably 30 times and boom, hang. I hope this is not the case! The one/two grep test worked flawlessly, but now if it's delayed, then I can't do that anymore. (but at least I have the bios option now! heh) I suggest you reference the Shuttle AN35 12-05-2003 BIOS, and maybe Bob's MSI, when you talk to Asus. If they can do it, then Asus should be able as well. Jesse ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-12 18:18 ` Josh McKinney 2003-12-12 19:29 ` Jesse Allen @ 2003-12-12 21:42 ` Craig Bradney 2003-12-13 4:18 ` Bob 2 siblings, 0 replies; 13+ messages in thread From: Craig Bradney @ 2003-12-12 21:42 UTC (permalink / raw) To: Josh McKinney; +Cc: linux-kernel On Fri, 2003-12-12 at 19:18, Josh McKinney wrote: > On approximately Fri, Dec 12, 2003 at 09:59:29AM -0700, Jesse Allen wrote: > > On Fri, Dec 12, 2003 at 04:27:59AM -0500, Bob wrote: > > > Jesse Allen wrote: > > > > > > >On Thu, Dec 11, 2003 at 06:52:41PM +0100, Ian Kumlien wrote: > > > > > > > > > > > >>Heh, yeah, the need for disconnect is somewhat dodgy, i haven't read up > > > >>on th rest. > > > >> > > > >Hmm, weird. I went to go look at the Shuttle motherboard maker's site - > > > >maybe so that I can bug them for a bios disconnect option - but I checked > > > >for a bios update first. And sure enough like they read my mind, just > > > >posted online today, an update. Here are the details of fixes: > > > > > > > >" Checksum: 8B00H Date Code: 12/05/03 > > > >1.Support 0.18 micron AMD Duron (Palomino) CPU. > > > >2.Add C1 disconnect item." > > > > > > > >It's almost as they're reading this list. This disconnect problem was > > > >discovered on the 5th (well the 5th in my timezone). Perhaps they're > > > >aware of this issue... I'm gonna talk to them. > > > > > > > >Jesse > > > > > > > A bios update for MSI K7N2 MCP2-T nforce2 board > > > fixed the crashing BEFORE these patches were developed, > > > but there was no documentation that would relate or explain. > > > > Last night, I updated the bios to the 12-5-03 released yesterday (see above). I looked at the new option under Advanced Chipset Features, "C1 Disconnect". It has three selections: Auto, Enabled, Disabled. There seems to be no default. The item help says: > > "Force En/Disabled > > or Auto mode: > > C17 IGP/SPP NB A03 > > C18D SPP NM A01 (C01) > > enabled C1 disconnect > > otherwise disabled it" > > > > Auto sounded nice, so I selected that first. I compiled a new kernel without the disconnect off patch, or the ack delay. These are the exact patches I used on 2.6.0-test11: > > patch-2.6.0-test11-bk8.bz2 > > acpi-2.6.0t11.patch acpi bugfixes from Maciej. > > nforce-ioapic-timer-2.6t11.patch from Ross Dickson. Timer patch. > > forcedeth.patch Patch stolen from -test10-mm1? Unused. > > forcedeth-update-2.patch Same. > > > > Sure enough, under this kernel, no lockups. Athcool reported Disconnect was "on". > > > <snip> > > So the fix was absolutely a BIOS fix. It seems a lot of people have buggy BIOSes on nforce2 boards. Even some that have the option. I guess I haven't proved that it was the BIOS fix, because I haven't stressed it for a long period of time. But I don't believe I have to because I can do grep's and kernel compiles with disconnect on now, where before I couldn't (always been very easy to reproduce lockup). > <snip> > > The thing that strikes me funny is that you get no crashes with the > updated BIOS and Disconnect on, but without the updated BIOS we have > to turn disconnect off with athcool or the patch? This makes me think > that there is some voodoo going on in the BIOS update that they aren't > saying, surprise surprise, or something is just slowing down the time > it takes for it to crash. I say this because I have gone 5+ days > without any of the patches from these threads, acpi apic lapic > enabled, and CPU disconnect on as stated by athcool. This was with > much stress testing, idle time, etc. One day I just ran a grep that I > have done probably 30 times and boom, hang. > > Good luck, hope the BIOS is the trick, now off to see how I can get > ASUS to put the C1 Disconnect in the next revision. Yes, thats how it was for me.. I was the only one here saying "no problems, la la la", then at about 5.25 days.. boom. Then the next day it crashed twice. Hopefully you make some progress with ASUS.. (for the A7N8X Deluxe as well as you mobo please :) ). Ive been playing with hardware in the past few days (new quieter Zalman PSU, and Zalman 7000 Cu fan etc) so no uptime to speak of here now. I did compile KDE 3.2 beta 2 last night though.. 6 hours of solid compilation.. no hassles. I have never turned off Disconnect either. Thanks to all you guys who are working on this one. Seems to be getting somewhere. Craig ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-12 18:18 ` Josh McKinney 2003-12-12 19:29 ` Jesse Allen 2003-12-12 21:42 ` Craig Bradney @ 2003-12-13 4:18 ` Bob 2 siblings, 0 replies; 13+ messages in thread From: Bob @ 2003-12-13 4:18 UTC (permalink / raw) To: linux-kernel Re: two instances of good but undocumented bios voodoo Josh McKinney wrote: >On approximately Fri, Dec 12, 2003 at 09:59:29AM -0700, Jesse Allen wrote: > > >>On Fri, Dec 12, 2003 at 04:27:59AM -0500, Bob wrote: >> >> >>>Jesse Allen wrote: >>> >>> >>> >>>>On Thu, Dec 11, 2003 at 06:52:41PM +0100, Ian Kumlien wrote: >>>> >>>> >>>> >>>> >>>> ............ >>>> >>>>but I checked >>>>for a bios update first. And sure enough like they read my mind, just >>>>posted online today, an update. Here are the details of fixes: >>>> >>>>" Checksum: 8B00H Date Code: 12/05/03 >>>>1.Support 0.18 micron AMD Duron (Palomino) CPU. >>>>2.Add C1 disconnect item."..........Jesse >>>> >>>> -Jesse got a bios update that gives him a cpu disconnect option now in setup >>>> >>>> >>>A bios update for MSI K7N2 MCP2-T nforce2 board >>>fixed the crashing BEFORE these patches were developed, >>>but there was no documentation that would relate or explain. >>> >>> -Bob said that about his bios update fixing the lockup problem entirely, but no doc, needing no patch except to turn on ioapic edge timer(another clue--without ioapic edge timer working bios update fixed this nforce2 situation!), no clue as to whether bios update sets cpu disconnect one way or the other, no opt to choose cpu disconnect in new or old setup. Jesse continues-- >>Last night, I updated the bios to the 12-5-03 released yesterday (see above). I looked at the new option under Advanced Chipset Features, "C1 Disconnect". It has three selections: Auto, Enabled, Disabled. There seems to be no default. The item help says: >>"Force En/Disabled >> or Auto mode: >> C17 IGP/SPP NB A03 >> C18D SPP NM A01 (C01) >> enabled C1 disconnect >> otherwise disabled it" >> >>Auto sounded nice, so I selected that first. I compiled a new kernel without the disconnect off patch, or the ack delay. These are the exact patches I used on 2.6.0-test11: >>patch-2.6.0-test11-bk8.bz2 >>acpi-2.6.0t11.patch acpi bugfixes from Maciej. >>nforce-ioapic-timer-2.6t11.patch from Ross Dickson. Timer patch. >>forcedeth.patch Patch stolen from -test10-mm1? Unused. >>forcedeth-update-2.patch Same. >> >>Sure enough, under this kernel, no lockups. Athcool reported Disconnect was "on". >> >> Disconnect was ON!!! > <snip> ...one case the bios update fixed the problem without needing cpu disconnect off, the other case we don't know how or whether cpu disconnect is on or off now but bios update fixed nforce2 without turning ioapic edge timer on. I guess these two case prove that neither cpu disconnect =on or ioapic timer =off are causing the problem directly. >The thing that strikes me funny is that you get no crashes with the >updated BIOS and Disconnect on, but without the updated BIOS we have >to turn disconnect off with athcool or the patch? This makes me think >that there is some voodoo going on in the BIOS update that they aren't >saying, surprise surprise, or something is just slowing down the time >it takes for it to crash. I say this because I have gone 5+ days >without any of the patches from these threads, acpi apic lapic >enabled, and CPU disconnect on as stated by athcool. This was with >much stress testing, idle time, etc. One day I just ran a grep that I >have done probably 30 times and boom, hang. > >Good luck, hope the BIOS is the trick, now off to see how I can get >ASUS to put the C1 Disconnect in the next revision. > ...and at least two motherboard makers have voodoo to fix the problem. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered 2003-12-12 16:59 ` Working nforce2, was " Jesse Allen 2003-12-12 17:18 ` Jesse Allen 2003-12-12 18:18 ` Josh McKinney @ 2003-12-13 6:34 ` Bob 2 siblings, 0 replies; 13+ messages in thread From: Bob @ 2003-12-13 6:34 UTC (permalink / raw) To: linux-kernel hackers be clever-- "system temperature was getting -- above 40 deg C. CPU was getting up to 49 deg C...how poorly it's thermal management was operating then. Now with the new patches, and ultimately, BIOS update, system temperature is about 35 deg C -JesseAllen" Maybe that tells me that my bios update fixed my lockup problems without turning on cpu disconnect or even by turning it off with no doc as face-saver and not allowing me to see a choice in setup, since like yours before cpu disconnect working my temp is 41C most of the time and 48C under a heavy load, possibly 49C, the exact range you are looking at before you had cpu disconnect working or they turned cpu disconnect off without saying anything, buying time, saving embarrassment anyway it's probably off here since I have exactly the same heat profile I have 120mm fans one in one out, blowing air across Zalman cpu and gpu heatsinks, no 80mm extra Zalman fan. amd xp 3000+ 333mhz 1:1 arctic silver compound on heatsinks Thermal 1: ok, 41.0 degrees C 105.8 degrees F - 41C in X, running realplayer - 48C compile a fat kernel or several heavy tasks -Bob Jesse Allen wrote: > ....I compiled a new kernel without the disconnect off patch, or the > ack delay. These are the exact patches I used on 2.6.0-test11: > >patch-2.6.0-test11-bk8.bz2 >acpi-2.6.0t11.patch acpi bugfixes from Maciej. >nforce-ioapic-timer-2.6t11.patch from Ross Dickson. Timer patch. >forcedeth.patch Patch stolen from -test10-mm1? Unused. >forcedeth-update-2.patch Same. > >Sure enough, under this kernel, no lockups. Athcool reported Disconnect was "on". > >I decided to wait till this morning, to try the BIOS "C1 Disconnect" set to enabled. Still no lockups under this kernel. Tried a vanilla kernel, no lockups (but timer and watchdog messed up still). Now that I read your message Bob, I understand what you are saying. Luckily, the updated BIOS changelog states "Add C1 disconnect item." And this exact version seems to have fixed it, and now we have an exact fix (another one?) to refer to. > >So the fix was absolutely a BIOS fix. > ...but we're stuck looking at smoke and mirrors, when the kernel might be able to work around bioses that have not been "updated". Or to put it another way, "voodoo" may be done by kernel if not done by bios. Whatever is being tweaked may be accessible to kernel code. I can't read anything useful in my bios flash file w6570nms.760 which is contained in-- >>http://download.msi.com.tw/support/bos_exe/6570v76.exe >> >>Nvidia X driver for ti4200 agp8 still locks up linux though, >>but X nv works fine. agp8 3d may expose the timer issue. >> >> >> > >That's either an nvidia driver problem, or agpgart-nforce problem. I'd try 4x agp, and or NVAGP (or agpgart, if already using NVAGP). If you think it's the timer, try the timer patch, or with nolapic noapic. > >Jesse > Thanks, I've tried all of those except passing agp4 or agp2 to the nvidia X "nvidia" driver. Another clue that it's related to interrupts or timing of access to interrupts is that before I put another card on the pci bus I could get into X for a few seconds with the nvidia driver before linux locked up, now with an elan pcmcia 32-bit cardbus pci card that claims it needs its own interrupt(can't give it one yet!) X just locks up linux on load. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2003-12-16 7:18 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-12-13 5:16 Working nforce2, was Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered Ross Dickson
2003-12-13 6:04 ` Jesse Allen
[not found] <200312132040.00875.ross@datscreative.com.au>
2003-12-13 12:00 ` Fwd: " Bob
2003-12-15 13:11 ` Maciej W. Rozycki
2003-12-16 7:18 ` Bob
-- strict thread matches above, loose matches on Subject: below --
2003-12-15 14:30 Fwd: " Ross Dickson
2003-12-15 15:02 ` Craig Bradney
2003-12-15 16:54 ` Ross Dickson
2003-12-16 6:07 ` Bob
2003-12-13 9:20 Ross Dickson
2003-12-13 9:51 ` Bob
2003-12-07 13:12 Ross Dickson
2003-12-11 6:55 ` Ross Dickson
2003-12-11 11:47 ` Ian Kumlien
2003-12-11 9:12 ` Ross Dickson
2003-12-11 17:52 ` Ian Kumlien
2003-12-11 18:21 ` Jesse Allen
2003-12-12 9:27 ` Bob
2003-12-12 16:59 ` Working nforce2, was " Jesse Allen
2003-12-12 17:18 ` Jesse Allen
2003-12-12 18:18 ` Josh McKinney
2003-12-12 19:29 ` Jesse Allen
2003-12-12 21:42 ` Craig Bradney
2003-12-13 4:18 ` Bob
2003-12-13 6:34 ` Bob
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox