[2.4] Nforce2 oops and occasional hang (tried the lockups patch, no difference)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [2.4] Nforce2 oops and occasional hang (tried the lockups patch, no difference)
@ 2003-12-11 16:16 Disconnect
  2003-12-11 16:38 ` Josh McKinney
  2003-12-11 17:22 ` Disconnect
  0 siblings, 2 replies; 12+ messages in thread
From: Disconnect @ 2003-12-11 16:16 UTC (permalink / raw)
  To: lkml

I've posted this a couple of times, with no response.

So long as memory pressure is kept to a minimum, the system is stable. 
Running without swap and without serious work kept it going for a couple
of weeks.

Running (currently with the nforce2-lockups patches and HZ=1000 but no
APIC/IO-APIC) results in the same oopses as every kernel I've tried,
including 2.4.22.

Suggestions? I'd really love to be able to use this thing reliably under
Linux :(

Unable to handle kernel NULL pointer dereference at virtual address 00000089
c012dff7
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c012dff7>]    Tainted: P
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010206
eax: 00000081   ebx: cff72b58   ecx: cee3a000   edx: 0000cad6
esi: 00001000   edi: cbabd9b4   ebp: 00000081   esp: cee3bf08
ds: 0018   es: 0018   ss: 0018
Process giftd (pid: 29738, stackpage=cee3b000)
Stack: c03155c0 c238b1c0 c44671c0 00000038 cee3a000 00000eea 00000000 00000000
       00000116 cbabd900 c012e6a0 cee3bf74 c2974660 c2974640 00001000 00001000
       00000000 00000000 c012e7f2 c2974640 c2974660 cee3bf74 c012e6a0 c0222e67
Call Trace:    [<c012e6a0>] [<c012e7f2>] [<c012e6a0>] [<c0222e67>] [<c013d413>]
  [<c0108fdf>]
Code: 39 78 08 74 05 8b 40 10 eb f2 39 68 0c 75 f6 85 c0 89 c6 0f
 
 
>>EIP; c012dff7 <do_generic_file_read+157/4e0>   <=====
 
>>ebx; cff72b58 <_end+fc02f6c/104a5494>
>>ecx; cee3a000 <_end+eaca414/104a5494>
>>edi; cbabd9b4 <_end+b74ddc8/104a5494>
>>esp; cee3bf08 <_end+eacc31c/104a5494>
 
Trace; c012e6a0 <file_read_actor+0/a0>
Trace; c012e7f2 <generic_file_read+b2/1a0>
Trace; c012e6a0 <file_read_actor+0/a0>
Trace; c0222e67 <sys_send+37/40>
Trace; c013d413 <sys_read+a3/110>
Trace; c0108fdf <system_call+33/38>
 
Code;  c012dff7 <do_generic_file_read+157/4e0>
00000000 <_EIP>:
Code;  c012dff7 <do_generic_file_read+157/4e0>   <=====
   0:   39 78 08                  cmp    %edi,0x8(%eax)   <=====
Code;  c012dffa <do_generic_file_read+15a/4e0>
   3:   74 05                     je     a <_EIP+0xa>
Code;  c012dffc <do_generic_file_read+15c/4e0>
   5:   8b 40 10                  mov    0x10(%eax),%eax
Code;  c012dfff <do_generic_file_read+15f/4e0>
   8:   eb f2                     jmp    fffffffc <_EIP+0xfffffffc>
Code;  c012e001 <do_generic_file_read+161/4e0>
   a:   39 68 0c                  cmp    %ebp,0xc(%eax)
Code;  c012e004 <do_generic_file_read+164/4e0>
   d:   75 f6                     jne    5 <_EIP+0x5>
Code;  c012e006 <do_generic_file_read+166/4e0>
   f:   85 c0                     test   %eax,%eax
Code;  c012e008 <do_generic_file_read+168/4e0>
  11:   89 c6                     mov    %eax,%esi
Code;  c012e00a <do_generic_file_read+16a/4e0>
  13:   0f 00 00                  sldtl  (%eax)

-- 
Disconnect <lkml@sigkill.net>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [2.4] Nforce2 oops and occasional hang (tried the lockups patch, no difference)
  2003-12-11 16:16 [2.4] Nforce2 oops and occasional hang (tried the lockups patch, no difference) Disconnect
@ 2003-12-11 16:38 ` Josh McKinney
  2003-12-11 17:19   ` Disconnect
  2003-12-11 17:22 ` Disconnect
  1 sibling, 1 reply; 12+ messages in thread
From: Josh McKinney @ 2003-12-11 16:38 UTC (permalink / raw)
  To: lkml

On approximately Thu, Dec 11, 2003 at 11:16:20AM -0500, Disconnect wrote:
> I've posted this a couple of times, with no response.
> 
> So long as memory pressure is kept to a minimum, the system is stable. 
> Running without swap and without serious work kept it going for a couple
> of weeks.
> 
> Running (currently with the nforce2-lockups patches and HZ=1000 but no
> APIC/IO-APIC) results in the same oopses as every kernel I've tried,
> including 2.4.22.
> 
> Suggestions? I'd really love to be able to use this thing reliably under
> Linux :(
> 

Do you see hard locks with APIC/IO-APIC enabled?  Do you see this oops
with APIC/IO-APIC enabled?  Just curious because I can reproduce the
hard locks eventually, but never get an oops.  Also, what patches are
you running and what kernel version?

> Unable to handle kernel NULL pointer dereference at virtual address 00000089
> c012dff7
> *pde = 00000000
> Oops: 0000
> CPU:    0
> EIP:    0010:[<c012dff7>]    Tainted: P
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010206
> eax: 00000081   ebx: cff72b58   ecx: cee3a000   edx: 0000cad6
> esi: 00001000   edi: cbabd9b4   ebp: 00000081   esp: cee3bf08
> ds: 0018   es: 0018   ss: 0018
> Process giftd (pid: 29738, stackpage=cee3b000)
> Stack: c03155c0 c238b1c0 c44671c0 00000038 cee3a000 00000eea 00000000 00000000
>        00000116 cbabd900 c012e6a0 cee3bf74 c2974660 c2974640 00001000 00001000
>        00000000 00000000 c012e7f2 c2974640 c2974660 cee3bf74 c012e6a0 c0222e67
> Call Trace:    [<c012e6a0>] [<c012e7f2>] [<c012e6a0>] [<c0222e67>] [<c013d413>]
>   [<c0108fdf>]
> Code: 39 78 08 74 05 8b 40 10 eb f2 39 68 0c 75 f6 85 c0 89 c6 0f
>  
>  
> >>EIP; c012dff7 <do_generic_file_read+157/4e0>   <=====
>  
> >>ebx; cff72b58 <_end+fc02f6c/104a5494>
> >>ecx; cee3a000 <_end+eaca414/104a5494>
> >>edi; cbabd9b4 <_end+b74ddc8/104a5494>
> >>esp; cee3bf08 <_end+eacc31c/104a5494>
>  
> Trace; c012e6a0 <file_read_actor+0/a0>
> Trace; c012e7f2 <generic_file_read+b2/1a0>
> Trace; c012e6a0 <file_read_actor+0/a0>
> Trace; c0222e67 <sys_send+37/40>
> Trace; c013d413 <sys_read+a3/110>
> Trace; c0108fdf <system_call+33/38>
>  
> Code;  c012dff7 <do_generic_file_read+157/4e0>
> 00000000 <_EIP>:
> Code;  c012dff7 <do_generic_file_read+157/4e0>   <=====
>    0:   39 78 08                  cmp    %edi,0x8(%eax)   <=====
> Code;  c012dffa <do_generic_file_read+15a/4e0>
>    3:   74 05                     je     a <_EIP+0xa>
> Code;  c012dffc <do_generic_file_read+15c/4e0>
>    5:   8b 40 10                  mov    0x10(%eax),%eax
> Code;  c012dfff <do_generic_file_read+15f/4e0>
>    8:   eb f2                     jmp    fffffffc <_EIP+0xfffffffc>
> Code;  c012e001 <do_generic_file_read+161/4e0>
>    a:   39 68 0c                  cmp    %ebp,0xc(%eax)
> Code;  c012e004 <do_generic_file_read+164/4e0>
>    d:   75 f6                     jne    5 <_EIP+0x5>
> Code;  c012e006 <do_generic_file_read+166/4e0>
>    f:   85 c0                     test   %eax,%eax
> Code;  c012e008 <do_generic_file_read+168/4e0>
>   11:   89 c6                     mov    %eax,%esi
> Code;  c012e00a <do_generic_file_read+16a/4e0>
>   13:   0f 00 00                  sldtl  (%eax)
> 
> -- 
> Disconnect <lkml@sigkill.net>
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Josh McKinney		     |	Webmaster: http://joshandangie.org
--------------------------------------------------------------------------
                             | They that can give up essential liberty
Linux, the choice       -o)  | to obtain a little temporary safety deserve 
of the GNU generation    /\  | neither liberty or safety. 
                        _\_v |                          -Benjamin Franklin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [2.4] Nforce2 oops and occasional hang (tried the lockups patch, no difference)
  2003-12-11 16:38 ` Josh McKinney
@ 2003-12-11 17:19   ` Disconnect
  0 siblings, 0 replies; 12+ messages in thread
From: Disconnect @ 2003-12-11 17:19 UTC (permalink / raw)
  To: lkml

On Thu, 2003-12-11 at 11:38, Josh McKinney wrote:
> Do you see hard locks with APIC/IO-APIC enabled?  Do you see this oops
> with APIC/IO-APIC enabled?  Just curious because I can reproduce the
> hard locks eventually, but never get an oops.  Also, what patches are
> you running and what kernel version?

With apic/io-apic enabled it crashes hard during boot (with oopses, as I
recall).  I haven't tested apic support with the fixup patches.

The kernel is 2.4.23 plus the patches from "Fixes for nforce2 hard
lockup, apic, io-apic, udma133 covered"
(http://marc.theaimsgroup.com/?l=linux-kernel&m=107080280512734&w=2) and
athcool was running. (Its currently disabled, to see if that might have
something to do with it.)  The machine hasn't been rebooted since the
oops, since it seems to be limping along.  (I did find some minorly
corrupted data earlier; libbz2 had to be reinstalled.  Once its
straightened out I'll run debsums against it and fix up anything that
got mangled..)

The same oops was occurring in stock 2.4.22.

-- 
Disconnect <lkml@sigkill.net>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [2.4] Nforce2 oops and occasional hang (tried the lockups patch, no difference)
  2003-12-11 16:16 [2.4] Nforce2 oops and occasional hang (tried the lockups patch, no difference) Disconnect
  2003-12-11 16:38 ` Josh McKinney
@ 2003-12-11 17:22 ` Disconnect
  1 sibling, 0 replies; 12+ messages in thread
From: Disconnect @ 2003-12-11 17:22 UTC (permalink / raw)
  To: lkml

Oh, and the modules list:
Module                  Size  Used by    Tainted: P
i2c-dev                 4548   0 (unused)
i2c-core               13604   0 [i2c-dev]
ipt_mark                 472  18
ipt_state                568   0 (unused)
ipt_mac                  664   3
ip_nat_irc              2288   0 (unused)
ip_conntrack_irc        3120   1
ip_conntrack_ftp        4144   1 (autoclean)
ip_nat_ftp              3024   0 (unused)
iptable_filter          1772   1
iptable_mangle          2168   1
ipt_LOG                 3480   0 (unused)
ipt_TOS                 1016   0 (unused)
ipt_REJECT              3544   0 (unused)
ipt_MARK                 792   4
ipt_MASQUERADE          1592   8
ipt_REDIRECT             824   4
iptable_nat            17102   3 [ip_nat_irc ip_nat_ftp ipt_MASQUERADE ipt_REDIRECT]
ip_conntrack           21892   4 [ipt_state ip_nat_irc ip_conntrack_irc ip_conntrack_ftp ip_nat_ftp ipt_MASQUERADE ipt_REDIRECT iptable_nat]
ip_tables              12544  14 [ipt_mark ipt_state ipt_mac iptable_filter iptable_mangle ipt_LOG ipt_TOS ipt_REJECT ipt_MARK ipt_MASQUERADE ipt_REDIRECT iptable_nat]
soundcore               4036   0 (autoclean)
sd_mod                 11084   0 (autoclean) (unused)
sg                     28540   0 (autoclean) (unused)
sr_mod                 15640   0 (autoclean) (unused)
scsi_mod               89056   3 (autoclean) [sd_mod sg sr_mod]
ide-cd                 32096   0 (autoclean)
cdrom                  28320   0 (autoclean) [sr_mod ide-cd]
nvnet                  26400   1
mousedev                4276   0 (unused)
hid                    21572   0 (unused)
usbmouse                2264   0 (unused)
ehci-hcd               17868   0 (unused)
ipv6                  182548  -1
tulip                  40960   1
crc32                   2880   0 [tulip]
keybdev                 2084   0 (unused)
usbkbd                  3640   0 (unused)
input                   3488   0 [mousedev hid usbmouse keybdev usbkbd]
usb-ohci               19656   0 (unused)
usbcore                65100   1 [hid usbmouse ehci-hcd usbkbd usb-ohci]

-- 
Disconnect <lkml@sigkill.net>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [2.4] Nforce2 oops and occasional hang (tried the lockups patch, no difference)
@ 2003-12-13  2:25 Ross Dickson
  2003-12-13  5:02 ` Bob
  2003-12-15 16:40 ` Disconnect
  0 siblings, 2 replies; 12+ messages in thread
From: Ross Dickson @ 2003-12-13  2:25 UTC (permalink / raw)
  To: lkml; +Cc: linux-kernel

Oh, and the modules list: 
 Module Size Used by Tainted: P 
 i2c-dev 4548 0 (unused) 
 i2c-core 13604 0 [i2c-dev]
<snip>

I am not certain your problems are nforce2 type specific.
Standard response: I don't suppose you can try a different stick of ram?

The reason I say that is that oops were very uncommon on either the 
epox 8rga+ or albatron km18G-pro MOBOS upon which I developed my
patches. Hard lockups were pretty much all I experienced prior to the 
patches except for an occasional X fail. Base OS flavour I
use is Suse 8.2 including gcc version (web updates utilised)

The udma patches are really just a cleanup on the address setup timing so
I do not think that they are a factor. 

The local apic ack delay timing patch needs athlon cpu and amd/nvidia ide on in 
kern config to kick in. If you are using it then I highly recommend uniprocessor 
ioapic config as well to go with it to route the 8254 timer irq0 through pin 0 of 
ioapic as using the apic config alone leaves a lot of ints generated on irq7 
which can cause problems. (Reason for 8259 making them spurious on irq7 
is explained in 8259A data sheet)

Also I now use a small patch to fixup proc info - only if you are using 
the 64 bit jiffies var hz patch, avail here:

http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/0838.html

If you try acpi=off on boot and it is then not very stable then I think it has 
little to do with lockups patch as that is my fallback mode when I am 
playing with apic ioapic code. 

Another fallback I use at times is 

hdparm -Xudma3 /dev/hda

Hope this helps the confusion

Regards
Ross.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [2.4] Nforce2 oops and occasional hang (tried the lockups patch, no difference)
  2003-12-13  2:25 Ross Dickson
@ 2003-12-13  5:02 ` Bob
  2003-12-15 16:40 ` Disconnect
  1 sibling, 0 replies; 12+ messages in thread
From: Bob @ 2003-12-13  5:02 UTC (permalink / raw)
  To: linux-kernel

Ross Dickson wrote:

>Oh, and the modules list: 
> Module Size Used by Tainted: P 
> i2c-dev 4548 0 (unused) 
> i2c-core 13604 0 [i2c-dev]
><snip>
>
>
>I am not certain your problems are nforce2 type specific.
>Standard response: I don't suppose you can try a different stick of ram?
>  
>
Yes, and stock settings with tested ram may
be necessary with nforce2, possibly related to our
timing-related voodoo culture(might overclock
later on in life as timing-related patches evolve).

I have a via board that only recognizes two of
four generic ram sticks, but second stick will cause
an oops soon after. Another via setup will oops
if any fast ram settings(4-way,csl2 etc) is attempted,
though only using tested cas2 ram. "Try a different
stick of ram".

On nforce2 I'm able to use bios "performance"
ram timing but if I manually tweak all the ram
settings up like I can do on other systems, I
get mem-related OOPS with nforce2.

acpi apic lapic amd pre-empt nforce2ide
(once you start you have to go all the way)

>The reason I say that is that oops were very uncommon on either the 
>epox 8rga+ or albatron km18G-pro MOBOS upon which I developed my
>patches. Hard lockups were pretty much all I experienced prior to the 
>patches except for an occasional X fail. Base OS flavour I
>use is Suse 8.2 including gcc version (web updates utilised)
>
>The udma patches are really just a cleanup on the address setup timing so
>I do not think that they are a factor. 
>
>The local apic ack delay timing patch needs athlon cpu and amd/nvidia ide on in 
>kern config to kick in. If you are using it then I highly recommend uniprocessor 
>ioapic config as well to go with it to route the 8254 timer irq0 through pin 0 of 
>ioapic as using the apic config alone leaves a lot of ints generated on irq7 
>which can cause problems. (Reason for 8259 making them spurious on irq7 
>is explained in 8259A data sheet)
>
>Also I now use a small patch to fixup proc info - only if you are using 
>the 64 bit jiffies var hz patch, avail here:
>
>http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-12/0838.html
>
>If you try acpi=off on boot and it is then not very stable then I think it has 
>little to do with lockups patch as that is my fallback mode when I am 
>playing with apic ioapic code. 
>
>Another fallback I use at times is 
>
>hdparm -Xudma3 /dev/hda
>
>Hope this helps the confusion
>
>Regards
>Ross
>
>  
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [2.4] Nforce2 oops and occasional hang (tried the lockups patch, no difference)
  2003-12-13  2:25 Ross Dickson
  2003-12-13  5:02 ` Bob
@ 2003-12-15 16:40 ` Disconnect
  2003-12-18 18:52   ` Disconnect
  1 sibling, 1 reply; 12+ messages in thread
From: Disconnect @ 2003-12-15 16:40 UTC (permalink / raw)
  To: ross; +Cc: lkml

Thanks greatly for the tips, btw.  Its much appreciated :)

Also, I think I forgot to mention (doh) its an Epox 8rda+.

On Fri, 2003-12-12 at 21:25, Ross Dickson wrote:
> I am not certain your problems are nforce2 type specific.
> Standard response: I don't suppose you can try a different stick of ram?

Unfortunately not - its a single kingston hyper-x 3200 stick.  (In
theory I'll be moving it up to 700M or a gig in the next couple months,
at which point I'll be in position to swap ram around and so forth.)  I
did get the bios all tuned and run memtest-mmx on it for 24 hours before
the system installation though, and it passed.  What I did yesterday is
turn the memory frequency down from 200 to 166. (Which leaves the cpu
overclocked by about 33 mhz, something I think it will survive just fine
;) ..)

> The local apic ack delay timing patch needs athlon cpu and amd/nvidia ide on in 
> kern config to kick in. If you are using it then I highly recommend uniprocessor 
> ioapic config as well to go with it to route the 8254 timer irq0 through pin 0 of 
> ioapic as using the apic config alone leaves a lot of ints generated on irq7 
> which can cause problems. (Reason for 8259 making them spurious on irq7 
> is explained in 8259A data sheet)

Thats how I'm running it now - its gone about 1 day without any oopses. 
(In the past it would go anywhere from hours to about a week, so the
results aren't in yet.)  

On the basis of it being a memory issue I poked around the epox site and
noticed something I hadn't seen before - they recommend what might be
different memory timings (I'll have to check if/when it crashes again):

If your PC3200 memory is not stable try the following BIOS settings:

Memory Frequency = 100%
Memory Timing = Expert
T(RAS) = 7
T(RCD) = 3
T(RP) = 3
CAS Latency = 2.5

Adjust the memory frequency above until you reach the resulting
frequency of 200MHz (for PC3200).

-- 
Disconnect <lkml@sigkill.net>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [2.4] Nforce2 oops and occasional hang (tried the lockups patch, no difference)
  2003-12-15 16:40 ` Disconnect
@ 2003-12-18 18:52   ` Disconnect
  2003-12-19 17:24     ` Disconnect
  0 siblings, 1 reply; 12+ messages in thread
From: Disconnect @ 2003-12-18 18:52 UTC (permalink / raw)
  To: lkml

Tues evening it hardlocked - no messages, no blinking keyboard lights,
nothing.  Yesterday it did it again, and this time it wouldn't get more
than 5-10 minutes of uptime without hanging no matter what I did to
memory/cpu timings.  (Even underclocked it to 133 and 1G with no
change.)  So its unfortunately back on a stock 2.4.23-pre9 with
noapic/noacpi. (It disables one of the sets of usb ports, as I recall,
but it mostly works...)

In theory I'll have the ram upgrades and such tonight, so I can swap the
services to another machine and stress it a bit with different sticks,
new drives, different controller, etc.  Other than memtest86 (and some
kernel builds, bonnie, etc), are there any recommended tests?

On Mon, 2003-12-15 at 11:40, Disconnect wrote:
> Thanks greatly for the tips, btw.  Its much appreciated :)
> 
> Also, I think I forgot to mention (doh) its an Epox 8rda+.
> 
> On Fri, 2003-12-12 at 21:25, Ross Dickson wrote:
> > I am not certain your problems are nforce2 type specific.
> > Standard response: I don't suppose you can try a different stick of ram?
> 
> Unfortunately not - its a single kingston hyper-x 3200 stick.  (In
> theory I'll be moving it up to 700M or a gig in the next couple months,
> at which point I'll be in position to swap ram around and so forth.)  I
> did get the bios all tuned and run memtest-mmx on it for 24 hours before
> the system installation though, and it passed.  What I did yesterday is
> turn the memory frequency down from 200 to 166. (Which leaves the cpu
> overclocked by about 33 mhz, something I think it will survive just fine
> ;) ..)
> 
> > The local apic ack delay timing patch needs athlon cpu and amd/nvidia ide on in 
> > kern config to kick in. If you are using it then I highly recommend uniprocessor 
> > ioapic config as well to go with it to route the 8254 timer irq0 through pin 0 of 
> > ioapic as using the apic config alone leaves a lot of ints generated on irq7 
> > which can cause problems. (Reason for 8259 making them spurious on irq7 
> > is explained in 8259A data sheet)
> 
> Thats how I'm running it now - its gone about 1 day without any oopses. 
> (In the past it would go anywhere from hours to about a week, so the
> results aren't in yet.)  
> 
> On the basis of it being a memory issue I poked around the epox site and
> noticed something I hadn't seen before - they recommend what might be
> different memory timings (I'll have to check if/when it crashes again):
> 
> If your PC3200 memory is not stable try the following BIOS settings:
> 
> Memory Frequency = 100%
> Memory Timing = Expert
> T(RAS) = 7
> T(RCD) = 3
> T(RP) = 3
> CAS Latency = 2.5
> 
> Adjust the memory frequency above until you reach the resulting
> frequency of 200MHz (for PC3200).
-- 
Disconnect <lkml@sigkill.net>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [2.4] Nforce2 oops and occasional hang (tried the lockups patch, no difference)
  2003-12-18 18:52   ` Disconnect
@ 2003-12-19 17:24     ` Disconnect
  2003-12-19 20:22       ` Craig Bradney
  0 siblings, 1 reply; 12+ messages in thread
From: Disconnect @ 2003-12-19 17:24 UTC (permalink / raw)
  To: lkml

On Thu, 2003-12-18 at 13:52, Disconnect wrote:
> memory/cpu timings.  (Even underclocked it to 133 and 1G with no
> change.)  So its unfortunately back on a stock 2.4.23-pre9 with
> noapic/noacpi. (It disables one of the sets of usb ports, as I recall,
> but it mostly works...)

Update: Underclocked from 1.8G to 1.2G (whups, meant to go down only
2-300mhz) and its been vaguely stable for about 1.5 days.  I don't have
another week (yet..) to run it under its normal load and wait for a
crash, so what I'm going to do is:
 - Move the workload (web/mail/..) to a different machine so this one
can be down for an extended period
 - Replace the ram with new sticks (they arrived this morning)
 - Reclock everything to stock (1.83G cpu, 200mhz ram and verify the
timings from kingston)
 - Replace the video card
 - Memtest86 until it cries
 - If it passes, bonnie++ on the new drives
 - If that passes, usb/acpi/apic testing with the associated patches

Anyone still watching this?  Tips and suggestions on what else might be
useful/informative are more than welcome.  The tests above mostly
replicate what I did when building this box, and it passed them then..

Recap:
 Epox 8rda+ nforce2 mobo
 AMD Athlon XP 2500+ (Barton) 1.83G
 Kingston HyperX PC3200
 WD Caviar WD1200JB 8M/UDMA100
 Antec case w/ 350W AMD-certified PSU

Oopses and occasional hangs, usually in do_generic_file_read, using
stock kernel.org 2.4.2x kernels.  Hardware passed testing (memtest86,
bonnie++) before I put Linux on it.

-- 
Disconnect <lkml@sigkill.net>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [2.4] Nforce2 oops and occasional hang (tried the lockups patch, no difference)
  2003-12-19 17:24     ` Disconnect
@ 2003-12-19 20:22       ` Craig Bradney
  2003-12-19 20:32         ` Disconnect
  2003-12-20 12:30         ` Voicu Liviu
  0 siblings, 2 replies; 12+ messages in thread
From: Craig Bradney @ 2003-12-19 20:22 UTC (permalink / raw)
  To: Disconnect; +Cc: lkml

On Fri, 2003-12-19 at 18:24, Disconnect wrote:
> On Thu, 2003-12-18 at 13:52, Disconnect wrote:
> > memory/cpu timings.  (Even underclocked it to 133 and 1G with no
> > change.)  So its unfortunately back on a stock 2.4.23-pre9 with
> > noapic/noacpi. (It disables one of the sets of usb ports, as I recall,
> > but it mostly works...)
> 
> Update: Underclocked from 1.8G to 1.2G (whups, meant to go down only
> 2-300mhz) and its been vaguely stable for about 1.5 days.  I don't have
> another week (yet..) to run it under its normal load and wait for a
> crash, so what I'm going to do is:
>  - Move the workload (web/mail/..) to a different machine so this one
> can be down for an extended period
>  - Replace the ram with new sticks (they arrived this morning)
>  - Reclock everything to stock (1.83G cpu, 200mhz ram and verify the
> timings from kingston)
>  - Replace the video card
>  - Memtest86 until it cries
>  - If it passes, bonnie++ on the new drives
>  - If that passes, usb/acpi/apic testing with the associated patches
> 
> Anyone still watching this?  Tips and suggestions on what else might be
> useful/informative are more than welcome.  The tests above mostly
> replicate what I did when building this box, and it passed them then..
> 
> Recap:
>  Epox 8rda+ nforce2 mobo
>  AMD Athlon XP 2500+ (Barton) 1.83G
>  Kingston HyperX PC3200
>  WD Caviar WD1200JB 8M/UDMA100
>  Antec case w/ 350W AMD-certified PSU
> 
> Oopses and occasional hangs, usually in do_generic_file_read, using
> stock kernel.org 2.4.2x kernels.  Hardware passed testing (memtest86,
> bonnie++) before I put Linux on it.

Does this not relate directly to the APIC/IOAPIC issues with 2.6 kernel
and nforce chipset motherboards? 

Craig


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [2.4] Nforce2 oops and occasional hang (tried the lockups patch, no difference)
  2003-12-19 20:22       ` Craig Bradney
@ 2003-12-19 20:32         ` Disconnect
  2003-12-20 12:30         ` Voicu Liviu
  1 sibling, 0 replies; 12+ messages in thread
From: Disconnect @ 2003-12-19 20:32 UTC (permalink / raw)
  To: Craig Bradney; +Cc: lkml

On Fri, 2003-12-19 at 15:22, Craig Bradney wrote:
> Does this not relate directly to the APIC/IOAPIC issues with 2.6 kernel
> and nforce chipset motherboards? 

Not when apic is disabled. (And I'm guessing you mean 2.4, right?
Although IIRC the patches are available for both.)

The patches for enabling/using apic/io-apic on nforce2 work fine, except
instead of oopsing it hardlocks semi-randomly. (It ran fine for a few
days, then hardlocked, and upon reboot it only made it about 10 minutes,
then another 30 minutes, so I went back to an unpatched 2.4.23 with
'noapic'.  Now that the hardware has arrived I can move production
elsewhere and do further testing.)  I was under the impression that
nmi-watchdog was supposed to prevent the hardlocks (well, turn them into
oopses), but no such luck here.

-- 
Disconnect <lkml@sigkill.net>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [2.4] Nforce2 oops and occasional hang (tried the lockups patch, no difference)
  2003-12-19 20:22       ` Craig Bradney
  2003-12-19 20:32         ` Disconnect
@ 2003-12-20 12:30         ` Voicu Liviu
  1 sibling, 0 replies; 12+ messages in thread
From: Voicu Liviu @ 2003-12-20 12:30 UTC (permalink / raw)
  To: Craig Bradney; +Cc: Disconnect, lkml

On Fri, 19 Dec 2003, Craig Bradney wrote:

> On Fri, 2003-12-19 at 18:24, Disconnect wrote:
> > On Thu, 2003-12-18 at 13:52, Disconnect wrote:
> > > memory/cpu timings.  (Even underclocked it to 133 and 1G with no
> > > change.)  So its unfortunately back on a stock 2.4.23-pre9 with
> > > noapic/noacpi. (It disables one of the sets of usb ports, as I recall,
> > > but it mostly works...)
> > 
> > Update: Underclocked from 1.8G to 1.2G (whups, meant to go down only
> > 2-300mhz) and its been vaguely stable for about 1.5 days.  I don't have
> > another week (yet..) to run it under its normal load and wait for a
> > crash, so what I'm going to do is:
> >  - Move the workload (web/mail/..) to a different machine so this one
> > can be down for an extended period
> >  - Replace the ram with new sticks (they arrived this morning)
> >  - Reclock everything to stock (1.83G cpu, 200mhz ram and verify the
> > timings from kingston)
> >  - Replace the video card
> >  - Memtest86 until it cries
> >  - If it passes, bonnie++ on the new drives
> >  - If that passes, usb/acpi/apic testing with the associated patches
> > 
> > Anyone still watching this?  Tips and suggestions on what else might be
> > useful/informative are more than welcome.  The tests above mostly
> > replicate what I did when building this box, and it passed them then..
> > 
> > Recap:
> >  Epox 8rda+ nforce2 mobo
I have Epox 8rda3+
> >  AMD Athlon XP 2500+ (Barton) 1.83G
same
> >  Kingston HyperX PC3200
corsair twinx 512 (2 stiks of 256)
> >  WD Caviar WD1200JB 8M/UDMA100
seagate
> >  Antec case w/ 350W AMD-certified PSU
black case (not something special)
My system works with 2.4 and 2.6 even overclocked to 10x190 (1900 Mhz)
Cheers

> > 
> > Oopses and occasional hangs, usually in do_generic_file_read, using
> > stock kernel.org 2.4.2x kernels.  Hardware passed testing (memtest86,
> > bonnie++) before I put Linux on it.
> 
> Does this not relate directly to the APIC/IOAPIC issues with 2.6 kernel
> and nforce chipset motherboards? 
> 
> Craig
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
Liviu Voicu
Assistant Programmer and network support
Computation Center, Mount Scopus
Hebrew University of Jerusalem
Tel: 972(2)-5881253
E-mail: "Liviu Voicu"<pacman@mscc.huji.ac.il>

/**
 * cat /usr/src/linux/arch/i386/boot/bzImage > /dev/dsp
 * ( and the voice of God will be heard! )
 *
 */

Click here to see my GPG signature:
----------------------------------
	http://search.keyserver.net:11371/pks/lookup?template=netensearch%2Cnetennomatch%2Cnetenerror&search=pacman%40mscc.huji.ac.il&op=vindex&fingerprint=on&submit=Get+List


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2003-12-20 12:30 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-12-11 16:16 [2.4] Nforce2 oops and occasional hang (tried the lockups patch, no difference) Disconnect
2003-12-11 16:38 ` Josh McKinney
2003-12-11 17:19   ` Disconnect
2003-12-11 17:22 ` Disconnect
  -- strict thread matches above, loose matches on Subject: below --
2003-12-13  2:25 Ross Dickson
2003-12-13  5:02 ` Bob
2003-12-15 16:40 ` Disconnect
2003-12-18 18:52   ` Disconnect
2003-12-19 17:24     ` Disconnect
2003-12-19 20:22       ` Craig Bradney
2003-12-19 20:32         ` Disconnect
2003-12-20 12:30         ` Voicu Liviu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox