* kernel crashes with 2.5/2.8
@ 2004-12-01 17:09 John Newman
[not found] ` <Pine.LNX.4.53.0412011815090.1909@yvahk01.tjqt.qr>
2004-12-01 19:16 ` Chris Wright
0 siblings, 2 replies; 4+ messages in thread
From: John Newman @ 2004-12-01 17:09 UTC (permalink / raw)
To: linux-kernel; +Cc: mhenderson
Hello everyone,
I've been experiencing (seemingly) random kernel Oops with various
versions of 2.5 and 2.8 on a Dell Poweredge 2650. The machine is
configured with 2GB of ram, a 5GB swap file and two 2.8GHZ Xeons. It
is a NIS/NFS server with an internal hardware raid 5 array on the
Dell Perc card, using the aacraid driver. The RAID5 array comes out
to about 500GB and is where all our user data is stored for NIS users.
That data and the NIS information is accessed by a variety of older
and newer Linux and Sun boxes.
Here is the latest oops from my klog, which happened about a week ago:
Nov 24 21:34:10 ptscorp-nis01 kernel: Unable to handle kernel paging
request at virtual address 01000004
Nov 24 21:34:10 ptscorp-nis01 kernel: printing eip:
Nov 24 21:34:10 ptscorp-nis01 kernel: 02144496
Nov 24 21:34:10 ptscorp-nis01 kernel: *pde = 00005001
Nov 24 21:34:10 ptscorp-nis01 kernel: Oops: 0002 [#1]
Nov 24 21:34:10 ptscorp-nis01 kernel: SMP
Nov 24 21:34:10 ptscorp-nis01 kernel: Modules linked in:
iptable_filter ip_tables nfsd exportfs lockd md5 ipv6 autofs4 sunrpc
tg3 microcode dm_mod ohci_hcd button battery asus_acpi ac ext3 jbd
aacraid sd_mod scsi_mod
Nov 24 21:34:10 ptscorp-nis01 kernel: CPU: 3
Nov 24 21:34:10 ptscorp-nis01 kernel: EIP: 0060:[<02144496>] Not tainted
Nov 24 21:34:10 ptscorp-nis01 kernel: EFLAGS: 00010006 (2.6.8-1.521smp)
Nov 24 21:34:10 ptscorp-nis01 kernel: EIP is at free_block+0x3d/0xd9
Nov 24 21:34:10 ptscorp-nis01 kernel: eax: 01000000 ebx: 784cb000
ecx: 784cbe38 edx: 413bd000
Nov 24 21:34:10 ptscorp-nis01 kernel: esi: 81fa0c80 edi: 0000001b
ebp: 00000006 esp: 39e32de4
Nov 24 21:34:10 ptscorp-nis01 kernel: ds: 007b es: 007b ss: 0068
Nov 24 21:34:10 ptscorp-nis01 kernel: Process pdflush (pid: 57,
threadinfo=39e32000 task=39e1e230)
Nov 24 21:34:10 ptscorp-nis01 kernel: Stack: 0426b090 81f96800
0426b090 2d1c2148 0000001b 021445c9 0426b080 81fa0c80
Nov 24 21:34:10 ptscorp-nis01 kernel: 0000616d 0232cd88
0426b080 0426b090 2d1c2148 00000006 021449cc 39e32e44
Nov 24 21:34:10 ptscorp-nis01 kernel: 00000002 00000010
00000002 021c4776 03b80220 39e32e50 175c8d58 00000202
Nov 24 21:34:10 ptscorp-nis01 kernel: Call Trace:
Nov 24 21:34:10 ptscorp-nis01 kernel: [<021445c9>] cache_flusharray+0x97/0xf4
Nov 24 21:34:10 ptscorp-nis01 kernel: [<021449cc>] kmem_cache_free+0x2b/0x39
Nov 24 21:34:10 ptscorp-nis01 kernel: [<021c4776>]
radix_tree_delete+0x120/0x14a
Nov 24 21:34:10 ptscorp-nis01 kernel: [<0215f806>]
try_to_free_buffers+0xb9/0xcc
Nov 24 21:34:10 ptscorp-nis01 kernel: [<021c44d8>]
radix_tree_gang_lookup+0x39/0x52
Nov 24 21:34:10 ptscorp-nis01 kernel: [<8285e1bb>]
journal_try_to_free_buffers+0xc4/0xd1 [jbd]
Nov 24 21:34:10 ptscorp-nis01 kernel: [<0213bfa2>]
__remove_from_page_cache+0x12/0x51
Nov 24 21:34:10 ptscorp-nis01 kernel: [<0214651e>]
invalidate_complete_page+0x90/0xd0
Nov 24 21:34:10 ptscorp-nis01 kernel: [<021467e1>]
invalidate_mapping_pages+0x65/0xc9
Nov 24 21:34:10 ptscorp-nis01 kernel: [<0215d594>]
remove_inode_buffers+0x12/0xa9
Nov 24 21:34:10 ptscorp-nis01 kernel: [<021745aa>] prune_icache+0x141/0x28b
Nov 24 21:34:10 ptscorp-nis01 kernel: [<02142000>] pdflush+0x0/0x1e
Nov 24 21:34:10 ptscorp-nis01 kernel: [<02173d3c>] try_to_clip_inodes+0x90/0x94
Nov 24 21:34:10 ptscorp-nis01 kernel: [<02141f1a>] __pdflush+0x1be/0x2a4
Nov 24 21:34:10 ptscorp-nis01 kernel: [<0214201a>] pdflush+0x1a/0x1e
Nov 24 21:34:10 ptscorp-nis01 kernel: [<0214132f>] wb_kupdate+0x0/0xee
I've gotten crashes like this with both 2.5 and 2.8. I'm sorry I
don't have a copy of an oops from the 2.5 crashes. The kermel I'm
using is the stock Fedora Core 2 2.6.8-1.521smp. I've also compiled
my own kernels, various versions from 2.5 - 2.9, but NFS never works
quite right with my kernels. That's even when I'm using the exact
config file that comes with the feodra kernel rpm's. The NFs problems
when I compile my own kernel are inter-operability issues with our
Sun's and other, older, Linux boxes. With the RPM kernels NFS works
great but the crashes are really messings us up, and scaring me about
potential data loss!
Here is a list of loaded modules:
Module Size Used by
nfsd 169441 9
exportfs 9281 1 nfsd
lockd 53513 2 nfsd
md5 7745 1
ipv6 233701 24
autofs4 20165 0
sunrpc 128805 19 nfsd,lockd
tg3 79045 0
microcode 10209 0
dm_mod 49477 0
ohci_hcd 22097 0
button 8793 0
battery 11085 0
asus_acpi 13017 0
ac 7373 0
ext3 99497 2
jbd 58457 1 ext3
aacraid 41073 3
sd_mod 20801 4
scsi_mod 102025 2 aacraid,sd_mod
uname -a output:
Linux ptscorp-nis01 2.6.8-1.521smp #1 SMP Mon Aug 16 09:25:06 EDT 2004
i686 i686 i386 GNU/Linux
Does anyone have any idea what is causing the panics? I appreciate
any feedback. Is it possible this is a bug within some of the patches
that Fedora is using? I was hopeful going from 2.5 to 2.8 would fix
the issue. Since it didn't I haven't yet jumped to 2.9 but am
planning on doing it tonight.
--
John
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: kernel crashes with 2.5/2.8
[not found] ` <abc33982041201093665e56767@mail.gmail.com>
@ 2004-12-01 17:37 ` John Newman
0 siblings, 0 replies; 4+ messages in thread
From: John Newman @ 2004-12-01 17:37 UTC (permalink / raw)
To: linux-kernel
Whoops! I look quite dumb with this one. Obviously I meant 2.6.5 and
2.6.8. Sorry.
--
john
---------- Forwarded message ----------
From: John Newman <cachehit@gmail.com>
Date: Wed, 1 Dec 2004 11:36:39 -0600
Subject: Re: kernel crashes with 2.5/2.8
To: Jan Engelhardt <jengelh@linux01.gwdg.de>
Sorry I'm stupid :) 2.6.5 and 2.6.8. I don't know why I said that.
On Wed, 1 Dec 2004 18:15:32 +0100 (MET), Jan Engelhardt
<jengelh@linux01.gwdg.de> wrote:
> >Hello everyone,
> >
> >I've been experiencing (seemingly) random kernel Oops with various
> >versions of 2.5 and 2.8 on a Dell Poweredge 2650. The machine is
>
> 2.8's out?
>
> >my own kernels, various versions from 2.5 - 2.9, but NFS never works
>
> 2.9's out?
>
>
> Jan Engelhardt
> --
> ENOSPC
>
--
John
--
John
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: kernel crashes with 2.5/2.8
2004-12-01 17:09 kernel crashes with 2.5/2.8 John Newman
[not found] ` <Pine.LNX.4.53.0412011815090.1909@yvahk01.tjqt.qr>
@ 2004-12-01 19:16 ` Chris Wright
2004-12-03 1:03 ` John Newman
1 sibling, 1 reply; 4+ messages in thread
From: Chris Wright @ 2004-12-01 19:16 UTC (permalink / raw)
To: John Newman; +Cc: linux-kernel, mhenderson
* John Newman (cachehit@gmail.com) wrote:
> Nov 24 21:34:10 ptscorp-nis01 kernel: Unable to handle kernel paging
> request at virtual address 01000004
Possible bad memory. This could be 4 byte offset of NULL with one bit
flipped. Have you run memtest86?
Also, it'd be useful to keep tabs on the Oopsen. Are they totally
random, same location, etc.
thanks,
-chris
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: kernel crashes with 2.5/2.8
2004-12-01 19:16 ` Chris Wright
@ 2004-12-03 1:03 ` John Newman
0 siblings, 0 replies; 4+ messages in thread
From: John Newman @ 2004-12-03 1:03 UTC (permalink / raw)
To: Chris Wright; +Cc: linux-kernel, mhenderson
Ok, I got some time this evening to run memtest86. With the 2650
configured with 2 1GB sticks memtest86 started reprting errors almost
immediately. "Terrfic!" I thought, "this is my problem!" The address
seemed suspicious though: 0007ffedc80 (2047.8MB) is where all the
errors were reported. I decided to let it run all the way through but
it froze at 34% of the first test. Memtest86 also said it did not
support this chipset when I tried to dink with advanced options.
So, I replaced the ram with 4 512MB sticks I had laying around from a
different Dell machine. I booted memtest86 again and what do you
know.... errors @ 0007ffedc80, with completely different RAM. So
obviously these results are fishy.
Someone replied to me off-list and said the only way they could get
their Dell 2650 stable was by running 2.4.28.
--
john
On Wed, 1 Dec 2004 11:16:17 -0800, Chris Wright <chrisw@osdl.org> wrote:
> * John Newman (cachehit@gmail.com) wrote:
> > Nov 24 21:34:10 ptscorp-nis01 kernel: Unable to handle kernel paging
> > request at virtual address 01000004
>
> Possible bad memory. This could be 4 byte offset of NULL with one bit
> flipped. Have you run memtest86?
>
> Also, it'd be useful to keep tabs on the Oopsen. Are they totally
> random, same location, etc.
>
> thanks,
> -chris
>
--
John
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2004-12-03 1:03 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-12-01 17:09 kernel crashes with 2.5/2.8 John Newman
[not found] ` <Pine.LNX.4.53.0412011815090.1909@yvahk01.tjqt.qr>
[not found] ` <abc33982041201093665e56767@mail.gmail.com>
2004-12-01 17:37 ` John Newman
2004-12-01 19:16 ` Chris Wright
2004-12-03 1:03 ` John Newman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox