kernel crashes with 2.5/2.8

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* kernel crashes with 2.5/2.8
@ 2004-12-01 17:09 John Newman
       [not found] ` <Pine.LNX.4.53.0412011815090.1909@yvahk01.tjqt.qr>
  2004-12-01 19:16 ` Chris Wright
  0 siblings, 2 replies; 4+ messages in thread
From: John Newman @ 2004-12-01 17:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: mhenderson

Hello everyone,

I've been experiencing (seemingly) random kernel Oops with various
versions of 2.5 and 2.8 on a Dell Poweredge 2650.  The machine is
configured with 2GB of ram, a 5GB swap file and two 2.8GHZ Xeons.   It
is a NIS/NFS server with an internal hardware raid 5 array  on the
Dell Perc card, using the aacraid driver.  The RAID5 array comes out
to about 500GB and is where all our user data is stored for NIS users.
  That data and the NIS information is accessed by a variety of older
and newer Linux and Sun boxes.

Here is the latest oops from my klog, which happened about a week ago:

Nov 24 21:34:10 ptscorp-nis01 kernel: Unable to handle kernel paging
request at virtual address 01000004
Nov 24 21:34:10 ptscorp-nis01 kernel:  printing eip:
Nov 24 21:34:10 ptscorp-nis01 kernel: 02144496
Nov 24 21:34:10 ptscorp-nis01 kernel: *pde = 00005001
Nov 24 21:34:10 ptscorp-nis01 kernel: Oops: 0002 [#1]
Nov 24 21:34:10 ptscorp-nis01 kernel: SMP
Nov 24 21:34:10 ptscorp-nis01 kernel: Modules linked in:
iptable_filter ip_tables nfsd exportfs lockd md5 ipv6 autofs4 sunrpc
tg3 microcode dm_mod ohci_hcd button battery asus_acpi ac ext3 jbd
aacraid sd_mod scsi_mod
Nov 24 21:34:10 ptscorp-nis01 kernel: CPU:    3
Nov 24 21:34:10 ptscorp-nis01 kernel: EIP:    0060:[<02144496>]    Not tainted
Nov 24 21:34:10 ptscorp-nis01 kernel: EFLAGS: 00010006   (2.6.8-1.521smp)
Nov 24 21:34:10 ptscorp-nis01 kernel: EIP is at free_block+0x3d/0xd9
Nov 24 21:34:10 ptscorp-nis01 kernel: eax: 01000000   ebx: 784cb000  
ecx: 784cbe38   edx: 413bd000
Nov 24 21:34:10 ptscorp-nis01 kernel: esi: 81fa0c80   edi: 0000001b  
ebp: 00000006   esp: 39e32de4
Nov 24 21:34:10 ptscorp-nis01 kernel: ds: 007b   es: 007b   ss: 0068
Nov 24 21:34:10 ptscorp-nis01 kernel: Process pdflush (pid: 57,
threadinfo=39e32000 task=39e1e230)
Nov 24 21:34:10 ptscorp-nis01 kernel: Stack: 0426b090 81f96800
0426b090 2d1c2148 0000001b 021445c9 0426b080 81fa0c80
Nov 24 21:34:10 ptscorp-nis01 kernel:        0000616d 0232cd88
0426b080 0426b090 2d1c2148 00000006 021449cc 39e32e44
Nov 24 21:34:10 ptscorp-nis01 kernel:        00000002 00000010
00000002 021c4776 03b80220 39e32e50 175c8d58 00000202
Nov 24 21:34:10 ptscorp-nis01 kernel: Call Trace:
Nov 24 21:34:10 ptscorp-nis01 kernel:  [<021445c9>] cache_flusharray+0x97/0xf4
Nov 24 21:34:10 ptscorp-nis01 kernel:  [<021449cc>] kmem_cache_free+0x2b/0x39
Nov 24 21:34:10 ptscorp-nis01 kernel:  [<021c4776>]
radix_tree_delete+0x120/0x14a
Nov 24 21:34:10 ptscorp-nis01 kernel:  [<0215f806>]
try_to_free_buffers+0xb9/0xcc
Nov 24 21:34:10 ptscorp-nis01 kernel:  [<021c44d8>]
radix_tree_gang_lookup+0x39/0x52
Nov 24 21:34:10 ptscorp-nis01 kernel:  [<8285e1bb>]
journal_try_to_free_buffers+0xc4/0xd1 [jbd]
Nov 24 21:34:10 ptscorp-nis01 kernel:  [<0213bfa2>]
__remove_from_page_cache+0x12/0x51
Nov 24 21:34:10 ptscorp-nis01 kernel:  [<0214651e>]
invalidate_complete_page+0x90/0xd0
Nov 24 21:34:10 ptscorp-nis01 kernel:  [<021467e1>]
invalidate_mapping_pages+0x65/0xc9
Nov 24 21:34:10 ptscorp-nis01 kernel:  [<0215d594>]
remove_inode_buffers+0x12/0xa9
Nov 24 21:34:10 ptscorp-nis01 kernel:  [<021745aa>] prune_icache+0x141/0x28b
Nov 24 21:34:10 ptscorp-nis01 kernel:  [<02142000>] pdflush+0x0/0x1e
Nov 24 21:34:10 ptscorp-nis01 kernel:  [<02173d3c>] try_to_clip_inodes+0x90/0x94
Nov 24 21:34:10 ptscorp-nis01 kernel:  [<02141f1a>] __pdflush+0x1be/0x2a4
Nov 24 21:34:10 ptscorp-nis01 kernel:  [<0214201a>] pdflush+0x1a/0x1e
Nov 24 21:34:10 ptscorp-nis01 kernel:  [<0214132f>] wb_kupdate+0x0/0xee

I've gotten crashes like this with both 2.5 and 2.8.  I'm sorry I
don't have a copy of an oops from the 2.5 crashes.   The kermel I'm
using is the stock Fedora Core 2  2.6.8-1.521smp.   I've also compiled
my own kernels, various versions from 2.5 - 2.9, but NFS never works
quite right with my kernels.   That's even when I'm using the exact
config file that comes with the feodra kernel rpm's.  The NFs problems
when I compile my own kernel are inter-operability issues with our
Sun's and other, older, Linux boxes.  With the RPM kernels NFS works
great but the crashes are  really messings us up, and scaring me about
potential data loss!

Here is a list of loaded modules:

Module                  Size  Used by
nfsd                  169441  9
exportfs                9281  1 nfsd
lockd                  53513  2 nfsd
md5                     7745  1
ipv6                  233701  24
autofs4                20165  0
sunrpc                128805  19 nfsd,lockd
tg3                    79045  0
microcode              10209  0
dm_mod                 49477  0
ohci_hcd               22097  0
button                  8793  0
battery                11085  0
asus_acpi              13017  0
ac                      7373  0
ext3                   99497  2
jbd                    58457  1 ext3
aacraid                41073  3
sd_mod                 20801  4
scsi_mod              102025  2 aacraid,sd_mod

uname -a output:

Linux ptscorp-nis01 2.6.8-1.521smp #1 SMP Mon Aug 16 09:25:06 EDT 2004
i686 i686 i386 GNU/Linux

Does anyone have any idea what is causing the panics?    I appreciate
any feedback.  Is it possible this is a bug within some of the patches
that Fedora is using?   I was hopeful going from 2.5 to 2.8 would fix
the issue.  Since it didn't I haven't yet jumped to 2.9 but am
planning on doing it tonight.

-- 
John

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kernel crashes with 2.5/2.8
       [not found]   ` <abc33982041201093665e56767@mail.gmail.com>
@ 2004-12-01 17:37     ` John Newman
  0 siblings, 0 replies; 4+ messages in thread
From: John Newman @ 2004-12-01 17:37 UTC (permalink / raw)
  To: linux-kernel

Whoops!  I look quite dumb with this one.  Obviously I meant 2.6.5 and
2.6.8.  Sorry.

--
john


---------- Forwarded message ----------
From: John Newman <cachehit@gmail.com>
Date: Wed, 1 Dec 2004 11:36:39 -0600
Subject: Re: kernel crashes with 2.5/2.8
To: Jan Engelhardt <jengelh@linux01.gwdg.de>


Sorry I'm stupid :)  2.6.5 and 2.6.8.  I don't know why I said that.




On Wed, 1 Dec 2004 18:15:32 +0100 (MET), Jan Engelhardt
<jengelh@linux01.gwdg.de> wrote:
> >Hello everyone,
> >
> >I've been experiencing (seemingly) random kernel Oops with various
> >versions of 2.5 and 2.8 on a Dell Poweredge 2650.  The machine is
>
> 2.8's out?
>
> >my own kernels, various versions from 2.5 - 2.9, but NFS never works
>
> 2.9's out?
>
>
> Jan Engelhardt
> --
> ENOSPC
>


--
John


-- 
John

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kernel crashes with 2.5/2.8
  2004-12-01 17:09 kernel crashes with 2.5/2.8 John Newman
       [not found] ` <Pine.LNX.4.53.0412011815090.1909@yvahk01.tjqt.qr>
@ 2004-12-01 19:16 ` Chris Wright
  2004-12-03  1:03   ` John Newman
  1 sibling, 1 reply; 4+ messages in thread
From: Chris Wright @ 2004-12-01 19:16 UTC (permalink / raw)
  To: John Newman; +Cc: linux-kernel, mhenderson

* John Newman (cachehit@gmail.com) wrote:
> Nov 24 21:34:10 ptscorp-nis01 kernel: Unable to handle kernel paging
> request at virtual address 01000004

Possible bad memory.  This could be 4 byte offset of NULL with one bit
flipped.  Have you run memtest86?

Also, it'd be useful to keep tabs on the Oopsen.  Are they totally
random, same location, etc.

thanks,
-chris

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kernel crashes with 2.5/2.8
  2004-12-01 19:16 ` Chris Wright
@ 2004-12-03  1:03   ` John Newman
  0 siblings, 0 replies; 4+ messages in thread
From: John Newman @ 2004-12-03  1:03 UTC (permalink / raw)
  To: Chris Wright; +Cc: linux-kernel, mhenderson

Ok, I got some time this evening to run memtest86.  With the 2650
configured with 2 1GB sticks memtest86 started reprting errors almost
immediately.  "Terrfic!" I thought, "this is my problem!"  The address
seemed suspicious though:  0007ffedc80 (2047.8MB) is where all the
errors were reported.  I decided to let it run all the way through but
it froze at 34% of the first test.  Memtest86 also said it did not
support this chipset when I tried to dink with advanced options.

So, I replaced the ram with 4 512MB sticks I had laying around from a
different Dell machine.  I booted memtest86 again and what do you
know.... errors @ 0007ffedc80, with completely different RAM.  So
obviously these results are fishy.

Someone replied to me off-list and said the only way they could get
their Dell 2650 stable was by running 2.4.28.

--
john

On Wed, 1 Dec 2004 11:16:17 -0800, Chris Wright <chrisw@osdl.org> wrote:
> * John Newman (cachehit@gmail.com) wrote:
> > Nov 24 21:34:10 ptscorp-nis01 kernel: Unable to handle kernel paging
> > request at virtual address 01000004
> 
> Possible bad memory.  This could be 4 byte offset of NULL with one bit
> flipped.  Have you run memtest86?
> 
> Also, it'd be useful to keep tabs on the Oopsen.  Are they totally
> random, same location, etc.
> 
> thanks,
> -chris
> 

-- 
John

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-12-03  1:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-12-01 17:09 kernel crashes with 2.5/2.8 John Newman
     [not found] ` <Pine.LNX.4.53.0412011815090.1909@yvahk01.tjqt.qr>
     [not found]   ` <abc33982041201093665e56767@mail.gmail.com>
2004-12-01 17:37     ` John Newman
2004-12-01 19:16 ` Chris Wright
2004-12-03  1:03   ` John Newman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox