2.6.21.14 NFS related oops

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.6.21.14 NFS related oops
@ 2007-06-13 12:00 Maciej Soltysiak
  2007-06-13 19:17 ` Trond Myklebust
  0 siblings, 1 reply; 7+ messages in thread
From: Maciej Soltysiak @ 2007-06-13 12:00 UTC (permalink / raw)
  To: linux-kernel

Hi,

If anyone is interested I got this OOPS while running a torrent 
(btdownloadcurses)
application writing directly to a NAS mounted via nfs3.

The client machine is 2.6.21.14 and it is mounted with options:
wsize=8192,rsize=8192,hard,intr,tcp

After that, the application hung and i am unable to cd into the mounted 
nfs directory
nor unmount it (busy), nor kill the app (kill -9 fails, process in D state)

Best regards,
Maciej

BUG: unable to handle kernel paging request at virtual address 5018f248
 printing eip:
f0a93c94
*pde = 00000000
Oops: 0002 [#1]
Modules linked in: binfmt_misc sit nfs lockd nfs_acl sunrpc w83627ehf 
i2c_isa i2c_viapro i2c_core via_agp agpgart rtc
CPU:    0
EIP:    0060:[<f0a93c94>]    Not tainted VLI
EFLAGS: 00010206   (2.6.20.14-cks1 #15)
EIP is at rpcauth_checkverf+0x34/0x70 [sunrpc]
eax: d2f4447c   ebx: c655d584   ecx: 00000000   edx: f0aa9f60
esi: e91ea640   edi: d2f44474   ebp: ede2f228   esp: e64b5eec
ds: 007b   es: 007b   ss: 0068
Process rpciod/0 (pid: 1005, ti=e64b4000 task=efe95a90 task.ti=e64b4000)
Stack: 00000286 ede2f8a0 ede2f8a0 00000286 c655d584 121d0da3 00000820 
f0a8d7fd
       f0a93d60 f08bae07 00000286 c655d5cc 00000286 00000286 f08c0520 
c655d584
       00000000 c655d5ec f0a93260 f0a9306f efe95a90 ee2d5740 e092ffb0 
c034e11c
Call Trace:
 [<f0a8d7fd>] call_decode+0x27d/0x5e0 [sunrpc]
 [<f0a93d60>] rpcauth_unbindcred+0x20/0x60 [sunrpc]
 [<f08bae07>] nfs_readpage_result_full+0xf7/0x120 [nfs]
 [<f08c0520>] nfs3_xdr_readres+0x0/0x160 [nfs]
 [<f0a93260>] rpc_async_schedule+0x0/0x10 [sunrpc]
 [<f0a9306f>] __rpc_execute+0x5f/0x250 [sunrpc]
 [<c034e11c>] schedule+0x21c/0x450
 [<c01283aa>] run_workqueue+0x7a/0x110
 [<c0128a07>] worker_thread+0x137/0x160
 [<c01176b0>] default_wake_function+0x0/0x10
 [<c01288d0>] worker_thread+0x0/0x160
 [<c012b329>] kthread+0xa9/0xe0
 [<c012b280>] kthread+0x0/0xe0
 [<c0103a97>] kernel_thread_helper+0x7/0x10
 =======================
Code: 10 89 5c 24 10 89 c3 89 7c 24 18 89 d7 89 74 24 14 8b 70 28 75 1a 8b
4e 08 89 fa 89 d8 ff 51 18 8b 5c 24 10 83 74 24 14 8b 7c 24 <18> 83 c4 1c c3
89 74 24 0c 8b 40 10 8b 40 24 8b 40 10 8b 40 08 EIP: [<f0a93c94>]
rpcauth_checkverf+0x34/0x70 [sunrpc] SS:ESP 0068:e64b5eec


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.21.14 NFS related oops
  2007-06-13 12:00 2.6.21.14 NFS related oops Maciej Soltysiak
@ 2007-06-13 19:17 ` Trond Myklebust
  2007-06-13 20:35   ` Chuck Ebbert
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Trond Myklebust @ 2007-06-13 19:17 UTC (permalink / raw)
  To: Maciej Soltysiak; +Cc: linux-kernel

On Wed, 2007-06-13 at 14:00 +0200, Maciej Soltysiak wrote:
> Hi,
> 
> If anyone is interested I got this OOPS while running a torrent 
> (btdownloadcurses)
> application writing directly to a NAS mounted via nfs3.
> 
> The client machine is 2.6.21.14 and it is mounted with options:
> wsize=8192,rsize=8192,hard,intr,tcp

Hmm. The Oops says '2.6.20.14-cks1'

Firstly, does that have any extra out-of-tree patches?
Secondly, is it reproducible with 2.6.21 or a more recent kernel?

> After that, the application hung and i am unable to cd into the mounted 
> nfs directory
> nor unmount it (busy), nor kill the app (kill -9 fails, process in D state)
> 
> Best regards,
> Maciej
> 
> BUG: unable to handle kernel paging request at virtual address 5018f248
>  printing eip:
> f0a93c94
> *pde = 00000000
> Oops: 0002 [#1]
> Modules linked in: binfmt_misc sit nfs lockd nfs_acl sunrpc w83627ehf 
> i2c_isa i2c_viapro i2c_core via_agp agpgart rtc
> CPU:    0
> EIP:    0060:[<f0a93c94>]    Not tainted VLI
> EFLAGS: 00010206   (2.6.20.14-cks1 #15)
> EIP is at rpcauth_checkverf+0x34/0x70 [sunrpc]
> eax: d2f4447c   ebx: c655d584   ecx: 00000000   edx: f0aa9f60
> esi: e91ea640   edi: d2f44474   ebp: ede2f228   esp: e64b5eec
> ds: 007b   es: 007b   ss: 0068
> Process rpciod/0 (pid: 1005, ti=e64b4000 task=efe95a90 task.ti=e64b4000)
> Stack: 00000286 ede2f8a0 ede2f8a0 00000286 c655d584 121d0da3 00000820 
> f0a8d7fd
>        f0a93d60 f08bae07 00000286 c655d5cc 00000286 00000286 f08c0520 
> c655d584
>        00000000 c655d5ec f0a93260 f0a9306f efe95a90 ee2d5740 e092ffb0 
> c034e11c
> Call Trace:
>  [<f0a8d7fd>] call_decode+0x27d/0x5e0 [sunrpc]
>  [<f0a93d60>] rpcauth_unbindcred+0x20/0x60 [sunrpc]
>  [<f08bae07>] nfs_readpage_result_full+0xf7/0x120 [nfs]
>  [<f08c0520>] nfs3_xdr_readres+0x0/0x160 [nfs]
>  [<f0a93260>] rpc_async_schedule+0x0/0x10 [sunrpc]
>  [<f0a9306f>] __rpc_execute+0x5f/0x250 [sunrpc]
>  [<c034e11c>] schedule+0x21c/0x450
>  [<c01283aa>] run_workqueue+0x7a/0x110
>  [<c0128a07>] worker_thread+0x137/0x160
>  [<c01176b0>] default_wake_function+0x0/0x10
>  [<c01288d0>] worker_thread+0x0/0x160
>  [<c012b329>] kthread+0xa9/0xe0
>  [<c012b280>] kthread+0x0/0xe0
>  [<c0103a97>] kernel_thread_helper+0x7/0x10
>  =======================
> Code: 10 89 5c 24 10 89 c3 89 7c 24 18 89 d7 89 74 24 14 8b 70 28 75 1a 8b
> 4e 08 89 fa 89 d8 ff 51 18 8b 5c 24 10 83 74 24 14 8b 7c 24 <18> 83 c4 1c c3
> 89 74 24 0c 8b 40 10 8b 40 24 8b 40 10 8b 40 08 EIP: [<f0a93c94>]
> rpcauth_checkverf+0x34/0x70 [sunrpc] SS:ESP 0068:e64b5eec

At a first guess, it looks as though something has scribbled over your
credential. Have you tried running this kernel with slab debugging
enabled?

Cheers
  Trond


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.21.14 NFS related oops
  2007-06-13 19:17 ` Trond Myklebust
@ 2007-06-13 20:35   ` Chuck Ebbert
  2007-06-14 15:34   ` Maciej Soltysiak
  2007-06-16  9:26   ` Maciej Sołtysiak
  2 siblings, 0 replies; 7+ messages in thread
From: Chuck Ebbert @ 2007-06-13 20:35 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Maciej Soltysiak, linux-kernel

On 06/13/2007 03:17 PM, Trond Myklebust wrote:
> On Wed, 2007-06-13 at 14:00 +0200, Maciej Soltysiak wrote:
>>  =======================
>> Code: 10 89 5c 24 10 89 c3 89 7c 24 18 89 d7 89 74 24 14 8b 70 28 75 1a 8b
>> 4e 08 89 fa 89 d8 ff 51 18 8b 5c 24 10 83 74 24 14 8b 7c 24 <18> 83 c4 1c c3
>> 89 74 24 0c 8b 40 10 8b 40 24 8b 40 10 8b 40 08 EIP: [<f0a93c94>]
>> rpcauth_checkverf+0x34/0x70 [sunrpc] SS:ESP 0068:e64b5eec
> 
> At a first guess, it looks as though something has scribbled over your
> credential. Have you tried running this kernel with slab debugging
> enabled?
> 

Disassembly of this code yields gibberish, like a bit got flipped
somewhere:

  1c:   ff 51 18                  call   *0x18(%ecx)
  1f:   8b 5c 24 10               mov    0x10(%esp),%ebx
  23:   83 74 24 14 8b            xorl   $0xffffff8b,0x14(%esp)
  28:   7c 24                     jl     4e <_EIP+0x4e>
   0:   18 83 c4 1c c3 89         sbb    %al,0x89c31cc4(%ebx)   <=====
   6:   74 24                     je     2c <_EIP+0x2c>
   8:   0c 8b                     or     $0x8b,%al
   a:   40                        inc    %eax
   b:   10 8b 40 24 8b 40         adc    %cl,0x408b2440(%ebx)
  11:   10                        .byte 0x10
  12:   8b 40 08                  mov    0x8(%eax),%eax

Somewhere around 23: things went horribly wrong.
At 12: it starts to make sense again.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.21.14 NFS related oops
  2007-06-13 19:17 ` Trond Myklebust
  2007-06-13 20:35   ` Chuck Ebbert
@ 2007-06-14 15:34   ` Maciej Soltysiak
  2007-06-16  9:26   ` Maciej Sołtysiak
  2 siblings, 0 replies; 7+ messages in thread
From: Maciej Soltysiak @ 2007-06-14 15:34 UTC (permalink / raw)
  To: Trond Myklebust, linux-kernel

Trond Myklebust pisze:
> On Wed, 2007-06-13 at 14:00 +0200, Maciej Soltysiak wrote:
>   
>> Hi,
>>
>> If anyone is interested I got this OOPS while running a torrent 
>> (btdownloadcurses)
>> application writing directly to a NAS mounted via nfs3.
>>
>> The client machine is 2.6.21.14 and it is mounted with options:
>> wsize=8192,rsize=8192,hard,intr,tcp
>>     
>
> Hmm. The Oops says '2.6.20.14-cks1'
>
> Firstly, does that have any extra out-of-tree patches?
> Secondly, is it reproducible with 2.6.21 or a more recent kernel?
>
>   
Ah, yes, 2.6.20.14 not 2.6.21.14 and it does contain 2 extra things:
- Con Kolivas' -cks1 (server version)
- reiser4 code, one mounted filesystem.
>> After that, the application hung and i am unable to cd into the mounted 
>> nfs directory
>> nor unmount it (busy), nor kill the app (kill -9 fails, process in D state)
>>
>> Best regards,
>> Maciej
>>
>> BUG: unable to handle kernel paging request at virtual address 5018f248
>>  printing eip:
>> f0a93c94
>> *pde = 00000000
>> Oops: 0002 [#1]
>> Modules linked in: binfmt_misc sit nfs lockd nfs_acl sunrpc w83627ehf 
>> i2c_isa i2c_viapro i2c_core via_agp agpgart rtc
>> CPU:    0
>> EIP:    0060:[<f0a93c94>]    Not tainted VLI
>> EFLAGS: 00010206   (2.6.20.14-cks1 #15)
>> EIP is at rpcauth_checkverf+0x34/0x70 [sunrpc]
>> eax: d2f4447c   ebx: c655d584   ecx: 00000000   edx: f0aa9f60
>> esi: e91ea640   edi: d2f44474   ebp: ede2f228   esp: e64b5eec
>> ds: 007b   es: 007b   ss: 0068
>> Process rpciod/0 (pid: 1005, ti=e64b4000 task=efe95a90 task.ti=e64b4000)
>> Stack: 00000286 ede2f8a0 ede2f8a0 00000286 c655d584 121d0da3 00000820 
>> f0a8d7fd
>>        f0a93d60 f08bae07 00000286 c655d5cc 00000286 00000286 f08c0520 
>> c655d584
>>        00000000 c655d5ec f0a93260 f0a9306f efe95a90 ee2d5740 e092ffb0 
>> c034e11c
>> Call Trace:
>>  [<f0a8d7fd>] call_decode+0x27d/0x5e0 [sunrpc]
>>  [<f0a93d60>] rpcauth_unbindcred+0x20/0x60 [sunrpc]
>>  [<f08bae07>] nfs_readpage_result_full+0xf7/0x120 [nfs]
>>  [<f08c0520>] nfs3_xdr_readres+0x0/0x160 [nfs]
>>  [<f0a93260>] rpc_async_schedule+0x0/0x10 [sunrpc]
>>  [<f0a9306f>] __rpc_execute+0x5f/0x250 [sunrpc]
>>  [<c034e11c>] schedule+0x21c/0x450
>>  [<c01283aa>] run_workqueue+0x7a/0x110
>>  [<c0128a07>] worker_thread+0x137/0x160
>>  [<c01176b0>] default_wake_function+0x0/0x10
>>  [<c01288d0>] worker_thread+0x0/0x160
>>  [<c012b329>] kthread+0xa9/0xe0
>>  [<c012b280>] kthread+0x0/0xe0
>>  [<c0103a97>] kernel_thread_helper+0x7/0x10
>>  =======================
>> Code: 10 89 5c 24 10 89 c3 89 7c 24 18 89 d7 89 74 24 14 8b 70 28 75 1a 8b
>> 4e 08 89 fa 89 d8 ff 51 18 8b 5c 24 10 83 74 24 14 8b 7c 24 <18> 83 c4 1c c3
>> 89 74 24 0c 8b 40 10 8b 40 24 8b 40 10 8b 40 08 EIP: [<f0a93c94>]
>> rpcauth_checkverf+0x34/0x70 [sunrpc] SS:ESP 0068:e64b5eec
>>     
>
> At a first guess, it looks as though something has scribbled over your
> credential. Have you tried running this kernel with slab debugging
> enabled?
>
>   
No, i will turn it on, though. The server crashes on heavy NFS traffic 
(eg. nightly rsync backup)
It crashed again today, but the oops did not get written to kern.log
> Cheers
>   Trond
>   
Thanks for your reply and best regards,
Maciej


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.21.14 NFS related oops
  2007-06-13 19:17 ` Trond Myklebust
  2007-06-13 20:35   ` Chuck Ebbert
  2007-06-14 15:34   ` Maciej Soltysiak
@ 2007-06-16  9:26   ` Maciej Sołtysiak
  2007-06-16 15:08     ` Trond Myklebust
  2 siblings, 1 reply; 7+ messages in thread
From: Maciej Sołtysiak @ 2007-06-16  9:26 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-kernel

>>  =======================
>> Code: 10 89 5c 24 10 89 c3 89 7c 24 18 89 d7 89 74 24 14 8b 70 28 75 1a 
>> 8b
>> 4e 08 89 fa 89 d8 ff 51 18 8b 5c 24 10 83 74 24 14 8b 7c 24 <18> 83 c4 1c 
>> c3
>> 89 74 24 0c 8b 40 10 8b 40 24 8b 40 10 8b 40 08 EIP: [<f0a93c94>]
>> rpcauth_checkverf+0x34/0x70 [sunrpc] SS:ESP 0068:e64b5eec
>
> At a first guess, it looks as though something has scribbled over your
> credential. Have you tried running this kernel with slab debugging
> enabled?

I'm running 2.6.21.5 now with slab debugging on, here's what I got about
slab corruption:

Slab corruption: skbuff_head_cache start=ef287b78, len=164
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c031710c>](kfree_skbmem+0x3c/0x90)
090: 6b 6b 6b 6b 6b 63 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Single bit error detected. Probably bad RAM.
Run memtest86+ or a similar memory test tool.
Prev obj: start=ef287ac8, len=164
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<c031798b>](__alloc_skb+0x2b/0x100)
000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
010: 00 00 00 00 e0 71 e6 ef 00 00 00 00 00 00 00 00
Next obj: start=ef287c28, len=164
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<c031798b>](__alloc_skb+0x2b/0x100)
000: 84 d0 85 c5 84 d0 85 c5 04 d0 85 c5 2c 0a 73 46
010: 6f cd 09 00 00 00 00 00 01 00 00 00 08 e5 72 ee

How probable is that it is really a bad memory issue?
Does this report say anything about which RAM chip I should
investigate/replace ? I have 1x512MB+1x256MB

Best Regards,
Maciej


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.21.14 NFS related oops
  2007-06-16  9:26   ` Maciej Sołtysiak
@ 2007-06-16 15:08     ` Trond Myklebust
  0 siblings, 0 replies; 7+ messages in thread
From: Trond Myklebust @ 2007-06-16 15:08 UTC (permalink / raw)
  To: Maciej Sołtysiak; +Cc: linux-kernel

On Sat, 2007-06-16 at 11:26 +0200, Maciej Sołtysiak wrote:
> >>  =======================
> >> Code: 10 89 5c 24 10 89 c3 89 7c 24 18 89 d7 89 74 24 14 8b 70 28 75 1a 
> >> 8b
> >> 4e 08 89 fa 89 d8 ff 51 18 8b 5c 24 10 83 74 24 14 8b 7c 24 <18> 83 c4 1c 
> >> c3
> >> 89 74 24 0c 8b 40 10 8b 40 24 8b 40 10 8b 40 08 EIP: [<f0a93c94>]
> >> rpcauth_checkverf+0x34/0x70 [sunrpc] SS:ESP 0068:e64b5eec
> >
> > At a first guess, it looks as though something has scribbled over your
> > credential. Have you tried running this kernel with slab debugging
> > enabled?
> 
> I'm running 2.6.21.5 now with slab debugging on, here's what I got about
> slab corruption:
> 
> Slab corruption: skbuff_head_cache start=ef287b78, len=164
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<c031710c>](kfree_skbmem+0x3c/0x90)
> 090: 6b 6b 6b 6b 6b 63 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> Single bit error detected. Probably bad RAM.
> Run memtest86+ or a similar memory test tool.
> Prev obj: start=ef287ac8, len=164
> Redzone: 0x170fc2a5/0x170fc2a5.
> Last user: [<c031798b>](__alloc_skb+0x2b/0x100)
> 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 010: 00 00 00 00 e0 71 e6 ef 00 00 00 00 00 00 00 00
> Next obj: start=ef287c28, len=164
> Redzone: 0x170fc2a5/0x170fc2a5.
> Last user: [<c031798b>](__alloc_skb+0x2b/0x100)
> 000: 84 d0 85 c5 84 d0 85 c5 04 d0 85 c5 2c 0a 73 46
> 010: 6f cd 09 00 00 00 00 00 01 00 00 00 08 e5 72 ee
> 
> How probable is that it is really a bad memory issue?
> Does this report say anything about which RAM chip I should
> investigate/replace ? I have 1x512MB+1x256MB
> 
> Best Regards,
> Maciej

I'd try doing as suggested above: run memtest86 on the computer for a
couple of hours and see what it tells you. That should hopefully give
you enough information to figure out which chips need replacing.

Cheers
  Trond


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.21.14 NFS related oops
@ 2007-06-20 10:35 Maciej Sołtysiak
  0 siblings, 0 replies; 7+ messages in thread
From: Maciej Sołtysiak @ 2007-06-20 10:35 UTC (permalink / raw)
  To: linux-kernel

 > > I'm running 2.6.21.5 now with slab debugging on, here's what I got 
about
 > > slab corruption:
 > >
 > > Slab corruption: skbuff_head_cache start=ef287b78, len=164
 > > Redzone: 0x5a2cf071/0x5a2cf071.
 > > Last user: [<c031710c>](kfree_skbmem+0x3c/0x90)
 > > 090: 6b 6b 6b 6b 6b 63 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
 > > Single bit error detected. Probably bad RAM.
 > > Run memtest86+ or a similar memory test tool.
 > > Prev obj: start=ef287ac8, len=164
 > > Redzone: 0x170fc2a5/0x170fc2a5.
 > > Last user: [<c031798b>](__alloc_skb+0x2b/0x100)
 > > 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 > > 010: 00 00 00 00 e0 71 e6 ef 00 00 00 00 00 00 00 00
 > > Next obj: start=ef287c28, len=164
 > > Redzone: 0x170fc2a5/0x170fc2a5.
 > > Last user: [<c031798b>](__alloc_skb+0x2b/0x100)
 > > 000: 84 d0 85 c5 84 d0 85 c5 04 d0 85 c5 2c 0a 73 46
 > > 010: 6f cd 09 00 00 00 00 00 01 00 00 00 08 e5 72 ee
 > >
 > > How probable is that it is really a bad memory issue?
 > > Does this report say anything about which RAM chip I should
 > > investigate/replace ? I have 1x512MB+1x256MB
 > >
 > > Best Regards,
 > > Maciej
 >
 > I'd try doing as suggested above: run memtest86 on the computer for a
 > couple of hours and see what it tells you. That should hopefully give
 > you enough information to figure out which chips need replacing.

I am also getting BAD CRC on the disk that holds my swap partition.
I was wondering if slab debugging could say I have slab corruption not 
because
my RAM chips are bad, but because SWAP has bad blocks ? And that the
whole problem might be swap disk related not ram related.

 > Cheers
 >   Trond
Regards,
Maciej


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-06-20 10:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-13 12:00 2.6.21.14 NFS related oops Maciej Soltysiak
2007-06-13 19:17 ` Trond Myklebust
2007-06-13 20:35   ` Chuck Ebbert
2007-06-14 15:34   ` Maciej Soltysiak
2007-06-16  9:26   ` Maciej Sołtysiak
2007-06-16 15:08     ` Trond Myklebust
  -- strict thread matches above, loose matches on Subject: below --
2007-06-20 10:35 Maciej Sołtysiak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox