* NFS oops in 2.6.26rc4
@ 2008-05-27 19:04 Dave Jones
2008-05-29 11:48 ` Jeff Layton
2008-05-30 17:59 ` Chuck Lever
0 siblings, 2 replies; 16+ messages in thread
From: Dave Jones @ 2008-05-27 19:04 UTC (permalink / raw)
To: Linux Kernel; +Cc: Trond Myklebust
When trying to mount an nfs export, I got this oops..
BUG: unable to handle kernel paging request at f4569000
IP: [<f8daac01>] :sunrpc:xdr_encode_opaque_fixed+0x2d/0x69
*pde = 34c23163 *pte = 34569160
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: nfs nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ext2 sg button via_rhine via_ircc pcspkr r8169 mii pata_sil680 irda crc_ccitt i2c_viapro i2c_core dm_snapshot dm_zero dm_mirror dm_log dm_mod pata_via ata_generic pata_acpi libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
Pid: 2046, comm: mount.nfs Not tainted (2.6.26-0.33.rc4.fc10.i686 #1)
EIP: 0060:[<f8daac01>] EFLAGS: 00210212 CPU: 0
EIP is at xdr_encode_opaque_fixed+0x2d/0x69 [sunrpc]
EAX: 0000f455 EBX: 00003d16 ECX: 0000349c EDX: 00000003
ESI: f4569000 EDI: f4d2e450 EBP: f4566a78 ESP: f4566a68
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process mount.nfs (pid: 2046, ti=f4566000 task=f4580000 task.ti=f4566000)
Stack: f4d2c26c 55f40000 f4e740c0 f4e740c0 f4566a84 f8daac4f 0000f455 f4566a94
f8e7ec28 00000000 f4d00600 f4566aac f8da4db8 f8e7ec12 f4e740c0 f4e740c0
f4d00600 f4566acc f8d9ea9d f4d2c268 f4566e1a f8e7ec12 f4d00600 00000000
Call Trace:
[<f8daac4f>] ? xdr_encode_opaque+0x12/0x15 [sunrpc]
[<f8e7ec28>] ? nfs3_xdr_fhandle+0x16/0x25 [nfs]
[<f8da4db8>] ? rpcauth_wrap_req+0x66/0x77 [sunrpc]
[<f8e7ec12>] ? nfs3_xdr_fhandle+0x0/0x25 [nfs]
[<f8d9ea9d>] ? call_transmit+0x18a/0x1eb [sunrpc]
[<f8e7ec12>] ? nfs3_xdr_fhandle+0x0/0x25 [nfs]
[<f8da4450>] ? __rpc_execute+0x69/0x1e1 [sunrpc]
[<f8da45e3>] ? rpc_execute+0x1b/0x1e [sunrpc]
[<f8d9f260>] ? rpc_run_task+0x43/0x49 [sunrpc]
[<f8d9f368>] ? rpc_call_sync+0x43/0x5e [sunrpc]
[<f8e7cf05>] ? nfs3_rpc_wrapper+0x17/0x4d [nfs]
[<f8e7d014>] ? nfs3_proc_fsinfo+0x5e/0x80 [nfs]
[<f8e6c64c>] ? nfs_probe_fsinfo+0x75/0x462 [nfs]
[<f8d9f3c4>] ? rpc_ping+0x41/0x4b [sunrpc]
[<f8d9f7c7>] ? rpc_bind_new_program+0x5b/0x71 [sunrpc]
[<f8e6de14>] ? nfs_create_server+0x451/0x5fd [nfs]
[<f8d9f4ef>] ? rpc_free_auth+0x33/0x36 [sunrpc]
[<c05025e5>] ? kref_put+0x39/0x44
[<f8d9f415>] ? rpc_release_client+0x47/0x4c [sunrpc]
[<f8d9f5a6>] ? rpc_shutdown_client+0xb4/0xbc [sunrpc]
[<f8e7cd39>] ? nfs_mount+0x12b/0x131 [nfs]
[<f8e74eb8>] ? nfs_get_sb+0x599/0x830 [nfs]
[<c04887c7>] ? check_object+0x134/0x18b
[<c0489995>] ? __slab_alloc+0x45c/0x4ea
[<c048a3a0>] ? __kmalloc+0xbc/0xfb
[<c044788f>] ? trace_hardirqs_on+0xe9/0x10a
[<c04a280c>] ? alloc_vfsmnt+0xe3/0x10a
[<c048f6b1>] ? vfs_kern_mount+0x82/0xf5
[<c048f768>] ? do_kern_mount+0x32/0xba
[<c04a2520>] ? do_new_mount+0x42/0x6c
[<c04a2fa0>] ? do_mount+0x199/0x1b7
[<c04a1626>] ? copy_mount_options+0x79/0xf9
[<c04a3024>] ? sys_mount+0x66/0x9e
[<c0404c3a>] ? syscall_call+0x7/0xb
=======================
Code: e5 57 56 89 d6 53 83 ec 04 85 c9 89 45 f0 89 c8 74 4c 8d 59 03 c1 eb 02 8d 14 9d 00 00 00 00 29 ca 85 f6 74 11 c1 e9 02 8b 7d f0 <f3> a5 89 c1 83 e1 03 74 02 f3 a4 85 d2 74 1b 8b 7d f0 89 d1 c1
EIP: [<f8daac01>] xdr_encode_opaque_fixed+0x2d/0x69 [sunrpc] SS:ESP 0068:f4566a68
---[ end trace a8a691a45122c25a ]---
mount.nfs used greatest stack depth: 812 bytes left
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: NFS oops in 2.6.26rc4
2008-05-27 19:04 NFS oops in 2.6.26rc4 Dave Jones
@ 2008-05-29 11:48 ` Jeff Layton
2008-05-30 17:59 ` Chuck Lever
1 sibling, 0 replies; 16+ messages in thread
From: Jeff Layton @ 2008-05-29 11:48 UTC (permalink / raw)
To: Dave Jones; +Cc: Linux Kernel, Trond Myklebust
On Tue, 27 May 2008 15:04:20 -0400
Dave Jones <davej@redhat.com> wrote:
> When trying to mount an nfs export, I got this oops..
>
> BUG: unable to handle kernel paging request at f4569000
> IP: [<f8daac01>] :sunrpc:xdr_encode_opaque_fixed+0x2d/0x69
> *pde = 34c23163 *pte = 34569160
> Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> Modules linked in: nfs nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ext2 sg button via_rhine via_ircc pcspkr r8169 mii pata_sil680 irda crc_ccitt i2c_viapro i2c_core dm_snapshot dm_zero dm_mirror dm_log dm_mod pata_via ata_generic pata_acpi libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
>
> Pid: 2046, comm: mount.nfs Not tainted (2.6.26-0.33.rc4.fc10.i686 #1)
> EIP: 0060:[<f8daac01>] EFLAGS: 00210212 CPU: 0
> EIP is at xdr_encode_opaque_fixed+0x2d/0x69 [sunrpc]
> EAX: 0000f455 EBX: 00003d16 ECX: 0000349c EDX: 00000003
> ESI: f4569000 EDI: f4d2e450 EBP: f4566a78 ESP: f4566a68
> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process mount.nfs (pid: 2046, ti=f4566000 task=f4580000 task.ti=f4566000)
> Stack: f4d2c26c 55f40000 f4e740c0 f4e740c0 f4566a84 f8daac4f 0000f455 f4566a94
> f8e7ec28 00000000 f4d00600 f4566aac f8da4db8 f8e7ec12 f4e740c0 f4e740c0
> f4d00600 f4566acc f8d9ea9d f4d2c268 f4566e1a f8e7ec12 f4d00600 00000000
> Call Trace:
> [<f8daac4f>] ? xdr_encode_opaque+0x12/0x15 [sunrpc]
> [<f8e7ec28>] ? nfs3_xdr_fhandle+0x16/0x25 [nfs]
> [<f8da4db8>] ? rpcauth_wrap_req+0x66/0x77 [sunrpc]
> [<f8e7ec12>] ? nfs3_xdr_fhandle+0x0/0x25 [nfs]
> [<f8d9ea9d>] ? call_transmit+0x18a/0x1eb [sunrpc]
> [<f8e7ec12>] ? nfs3_xdr_fhandle+0x0/0x25 [nfs]
> [<f8da4450>] ? __rpc_execute+0x69/0x1e1 [sunrpc]
> [<f8da45e3>] ? rpc_execute+0x1b/0x1e [sunrpc]
> [<f8d9f260>] ? rpc_run_task+0x43/0x49 [sunrpc]
> [<f8d9f368>] ? rpc_call_sync+0x43/0x5e [sunrpc]
> [<f8e7cf05>] ? nfs3_rpc_wrapper+0x17/0x4d [nfs]
> [<f8e7d014>] ? nfs3_proc_fsinfo+0x5e/0x80 [nfs]
> [<f8e6c64c>] ? nfs_probe_fsinfo+0x75/0x462 [nfs]
> [<f8d9f3c4>] ? rpc_ping+0x41/0x4b [sunrpc]
> [<f8d9f7c7>] ? rpc_bind_new_program+0x5b/0x71 [sunrpc]
> [<f8e6de14>] ? nfs_create_server+0x451/0x5fd [nfs]
> [<f8d9f4ef>] ? rpc_free_auth+0x33/0x36 [sunrpc]
> [<c05025e5>] ? kref_put+0x39/0x44
> [<f8d9f415>] ? rpc_release_client+0x47/0x4c [sunrpc]
> [<f8d9f5a6>] ? rpc_shutdown_client+0xb4/0xbc [sunrpc]
> [<f8e7cd39>] ? nfs_mount+0x12b/0x131 [nfs]
> [<f8e74eb8>] ? nfs_get_sb+0x599/0x830 [nfs]
> [<c04887c7>] ? check_object+0x134/0x18b
> [<c0489995>] ? __slab_alloc+0x45c/0x4ea
> [<c048a3a0>] ? __kmalloc+0xbc/0xfb
> [<c044788f>] ? trace_hardirqs_on+0xe9/0x10a
> [<c04a280c>] ? alloc_vfsmnt+0xe3/0x10a
> [<c048f6b1>] ? vfs_kern_mount+0x82/0xf5
> [<c048f768>] ? do_kern_mount+0x32/0xba
> [<c04a2520>] ? do_new_mount+0x42/0x6c
> [<c04a2fa0>] ? do_mount+0x199/0x1b7
> [<c04a1626>] ? copy_mount_options+0x79/0xf9
> [<c04a3024>] ? sys_mount+0x66/0x9e
> [<c0404c3a>] ? syscall_call+0x7/0xb
> =======================
> Code: e5 57 56 89 d6 53 83 ec 04 85 c9 89 45 f0 89 c8 74 4c 8d 59 03 c1 eb 02 8d 14 9d 00 00 00 00 29 ca 85 f6 74 11 c1 e9 02 8b 7d f0 <f3> a5 89 c1 83 e1 03 74 02 f3 a4 85 d2 74 1b 8b 7d f0 89 d1 c1
> EIP: [<f8daac01>] xdr_encode_opaque_fixed+0x2d/0x69 [sunrpc] SS:ESP 0068:f4566a68
> ---[ end trace a8a691a45122c25a ]---
> mount.nfs used greatest stack depth: 812 bytes left
>
>
Here's some disassembly from that function:
0000cbd4 <xdr_encode_opaque_fixed>:
cbd4: 55 push %ebp
cbd5: 89 e5 mov %esp,%ebp
cbd7: 57 push %edi
cbd8: 56 push %esi
cbd9: 89 d6 mov %edx,%esi
cbdb: 53 push %ebx
cbdc: 83 ec 04 sub $0x4,%esp
cbdf: 85 c9 test %ecx,%ecx
cbe1: 89 45 f0 mov %eax,-0x10(%ebp)
cbe4: 89 c8 mov %ecx,%eax
cbe6: 74 4c je cc34 <xdr_encode_opaque_fixed+0x60>
cbe8: 8d 59 03 lea 0x3(%ecx),%ebx
cbeb: c1 eb 02 shr $0x2,%ebx
cbee: 8d 14 9d 00 00 00 00 lea 0x0(,%ebx,4),%edx
cbf5: 29 ca sub %ecx,%edx
cbf7: 85 f6 test %esi,%esi
cbf9: 74 11 je cc0c <xdr_encode_opaque_fixed+0x38>
cbfb: c1 e9 02 shr $0x2,%ecx
cbfe: 8b 7d f0 mov -0x10(%ebp),%edi
cc01: f3 a5 rep movsl %ds:(%esi),%es:(%edi) <<< CRASH HERE
cc03: 89 c1 mov %eax,%ecx
cc05: 83 e1 03 and $0x3,%ecx
cc08: 74 02 je cc0c <xdr_encode_opaque_fixed+0x38>
cc0a: f3 a4 rep movsb %ds:(%esi),%es:(%edi)
cc0c: 85 d2 test %edx,%edx
cc0e: 74 1b je cc2b <xdr_encode_opaque_fixed+0x57>
...I think that corresponds to the memcpy here:
__be32 *xdr_encode_opaque_fixed(__be32 *p, const void *ptr, unsigned int nbytes)
{
if (likely(nbytes != 0)) {
unsigned int quadlen = XDR_QUADLEN(nbytes);
unsigned int padding = (quadlen << 2) - nbytes;
if (ptr != NULL)
memcpy(p, ptr, nbytes); <<<< CRASH HERE
if (padding != 0)
memset((char *)p + nbytes, 0, padding);
...and I think that would mean that %esi held the value of "ptr" at the
time. Looks like it was a bad pointer then? If I'm backtracking through
the stack correctly, then it looks like the nfs_fh pointer passed
in from upper layers was bad? I could be wrong though -- I always have
a hard time unrolling rep instructions.
Cheers,
--
Jeff Layton <jlayton@redhat.com>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: NFS oops in 2.6.26rc4
2008-05-27 19:04 NFS oops in 2.6.26rc4 Dave Jones
2008-05-29 11:48 ` Jeff Layton
@ 2008-05-30 17:59 ` Chuck Lever
2008-05-30 18:21 ` Dave Jones
1 sibling, 1 reply; 16+ messages in thread
From: Chuck Lever @ 2008-05-30 17:59 UTC (permalink / raw)
To: Dave Jones; +Cc: Chuck Lever, Linux Kernel, Trond Myklebust
Hi Dave-
On Tue, May 27, 2008 at 3:04 PM, Dave Jones <davej@redhat.com> wrote:
> When trying to mount an nfs export, I got this oops..
>
> BUG: unable to handle kernel paging request at f4569000
> IP: [<f8daac01>] :sunrpc:xdr_encode_opaque_fixed+0x2d/0x69
> *pde = 34c23163 *pte = 34569160
> Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> Modules linked in: nfs nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ext2 sg button via_rhine via_ircc pcspkr r8169 mii pata_sil680 irda crc_ccitt i2c_viapro i2c_core dm_snapshot dm_zero dm_mirror dm_log dm_mod pata_via ata_generic pata_acpi libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
>
> Pid: 2046, comm: mount.nfs Not tainted (2.6.26-0.33.rc4.fc10.i686 #1)
> EIP: 0060:[<f8daac01>] EFLAGS: 00210212 CPU: 0
> EIP is at xdr_encode_opaque_fixed+0x2d/0x69 [sunrpc]
> EAX: 0000f455 EBX: 00003d16 ECX: 0000349c EDX: 00000003
> ESI: f4569000 EDI: f4d2e450 EBP: f4566a78 ESP: f4566a68
> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process mount.nfs (pid: 2046, ti=f4566000 task=f4580000 task.ti=f4566000)
> Stack: f4d2c26c 55f40000 f4e740c0 f4e740c0 f4566a84 f8daac4f 0000f455 f4566a94
> f8e7ec28 00000000 f4d00600 f4566aac f8da4db8 f8e7ec12 f4e740c0 f4e740c0
> f4d00600 f4566acc f8d9ea9d f4d2c268 f4566e1a f8e7ec12 f4d00600 00000000
> Call Trace:
> [<f8daac4f>] ? xdr_encode_opaque+0x12/0x15 [sunrpc]
> [<f8e7ec28>] ? nfs3_xdr_fhandle+0x16/0x25 [nfs]
> [<f8da4db8>] ? rpcauth_wrap_req+0x66/0x77 [sunrpc]
> [<f8e7ec12>] ? nfs3_xdr_fhandle+0x0/0x25 [nfs]
> [<f8d9ea9d>] ? call_transmit+0x18a/0x1eb [sunrpc]
> [<f8e7ec12>] ? nfs3_xdr_fhandle+0x0/0x25 [nfs]
> [<f8da4450>] ? __rpc_execute+0x69/0x1e1 [sunrpc]
> [<f8da45e3>] ? rpc_execute+0x1b/0x1e [sunrpc]
> [<f8d9f260>] ? rpc_run_task+0x43/0x49 [sunrpc]
> [<f8d9f368>] ? rpc_call_sync+0x43/0x5e [sunrpc]
> [<f8e7cf05>] ? nfs3_rpc_wrapper+0x17/0x4d [nfs]
> [<f8e7d014>] ? nfs3_proc_fsinfo+0x5e/0x80 [nfs]
> [<f8e6c64c>] ? nfs_probe_fsinfo+0x75/0x462 [nfs]
> [<f8d9f3c4>] ? rpc_ping+0x41/0x4b [sunrpc]
> [<f8d9f7c7>] ? rpc_bind_new_program+0x5b/0x71 [sunrpc]
> [<f8e6de14>] ? nfs_create_server+0x451/0x5fd [nfs]
> [<f8d9f4ef>] ? rpc_free_auth+0x33/0x36 [sunrpc]
> [<c05025e5>] ? kref_put+0x39/0x44
> [<f8d9f415>] ? rpc_release_client+0x47/0x4c [sunrpc]
> [<f8d9f5a6>] ? rpc_shutdown_client+0xb4/0xbc [sunrpc]
> [<f8e7cd39>] ? nfs_mount+0x12b/0x131 [nfs]
> [<f8e74eb8>] ? nfs_get_sb+0x599/0x830 [nfs]
> [<c04887c7>] ? check_object+0x134/0x18b
> [<c0489995>] ? __slab_alloc+0x45c/0x4ea
> [<c048a3a0>] ? __kmalloc+0xbc/0xfb
> [<c044788f>] ? trace_hardirqs_on+0xe9/0x10a
> [<c04a280c>] ? alloc_vfsmnt+0xe3/0x10a
> [<c048f6b1>] ? vfs_kern_mount+0x82/0xf5
> [<c048f768>] ? do_kern_mount+0x32/0xba
> [<c04a2520>] ? do_new_mount+0x42/0x6c
> [<c04a2fa0>] ? do_mount+0x199/0x1b7
> [<c04a1626>] ? copy_mount_options+0x79/0xf9
> [<c04a3024>] ? sys_mount+0x66/0x9e
> [<c0404c3a>] ? syscall_call+0x7/0xb
> =======================
> Code: e5 57 56 89 d6 53 83 ec 04 85 c9 89 45 f0 89 c8 74 4c 8d 59 03 c1 eb 02 8d 14 9d 00 00 00 00 29 ca 85 f6 74 11 c1 e9 02 8b 7d f0 <f3> a5 89 c1 83 e1 03 74 02 f3 a4 85 d2 74 1b 8b 7d f0 89 d1 c1
> EIP: [<f8daac01>] xdr_encode_opaque_fixed+0x2d/0x69 [sunrpc] SS:ESP 0068:f4566a68
> ---[ end trace a8a691a45122c25a ]---
> mount.nfs used greatest stack depth: 812 bytes left
The last line suggests you are trying this with 4KB kernel stacks. I
have patches queued for .27 that provide some stack relief in this
code path. If you hit this often, you might want to try with 8KB
stacks to see if that helps.
In the meantime, the traceback is a little funky, so I can't see
directly what the root cause is. Can you provide the full command
line of the mount command that caused this? What "brand" of server
were you trying to mount? How often can you reproduce this?
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: NFS oops in 2.6.26rc4
2008-05-30 17:59 ` Chuck Lever
@ 2008-05-30 18:21 ` Dave Jones
2008-05-30 18:31 ` Trond Myklebust
0 siblings, 1 reply; 16+ messages in thread
From: Dave Jones @ 2008-05-30 18:21 UTC (permalink / raw)
To: chucklever; +Cc: Chuck Lever, Linux Kernel, Trond Myklebust
On Fri, May 30, 2008 at 01:59:12PM -0400, Chuck Lever wrote:
> > When trying to mount an nfs export, I got this oops..
> >
> > BUG: unable to handle kernel paging request at f4569000
> > IP: [<f8daac01>] :sunrpc:xdr_encode_opaque_fixed+0x2d/0x69
> > *pde = 34c23163 *pte = 34569160
> > Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> > Modules linked in: nfs nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ext2 sg button via_rhine via_ircc pcspkr r8169 mii pata_sil680 irda crc_ccitt i2c_viapro i2c_core dm_snapshot dm_zero dm_mirror dm_log dm_mod pata_via ata_generic pata_acpi libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
> > ...
> > Code: e5 57 56 89 d6 53 83 ec 04 85 c9 89 45 f0 89 c8 74 4c 8d 59 03 c1 eb 02 8d 14 9d 00 00 00 00 29 ca 85 f6 74 11 c1 e9 02 8b 7d f0 <f3> a5 89 c1 83 e1 03 74 02 f3 a4 85 d2 74 1b 8b 7d f0 89 d1 c1
> > EIP: [<f8daac01>] xdr_encode_opaque_fixed+0x2d/0x69 [sunrpc] SS:ESP 0068:f4566a68
> > ---[ end trace a8a691a45122c25a ]---
> > mount.nfs used greatest stack depth: 812 bytes left
>
> The last line suggests you are trying this with 4KB kernel stacks. I
> have patches queued for .27 that provide some stack relief in this
> code path. If you hit this often, you might want to try with 8KB
> stacks to see if that helps.
Yes, Fedora kernels have been using 4K stacks for some time.
>From the trace though, it doesn't look like we actually ran out of
stack space ?
> In the meantime, the traceback is a little funky, so I can't see
> directly what the root cause is. Can you provide the full command
> line of the mount command that caused this?
mount point in the fstab is ..
gelk:/mnt/data /mnt/nfs/gelk nfs nfsvers=3,tcp 0 0
> What "brand" of server were you trying to mount?
It's just another linux box. A no-name core2 duo, running 2.6.25.
> How often can you reproduce this?
Seems to do it every time I ask it to.
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: NFS oops in 2.6.26rc4
2008-05-30 18:21 ` Dave Jones
@ 2008-05-30 18:31 ` Trond Myklebust
2008-05-30 19:03 ` Dave Jones
0 siblings, 1 reply; 16+ messages in thread
From: Trond Myklebust @ 2008-05-30 18:31 UTC (permalink / raw)
To: Dave Jones; +Cc: chucklever, Chuck Lever, Linux Kernel
On Fri, 2008-05-30 at 14:21 -0400, Dave Jones wrote:
> mount point in the fstab is ..
>
> gelk:/mnt/data /mnt/nfs/gelk nfs nfsvers=3,tcp 0 0
>
> > What "brand" of server were you trying to mount?
>
> It's just another linux box. A no-name core2 duo, running 2.6.25.
>
> > How often can you reproduce this?
>
> Seems to do it every time I ask it to.
Could you provide us with a binary tcpdump in that case? I'd love to
have a look at the actual filehandle the server is producing.
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: NFS oops in 2.6.26rc4
2008-05-30 18:31 ` Trond Myklebust
@ 2008-05-30 19:03 ` Dave Jones
2008-05-30 19:37 ` Chuck Lever
0 siblings, 1 reply; 16+ messages in thread
From: Dave Jones @ 2008-05-30 19:03 UTC (permalink / raw)
To: Trond Myklebust; +Cc: chucklever, Chuck Lever, Linux Kernel
On Fri, May 30, 2008 at 11:31:48AM -0700, Trond Myklebust wrote:
> On Fri, 2008-05-30 at 14:21 -0400, Dave Jones wrote:
>
> > mount point in the fstab is ..
> >
> > gelk:/mnt/data /mnt/nfs/gelk nfs nfsvers=3,tcp 0 0
> >
> > > What "brand" of server were you trying to mount?
> >
> > It's just another linux box. A no-name core2 duo, running 2.6.25.
> >
> > > How often can you reproduce this?
> >
> > Seems to do it every time I ask it to.
>
> Could you provide us with a binary tcpdump in that case? I'd love to
> have a look at the actual filehandle the server is producing.
This is from the client side: http://www.codemonkey.org.uk/junk/tcp.out
Wireshark picks up some of those packets as being 'malformed', which
could be a clue ?
Something else of note which I hadn't seen before, usually things lock
up just after that first oops. For some reason, today it survived
a little longer, but things really went downhill fast.
After that first oops scrolled off the screen, and I saved the wireshark
output to disk, I got this..
=============================================================================
BUG files_cache (Tainted: G D ): Padding overwritten. 0xf4d31fc0-0xf4d31fff
-----------------------------------------------------------------------------
INFO: Slab 0xc1b902b8 objects=9 used=7 fp=0xf4d31a80 flags=0x400000c3
Pid: 1910, comm: hald-runner Tainted: G D 2.6.26-0.33.rc4.fc10.i686 #1
[<c04882df>] slab_err+0x51/0x58
[<c041b326>] ? kernel_map_pages+0xf2/0x109
[<c041d019>] ? kmap_atomic_prot+0x1dc/0x1de
[<c048836b>] slab_pad_check+0x85/0xc1
[<c0488439>] check_slab+0x92/0x9f
[<c04898aa>] __slab_alloc+0x371/0x4ea
[<c0489ced>] kmem_cache_alloc+0x62/0xc4
[<c049ffa0>] ? dup_fd+0x22/0x2d4
[<c049ffa0>] ? dup_fd+0x22/0x2d4
[<c049ffa0>] dup_fd+0x22/0x2d4
[<c0461d87>] ? audit_alloc+0xa7/0xec
[<c0461d87>] ? audit_alloc+0xa7/0xec
[<c0429b53>] copy_process+0x64c/0x1130
[<c042a6e6>] do_fork+0xaf/0x1e4
[<c0461cb6>] ? audit_syscall_entry+0xf9/0x123
[<c0403556>] sys_clone+0x1f/0x21
[<c0404c3a>] syscall_call+0x7/0xb
=======================
Padding 0xf4d31fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Padding 0xf4d31fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Padding 0xf4d31fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Padding 0xf4d31ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
FIX files_cache: Restoring 0xf4d31000-0xf4d31fff=0x5a
=============================================================================
BUG files_cache (Tainted: G D ): Redzone overwritten
-----------------------------------------------------------------------------
INFO: 0xf4d31c00-0xf4d31c03. First byte 0x5a instead of 0xbb
INFO: Allocated in 0x5a5a5a5a age=2782247384 cpu=1515870810 pid=1515870810
INFO: Freed in 0x5a5a5a5a age=2782247384 cpu=1515870810 pid=1515870810
INFO: Slab 0xc1b902b8 objects=9 used=7 fp=0xf4d31a80 flags=0x400000c3
INFO: Object 0xf4d31a80 @offset=2688 fp=0x5a5a5a5a
Bytes b4 0xf4d31a70: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Object 0xf4d31a80: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Object 0xf4d31a90: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Object 0xf4d31aa0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Object 0xf4d31ab0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Object 0xf4d31ac0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Object 0xf4d31ad0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Object 0xf4d31ae0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Object 0xf4d31af0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Redzone 0xf4d31c00: 5a 5a 5a 5a ZZZZ
Padding 0xf4d31c28: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
Padding 0xf4d31c38: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
Pid: 1910, comm: hald-runner Tainted: G D 2.6.26-0.33.rc4.fc10.i686 #1
[<c04885b7>] print_trailer+0xe1/0xe9
[<c0488640>] check_bytes_and_report+0x81/0xa4
[<c04886d7>] check_object+0x44/0x18b
[<c0489912>] __slab_alloc+0x3d9/0x4ea
[<c0489ced>] kmem_cache_alloc+0x62/0xc4
[<c049ffa0>] ? dup_fd+0x22/0x2d4
[<c049ffa0>] ? dup_fd+0x22/0x2d4
[<c049ffa0>] dup_fd+0x22/0x2d4
[<c0461d87>] ? audit_alloc+0xa7/0xec
[<c0461d87>] ? audit_alloc+0xa7/0xec
[<c0429b53>] copy_process+0x64c/0x1130
[<c042a6e6>] do_fork+0xaf/0x1e4
[<c0461cb6>] ? audit_syscall_entry+0xf9/0x123
[<c0403556>] sys_clone+0x1f/0x21
[<c0404c3a>] syscall_call+0x7/0xb
=======================
FIX files_cache: Restoring 0xf4d31c00-0xf4d31c03=0xbb
FIX files_cache: Marking all objects used
BUG: unable to handle kernel paging request at 5a5a5a5a
IP: [<c048e39a>] fget_light+0x59/0xb9
*pde = 00000000
Oops: 0000 [#2] SMP DEBUG_PAGEALLOC
Modules linked in: nfs nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ext2 sg via_ircc button irda crc_ccitt pata_sil680 via_rhine r8169 i2c_viapro pcspkr mii i2c_core dm_snapshot dm_zero dm_mirror dm_log dm_mod pata_via ata_generic pata_acpi libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
Pid: 1817, comm: sendmail Tainted: G D (2.6.26-0.33.rc4.fc10.i686 #1)
EIP: 0060:[<c048e39a>] EFLAGS: 00210292 CPU: 0
EIP is at fget_light+0x59/0xb9
EAX: 00200246 EBX: f4d318c0 ECX: 00000000 EDX: 5a5a5a5a
ESI: 00000004 EDI: f4ee3c00 EBP: f4d43b70 ESP: f4d43b58
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process sendmail (pid: 1817, ti=f4d43000 task=f4ef8000 task.ti=f4d43000)
Stack: f4d43e18 006c6900 00000000 00000010 00000000 f4ee3c00 f4d43e28 c0499337
f4d43e44 c041ad7a f4d43e48 00000020 f4d43f9c f4d43f44 00000000 00000000
f4d43e50 f4d43e54 f4d43e58 f4d43e44 f4d43e48 f4d43e4c 00000010 00000000
Call Trace:
[<c0499337>] ? do_select+0x2e1/0x4f5
[<c041ad7a>] ? __change_page_attr_set_clr+0x1c3/0x67d
[<c0499ba9>] ? __pollwait+0x0/0xb3
[<c0422900>] ? default_wake_function+0x0/0xd
[<c040974f>] ? native_sched_clock+0xac/0xc8
[<c040974f>] ? native_sched_clock+0xac/0xc8
[<c041b326>] ? kernel_map_pages+0xf2/0x109
[<c040974f>] ? native_sched_clock+0xac/0xc8
[<c0447707>] ? mark_held_locks+0x4e/0x66
[<c04479be>] ? debug_check_no_locks_freed+0x10e/0x123
[<c044788f>] ? trace_hardirqs_on+0xe9/0x10a
[<c041b326>] ? kernel_map_pages+0xf2/0x109
[<c040974f>] ? native_sched_clock+0xac/0xc8
[<c040974f>] ? native_sched_clock+0xac/0xc8
[<c0448313>] ? __lock_acquire+0x564/0xc18
[<c040974f>] ? native_sched_clock+0xac/0xc8
[<c041aa06>] ? lookup_address+0x68/0x88
[<c0448313>] ? __lock_acquire+0x564/0xc18
[<c041ad7a>] ? __change_page_attr_set_clr+0x1c3/0x67d
[<c040974f>] ? native_sched_clock+0xac/0xc8
[<c040974f>] ? native_sched_clock+0xac/0xc8
[<c0448313>] ? __lock_acquire+0x564/0xc18
[<c040974f>] ? native_sched_clock+0xac/0xc8
[<c040946e>] ? sched_clock+0x8/0xb
[<c04460d1>] ? lock_release_holdtime+0x1a/0x115
[<c0499577>] ? core_sys_select+0x2c/0x2d8
[<c0499738>] ? core_sys_select+0x1ed/0x2d8
[<c04887c7>] ? check_object+0x134/0x18b
[<c0489c53>] ? __slab_free+0x230/0x268
[<c040974f>] ? native_sched_clock+0xac/0xc8
[<c04be9f2>] ? proc_destroy_inode+0x10/0x12
[<c040974f>] ? native_sched_clock+0xac/0xc8
[<c040946e>] ? sched_clock+0x8/0xb
[<c0499ae9>] ? sys_select+0x88/0x148
[<c040ac0b>] ? do_syscall_trace+0x138/0x17f
[<c0404c3a>] ? syscall_call+0x7/0xb
=======================
Code: 8d 04 b5 00 00 00 00 03 42 04 8b 18 eb 68 68 7d e3 48 c0 31 d2 6a 01 31 c9 6a 02 b8 80 62 74 c0 e8 33 a6 fb ff 8b 53 04 83 c4 0c <3b> 32 73 31 8d 04 b5 00 00 00 00 03 42 04 8b 18 85 db 74 23 8d
EIP: [<c048e39a>] fget_light+0x59/0xb9 SS:ESP 0068:f4d43b58
---[ end trace 013a4d2d914c796e ]---
BUG: unable to handle kernel paging request at 5a5a5a5a
IP: [<c048e39a>] fget_light+0x59/0xb9
*pde = 00000000
Oops: 0000 [#3] SMP DEBUG_PAGEALLOC
Modules linked in: nfs nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ext2 sg via_ircc button irda crc_ccitt pata_sil680 via_rhine r8169 i2c_viapro pcspkr mii i2c_core dm_snapshot dm_zero dm_mirror dm_log dm_mod pata_via ata_generic pata_acpi libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
Pid: 1721, comm: ntpd Tainted: G D (2.6.26-0.33.rc4.fc10.i686 #1)
EIP: 0060:[<c048e39a>] EFLAGS: 00210292 CPU: 0
EIP is at fget_light+0x59/0xb9
EAX: 00200246 EBX: f4d31000 ECX: 00000000 EDX: 5a5a5a5a
ESI: 00000010 EDI: f4c4bb00 EBP: f4d4eb70 ESP: f4d4eb58
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process ntpd (pid: 1721, ti=f4d4e000 task=f4d02fc0 task.ti=f4d4e000)
Stack: f4d4ee18 00000000 00000000 00010000 00000000 f4c4bb00 f4d4ee28 c0499337
f4d4ee44 f4d03030 f4d4ee48 00000020 f4d4ef9c f4d4ef44 00000000 00000000
f4d4ee50 f4d4ee54 f4d4ee58 f4d4ee44 f4d4ee48 f4d4ee4c 00070000 00000000
Call Trace:
[<c0499337>] ? do_select+0x2e1/0x4f5
[<c0499ba9>] ? __pollwait+0x0/0xb3
[<c0422900>] ? default_wake_function+0x0/0xd
[<c0422900>] ? default_wake_function+0x0/0xd
[<c0422900>] ? default_wake_function+0x0/0xd
[<c040946e>] ? sched_clock+0x8/0xb
[<c04dd3c8>] ? avc_has_perm_noaudit+0x3a6/0x3c4
[<c040974f>] ? native_sched_clock+0xac/0xc8
[<c040946e>] ? sched_clock+0x8/0xb
[<c04460d1>] ? lock_release_holdtime+0x1a/0x115
[<c0644bb8>] ? _spin_unlock_irqrestore+0x40/0x50
[<c044788f>] ? trace_hardirqs_on+0xe9/0x10a
[<c05ce52f>] ? __skb_recv_datagram+0xd8/0x1f5
[<c04de040>] ? socket_has_perm+0x53/0x5d
[<c060a0ce>] ? udp_recvmsg+0x5e/0x21e
[<c05c8909>] ? sock_common_recvmsg+0x31/0x4a
[<c040974f>] ? native_sched_clock+0xac/0xc8
[<c0448313>] ? __lock_acquire+0x564/0xc18
[<c040974f>] ? native_sched_clock+0xac/0xc8
[<c040946e>] ? sched_clock+0x8/0xb
[<c04460d1>] ? lock_release_holdtime+0x1a/0x115
[<c0499577>] ? core_sys_select+0x2c/0x2d8
[<c0499738>] ? core_sys_select+0x1ed/0x2d8
[<c05cdd01>] ? verify_iovec+0x40/0x6f
[<c05c8155>] ? sys_recvmsg+0xe8/0x17b
[<c05c7ed0>] ? sys_sendto+0xa4/0xc3
[<c04887c7>] ? check_object+0x134/0x18b
[<c0489c53>] ? __slab_free+0x230/0x268
[<c048af05>] ? kmem_cache_free+0xc9/0xde
[<c044788f>] ? trace_hardirqs_on+0xe9/0x10a
[<c049c77d>] ? d_free+0x3b/0x4d
[<c0499ae9>] ? sys_select+0x88/0x148
[<c040ac0b>] ? do_syscall_trace+0x138/0x17f
[<c0404c3a>] ? syscall_call+0x7/0xb
=======================
Code: 8d 04 b5 00 00 00 00 03 42 04 8b 18 eb 68 68 7d e3 48 c0 31 d2 6a 01 31 c9 6a 02 b8 80 62 74 c0 e8 33 a6 fb ff 8b 53 04 83 c4 0c <3b> 32 73 31 8d 04 b5 00 00 00 00 03 42 04 8b 18 85 db 74 23 8d
EIP: [<c048e39a>] fget_light+0x59/0xb9 SS:ESP 0068:f4d4eb58
---[ end trace 013a4d2d914c796e ]---
------------[ cut here ]------------
kernel BUG at mm/mmap.c:2160!
invalid opcode: 0000 [#4] SMP DEBUG_PAGEALLOC
Modules linked in: nfs nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ext2 sg via_ircc button irda crc_ccitt pata_sil680 via_rhine r8169 i2c_viapro pcspkr mii i2c_core dm_snapshot dm_zero dm_mirror dm_log dm_mod pata_via ata_generic pata_acpi libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
Pid: 1721, comm: ntpd Tainted: G D (2.6.26-0.33.rc4.fc10.i686 #1)
EIP: 0060:[<c047b51e>] EFLAGS: 00210202 CPU: 0
EIP is at exit_mmap+0xca/0xd6
EAX: 00000000 EBX: c1e1364c ECX: 00000000 EDX: f4d40a00
ESI: 00000000 EDI: f731afc0 EBP: f4d4e948 ESP: f4d4e934
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process ntpd (pid: 1721, ti=f4d4e000 task=f4d02fc0 task.ti=f4d4e000)
Stack: 000000c1 c1e1364c f731afc0 f731b058 f4d02fc0 f4d4e958 c0428f2d f731afc0
f731affc f4d4e970 c042c79c f4d033ec 0000000b f4d02fc0 00000000 f4d4e9a4
c042de84 00200206 00000001 f4d4e990 c0648c47 c06eb561 f4d4e99c f4d4e9a4
Call Trace:
[<c0428f2d>] ? mmput+0x3a/0x8b
[<c042c79c>] ? exit_mm+0xd8/0xde
[<c042de84>] ? do_exit+0x1fc/0x635
[<c042ad7a>] ? oops_exit+0x23/0x28
[<c0406262>] ? die+0x15c/0x164
[<c0646e58>] ? do_page_fault+0x618/0x70a
[<c040974f>] ? native_sched_clock+0xac/0xc8
[<c040974f>] ? native_sched_clock+0xac/0xc8
[<c0500b99>] ? __next_cpu+0x15/0x25
[<c041f172>] ? find_busiest_group+0x23f/0x5d3
[<c0423826>] ? hrtick_set+0x80/0xe5
[<c043316b>] ? lock_timer_base+0x1f/0x3e
[<c0646840>] ? do_page_fault+0x0/0x70a
[<c064521a>] ? error_code+0x72/0x78
[<c048e39a>] ? fget_light+0x59/0xb9
[<c0499337>] ? do_select+0x2e1/0x4f5
[<c0499ba9>] ? __pollwait+0x0/0xb3
[<c0422900>] ? default_wake_function+0x0/0xd
[<c0422900>] ? default_wake_function+0x0/0xd
[<c0422900>] ? default_wake_function+0x0/0xd
[<c040946e>] ? sched_clock+0x8/0xb
[<c04dd3c8>] ? avc_has_perm_noaudit+0x3a6/0x3c4
[<c040974f>] ? native_sched_clock+0xac/0xc8
[<c040946e>] ? sched_clock+0x8/0xb
[<c04460d1>] ? lock_release_holdtime+0x1a/0x115
[<c0644bb8>] ? _spin_unlock_irqrestore+0x40/0x50
[<c044788f>] ? trace_hardirqs_on+0xe9/0x10a
[<c05ce52f>] ? __skb_recv_datagram+0xd8/0x1f5
[<c04de040>] ? socket_has_perm+0x53/0x5d
[<c060a0ce>] ? udp_recvmsg+0x5e/0x21e
[<c05c8909>] ? sock_common_recvmsg+0x31/0x4a
[<c040974f>] ? native_sched_clock+0xac/0xc8
[<c0448313>] ? __lock_acquire+0x564/0xc18
[<c040974f>] ? native_sched_clock+0xac/0xc8
[<c040946e>] ? sched_clock+0x8/0xb
[<c04460d1>] ? lock_release_holdtime+0x1a/0x115
[<c0499577>] ? core_sys_select+0x2c/0x2d8
[<c0499738>] ? core_sys_select+0x1ed/0x2d8
[<c05cdd01>] ? verify_iovec+0x40/0x6f
[<c05c8155>] ? sys_recvmsg+0xe8/0x17b
[<c05c7ed0>] ? sys_sendto+0xa4/0xc3
[<c04887c7>] ? check_object+0x134/0x18b
[<c0489c53>] ? __slab_free+0x230/0x268
[<c048af05>] ? kmem_cache_free+0xc9/0xde
[<c044788f>] ? trace_hardirqs_on+0xe9/0x10a
[<c049c77d>] ? d_free+0x3b/0x4d
[<c0499ae9>] ? sys_select+0x88/0x148
[<c040ac0b>] ? do_syscall_trace+0x138/0x17f
[<c0404c3a>] ? syscall_call+0x7/0xb
=======================
Code: e8 95 50 00 00 c7 43 04 00 00 00 00 89 f8 e8 d2 7e f8 ff eb 09 89 f0 e8 ea fe ff ff 89 c6 85 f6 75 f3 83 bf cc 00 00 00 00 74 04 <0f> 0b eb fe 8d 65 f4 5b 5e 5f 5d c3 55 89 e5 57 89 c7 56 53 89
EIP: [<c047b51e>] exit_mmap+0xca/0xd6 SS:ESP 0068:f4d4e934
---[ end trace 013a4d2d914c796e ]---
Fixing recursive fault but reboot is needed!
device eth1 left promiscuous mode
BUG: unable to handle kernel paging request at 5a5a5a5a
IP: [<c048bc69>] get_unused_fd_flags+0x3d/0xd3
*pde = 00000000
Oops: 0000 [#5] SMP DEBUG_PAGEALLOC
Modules linked in: nfs nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ext2 sg via_ircc button irda crc_ccitt pata_sil680 via_rhine r8169 i2c_viapro pcspkr mii i2c_core dm_snapshot dm_zero dm_mirror dm_log dm_mod pata_via ata_generic pata_acpi libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
Pid: 1837, comm: crond Tainted: G D (2.6.26-0.33.rc4.fc10.i686 #1)
EIP: 0060:[<c048bc69>] EFLAGS: 00010282 CPU: 0
EIP is at get_unused_fd_flags+0x3d/0xd3
EAX: f4d31e00 EBX: 00098800 ECX: 5a5a5a5a EDX: f4d31e04
ESI: f788d140 EDI: 5a5a5a5a EBP: f4f18f7c ESP: f4f18f60
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process crond (pid: 1837, ti=f4f18000 task=f4f2df80 task.ti=f4f18000)
Stack: 00098800 f4d31e80 f4d31e04 f4d31e00 00098800 f788d140 ffffff9c f4f18f98
c048bd2a f788d140 00000000 09aa3708 00098800 080514b5 f4f18fb0 c048bdf7
09aa3708 080514b5 48401455 080514b5 f4f18000 c0404c3a 080514b5 00098800
Call Trace:
[<c048bd2a>] ? do_sys_open+0x2b/0xb6
[<c048bdf7>] ? sys_open+0x1e/0x26
[<c0404c3a>] ? syscall_call+0x7/0xb
=======================
Code: 8b 80 c4 03 00 00 89 45 f0 83 e8 80 89 45 e8 e8 4c 90 1b 00 8b 45 f0 83 c0 04 89 45 ec 8b 55 ec 8b 45 f0 8b 3a 8b 88 a0 00 00 00 <8b> 17 8b 47 0c e8 b5 50 07 00 89 c3 64 a1 00 a0 7b c0 8b 80 cc
EIP: [<c048bc69>] get_unused_fd_flags+0x3d/0xd3 SS:ESP 0068:f4f18f60
---[ end trace 013a4d2d914c796e ]---
It survived a 'dmesg ; scp dmesg davej@gelk', and then wedged solid.
So as well as the oops, it seems we're corrupting memory too.
For reference, this kernel has both SLUB_DEBUG and PAGEALLOC_DEBUG enabled.
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: NFS oops in 2.6.26rc4
2008-05-30 19:03 ` Dave Jones
@ 2008-05-30 19:37 ` Chuck Lever
2008-06-04 14:19 ` Dave Jones
0 siblings, 1 reply; 16+ messages in thread
From: Chuck Lever @ 2008-05-30 19:37 UTC (permalink / raw)
To: Dave Jones; +Cc: Trond Myklebust, chucklever, Linux Kernel
On May 30, 2008, at 3:03 PM, Dave Jones wrote:
> On Fri, May 30, 2008 at 11:31:48AM -0700, Trond Myklebust wrote:
>> On Fri, 2008-05-30 at 14:21 -0400, Dave Jones wrote:
>>
>>> mount point in the fstab is ..
>>>
>>> gelk:/mnt/data /mnt/nfs/gelk nfs
>>> nfsvers=3,tcp 0 0
>>>
>>>> What "brand" of server were you trying to mount?
>>>
>>> It's just another linux box. A no-name core2 duo, running 2.6.25.
>>>
>>>> How often can you reproduce this?
>>>
>>> Seems to do it every time I ask it to.
>>
>> Could you provide us with a binary tcpdump in that case? I'd love to
>> have a look at the actual filehandle the server is producing.
>
> This is from the client side: http://www.codemonkey.org.uk/junk/
> tcp.out
> Wireshark picks up some of those packets as being 'malformed', which
> could be a clue ?
Wireshark is overly cautious, and sometimes throws spurious warnings.
> Something else of note which I hadn't seen before, usually things lock
> up just after that first oops. For some reason, today it survived
> a little longer, but things really went downhill fast.
> It survived a 'dmesg ; scp dmesg davej@gelk', and then wedged solid.
> So as well as the oops, it seems we're corrupting memory too.
> For reference, this kernel has both SLUB_DEBUG and PAGEALLOC_DEBUG
> enabled.
I haven't seen this kind of problem here with .26, but yes, it does
look like something is clobbering memory during an NFS mount.
I introduced some NFS mount parsing changes in this commit range:
2d767432..82d101d5
A quick bisect should show which, if any of these, is the guilty
party. If any of these are the problem, I suspect it's 3f8400d1.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: NFS oops in 2.6.26rc4
2008-05-30 19:37 ` Chuck Lever
@ 2008-06-04 14:19 ` Dave Jones
2008-06-04 18:13 ` Chuck Lever
0 siblings, 1 reply; 16+ messages in thread
From: Dave Jones @ 2008-06-04 14:19 UTC (permalink / raw)
To: Chuck Lever; +Cc: Trond Myklebust, chucklever, Linux Kernel
On Fri, May 30, 2008 at 03:37:01PM -0400, Chuck Lever wrote:
> > Something else of note which I hadn't seen before, usually things lock
> > up just after that first oops. For some reason, today it survived
> > a little longer, but things really went downhill fast.
> > It survived a 'dmesg ; scp dmesg davej@gelk', and then wedged solid.
> > So as well as the oops, it seems we're corrupting memory too.
> > For reference, this kernel has both SLUB_DEBUG and PAGEALLOC_DEBUG
> > enabled.
>
> I haven't seen this kind of problem here with .26, but yes, it does
> look like something is clobbering memory during an NFS mount.
>
> I introduced some NFS mount parsing changes in this commit range:
>
> 2d767432..82d101d5
>
> A quick bisect should show which, if any of these, is the guilty
> party. If any of these are the problem, I suspect it's 3f8400d1.
I didn't get time to try this out yet (hopefully tomorrow).
In the meantime, we've just gotten word of another user seeing memory
corruption with nfs - https://bugzilla.redhat.com/show_bug.cgi?id=449958
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: NFS oops in 2.6.26rc4
2008-06-04 14:19 ` Dave Jones
@ 2008-06-04 18:13 ` Chuck Lever
2008-06-04 18:20 ` Dave Jones
0 siblings, 1 reply; 16+ messages in thread
From: Chuck Lever @ 2008-06-04 18:13 UTC (permalink / raw)
To: Dave Jones; +Cc: Trond Myklebust, chucklever, Linux Kernel
On Jun 4, 2008, at 10:19 AM, Dave Jones wrote:
> On Fri, May 30, 2008 at 03:37:01PM -0400, Chuck Lever wrote:
>
>>> Something else of note which I hadn't seen before, usually things
>>> lock
>>> up just after that first oops. For some reason, today it survived
>>> a little longer, but things really went downhill fast.
>>> It survived a 'dmesg ; scp dmesg davej@gelk', and then wedged solid.
>>> So as well as the oops, it seems we're corrupting memory too.
>>> For reference, this kernel has both SLUB_DEBUG and PAGEALLOC_DEBUG
>>> enabled.
>>
>> I haven't seen this kind of problem here with .26, but yes, it does
>> look like something is clobbering memory during an NFS mount.
>>
>> I introduced some NFS mount parsing changes in this commit range:
>>
>> 2d767432..82d101d5
>>
>> A quick bisect should show which, if any of these, is the guilty
>> party. If any of these are the problem, I suspect it's 3f8400d1.
>
> I didn't get time to try this out yet (hopefully tomorrow).
> In the meantime, we've just gotten word of another user seeing memory
> corruption with nfs - https://bugzilla.redhat.com/show_bug.cgi?id=449958
449958 could very well be the same problem. The stack traceback is a
lot cleaner than the one you originally sent, but there are a lot of
similarities. (I doubt this is related to symlinks, as the comment
suggests).
Is commit 86d61d863 applied to the current rawhide kernel?
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: NFS oops in 2.6.26rc4
2008-06-04 18:13 ` Chuck Lever
@ 2008-06-04 18:20 ` Dave Jones
2008-06-04 19:13 ` Chuck Lever
2008-06-23 15:40 ` Trond Myklebust
0 siblings, 2 replies; 16+ messages in thread
From: Dave Jones @ 2008-06-04 18:20 UTC (permalink / raw)
To: Chuck Lever; +Cc: Trond Myklebust, chucklever, Linux Kernel
On Wed, Jun 04, 2008 at 02:13:08PM -0400, Chuck Lever wrote:
>
> On Jun 4, 2008, at 10:19 AM, Dave Jones wrote:
>
> > On Fri, May 30, 2008 at 03:37:01PM -0400, Chuck Lever wrote:
> >
> >>> Something else of note which I hadn't seen before, usually things
> >>> lock
> >>> up just after that first oops. For some reason, today it survived
> >>> a little longer, but things really went downhill fast.
> >>> It survived a 'dmesg ; scp dmesg davej@gelk', and then wedged solid.
> >>> So as well as the oops, it seems we're corrupting memory too.
> >>> For reference, this kernel has both SLUB_DEBUG and PAGEALLOC_DEBUG
> >>> enabled.
> >>
> >> I haven't seen this kind of problem here with .26, but yes, it does
> >> look like something is clobbering memory during an NFS mount.
> >>
> >> I introduced some NFS mount parsing changes in this commit range:
> >>
> >> 2d767432..82d101d5
> >>
> >> A quick bisect should show which, if any of these, is the guilty
> >> party. If any of these are the problem, I suspect it's 3f8400d1.
> >
> > I didn't get time to try this out yet (hopefully tomorrow).
> > In the meantime, we've just gotten word of another user seeing memory
> > corruption with nfs - https://bugzilla.redhat.com/show_bug.cgi?id=449958
>
> 449958 could very well be the same problem. The stack traceback is a
> lot cleaner than the one you originally sent, but there are a lot of
> similarities. (I doubt this is related to symlinks, as the comment
> suggests).
>
> Is commit 86d61d863 applied to the current rawhide kernel?
That kernel was .26rc4.git2, so unless it's only gone in in the last day
or two, yes. (Bandwidth impaired right now, and no local git repo to check)
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: NFS oops in 2.6.26rc4
2008-06-04 18:20 ` Dave Jones
@ 2008-06-04 19:13 ` Chuck Lever
2008-06-23 15:40 ` Trond Myklebust
1 sibling, 0 replies; 16+ messages in thread
From: Chuck Lever @ 2008-06-04 19:13 UTC (permalink / raw)
To: Dave Jones, Chuck Lever, Trond Myklebust, chucklever,
Linux Kernel
On Wed, Jun 4, 2008 at 2:20 PM, Dave Jones <davej@redhat.com> wrote:
> On Wed, Jun 04, 2008 at 02:13:08PM -0400, Chuck Lever wrote:
> >
> > On Jun 4, 2008, at 10:19 AM, Dave Jones wrote:
> >
> > > On Fri, May 30, 2008 at 03:37:01PM -0400, Chuck Lever wrote:
> > >
> > >>> Something else of note which I hadn't seen before, usually things
> > >>> lock
> > >>> up just after that first oops. For some reason, today it survived
> > >>> a little longer, but things really went downhill fast.
> > >>> It survived a 'dmesg ; scp dmesg davej@gelk', and then wedged solid.
> > >>> So as well as the oops, it seems we're corrupting memory too.
> > >>> For reference, this kernel has both SLUB_DEBUG and PAGEALLOC_DEBUG
> > >>> enabled.
> > >>
> > >> I haven't seen this kind of problem here with .26, but yes, it does
> > >> look like something is clobbering memory during an NFS mount.
> > >>
> > >> I introduced some NFS mount parsing changes in this commit range:
> > >>
> > >> 2d767432..82d101d5
> > >>
> > >> A quick bisect should show which, if any of these, is the guilty
> > >> party. If any of these are the problem, I suspect it's 3f8400d1.
> > >
> > > I didn't get time to try this out yet (hopefully tomorrow).
> > > In the meantime, we've just gotten word of another user seeing memory
> > > corruption with nfs - https://bugzilla.redhat.com/show_bug.cgi?id=449958
> >
> > 449958 could very well be the same problem. The stack traceback is a
> > lot cleaner than the one you originally sent, but there are a lot of
> > similarities. (I doubt this is related to symlinks, as the comment
> > suggests).
> >
> > Is commit 86d61d863 applied to the current rawhide kernel?
>
> That kernel was .26rc4.git2, so unless it's only gone in in the last day
> or two, yes. (Bandwidth impaired right now, and no local git repo to check)
Argh, I was afraid of that. I expected that commit to improve things.
Maybe it did, but this is a different problem? You're going to force
me to actually think about this. :-)
In any event, a bisect would be helpful here, when you can. I will
also stare at the traceback in 449958 and see if anything new jumps
out. It's certainly taken the heat off of the NFS client; it looks
like an rpcbind issue.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: NFS oops in 2.6.26rc4
2008-06-04 18:20 ` Dave Jones
2008-06-04 19:13 ` Chuck Lever
@ 2008-06-23 15:40 ` Trond Myklebust
2008-06-23 15:55 ` Dave Jones
2008-06-23 23:11 ` Dave Jones
1 sibling, 2 replies; 16+ messages in thread
From: Trond Myklebust @ 2008-06-23 15:40 UTC (permalink / raw)
To: Dave Jones; +Cc: Chuck Lever, chucklever, Linux Kernel
[-- Attachment #1: Type: text/plain, Size: 223 bytes --]
Hi Dave,
Any chance you could give the attached patch a whirl to see if it fixes
the NFS oops you reported?
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
[-- Attachment #2: linux-2.6.26-001-reduce_mount_stack_usage.dif --]
[-- Type: message/rfc822, Size: 5179 bytes --]
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: NFS: Reduce the NFS mount code stack usage.
Date: Thu, 19 Jun 2008 14:20:11 -0400
Message-ID: <1214235616.7205.18.camel@localhost>
This appears to fix the Oops reported in
http://bugzilla.kernel.org/show_bug.cgi?id=10826
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
fs/nfs/super.c | 68 +++++++++++++++++++++++++++++++++-----------------------
1 files changed, 40 insertions(+), 28 deletions(-)
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 2a4a024..dac663d 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -1216,8 +1216,6 @@ static int nfs_validate_mount_data(void *options,
{
struct nfs_mount_data *data = (struct nfs_mount_data *)options;
- memset(args, 0, sizeof(*args));
-
if (data == NULL)
goto out_no_data;
@@ -1585,24 +1583,29 @@ static int nfs_get_sb(struct file_system_type *fs_type,
{
struct nfs_server *server = NULL;
struct super_block *s;
- struct nfs_fh mntfh;
- struct nfs_parsed_mount_data data;
+ struct nfs_parsed_mount_data *data;
+ struct nfs_fh *mntfh;
struct dentry *mntroot;
int (*compare_super)(struct super_block *, void *) = nfs_compare_super;
struct nfs_sb_mountdata sb_mntdata = {
.mntflags = flags,
};
- int error;
+ int error = -ENOMEM;
- security_init_mnt_opts(&data.lsm_opts);
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ mntfh = kzalloc(sizeof(*mntfh), GFP_KERNEL);
+ if (data == NULL || mntfh == NULL)
+ goto out_free_fh;
+
+ security_init_mnt_opts(&data->lsm_opts);
/* Validate the mount data */
- error = nfs_validate_mount_data(raw_data, &data, &mntfh, dev_name);
+ error = nfs_validate_mount_data(raw_data, data, mntfh, dev_name);
if (error < 0)
goto out;
/* Get a volume representation */
- server = nfs_create_server(&data, &mntfh);
+ server = nfs_create_server(data, mntfh);
if (IS_ERR(server)) {
error = PTR_ERR(server);
goto out;
@@ -1630,16 +1633,16 @@ static int nfs_get_sb(struct file_system_type *fs_type,
if (!s->s_root) {
/* initial superblock/root creation */
- nfs_fill_super(s, &data);
+ nfs_fill_super(s, data);
}
- mntroot = nfs_get_root(s, &mntfh);
+ mntroot = nfs_get_root(s, mntfh);
if (IS_ERR(mntroot)) {
error = PTR_ERR(mntroot);
goto error_splat_super;
}
- error = security_sb_set_mnt_opts(s, &data.lsm_opts);
+ error = security_sb_set_mnt_opts(s, &data->lsm_opts);
if (error)
goto error_splat_root;
@@ -1649,9 +1652,12 @@ static int nfs_get_sb(struct file_system_type *fs_type,
error = 0;
out:
- kfree(data.nfs_server.hostname);
- kfree(data.mount_server.hostname);
- security_free_mnt_opts(&data.lsm_opts);
+ kfree(data->nfs_server.hostname);
+ kfree(data->mount_server.hostname);
+ security_free_mnt_opts(&data->lsm_opts);
+out_free_fh:
+ kfree(mntfh);
+ kfree(data);
return error;
out_err_nosb:
@@ -1800,8 +1806,6 @@ static int nfs4_validate_mount_data(void *options,
struct nfs4_mount_data *data = (struct nfs4_mount_data *)options;
char *c;
- memset(args, 0, sizeof(*args));
-
if (data == NULL)
goto out_no_data;
@@ -1959,26 +1963,31 @@ out_no_client_address:
static int nfs4_get_sb(struct file_system_type *fs_type,
int flags, const char *dev_name, void *raw_data, struct vfsmount *mnt)
{
- struct nfs_parsed_mount_data data;
+ struct nfs_parsed_mount_data *data;
struct super_block *s;
struct nfs_server *server;
- struct nfs_fh mntfh;
+ struct nfs_fh *mntfh;
struct dentry *mntroot;
int (*compare_super)(struct super_block *, void *) = nfs_compare_super;
struct nfs_sb_mountdata sb_mntdata = {
.mntflags = flags,
};
- int error;
+ int error = -ENOMEM;
- security_init_mnt_opts(&data.lsm_opts);
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ mntfh = kzalloc(sizeof(*mntfh), GFP_KERNEL);
+ if (data == NULL || mntfh == NULL)
+ goto out_free_fh;
+
+ security_init_mnt_opts(&data->lsm_opts);
/* Validate the mount data */
- error = nfs4_validate_mount_data(raw_data, &data, dev_name);
+ error = nfs4_validate_mount_data(raw_data, data, dev_name);
if (error < 0)
goto out;
/* Get a volume representation */
- server = nfs4_create_server(&data, &mntfh);
+ server = nfs4_create_server(data, mntfh);
if (IS_ERR(server)) {
error = PTR_ERR(server);
goto out;
@@ -2009,13 +2018,13 @@ static int nfs4_get_sb(struct file_system_type *fs_type,
nfs4_fill_super(s);
}
- mntroot = nfs4_get_root(s, &mntfh);
+ mntroot = nfs4_get_root(s, mntfh);
if (IS_ERR(mntroot)) {
error = PTR_ERR(mntroot);
goto error_splat_super;
}
- error = security_sb_set_mnt_opts(s, &data.lsm_opts);
+ error = security_sb_set_mnt_opts(s, &data->lsm_opts);
if (error)
goto error_splat_root;
@@ -2025,10 +2034,13 @@ static int nfs4_get_sb(struct file_system_type *fs_type,
error = 0;
out:
- kfree(data.client_address);
- kfree(data.nfs_server.export_path);
- kfree(data.nfs_server.hostname);
- security_free_mnt_opts(&data.lsm_opts);
+ kfree(data->client_address);
+ kfree(data->nfs_server.export_path);
+ kfree(data->nfs_server.hostname);
+ security_free_mnt_opts(&data->lsm_opts);
+out_free_fh:
+ kfree(mntfh);
+ kfree(data);
return error;
out_free:
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: NFS oops in 2.6.26rc4
2008-06-23 15:40 ` Trond Myklebust
@ 2008-06-23 15:55 ` Dave Jones
2008-06-23 16:04 ` Trond Myklebust
2008-06-23 23:11 ` Dave Jones
1 sibling, 1 reply; 16+ messages in thread
From: Dave Jones @ 2008-06-23 15:55 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Chuck Lever, chucklever, Linux Kernel
On Mon, Jun 23, 2008 at 11:40:29AM -0400, Trond Myklebust wrote:
> Hi Dave,
>
> Any chance you could give the attached patch a whirl to see if it fixes
> the NFS oops you reported?
Yeah, I'll give it a shot, won't be until the end of the day/tomorrow though.
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: NFS oops in 2.6.26rc4
2008-06-23 15:55 ` Dave Jones
@ 2008-06-23 16:04 ` Trond Myklebust
0 siblings, 0 replies; 16+ messages in thread
From: Trond Myklebust @ 2008-06-23 16:04 UTC (permalink / raw)
To: Dave Jones; +Cc: Chuck Lever, chucklever, Linux Kernel
On Mon, 2008-06-23 at 11:55 -0400, Dave Jones wrote:
> On Mon, Jun 23, 2008 at 11:40:29AM -0400, Trond Myklebust wrote:
> > Hi Dave,
> >
> > Any chance you could give the attached patch a whirl to see if it fixes
> > the NFS oops you reported?
>
> Yeah, I'll give it a shot, won't be until the end of the day/tomorrow though.
>
> Dave
That will be great. Thanks!
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: NFS oops in 2.6.26rc4
2008-06-23 15:40 ` Trond Myklebust
2008-06-23 15:55 ` Dave Jones
@ 2008-06-23 23:11 ` Dave Jones
2008-06-23 23:19 ` Trond Myklebust
1 sibling, 1 reply; 16+ messages in thread
From: Dave Jones @ 2008-06-23 23:11 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Chuck Lever, chucklever, Linux Kernel
On Mon, Jun 23, 2008 at 11:40:29AM -0400, Trond Myklebust wrote:
> Hi Dave,
>
> Any chance you could give the attached patch a whirl to see if it fixes
> the NFS oops you reported?
Seems to have done the trick for me.
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: NFS oops in 2.6.26rc4
2008-06-23 23:11 ` Dave Jones
@ 2008-06-23 23:19 ` Trond Myklebust
0 siblings, 0 replies; 16+ messages in thread
From: Trond Myklebust @ 2008-06-23 23:19 UTC (permalink / raw)
To: Dave Jones; +Cc: Chuck Lever, chucklever, Linux Kernel
On Mon, 2008-06-23 at 19:11 -0400, Dave Jones wrote:
> On Mon, Jun 23, 2008 at 11:40:29AM -0400, Trond Myklebust wrote:
> > Hi Dave,
> >
> > Any chance you could give the attached patch a whirl to see if it fixes
> > the NFS oops you reported?
>
> Seems to have done the trick for me.
>
> Dave
Thanks Dave!
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2008-06-23 23:20 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-27 19:04 NFS oops in 2.6.26rc4 Dave Jones
2008-05-29 11:48 ` Jeff Layton
2008-05-30 17:59 ` Chuck Lever
2008-05-30 18:21 ` Dave Jones
2008-05-30 18:31 ` Trond Myklebust
2008-05-30 19:03 ` Dave Jones
2008-05-30 19:37 ` Chuck Lever
2008-06-04 14:19 ` Dave Jones
2008-06-04 18:13 ` Chuck Lever
2008-06-04 18:20 ` Dave Jones
2008-06-04 19:13 ` Chuck Lever
2008-06-23 15:40 ` Trond Myklebust
2008-06-23 15:55 ` Dave Jones
2008-06-23 16:04 ` Trond Myklebust
2008-06-23 23:11 ` Dave Jones
2008-06-23 23:19 ` Trond Myklebust
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox