* stuck/hung nfsv4 mounts @ 2008-11-03 15:05 Brian J. Murrell 2008-11-03 16:59 ` Trond Myklebust 0 siblings, 1 reply; 19+ messages in thread From: Brian J. Murrell @ 2008-11-03 15:05 UTC (permalink / raw) To: linux-nfs I have a system which I migrated a couple of mounts from sys authentication on nfs3 to gssapi on nfsv4 to overcome the 16 supplementary groups limitation inherent in the "sys" security model. It was working for a while but now I have "stuck" (i.e. hung) NFS mounts. It seems that a mount issued from the automounter kicked all of this off but I cannot be sure of that. In my process table I have a number of processes blocked in the NFSv4 path including a mount command: /bin/mount -t nfs4 -s -o sec=krb5i,rw,soft,intr,rsize=8192,wsize=8192 linux:/home/share /autohome/share It's stack trace is: Nov 3 09:54:14 pc kernel: [474304.152005] mount.nfs4 D c012ad55 0 25903 25902 Nov 3 09:54:14 pc kernel: [474304.152005] dcf1bb34 00000086 dcf1badc c012ad55 00001eff 026b0b7a 00000000 dcf1badc Nov 3 09:54:14 pc kernel: [474304.152005] 00000086 f670d7f0 ec213ed0 003649ad 00000000 c012b4ee 00000000 ec2b5340 Nov 3 09:54:14 pc kernel: [474304.152005] c050f080 f3740000 dcf1a000 ec214148 c28d1d00 00000000 dcf1bb0c c28d1d00 Nov 3 09:54:14 pc kernel: [474304.152005] Call Trace: Nov 3 09:54:14 pc kernel: [474304.152005] [<c012ad55>] ? check_preempt_wakeup+0x175/0x1b0 Nov 3 09:54:14 pc kernel: [474304.152005] [<c012b4ee>] ? try_to_wake_up+0xde/0x290 Nov 3 09:54:14 pc kernel: [474304.152005] [<f9574c5d>] rpc_wait_bit_killable+0x1d/0x50 [sunrpc] Nov 3 09:54:14 pc kernel: [474304.152005] [<c037d2df>] __wait_on_bit+0x4f/0x70 Nov 3 09:54:14 pc kernel: [474304.152005] [<f9574c40>] ? rpc_wait_bit_killable+0x0/0x50 [sunrpc] Nov 3 09:54:14 pc kernel: [474304.152005] [<c037d3ad>] out_of_line_wait_on_bit+0xad/0xc0 Nov 3 09:54:14 pc kernel: [474304.152005] [<f9574c40>] ? rpc_wait_bit_killable+0x0/0x50 [sunrpc] Nov 3 09:54:14 pc kernel: [474304.152005] [<c0147500>] ? wake_bit_function+0x0/0x60 Nov 3 09:54:14 pc kernel: [474304.152005] [<f95756ae>] __rpc_execute+0xae/0x250 [sunrpc] Nov 3 09:54:14 pc kernel: [474304.152005] [<f95758c9>] rpc_execute+0x79/0x90 [sunrpc] Nov 3 09:54:14 pc kernel: [474304.152005] [<f956e743>] rpc_run_task+0x33/0x70 [sunrpc] Nov 3 09:54:14 pc kernel: [474304.152005] [<f956e8a0>] rpc_call_sync+0x40/0x70 [sunrpc] Nov 3 09:54:14 pc kernel: [474304.152005] [<f9cd0401>] nfs4_proc_get_root+0x71/0xc0 [nfs] Nov 3 09:54:14 pc kernel: [474304.152005] [<f9cba216>] nfs4_path_walk+0x86/0x340 [nfs] Nov 3 09:54:14 pc kernel: [474304.152005] [<c01cb0d2>] ? mntput_no_expire+0x22/0x120 Nov 3 09:54:14 pc kernel: [474304.152005] [<c01d1ab4>] ? simple_release_fs+0x54/0x70 Nov 3 09:54:14 pc kernel: [474304.152005] [<f95818bb>] ? rpc_mkdir+0x9b/0x110 [sunrpc] Nov 3 09:54:14 pc kernel: [474304.152005] [<c037e61d>] ? _spin_lock+0xd/0x10 Nov 3 09:54:14 pc kernel: [474304.152005] [<c01d1b14>] ? simple_pin_fs+0x44/0xb0 Nov 3 09:54:14 pc kernel: [474304.152005] [<f956ec30>] ? rpc_setup_pipedir+0x90/0xd0 [sunrpc] Nov 3 09:54:14 pc kernel: [474304.152005] [<f9574528>] ? rpciod_up+0x8/0x30 [sunrpc] Nov 3 09:54:14 pc kernel: [474304.152005] [<f956ed46>] ? rpc_clone_client+0xd6/0x130 [sunrpc] Nov 3 09:54:14 pc kernel: [474304.152005] [<f9cb373b>] ? nfs_init_server_rpcclient+0x2b/0xf0 [nfs] Nov 3 09:54:14 pc kernel: [474304.152005] [<f9cb4c0a>] ? nfs4_init_server+0xea/0x220 [nfs] Nov 3 09:54:14 pc kernel: [474304.152005] [<f9cb4dbc>] nfs4_create_server+0x7c/0x240 [nfs] Nov 3 09:54:14 pc kernel: [474304.152005] [<c01acc24>] ? kfree+0x14/0xd0 Nov 3 09:54:14 pc kernel: [474304.152005] [<c01ad388>] ? get_slab+0x8/0x60 Nov 3 09:54:14 pc kernel: [474304.152005] [<f9cbdcb3>] ? nfs_parse_simple_hostname+0x83/0x160 [nfs] Nov 3 09:54:14 pc kernel: [474304.152005] [<c0191496>] ? kstrndup+0x36/0x60 Nov 3 09:54:14 pc kernel: [474304.152005] [<f9cbdcb3>] ? nfs_parse_simple_hostname+0x83/0x160 [nfs] Nov 3 09:54:14 pc kernel: [474304.152005] [<f9cbddb3>] ? nfs_parse_devname+0x23/0x50 [nfs] Nov 3 09:54:14 pc kernel: [474304.152005] [<f9cbfe7e>] ? nfs4_validate_mount_data+0x31e/0x370 [nfs] Nov 3 09:54:14 pc kernel: [474304.152005] [<f9cbff65>] nfs4_get_sb+0x95/0x280 [nfs] Nov 3 09:54:14 pc kernel: [474304.152005] [<c01b43ee>] vfs_kern_mount+0x5e/0x130 Nov 3 09:54:14 pc kernel: [474304.152005] [<c01b451e>] do_kern_mount+0x3e/0xe0 Nov 3 09:54:14 pc kernel: [474304.152005] [<c01cc9ef>] do_new_mount+0x6f/0x90 Nov 3 09:54:14 pc kernel: [474304.152005] [<c01ccf32>] do_mount+0x1d2/0x1f0 Nov 3 09:54:14 pc kernel: [474304.152005] [<c01ca67d>] ? exact_copy_from_user+0x4d/0xa0 Nov 3 09:54:14 pc kernel: [474304.152005] [<c01cac8e>] ? copy_mount_options+0x6e/0xd0 Nov 3 09:54:14 pc kernel: [474304.152005] [<c01ccfe1>] sys_mount+0x91/0xc0 Nov 3 09:54:14 pc kernel: [474304.152005] [<c0103f7b>] sysenter_do_call+0x12/0x2f There are a number of other processes stuck in the nfs4 paths as well: Nov 3 09:54:13 pc kernel: [474304.152005] gnome-panel D cafc1ca4 0 7266 16025 Nov 3 09:54:13 pc kernel: [474304.152005] cafc1d08 00000086 f7070000 cafc1ca4 cafc1ca4 00000246 c2e6a204 f7070000 Nov 3 09:54:13 pc kernel: [474304.152005] f7070000 cafc1cc4 ec216480 000025ad 00000000 00000246 00000001 f63ec380 Nov 3 09:54:13 pc kernel: [474304.152005] c050f080 f0d30000 cafc0000 ec2166f8 c28dcd00 00000001 f9574ec0 c28dcd00 Nov 3 09:54:13 pc kernel: [474304.152005] Call Trace: Nov 3 09:54:13 pc kernel: [474304.152005] [<f9574ec0>] ? __rpc_sleep_on+0x230/0x270 [sunrpc] Nov 3 09:54:13 pc kernel: [474304.152005] [<f9574c5d>] rpc_wait_bit_killable+0x1d/0x50 [sunrpc] Nov 3 09:54:13 pc kernel: [474304.152005] [<c037d2df>] __wait_on_bit+0x4f/0x70 Nov 3 09:54:13 pc kernel: [474304.152005] [<f9574c40>] ? rpc_wait_bit_killable+0x0/0x50 [sunrpc] Nov 3 09:54:13 pc kernel: [474304.152005] [<c037d3ad>] out_of_line_wait_on_bit+0xad/0xc0 Nov 3 09:54:13 pc kernel: [474304.152005] [<f9574c40>] ? rpc_wait_bit_killable+0x0/0x50 [sunrpc] Nov 3 09:54:13 pc kernel: [474304.152005] [<c0147500>] ? wake_bit_function+0x0/0x60 Nov 3 09:54:13 pc kernel: [474304.152005] [<f95756ae>] __rpc_execute+0xae/0x250 [sunrpc] Nov 3 09:54:13 pc kernel: [474304.152005] [<f95758c9>] rpc_execute+0x79/0x90 [sunrpc] Nov 3 09:54:13 pc kernel: [474304.152005] [<f956e743>] rpc_run_task+0x33/0x70 [sunrpc] Nov 3 09:54:13 pc kernel: [474304.152005] [<f956e8a0>] rpc_call_sync+0x40/0x70 [sunrpc] Nov 3 09:54:13 pc kernel: [474304.152005] [<f9cd0206>] nfs4_proc_getattr+0x76/0xa0 [nfs] Nov 3 09:54:13 pc kernel: [474304.152005] [<f9cbbc86>] __nfs_revalidate_inode+0x156/0x2b0 [nfs] Nov 3 09:54:13 pc kernel: [474304.152005] [<c014b474>] ? hrtimer_start+0xc4/0x1c0 Nov 3 09:54:13 pc kernel: [474304.152005] [<c01cb0d2>] ? mntput_no_expire+0x22/0x120 Nov 3 09:54:13 pc kernel: [474304.152005] [<c01babc5>] ? path_put+0x25/0x30 Nov 3 09:54:13 pc kernel: [474304.152005] [<c01bca62>] ? path_walk+0xa2/0xb0 Nov 3 09:54:13 pc kernel: [474304.152005] [<c01ba9fa>] ? putname+0x2a/0x40 Nov 3 09:54:13 pc kernel: [474304.152005] [<c01ba9fa>] ? putname+0x2a/0x40 Nov 3 09:54:13 pc kernel: [474304.152005] [<c01bd7e3>] ? user_path_at+0x53/0x80 Nov 3 09:54:13 pc kernel: [474304.152005] [<f9cbcb2d>] nfs_revalidate_inode+0x3d/0x70 [nfs] Nov 3 09:54:13 pc kernel: [474304.152005] [<f9cbcc13>] nfs_getattr+0xb3/0x100 [nfs] Nov 3 09:54:13 pc kernel: [474304.152005] [<f9cbcb60>] ? nfs_getattr+0x0/0x100 [nfs] Nov 3 09:54:13 pc kernel: [474304.152005] [<c01b5bda>] vfs_getattr+0x4a/0x70 Nov 3 09:54:13 pc kernel: [474304.152005] [<c01b5e39>] vfs_stat_fd+0x39/0x60 Nov 3 09:54:13 pc kernel: [474304.152005] [<c01b5f66>] vfs_stat+0x16/0x20 Nov 3 09:54:13 pc kernel: [474304.152005] [<c01b5f89>] sys_stat64+0x19/0x30 Nov 3 09:54:13 pc kernel: [474304.152005] [<c0103f7b>] sysenter_do_call+0x12/0x2f Nov 3 09:54:14 pc kernel: [474304.152005] lsof D ece08000 0 18247 18245 Nov 3 09:54:14 pc kernel: [474304.152005] c0e0dd08 00200086 f0953a74 ece08000 c0e0dca0 f95c629b f0953a74 00000000 Nov 3 09:54:14 pc kernel: [474304.152005] c0e0dce0 f95c6a10 c2e657f0 00007541 00000000 e5f76cc0 00000000 ee177c00 Nov 3 09:54:14 pc kernel: [474304.152005] c050f080 ee066000 c0e0c000 c2e65a68 c28d1d00 00000000 c0e0dce0 c28d1d00 Nov 3 09:54:14 pc kernel: [474304.152005] Call Trace: Nov 3 09:54:14 pc kernel: [474304.152005] [<f95c629b>] ? gss_release_msg+0x4b/0x90 [auth_rpcgss] Nov 3 09:54:14 pc kernel: [474304.152005] [<f95c6a10>] ? gss_cred_init+0xe0/0x140 [auth_rpcgss] Nov 3 09:54:14 pc kernel: [474304.152005] [<f9574c5d>] rpc_wait_bit_killable+0x1d/0x50 [sunrpc] Nov 3 09:54:14 pc kernel: [474304.152005] [<c037d2df>] __wait_on_bit+0x4f/0x70 Nov 3 09:54:14 pc kernel: [474304.152005] [<f9574c40>] ? rpc_wait_bit_killable+0x0/0x50 [sunrpc] Nov 3 09:54:14 pc kernel: [474304.152005] [<c037d3ad>] out_of_line_wait_on_bit+0xad/0xc0 Nov 3 09:54:14 pc kernel: [474304.152005] [<f9574c40>] ? rpc_wait_bit_killable+0x0/0x50 [sunrpc] Nov 3 09:54:14 pc kernel: [474304.152005] [<c0147500>] ? wake_bit_function+0x0/0x60 Nov 3 09:54:14 pc kernel: [474304.152005] [<f95756ae>] __rpc_execute+0xae/0x250 [sunrpc] Nov 3 09:54:14 pc kernel: [474304.152005] [<f95758c9>] rpc_execute+0x79/0x90 [sunrpc] Nov 3 09:54:14 pc kernel: [474304.152005] [<f956e743>] rpc_run_task+0x33/0x70 [sunrpc] Nov 3 09:54:14 pc kernel: [474304.152005] [<f956e8a0>] rpc_call_sync+0x40/0x70 [sunrpc] Nov 3 09:54:14 pc kernel: [474304.152005] [<f9cd0206>] nfs4_proc_getattr+0x76/0xa0 [nfs] Nov 3 09:54:14 pc kernel: [474304.152005] [<f9cbbc86>] __nfs_revalidate_inode+0x156/0x2b0 [nfs] Nov 3 09:54:14 pc kernel: [474304.152005] [<c01bac46>] ? path_to_nameidata+0x36/0x50 Nov 3 09:54:14 pc kernel: [474304.152005] [<c01cb0d2>] ? mntput_no_expire+0x22/0x120 Nov 3 09:54:14 pc kernel: [474304.152005] [<c01babc5>] ? path_put+0x25/0x30 Nov 3 09:54:14 pc kernel: [474304.152005] [<c01bca62>] ? path_walk+0xa2/0xb0 Nov 3 09:54:14 pc kernel: [474304.152005] [<c01ba9fa>] ? putname+0x2a/0x40 Nov 3 09:54:14 pc kernel: [474304.152005] [<c01ba9fa>] ? putname+0x2a/0x40 Nov 3 09:54:14 pc kernel: [474304.152005] [<c01bd7e3>] ? user_path_at+0x53/0x80 Nov 3 09:54:14 pc kernel: [474304.152005] [<f9cbcb2d>] nfs_revalidate_inode+0x3d/0x70 [nfs] Nov 3 09:54:14 pc kernel: [474304.152005] [<f9cbcc13>] nfs_getattr+0xb3/0x100 [nfs] Nov 3 09:54:14 pc kernel: [474304.152005] [<f9cbcb60>] ? nfs_getattr+0x0/0x100 [nfs] Nov 3 09:54:14 pc kernel: [474304.152005] [<c01b5bda>] vfs_getattr+0x4a/0x70 Nov 3 09:54:14 pc kernel: [474304.152005] [<c01b5e39>] vfs_stat_fd+0x39/0x60 Nov 3 09:54:14 pc kernel: [474304.152005] [<c01b5f66>] vfs_stat+0x16/0x20 Nov 3 09:54:14 pc kernel: [474304.152005] [<c01b5f89>] sys_stat64+0x19/0x30 Nov 3 09:54:14 pc kernel: [474304.152005] [<c01b26db>] ? vfs_read+0xab/0x110 Nov 3 09:54:14 pc kernel: [474304.152005] [<c01b1e40>] ? do_sync_read+0x0/0x120 Nov 3 09:54:14 pc kernel: [474304.152005] [<c01b2812>] ? sys_read+0x42/0x70 Nov 3 09:54:14 pc kernel: [474304.152005] [<c0103f7b>] sysenter_do_call+0x12/0x2f I have used wireshark and filtered for NFS4 (given that I have lots of NFV3 still) with "rpc.programversion == 4" on the wire and noticed that only a couple of packets (1 each after the TCP setup and teardown) are sent between the client and server. It was many tens of minutes between these last night when I was debugging. For as long as I have been looking this morning there has not been a single NFS4 packet. Client machine is the Ubuntu 2.6.27-7-generic kernel and the server is the Ubuntu 2.6.24-16-generic kernel. I've tried restarting NFS on the server a number of times and that has not helped. Any ideas? b. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: stuck/hung nfsv4 mounts 2008-11-03 15:05 stuck/hung nfsv4 mounts Brian J. Murrell @ 2008-11-03 16:59 ` Trond Myklebust [not found] ` <1225731544.6958.6.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Trond Myklebust @ 2008-11-03 16:59 UTC (permalink / raw) To: Brian J. Murrell; +Cc: linux-nfs On Mon, 2008-11-03 at 10:05 -0500, Brian J. Murrell wrote: > I have a system which I migrated a couple of mounts from sys > authentication on nfs3 to gssapi on nfsv4 to overcome the 16 > supplementary groups limitation inherent in the "sys" security model. > > It was working for a while but now I have "stuck" (i.e. hung) NFS > mounts. It seems that a mount issued from the automounter kicked all of > this off but I cannot be sure of that. > > In my process table I have a number of processes blocked in the NFSv4 > path including a mount command: > > /bin/mount -t nfs4 -s -o sec=krb5i,rw,soft,intr,rsize=8192,wsize=8192 linux:/home/share /autohome/share BTW: NFSv4 + soft == BAD BAD BAD! You will see weird behaviour, as the client and the server will have divergent opinions on what happened if/when an RPC call fails. Otherwise, have you checked on the state of your rpc.gssd? It looked as if several of those traces were waiting around RPCSEC_GSS upcalls... Cheers Trond ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <1225731544.6958.6.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>]
* Re: stuck/hung nfsv4 mounts [not found] ` <1225731544.6958.6.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> @ 2008-11-03 17:25 ` Jim Rees 2008-11-03 17:37 ` Trond Myklebust 2008-11-03 17:38 ` Benny Halevy 2008-11-03 17:50 ` Brian J. Murrell 1 sibling, 2 replies; 19+ messages in thread From: Jim Rees @ 2008-11-03 17:25 UTC (permalink / raw) To: Trond Myklebust; +Cc: Brian J. Murrell, linux-nfs Trond Myklebust wrote: BTW: NFSv4 + soft == BAD BAD BAD! Maybe this combination should be prohibited. Does it make any sense given the stateful nature of v4? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: stuck/hung nfsv4 mounts 2008-11-03 17:25 ` Jim Rees @ 2008-11-03 17:37 ` Trond Myklebust [not found] ` <1225733834.6958.12.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> 2008-11-03 17:38 ` Benny Halevy 1 sibling, 1 reply; 19+ messages in thread From: Trond Myklebust @ 2008-11-03 17:37 UTC (permalink / raw) To: Jim Rees; +Cc: Brian J. Murrell, linux-nfs On Mon, 2008-11-03 at 12:25 -0500, Jim Rees wrote: > Trond Myklebust wrote: > > BTW: NFSv4 + soft == BAD BAD BAD! > > Maybe this combination should be prohibited. Does it make any sense given > the stateful nature of v4? It might make sense if we were to fix up the granularity of the recovery routines so that we are able to recover all the state associated with just a single open owner or lock owner. Currently we'd have to recover all the state associated with that server. IOW: we might be able to fix things up in the future, but right now, NFSv4+soft is not a good idea. Trond ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <1225733834.6958.12.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>]
* Re: stuck/hung nfsv4 mounts [not found] ` <1225733834.6958.12.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> @ 2008-11-03 21:40 ` Chuck Lever 2008-11-03 22:20 ` Peter Staubach 0 siblings, 1 reply; 19+ messages in thread From: Chuck Lever @ 2008-11-03 21:40 UTC (permalink / raw) To: Trond Myklebust; +Cc: Jim Rees, Brian J. Murrell, linux-nfs On Nov 3, 2008, at Nov 3, 2008, 12:37 PM, Trond Myklebust wrote: > On Mon, 2008-11-03 at 12:25 -0500, Jim Rees wrote: >> Trond Myklebust wrote: >> >> BTW: NFSv4 + soft == BAD BAD BAD! >> >> Maybe this combination should be prohibited. Does it make any >> sense given >> the stateful nature of v4? > > It might make sense if we were to fix up the granularity of the > recovery > routines so that we are able to recover all the state associated with > just a single open owner or lock owner. Currently we'd have to recover > all the state associated with that server. > > IOW: we might be able to fix things up in the future, but right now, > NFSv4+soft is not a good idea. Two cents worth: Until NFSv4+soft works reasonably well, I wouldn't have any problem with making "soft" a no-op for nfs4 mounts. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: stuck/hung nfsv4 mounts 2008-11-03 21:40 ` Chuck Lever @ 2008-11-03 22:20 ` Peter Staubach 2008-11-03 22:47 ` Chuck Lever 0 siblings, 1 reply; 19+ messages in thread From: Peter Staubach @ 2008-11-03 22:20 UTC (permalink / raw) To: Chuck Lever; +Cc: Trond Myklebust, Jim Rees, Brian J. Murrell, linux-nfs Chuck Lever wrote: > On Nov 3, 2008, at Nov 3, 2008, 12:37 PM, Trond Myklebust wrote: >> On Mon, 2008-11-03 at 12:25 -0500, Jim Rees wrote: >>> Trond Myklebust wrote: >>> >>> BTW: NFSv4 + soft == BAD BAD BAD! >>> >>> Maybe this combination should be prohibited. Does it make any sense >>> given >>> the stateful nature of v4? >> >> It might make sense if we were to fix up the granularity of the recovery >> routines so that we are able to recover all the state associated with >> just a single open owner or lock owner. Currently we'd have to recover >> all the state associated with that server. >> >> IOW: we might be able to fix things up in the future, but right now, >> NFSv4+soft is not a good idea. > > Two cents worth: Until NFSv4+soft works reasonably well, I wouldn't > have any problem with making "soft" a no-op for nfs4 mounts. It may be the right thing to do in the short term, but it will generate calls to support organizations. Anyone trying it will wonder why it doesn't work "as it used to". That sort of thing. When that happens, we will be forced to do something and not at our choice of time. Perhaps we should deny such mount attempts because at least that would be obvious what was happening? Perhaps we should also deny NFSv4 mounting over UDP? ps ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: stuck/hung nfsv4 mounts 2008-11-03 22:20 ` Peter Staubach @ 2008-11-03 22:47 ` Chuck Lever 2008-11-04 16:03 ` Peter Staubach 0 siblings, 1 reply; 19+ messages in thread From: Chuck Lever @ 2008-11-03 22:47 UTC (permalink / raw) To: Peter Staubach; +Cc: Trond Myklebust, Jim Rees, Brian J. Murrell, linux-nfs On Nov 3, 2008, at Nov 3, 2008, 5:20 PM, Peter Staubach wrote: > Chuck Lever wrote: >> On Nov 3, 2008, at Nov 3, 2008, 12:37 PM, Trond Myklebust wrote: >>> On Mon, 2008-11-03 at 12:25 -0500, Jim Rees wrote: >>>> Trond Myklebust wrote: >>>> >>>> BTW: NFSv4 + soft == BAD BAD BAD! >>>> >>>> Maybe this combination should be prohibited. Does it make any >>>> sense given >>>> the stateful nature of v4? >>> >>> It might make sense if we were to fix up the granularity of the >>> recovery >>> routines so that we are able to recover all the state associated >>> with >>> just a single open owner or lock owner. Currently we'd have to >>> recover >>> all the state associated with that server. >>> >>> IOW: we might be able to fix things up in the future, but right now, >>> NFSv4+soft is not a good idea. >> >> Two cents worth: Until NFSv4+soft works reasonably well, I >> wouldn't have any problem with making "soft" a no-op for nfs4 mounts. > > It may be the right thing to do in the short term, but it will > generate calls to support organizations. As opposed to the support "call" we got today? ;-) > Anyone trying it will > wonder why it doesn't work "as it used to". That sort of thing. > When that happens, we will be forced to do something and not at > our choice of time. I think our petard is hoist either way. Better to avoid data corruption, by whatever means. > Perhaps we should deny such mount attempts because at least that > would be obvious what was happening? Perhaps we should also > deny NFSv4 mounting over UDP? -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: stuck/hung nfsv4 mounts 2008-11-03 22:47 ` Chuck Lever @ 2008-11-04 16:03 ` Peter Staubach 0 siblings, 0 replies; 19+ messages in thread From: Peter Staubach @ 2008-11-04 16:03 UTC (permalink / raw) To: Chuck Lever; +Cc: Trond Myklebust, Jim Rees, Brian J. Murrell, linux-nfs Chuck Lever wrote: > On Nov 3, 2008, at Nov 3, 2008, 5:20 PM, Peter Staubach wrote: >> Chuck Lever wrote: >>> On Nov 3, 2008, at Nov 3, 2008, 12:37 PM, Trond Myklebust wrote: >>>> On Mon, 2008-11-03 at 12:25 -0500, Jim Rees wrote: >>>>> Trond Myklebust wrote: >>>>> >>>>> BTW: NFSv4 + soft == BAD BAD BAD! >>>>> >>>>> Maybe this combination should be prohibited. Does it make any >>>>> sense given >>>>> the stateful nature of v4? >>>> >>>> It might make sense if we were to fix up the granularity of the >>>> recovery >>>> routines so that we are able to recover all the state associated with >>>> just a single open owner or lock owner. Currently we'd have to recover >>>> all the state associated with that server. >>>> >>>> IOW: we might be able to fix things up in the future, but right now, >>>> NFSv4+soft is not a good idea. >>> >>> Two cents worth: Until NFSv4+soft works reasonably well, I wouldn't >>> have any problem with making "soft" a no-op for nfs4 mounts. >> >> It may be the right thing to do in the short term, but it will >> generate calls to support organizations. > > As opposed to the support "call" we got today? ;-) > Well, at least, you have an answer today. >> Anyone trying it will >> wonder why it doesn't work "as it used to". That sort of thing. >> When that happens, we will be forced to do something and not at >> our choice of time. > > I think our petard is hoist either way. Better to avoid data > corruption, by whatever means. > Then it is probably time to fix it, sooner as opposed to later. I don't have the resources at this time to put into doing this work, so I am just expressing opinions. I do know that it caused quite the flood of interest when it was discovered that the superblock sharing stuff had broken a lot of customers who expected to be able mount the same file system from a server, multiple times, in different places, with different options, and it stopped using the options beyond the first mount. I am thinking that perhaps we can take something from this experience and do better this next time. This time, we know that we are making an incompatible change and can do better. ps >> Perhaps we should deny such mount attempts because at least that >> would be obvious what was happening? Perhaps we should also >> deny NFSv4 mounting over UDP? > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: stuck/hung nfsv4 mounts 2008-11-03 17:25 ` Jim Rees 2008-11-03 17:37 ` Trond Myklebust @ 2008-11-03 17:38 ` Benny Halevy 1 sibling, 0 replies; 19+ messages in thread From: Benny Halevy @ 2008-11-03 17:38 UTC (permalink / raw) To: Jim Rees; +Cc: Trond Myklebust, Brian J. Murrell, linux-nfs On Nov. 03, 2008, 19:25 +0200, Jim Rees <rees@umich.edu> wrote: > Trond Myklebust wrote: > > BTW: NFSv4 + soft == BAD BAD BAD! > > Maybe this combination should be prohibited. Does it make any sense given > the stateful nature of v4? >From the usage perspective it allows the application to fail rather than stall. If it would have been implemented similar to "intr" and return an error to the app as if the system call had received an interrupt but under the covers the RPC will be processed to completion (or graceful cleanup) I think it might still make sense for nfs4. Until then this combination should at least be warned about. Benny > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: stuck/hung nfsv4 mounts [not found] ` <1225731544.6958.6.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> 2008-11-03 17:25 ` Jim Rees @ 2008-11-03 17:50 ` Brian J. Murrell 2008-11-03 19:58 ` Kevin Coffman 1 sibling, 1 reply; 19+ messages in thread From: Brian J. Murrell @ 2008-11-03 17:50 UTC (permalink / raw) To: linux-nfs On Mon, 2008-11-03 at 11:59 -0500, Trond Myklebust wrote: > > BTW: NFSv4 + soft == BAD BAD BAD! Ahh. OK. > You will see weird behaviour, as the > client and the server will have divergent opinions on what happened > if/when an RPC call fails. I have removed the soft option from the auto map and killed the in-progress mount.nfs4. Seems the new one is not using soft any more: root 22731 22730 0 12:06 ? 00:00:00 /sbin/mount.nfs4 linux:/home/share /autohome/share -s -o rw,sec=krb5i,intr,rsize=8192,wsize=8192 > Otherwise, have you checked on the state of your rpc.gssd? It looked as > if several of those traces were waiting around RPCSEC_GSS upcalls... I thought it was working. I killed it and restarted it with -vvv -rrr and this is what it said when automount re-issued the above mount: Nov 3 12:02:15 pc rpc.gssd[21773]: rpcsec_gss: debug level is 3 Nov 3 12:06:00 pc rpc.gssd[21774]: in authgss_create_default() Nov 3 12:06:00 pc rpc.gssd[21774]: in authgss_create() Nov 3 12:06:00 pc rpc.gssd[21774]: authgss_create: name is 0x8ab8d88 Nov 3 12:06:00 pc rpc.gssd[21774]: authgss_create: gd->name is 0x8ab8ed0 Nov 3 12:06:00 pc rpc.gssd[21774]: in authgss_refresh() Nov 3 12:06:00 pc rpc.gssd[21774]: struct rpc_gss_sec: Nov 3 12:06:00 pc rpc.gssd[21774]: mechanism_OID: { 1 2 134 72 134 247 18 1 2 2 } Nov 3 12:06:00 pc rpc.gssd[21774]: qop: 0 Nov 3 12:06:00 pc rpc.gssd[21774]: service: 1 Nov 3 12:06:00 pc rpc.gssd[21774]: cred: 0x8abb830 Nov 3 12:06:00 pc rpc.gssd[21774]: req_flags: 00000002 Nov 3 12:06:00 pc rpc.gssd[21774]: in authgss_marshal() Nov 3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_buf: encode success ((nil):0) Nov 3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_cred: encode success (v 1, proc 1, seq 0, svc 1, ctx (nil):0) Nov 3 12:06:00 pc rpc.gssd[21774]: in authgss_wrap() Nov 3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_buf: encode success (0x8abbeb8:483) Nov 3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_init_args: encode success (token 0x8abbeb8:483) Nov 3 12:06:00 pc rpc.gssd[21774]: in authgss_validate() Nov 3 12:06:00 pc rpc.gssd[21774]: in authgss_unwrap() Nov 3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_buf: decode success (0x8abbe98:4) Nov 3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_buf: decode success (0x8abc110:114) Nov 3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_init_res decode success (ctx 0x8abbe98:4, maj 0, min 0, win 128, token 0x8abc110:114) Nov 3 12:06:00 pc rpc.gssd[21774]: authgss_create_default: freeing name 0x8ab8d88 Nov 3 12:06:00 pc rpc.gssd[21774]: in authgss_get_private_data() Nov 3 12:06:00 pc rpc.gssd[21774]: in authgss_free_private_data() Nov 3 12:06:00 pc rpc.gssd[21774]: in authgss_destroy() Nov 3 12:06:00 pc rpc.gssd[21774]: in authgss_destroy_context() Nov 3 12:06:00 pc rpc.gssd[21774]: authgss_destroy: freeing name 0x8ab8ed0 The kdc logged: Nov 3 12:06:00 linux krb5kdc[5006]: AS_REQ (1 etypes {1}) 10.75.22.1: ISSUE: authtime 1225731960, etypes {rep=1 tkt=16 ses=1}, nfs/pc.interlinx.bc.ca@ILINX for krbtgt/ILINX@ILINX Nov 3 12:06:00 linux krb5kdc[5006]: TGS_REQ (1 etypes {1}) 10.75.22.1: ISSUE: authtime 1225731960, etypes {rep=1 tkt=1 ses=1}, nfs/pc.interlinx.bc.ca@ILINX for nfs/linux.interlinx.bc.ca@ILINX in correlation to the new mount request, but the mount.nfs4 didn't complete. b. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: stuck/hung nfsv4 mounts 2008-11-03 17:50 ` Brian J. Murrell @ 2008-11-03 19:58 ` Kevin Coffman [not found] ` <4d569c330811031158r26963e0w5bcf8331e0fb14b7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Kevin Coffman @ 2008-11-03 19:58 UTC (permalink / raw) To: Brian J. Murrell; +Cc: linux-nfs On Mon, Nov 3, 2008 at 12:50 PM, Brian J. Murrell <brian-SquOHqY54CVWr29BmMi2cA@public.gmane.org> wrote: > On Mon, 2008-11-03 at 11:59 -0500, Trond Myklebust wrote: > >> Otherwise, have you checked on the state of your rpc.gssd? It looked as >> if several of those traces were waiting around RPCSEC_GSS upcalls... > > I thought it was working. I killed it and restarted it with -vvv -rrr > and this is what it said when automount re-issued the above mount: > > Nov 3 12:02:15 pc rpc.gssd[21773]: rpcsec_gss: debug level is 3 > Nov 3 12:06:00 pc rpc.gssd[21774]: in authgss_create_default() > Nov 3 12:06:00 pc rpc.gssd[21774]: in authgss_create() > Nov 3 12:06:00 pc rpc.gssd[21774]: authgss_create: name is 0x8ab8d88 > Nov 3 12:06:00 pc rpc.gssd[21774]: authgss_create: gd->name is 0x8ab8ed0 > Nov 3 12:06:00 pc rpc.gssd[21774]: in authgss_refresh() > Nov 3 12:06:00 pc rpc.gssd[21774]: struct rpc_gss_sec: > Nov 3 12:06:00 pc rpc.gssd[21774]: mechanism_OID: { 1 2 134 72 134 247 18 1 2 2 } > Nov 3 12:06:00 pc rpc.gssd[21774]: qop: 0 > Nov 3 12:06:00 pc rpc.gssd[21774]: service: 1 > Nov 3 12:06:00 pc rpc.gssd[21774]: cred: 0x8abb830 > Nov 3 12:06:00 pc rpc.gssd[21774]: req_flags: 00000002 > Nov 3 12:06:00 pc rpc.gssd[21774]: in authgss_marshal() > Nov 3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_buf: encode success ((nil):0) > Nov 3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_cred: encode success (v 1, proc 1, seq 0, svc 1, ctx (nil):0) > Nov 3 12:06:00 pc rpc.gssd[21774]: in authgss_wrap() > Nov 3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_buf: encode success (0x8abbeb8:483) > Nov 3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_init_args: encode success (token 0x8abbeb8:483) > Nov 3 12:06:00 pc rpc.gssd[21774]: in authgss_validate() > Nov 3 12:06:00 pc rpc.gssd[21774]: in authgss_unwrap() > Nov 3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_buf: decode success (0x8abbe98:4) > Nov 3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_buf: decode success (0x8abc110:114) > Nov 3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_init_res decode success (ctx 0x8abbe98:4, maj 0, min 0, win 128, token 0x8abc110:114) > Nov 3 12:06:00 pc rpc.gssd[21774]: authgss_create_default: freeing name 0x8ab8d88 > Nov 3 12:06:00 pc rpc.gssd[21774]: in authgss_get_private_data() > Nov 3 12:06:00 pc rpc.gssd[21774]: in authgss_free_private_data() > Nov 3 12:06:00 pc rpc.gssd[21774]: in authgss_destroy() > Nov 3 12:06:00 pc rpc.gssd[21774]: in authgss_destroy_context() > Nov 3 12:06:00 pc rpc.gssd[21774]: authgss_destroy: freeing name 0x8ab8ed0 These are all from librpcsecgss (from the -rrr). I don't see an error here. You should have gotten output from gssd itself as well. You might also look for any errors on the server from rpc.svcgssd. > The kdc logged: > > Nov 3 12:06:00 linux krb5kdc[5006]: AS_REQ (1 etypes {1}) 10.75.22.1: ISSUE: authtime 1225731960, etypes {rep=1 tkt=16 ses=1}, nfs/pc.interlinx.bc.ca@ILINX for krbtgt/ILINX@ILINX > Nov 3 12:06:00 linux krb5kdc[5006]: TGS_REQ (1 etypes {1}) 10.75.22.1: ISSUE: authtime 1225731960, etypes {rep=1 tkt=1 ses=1}, nfs/pc.interlinx.bc.ca@ILINX for nfs/linux.interlinx.bc.ca@ILINX This looks fine. (Assuming linux.interlinx.bc.ca is the NFS server and pc.interlinx.bc.ca is the client.) > in correlation to the new mount request, but the mount.nfs4 didn't > complete. > ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <4d569c330811031158r26963e0w5bcf8331e0fb14b7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: stuck/hung nfsv4 mounts [not found] ` <4d569c330811031158r26963e0w5bcf8331e0fb14b7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2008-11-03 20:09 ` Brian J. Murrell 2008-11-03 20:28 ` Kevin Coffman 0 siblings, 1 reply; 19+ messages in thread From: Brian J. Murrell @ 2008-11-03 20:09 UTC (permalink / raw) To: linux-nfs On Mon, 2008-11-03 at 14:58 -0500, Kevin Coffman wrote: > > These are all from librpcsecgss (from the -rrr). I don't see an error here. Good. > You should have gotten output from gssd itself as well. You might > also look for any errors on the server from rpc.svcgssd. Nothing. > This looks fine. (Assuming linux.interlinx.bc.ca is the NFS server > and pc.interlinx.bc.ca is the client.) Indeed, they both are as you assume. So I've restarted both the rpc.svcgssd and rpc.gssd with -rrr -vvv on the server and client (respectively) to see what more I can see but I can't get the upcall to trigger. I would have thought simply issuing a mount would do that, but it seems not. What exactly causes the upcall to be made and what might prevent it from being made? b. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: stuck/hung nfsv4 mounts 2008-11-03 20:09 ` Brian J. Murrell @ 2008-11-03 20:28 ` Kevin Coffman [not found] ` <4d569c330811031228r5bb9aefs7a970303910810e2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Kevin Coffman @ 2008-11-03 20:28 UTC (permalink / raw) To: Brian J. Murrell; +Cc: linux-nfs On Mon, Nov 3, 2008 at 3:09 PM, Brian J. Murrell <brian-SquOHqY54CVWr29BmMi2cA@public.gmane.org> wrote: > On Mon, 2008-11-03 at 14:58 -0500, Kevin Coffman wrote: >> >> These are all from librpcsecgss (from the -rrr). I don't see an error here. > > Good. > >> You should have gotten output from gssd itself as well. You might >> also look for any errors on the server from rpc.svcgssd. > > Nothing. > >> This looks fine. (Assuming linux.interlinx.bc.ca is the NFS server >> and pc.interlinx.bc.ca is the client.) > > Indeed, they both are as you assume. > > So I've restarted both the rpc.svcgssd and rpc.gssd with -rrr -vvv on > the server and client (respectively) to see what more I can see but I > can't get the upcall to trigger. I would have thought simply issuing a > mount would do that, but it seems not. What exactly causes the upcall > to be made and what might prevent it from being made? There should definitely be corresponding debug output from rpc.gssd (-vvv) if you got those librpcsecgss (-rrr) messages. I would suggest that you leave off the -rrr for now as it appears to be extra clutter at this point. It might help if you run rpc.gssd in the foreground (-f) while debugging. Also, make sure this is the only rpc.gssd running. Re: the upcall. Any new mount using Kerberos should cause the upcall to create a new context with the server. Make sure the filesystem is not already mounted. After the mount, any new user accessing that mounted filesystem should cause an upcall to create a context for that user. K.C. ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <4d569c330811031228r5bb9aefs7a970303910810e2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: stuck/hung nfsv4 mounts [not found] ` <4d569c330811031228r5bb9aefs7a970303910810e2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2008-11-03 21:12 ` Brian J. Murrell 2008-11-03 22:33 ` Kevin Coffman 0 siblings, 1 reply; 19+ messages in thread From: Brian J. Murrell @ 2008-11-03 21:12 UTC (permalink / raw) To: linux-nfs On Mon, 2008-11-03 at 15:28 -0500, Kevin Coffman wrote: > > There should definitely be corresponding debug output from rpc.gssd > (-vvv) if you got those librpcsecgss (-rrr) messages. Yes, but by my last post I had debug only on the client and not the server but when I enabled on the server I couldn't elicit the client at will. > Re: the upcall. Any new mount using Kerberos should cause the upcall > to create a new context with the server. Make sure the filesystem is > not already mounted. It was definitely not mounted the last time I tried to mount it and produce the upcall output but it just never happened. > After the mount, any new user accessing that > mounted filesystem should cause an upcall to create a context for that > user. I seem to have since gotten an upcall and here is the corresponding client and server debug: Nov 3 15:17:33 pc rpc.gssd[2937]: in authgss_create_default() Nov 3 15:17:33 pc rpc.gssd[2937]: in authgss_create() Nov 3 15:17:33 pc rpc.gssd[2937]: authgss_create: name is 0x98fd358 Nov 3 15:17:33 pc rpc.gssd[2937]: authgss_create: gd->name is 0x98f9cc8 Nov 3 15:17:33 pc rpc.gssd[2937]: in authgss_refresh() Nov 3 15:17:33 pc rpc.gssd[2937]: struct rpc_gss_sec: Nov 3 15:17:33 pc rpc.gssd[2937]: mechanism_OID: { 1 2 134 72 134 247 18 1 2 2 } Nov 3 15:17:33 pc rpc.gssd[2937]: qop: 0 Nov 3 15:17:33 pc rpc.gssd[2937]: service: 1 Nov 3 15:17:33 pc rpc.gssd[2937]: cred: 0x98fb568 Nov 3 15:17:33 pc rpc.gssd[2937]: req_flags: 00000002 Nov 3 15:17:33 pc rpc.gssd[2937]: in authgss_marshal() Nov 3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_buf: encode success ((nil):0) Nov 3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_cred: encode success (v 1, proc 1, seq 0, svc 1, ctx (nil):0) Nov 3 15:17:33 pc rpc.gssd[2937]: in authgss_wrap() Nov 3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_buf: encode success (0x98fbc08:459) Nov 3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_init_args: encode success (token 0x98fbc08:459) Nov 3 15:17:33 pc rpc.gssd[2937]: in authgss_validate() Nov 3 15:17:33 pc rpc.gssd[2937]: in authgss_unwrap() Nov 3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_buf: decode success (0x98fd428:4) Nov 3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_buf: decode success (0x990e428:114) Nov 3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_init_res decode success (ctx 0x98fd428:4, maj 0, min 0, win 128, token 0x990e428:114) Nov 3 15:17:33 pc rpc.gssd[2937]: authgss_create_default: freeing name 0x98fd358 Nov 3 15:17:33 pc rpc.gssd[2937]: in authgss_get_private_data() Nov 3 15:17:33 pc rpc.gssd[2937]: in authgss_free_private_data() Nov 3 15:17:33 pc rpc.gssd[2937]: in authgss_destroy() Nov 3 15:17:33 pc rpc.gssd[2937]: in authgss_destroy_context() Nov 3 15:17:33 pc rpc.gssd[2937]: authgss_destroy: freeing name 0x98f9cc8 Nov 3 15:17:33 linux krb5kdc[5006]: TGS_REQ (1 etypes {1}) 10.75.22.1: ISSUE: authtime 1225740577, etypes {rep=1 tkt=1 ses=1}, brian@ILINX for nfs/linux.interlinx.bc.ca@ILINX Nov 3 15:17:33 linux rpc.svcgssd[13724]: leaving poll Nov 3 15:17:33 linux rpc.svcgssd[13724]: handling null request Nov 3 15:17:33 linux rpc.svcgssd[13724]: sname = brian@ILINX Nov 3 15:17:33 linux rpc.svcgssd[13724]: DEBUG: serialize_krb5_ctx: lucid version! Nov 3 15:17:33 linux rpc.svcgssd[13724]: prepare_krb5_rfc1964_buffer: serializing keys with enctype 4 and length 8 Nov 3 15:17:33 linux rpc.svcgssd[13724]: doing downcall Nov 3 15:17:33 linux rpc.svcgssd[13724]: mech: krb5, hndl len: 4, ctx len 85, timeout: 2147483647, uid: 1001, gid: 1001, num aux grps: 13: Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 1) 1001 Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 2) 117 Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 3) 1010 Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 4) 1011 Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 5) 2000 Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 6) 29 Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 7) 20 Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 8) 24 Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 9) 25 Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 10) 30 Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 11) 44 Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 12) 46 Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 13) 111 Nov 3 15:17:33 linux rpc.svcgssd[13724]: sending null reply Nov 3 15:17:33 linux rpc.svcgssd[13724]: writing message: ... Nov 3 15:17:33 linux rpc.svcgssd[13724]: finished handling null request Nov 3 15:17:33 linux rpc.svcgssd[13724]: entering poll And I still have an uncompleted mount.nfs4 command in my process table. b. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: stuck/hung nfsv4 mounts 2008-11-03 21:12 ` Brian J. Murrell @ 2008-11-03 22:33 ` Kevin Coffman [not found] ` <4d569c330811031433k7ae18d4enfbda349e8f90a951-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Kevin Coffman @ 2008-11-03 22:33 UTC (permalink / raw) To: Brian J. Murrell; +Cc: linux-nfs On Mon, Nov 3, 2008 at 4:12 PM, Brian J. Murrell <brian-SquOHqY54CVWr29BmMi2cA@public.gmane.org> wrote: > On Mon, 2008-11-03 at 15:28 -0500, Kevin Coffman wrote: >> >> There should definitely be corresponding debug output from rpc.gssd >> (-vvv) if you got those librpcsecgss (-rrr) messages. > > Yes, but by my last post I had debug only on the client and not the > server but when I enabled on the server I couldn't elicit the client at > will. > >> Re: the upcall. Any new mount using Kerberos should cause the upcall >> to create a new context with the server. Make sure the filesystem is >> not already mounted. > > It was definitely not mounted the last time I tried to mount it and > produce the upcall output but it just never happened. > >> After the mount, any new user accessing that >> mounted filesystem should cause an upcall to create a context for that >> user. > > I seem to have since gotten an upcall and here is the corresponding > client and server debug: > > Nov 3 15:17:33 pc rpc.gssd[2937]: in authgss_create_default() > Nov 3 15:17:33 pc rpc.gssd[2937]: in authgss_create() > Nov 3 15:17:33 pc rpc.gssd[2937]: authgss_create: name is 0x98fd358 > Nov 3 15:17:33 pc rpc.gssd[2937]: authgss_create: gd->name is 0x98f9cc8 > Nov 3 15:17:33 pc rpc.gssd[2937]: in authgss_refresh() > Nov 3 15:17:33 pc rpc.gssd[2937]: struct rpc_gss_sec: > Nov 3 15:17:33 pc rpc.gssd[2937]: mechanism_OID: { 1 2 134 72 134 247 18 1 2 2 } > Nov 3 15:17:33 pc rpc.gssd[2937]: qop: 0 > Nov 3 15:17:33 pc rpc.gssd[2937]: service: 1 > Nov 3 15:17:33 pc rpc.gssd[2937]: cred: 0x98fb568 > Nov 3 15:17:33 pc rpc.gssd[2937]: req_flags: 00000002 > Nov 3 15:17:33 pc rpc.gssd[2937]: in authgss_marshal() > Nov 3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_buf: encode success ((nil):0) > Nov 3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_cred: encode success (v 1, proc 1, seq 0, svc 1, ctx (nil):0) > Nov 3 15:17:33 pc rpc.gssd[2937]: in authgss_wrap() > Nov 3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_buf: encode success (0x98fbc08:459) > Nov 3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_init_args: encode success (token 0x98fbc08:459) > Nov 3 15:17:33 pc rpc.gssd[2937]: in authgss_validate() > Nov 3 15:17:33 pc rpc.gssd[2937]: in authgss_unwrap() > Nov 3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_buf: decode success (0x98fd428:4) > Nov 3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_buf: decode success (0x990e428:114) > Nov 3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_init_res decode success (ctx 0x98fd428:4, maj 0, min 0, win 128, token 0x990e428:114) > Nov 3 15:17:33 pc rpc.gssd[2937]: authgss_create_default: freeing name 0x98fd358 > Nov 3 15:17:33 pc rpc.gssd[2937]: in authgss_get_private_data() > Nov 3 15:17:33 pc rpc.gssd[2937]: in authgss_free_private_data() > Nov 3 15:17:33 pc rpc.gssd[2937]: in authgss_destroy() > Nov 3 15:17:33 pc rpc.gssd[2937]: in authgss_destroy_context() > Nov 3 15:17:33 pc rpc.gssd[2937]: authgss_destroy: freeing name 0x98f9cc8 Once again, this is all from librpcsecgss. No messages originating from rpc.gssd itself here. > Nov 3 15:17:33 linux krb5kdc[5006]: TGS_REQ (1 etypes {1}) 10.75.22.1: ISSUE: authtime 1225740577, etypes {rep=1 tkt=1 ses=1}, brian@ILINX for nfs/linux.interlinx.bc.ca@ILINX > Nov 3 15:17:33 linux rpc.svcgssd[13724]: leaving poll > Nov 3 15:17:33 linux rpc.svcgssd[13724]: handling null request > Nov 3 15:17:33 linux rpc.svcgssd[13724]: sname = brian@ILINX > Nov 3 15:17:33 linux rpc.svcgssd[13724]: DEBUG: serialize_krb5_ctx: lucid version! > Nov 3 15:17:33 linux rpc.svcgssd[13724]: prepare_krb5_rfc1964_buffer: serializing keys with enctype 4 and length 8 > Nov 3 15:17:33 linux rpc.svcgssd[13724]: doing downcall > Nov 3 15:17:33 linux rpc.svcgssd[13724]: mech: krb5, hndl len: 4, ctx len 85, timeout: 2147483647, uid: 1001, gid: 1001, num aux grps: 13: > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 1) 1001 > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 2) 117 > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 3) 1010 > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 4) 1011 > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 5) 2000 > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 6) 29 > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 7) 20 > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 8) 24 > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 9) 25 > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 10) 30 > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 11) 44 > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 12) 46 > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 13) 111 > Nov 3 15:17:33 linux rpc.svcgssd[13724]: sending null reply > Nov 3 15:17:33 linux rpc.svcgssd[13724]: writing message: ... > Nov 3 15:17:33 linux rpc.svcgssd[13724]: finished handling null request > Nov 3 15:17:33 linux rpc.svcgssd[13724]: entering poll This looks like the server (the svcgssd/user-land part anyway) was happy, and replied with a context. There were a couple of reports in the past where the server returned error 524288 (GSS_S_NO_CONTEXT) after getting through the user-land part successfully. If you can get a packet trace between the client and server during the mount / context negotiation, check the NULL reply from the server for this error code. K.C. ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <4d569c330811031433k7ae18d4enfbda349e8f90a951-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: stuck/hung nfsv4 mounts [not found] ` <4d569c330811031433k7ae18d4enfbda349e8f90a951-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2008-11-03 23:45 ` Brian J. Murrell 2008-11-04 1:24 ` Brian J. Murrell 1 sibling, 0 replies; 19+ messages in thread From: Brian J. Murrell @ 2008-11-03 23:45 UTC (permalink / raw) To: linux-nfs On Mon, 2008-11-03 at 17:33 -0500, Kevin Coffman wrote: > > Once again, this is all from librpcsecgss. No messages originating > from rpc.gssd itself here. Not sure why. FWIW, > > > Nov 3 15:17:33 linux krb5kdc[5006]: TGS_REQ (1 etypes {1}) 10.75.22.1: ISSUE: authtime 1225740577, etypes {rep=1 tkt=1 ses=1}, brian@ILINX for nfs/linux.interlinx.bc.ca@ILINX > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: leaving poll > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: handling null request > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: sname = brian@ILINX > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: DEBUG: serialize_krb5_ctx: lucid version! > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: prepare_krb5_rfc1964_buffer: serializing keys with enctype 4 and length 8 > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: doing downcall > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: mech: krb5, hndl len: 4, ctx len 85, timeout: 2147483647, uid: 1001, gid: 1001, num aux grps: 13: > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 1) 1001 > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 2) 117 > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 3) 1010 > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 4) 1011 > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 5) 2000 > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 6) 29 > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 7) 20 > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 8) 24 > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 9) 25 > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 10) 30 > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 11) 44 > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 12) 46 > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: ( 13) 111 > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: sending null reply > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: writing message: ... > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: finished handling null request > > Nov 3 15:17:33 linux rpc.svcgssd[13724]: entering poll > > If you can get > a packet trace between the client and server during the mount / > context negotiation, check the NULL reply from the server for this > error code. Well, this is just it. # mount -t nfs4 -s -o sec=krb5i,rw,intr,rsize=8192,wsize=8192 linux:/home/share /mnt/test on the client does not result in any NFS4 traffic between the client and the server. In fact if I ^C the above mount and issue it again, I don't get another gss/svcgssd upcall. Is there some kind of caching going on? How can I invalidate the cache? Probably it's moot though given that this sequence of events is not resulting in any NFS4 traffic anyway. This was all working before last night, then all of a sudden, it stopped. Nothing remarkable happened at the time that could/should/would have caused this silliness. ~sigh~ b. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: stuck/hung nfsv4 mounts [not found] ` <4d569c330811031433k7ae18d4enfbda349e8f90a951-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2008-11-03 23:45 ` Brian J. Murrell @ 2008-11-04 1:24 ` Brian J. Murrell 2008-11-04 15:14 ` Brian J. Murrell 2008-11-04 17:22 ` Kevin Coffman 1 sibling, 2 replies; 19+ messages in thread From: Brian J. Murrell @ 2008-11-04 1:24 UTC (permalink / raw) To: linux-nfs On Mon, 2008-11-03 at 17:33 -0500, Kevin Coffman wrote: > If you can get > a packet trace between the client and server during the mount / > context negotiation, check the NULL reply from the server for this > error code. Here is the NULL Call from client: No. Time Time Delta Source Destination Protocol Info 8868 20:07:43.933888 0.043671 10.75.22.1 10.75.22.3 NFS V4 NULL Call (Reply In 8870) Frame 8868 (606 bytes on wire, 606 bytes captured) Ethernet II, Src: AsustekC_e1:65:cc (00:18:f3:e1:65:cc), Dst: Intel_3f:8b:74 (00:d0:b7:3f:8b:74) Internet Protocol, Src: 10.75.22.1 (10.75.22.1), Dst: 10.75.22.3 (10.75.22.3) Transmission Control Protocol, Src Port: imaps (993), Dst Port: nfs (2049), Seq: 1, Ack: 1, Len: 552 Remote Procedure Call, Type:Call XID:0x48023eeb Network File System [Program Version: 4] [V4 Procedure: NULL (0)] GSS Token: ... GSS Token Length: 483 GSS-API Generic Security Service Application Program Interface OID: 1.2.840.113554.1.2.2 (KRB5 - Kerberos 5) krb5_blob: ... krb5_tok_id: KRB5_AP_REQ (0x0001) Kerberos AP-REQ Pvno: 5 MSG Type: AP-REQ (14) Padding: 0 APOptions: 20000000 (Mutual required) .0.. .... .... .... .... .... .... .... = Use Session Key: Do NOT use the session key to encrypt the ticket ..1. .... .... .... .... .... .... .... = Mutual required: MUTUAL authentication is REQUIRED Ticket Tkt-vno: 5 Realm: ILINX Server Name (Service and Host): nfs/linux.interlinx.bc.ca enc-part des-cbc-crc Authenticator des-cbc-crc Here is the NULL reply from the server: No. Time Time Delta Source Destination Protocol Info 8870 20:07:44.062206 0.128081 10.75.22.3 10.75.22.1 NFS V4 NULL Reply (Call In 8868) Frame 8870 (262 bytes on wire, 262 bytes captured) Ethernet II, Src: Intel_3f:8b:74 (00:d0:b7:3f:8b:74), Dst: AsustekC_e1:65:cc (00:18:f3:e1:65:cc) Internet Protocol, Src: 10.75.22.3 (10.75.22.3), Dst: 10.75.22.1 (10.75.22.1) Transmission Control Protocol, Src Port: nfs (2049), Dst Port: imaps (993), Seq: 1, Ack: 553, Len: 208 Remote Procedure Call, Type:Reply XID:0x48023eeb Network File System [Program Version: 4] [V4 Procedure: NULL (0)] GSS Context: <DATA> GSS Major Status: 0 GSS Minor Status: 0 GSS Sequence Window: 128 GSS Token: ... GSS Token Length: 114 GSS-API Generic Security Service Application Program Interface OID: 1.2.840.113554.1.2.2 (KRB5 - Kerberos 5) krb5_blob: ... krb5_tok_id: KRB5_AP_REP (0x0002) Kerberos AP-REP Pvno: 5 MSG Type: AP-REP (15) enc-part des-cbc-crc Encryption type: des-cbc-crc (1) enc-part: ... b. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: stuck/hung nfsv4 mounts 2008-11-04 1:24 ` Brian J. Murrell @ 2008-11-04 15:14 ` Brian J. Murrell 2008-11-04 17:22 ` Kevin Coffman 1 sibling, 0 replies; 19+ messages in thread From: Brian J. Murrell @ 2008-11-04 15:14 UTC (permalink / raw) To: linux-nfs Hi All, Is there any more debugging I can/you want me to do here or can/shall I just reboot this machine? Thanx, b. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: stuck/hung nfsv4 mounts 2008-11-04 1:24 ` Brian J. Murrell 2008-11-04 15:14 ` Brian J. Murrell @ 2008-11-04 17:22 ` Kevin Coffman 1 sibling, 0 replies; 19+ messages in thread From: Kevin Coffman @ 2008-11-04 17:22 UTC (permalink / raw) To: Brian J. Murrell; +Cc: linux-nfs On Mon, Nov 3, 2008 at 8:24 PM, Brian J. Murrell <brian-SquOHqY54CVWr29BmMi2cA@public.gmane.org> wrote: > On Mon, 2008-11-03 at 17:33 -0500, Kevin Coffman wrote: >> If you can get >> a packet trace between the client and server during the mount / >> context negotiation, check the NULL reply from the server for this >> error code. > > Here is the NULL Call from client: > No. Time Time Delta Source Destination Protocol Info > 8868 20:07:43.933888 0.043671 10.75.22.1 10.75.22.3 NFS V4 NULL Call (Reply In 8870) > > Frame 8868 (606 bytes on wire, 606 bytes captured) > Ethernet II, Src: AsustekC_e1:65:cc (00:18:f3:e1:65:cc), Dst: Intel_3f:8b:74 (00:d0:b7:3f:8b:74) > Internet Protocol, Src: 10.75.22.1 (10.75.22.1), Dst: 10.75.22.3 (10.75.22.3) > Transmission Control Protocol, Src Port: imaps (993), Dst Port: nfs (2049), Seq: 1, Ack: 1, Len: 552 > Remote Procedure Call, Type:Call XID:0x48023eeb > Network File System > [Program Version: 4] > [V4 Procedure: NULL (0)] > GSS Token: ... > GSS Token Length: 483 > GSS-API Generic Security Service Application Program Interface > OID: 1.2.840.113554.1.2.2 (KRB5 - Kerberos 5) > krb5_blob: ... > krb5_tok_id: KRB5_AP_REQ (0x0001) > Kerberos AP-REQ > Pvno: 5 > MSG Type: AP-REQ (14) > Padding: 0 > APOptions: 20000000 (Mutual required) > .0.. .... .... .... .... .... .... .... = Use Session Key: Do NOT use the session key to encrypt the ticket > ..1. .... .... .... .... .... .... .... = Mutual required: MUTUAL authentication is REQUIRED > Ticket > Tkt-vno: 5 > Realm: ILINX > Server Name (Service and Host): nfs/linux.interlinx.bc.ca > enc-part des-cbc-crc > Authenticator des-cbc-crc > > Here is the NULL reply from the server: > > No. Time Time Delta Source Destination Protocol Info > 8870 20:07:44.062206 0.128081 10.75.22.3 10.75.22.1 NFS V4 NULL Reply (Call In 8868) > > Frame 8870 (262 bytes on wire, 262 bytes captured) > Ethernet II, Src: Intel_3f:8b:74 (00:d0:b7:3f:8b:74), Dst: AsustekC_e1:65:cc (00:18:f3:e1:65:cc) > Internet Protocol, Src: 10.75.22.3 (10.75.22.3), Dst: 10.75.22.1 (10.75.22.1) > Transmission Control Protocol, Src Port: nfs (2049), Dst Port: imaps (993), Seq: 1, Ack: 553, Len: 208 > Remote Procedure Call, Type:Reply XID:0x48023eeb > Network File System > [Program Version: 4] > [V4 Procedure: NULL (0)] > GSS Context: <DATA> > GSS Major Status: 0 > GSS Minor Status: 0 > GSS Sequence Window: 128 > GSS Token: ... > GSS Token Length: 114 > GSS-API Generic Security Service Application Program Interface > OID: 1.2.840.113554.1.2.2 (KRB5 - Kerberos 5) > krb5_blob: ... > krb5_tok_id: KRB5_AP_REP (0x0002) > Kerberos AP-REP > Pvno: 5 > MSG Type: AP-REP (15) > enc-part des-cbc-crc > Encryption type: des-cbc-crc (1) > enc-part: ... > > b. >From this output, it looks like the response back from the server was a success. So it appears to be some other issue. Perhaps, back to the processes stuck in the kernel? Unless Trond has suggestions, I would reboot the client and see if it happens again (w/o soft mounts). K.C. ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2008-11-04 17:22 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-03 15:05 stuck/hung nfsv4 mounts Brian J. Murrell
2008-11-03 16:59 ` Trond Myklebust
[not found] ` <1225731544.6958.6.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-11-03 17:25 ` Jim Rees
2008-11-03 17:37 ` Trond Myklebust
[not found] ` <1225733834.6958.12.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-11-03 21:40 ` Chuck Lever
2008-11-03 22:20 ` Peter Staubach
2008-11-03 22:47 ` Chuck Lever
2008-11-04 16:03 ` Peter Staubach
2008-11-03 17:38 ` Benny Halevy
2008-11-03 17:50 ` Brian J. Murrell
2008-11-03 19:58 ` Kevin Coffman
[not found] ` <4d569c330811031158r26963e0w5bcf8331e0fb14b7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-11-03 20:09 ` Brian J. Murrell
2008-11-03 20:28 ` Kevin Coffman
[not found] ` <4d569c330811031228r5bb9aefs7a970303910810e2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-11-03 21:12 ` Brian J. Murrell
2008-11-03 22:33 ` Kevin Coffman
[not found] ` <4d569c330811031433k7ae18d4enfbda349e8f90a951-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-11-03 23:45 ` Brian J. Murrell
2008-11-04 1:24 ` Brian J. Murrell
2008-11-04 15:14 ` Brian J. Murrell
2008-11-04 17:22 ` Kevin Coffman
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.