stuck/hung nfsv4 mounts

All of lore.kernel.org
 help / color / mirror / Atom feed

* stuck/hung nfsv4 mounts
@ 2008-11-03 15:05 Brian J. Murrell
  2008-11-03 16:59 ` Trond Myklebust
  0 siblings, 1 reply; 19+ messages in thread
From: Brian J. Murrell @ 2008-11-03 15:05 UTC (permalink / raw)
  To: linux-nfs

I have a system which I migrated a couple of mounts from sys
authentication on nfs3 to gssapi on nfsv4 to overcome the 16
supplementary groups limitation inherent in the "sys" security model.

It was working for a while but now I have "stuck" (i.e. hung) NFS
mounts.  It seems that a mount issued from the automounter kicked all of
this off but I cannot be sure of that.

In my process table I have a number of processes blocked in the NFSv4
path including a mount command:

/bin/mount -t nfs4 -s -o sec=krb5i,rw,soft,intr,rsize=8192,wsize=8192 linux:/home/share /autohome/share

It's stack trace is:

Nov  3 09:54:14 pc kernel: [474304.152005] mount.nfs4    D c012ad55     0 25903  25902
Nov  3 09:54:14 pc kernel: [474304.152005]        dcf1bb34 00000086 dcf1badc c012ad55 00001eff 026b0b7a 00000000 dcf1badc 
Nov  3 09:54:14 pc kernel: [474304.152005]        00000086 f670d7f0 ec213ed0 003649ad 00000000 c012b4ee 00000000 ec2b5340 
Nov  3 09:54:14 pc kernel: [474304.152005]        c050f080 f3740000 dcf1a000 ec214148 c28d1d00 00000000 dcf1bb0c c28d1d00 
Nov  3 09:54:14 pc kernel: [474304.152005] Call Trace:
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c012ad55>] ? check_preempt_wakeup+0x175/0x1b0
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c012b4ee>] ? try_to_wake_up+0xde/0x290
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f9574c5d>] rpc_wait_bit_killable+0x1d/0x50 [sunrpc]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c037d2df>] __wait_on_bit+0x4f/0x70
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f9574c40>] ? rpc_wait_bit_killable+0x0/0x50 [sunrpc]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c037d3ad>] out_of_line_wait_on_bit+0xad/0xc0
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f9574c40>] ? rpc_wait_bit_killable+0x0/0x50 [sunrpc]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c0147500>] ? wake_bit_function+0x0/0x60
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f95756ae>] __rpc_execute+0xae/0x250 [sunrpc]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f95758c9>] rpc_execute+0x79/0x90 [sunrpc]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f956e743>] rpc_run_task+0x33/0x70 [sunrpc]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f956e8a0>] rpc_call_sync+0x40/0x70 [sunrpc]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f9cd0401>] nfs4_proc_get_root+0x71/0xc0 [nfs]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f9cba216>] nfs4_path_walk+0x86/0x340 [nfs]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01cb0d2>] ? mntput_no_expire+0x22/0x120
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01d1ab4>] ? simple_release_fs+0x54/0x70
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f95818bb>] ? rpc_mkdir+0x9b/0x110 [sunrpc]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c037e61d>] ? _spin_lock+0xd/0x10
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01d1b14>] ? simple_pin_fs+0x44/0xb0
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f956ec30>] ? rpc_setup_pipedir+0x90/0xd0 [sunrpc]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f9574528>] ? rpciod_up+0x8/0x30 [sunrpc]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f956ed46>] ? rpc_clone_client+0xd6/0x130 [sunrpc]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f9cb373b>] ? nfs_init_server_rpcclient+0x2b/0xf0 [nfs]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f9cb4c0a>] ? nfs4_init_server+0xea/0x220 [nfs]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f9cb4dbc>] nfs4_create_server+0x7c/0x240 [nfs]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01acc24>] ? kfree+0x14/0xd0
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01ad388>] ? get_slab+0x8/0x60
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f9cbdcb3>] ? nfs_parse_simple_hostname+0x83/0x160 [nfs]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c0191496>] ? kstrndup+0x36/0x60
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f9cbdcb3>] ? nfs_parse_simple_hostname+0x83/0x160 [nfs]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f9cbddb3>] ? nfs_parse_devname+0x23/0x50 [nfs]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f9cbfe7e>] ? nfs4_validate_mount_data+0x31e/0x370 [nfs]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f9cbff65>] nfs4_get_sb+0x95/0x280 [nfs]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01b43ee>] vfs_kern_mount+0x5e/0x130
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01b451e>] do_kern_mount+0x3e/0xe0
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01cc9ef>] do_new_mount+0x6f/0x90
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01ccf32>] do_mount+0x1d2/0x1f0
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01ca67d>] ? exact_copy_from_user+0x4d/0xa0
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01cac8e>] ? copy_mount_options+0x6e/0xd0
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01ccfe1>] sys_mount+0x91/0xc0
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c0103f7b>] sysenter_do_call+0x12/0x2f

There are a number of other processes stuck in the nfs4 paths as well:

Nov  3 09:54:13 pc kernel: [474304.152005] gnome-panel   D cafc1ca4     0  7266  16025
Nov  3 09:54:13 pc kernel: [474304.152005]        cafc1d08 00000086 f7070000 cafc1ca4 cafc1ca4 00000246 c2e6a204 f7070000 
Nov  3 09:54:13 pc kernel: [474304.152005]        f7070000 cafc1cc4 ec216480 000025ad 00000000 00000246 00000001 f63ec380 
Nov  3 09:54:13 pc kernel: [474304.152005]        c050f080 f0d30000 cafc0000 ec2166f8 c28dcd00 00000001 f9574ec0 c28dcd00 
Nov  3 09:54:13 pc kernel: [474304.152005] Call Trace:
Nov  3 09:54:13 pc kernel: [474304.152005]  [<f9574ec0>] ? __rpc_sleep_on+0x230/0x270 [sunrpc]
Nov  3 09:54:13 pc kernel: [474304.152005]  [<f9574c5d>] rpc_wait_bit_killable+0x1d/0x50 [sunrpc]
Nov  3 09:54:13 pc kernel: [474304.152005]  [<c037d2df>] __wait_on_bit+0x4f/0x70
Nov  3 09:54:13 pc kernel: [474304.152005]  [<f9574c40>] ? rpc_wait_bit_killable+0x0/0x50 [sunrpc]
Nov  3 09:54:13 pc kernel: [474304.152005]  [<c037d3ad>] out_of_line_wait_on_bit+0xad/0xc0
Nov  3 09:54:13 pc kernel: [474304.152005]  [<f9574c40>] ? rpc_wait_bit_killable+0x0/0x50 [sunrpc]
Nov  3 09:54:13 pc kernel: [474304.152005]  [<c0147500>] ? wake_bit_function+0x0/0x60
Nov  3 09:54:13 pc kernel: [474304.152005]  [<f95756ae>] __rpc_execute+0xae/0x250 [sunrpc]
Nov  3 09:54:13 pc kernel: [474304.152005]  [<f95758c9>] rpc_execute+0x79/0x90 [sunrpc]
Nov  3 09:54:13 pc kernel: [474304.152005]  [<f956e743>] rpc_run_task+0x33/0x70 [sunrpc]
Nov  3 09:54:13 pc kernel: [474304.152005]  [<f956e8a0>] rpc_call_sync+0x40/0x70 [sunrpc]
Nov  3 09:54:13 pc kernel: [474304.152005]  [<f9cd0206>] nfs4_proc_getattr+0x76/0xa0 [nfs]
Nov  3 09:54:13 pc kernel: [474304.152005]  [<f9cbbc86>] __nfs_revalidate_inode+0x156/0x2b0 [nfs]
Nov  3 09:54:13 pc kernel: [474304.152005]  [<c014b474>] ? hrtimer_start+0xc4/0x1c0
Nov  3 09:54:13 pc kernel: [474304.152005]  [<c01cb0d2>] ? mntput_no_expire+0x22/0x120
Nov  3 09:54:13 pc kernel: [474304.152005]  [<c01babc5>] ? path_put+0x25/0x30
Nov  3 09:54:13 pc kernel: [474304.152005]  [<c01bca62>] ? path_walk+0xa2/0xb0
Nov  3 09:54:13 pc kernel: [474304.152005]  [<c01ba9fa>] ? putname+0x2a/0x40
Nov  3 09:54:13 pc kernel: [474304.152005]  [<c01ba9fa>] ? putname+0x2a/0x40
Nov  3 09:54:13 pc kernel: [474304.152005]  [<c01bd7e3>] ? user_path_at+0x53/0x80
Nov  3 09:54:13 pc kernel: [474304.152005]  [<f9cbcb2d>] nfs_revalidate_inode+0x3d/0x70 [nfs]
Nov  3 09:54:13 pc kernel: [474304.152005]  [<f9cbcc13>] nfs_getattr+0xb3/0x100 [nfs]
Nov  3 09:54:13 pc kernel: [474304.152005]  [<f9cbcb60>] ? nfs_getattr+0x0/0x100 [nfs]
Nov  3 09:54:13 pc kernel: [474304.152005]  [<c01b5bda>] vfs_getattr+0x4a/0x70
Nov  3 09:54:13 pc kernel: [474304.152005]  [<c01b5e39>] vfs_stat_fd+0x39/0x60
Nov  3 09:54:13 pc kernel: [474304.152005]  [<c01b5f66>] vfs_stat+0x16/0x20
Nov  3 09:54:13 pc kernel: [474304.152005]  [<c01b5f89>] sys_stat64+0x19/0x30
Nov  3 09:54:13 pc kernel: [474304.152005]  [<c0103f7b>] sysenter_do_call+0x12/0x2f

Nov  3 09:54:14 pc kernel: [474304.152005] lsof          D ece08000     0 18247  18245
Nov  3 09:54:14 pc kernel: [474304.152005]        c0e0dd08 00200086 f0953a74 ece08000 c0e0dca0 f95c629b f0953a74 00000000 
Nov  3 09:54:14 pc kernel: [474304.152005]        c0e0dce0 f95c6a10 c2e657f0 00007541 00000000 e5f76cc0 00000000 ee177c00 
Nov  3 09:54:14 pc kernel: [474304.152005]        c050f080 ee066000 c0e0c000 c2e65a68 c28d1d00 00000000 c0e0dce0 c28d1d00 
Nov  3 09:54:14 pc kernel: [474304.152005] Call Trace:
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f95c629b>] ? gss_release_msg+0x4b/0x90 [auth_rpcgss]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f95c6a10>] ? gss_cred_init+0xe0/0x140 [auth_rpcgss]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f9574c5d>] rpc_wait_bit_killable+0x1d/0x50 [sunrpc]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c037d2df>] __wait_on_bit+0x4f/0x70
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f9574c40>] ? rpc_wait_bit_killable+0x0/0x50 [sunrpc]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c037d3ad>] out_of_line_wait_on_bit+0xad/0xc0
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f9574c40>] ? rpc_wait_bit_killable+0x0/0x50 [sunrpc]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c0147500>] ? wake_bit_function+0x0/0x60
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f95756ae>] __rpc_execute+0xae/0x250 [sunrpc]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f95758c9>] rpc_execute+0x79/0x90 [sunrpc]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f956e743>] rpc_run_task+0x33/0x70 [sunrpc]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f956e8a0>] rpc_call_sync+0x40/0x70 [sunrpc]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f9cd0206>] nfs4_proc_getattr+0x76/0xa0 [nfs]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f9cbbc86>] __nfs_revalidate_inode+0x156/0x2b0 [nfs]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01bac46>] ? path_to_nameidata+0x36/0x50
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01cb0d2>] ? mntput_no_expire+0x22/0x120
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01babc5>] ? path_put+0x25/0x30
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01bca62>] ? path_walk+0xa2/0xb0
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01ba9fa>] ? putname+0x2a/0x40
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01ba9fa>] ? putname+0x2a/0x40
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01bd7e3>] ? user_path_at+0x53/0x80
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f9cbcb2d>] nfs_revalidate_inode+0x3d/0x70 [nfs]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f9cbcc13>] nfs_getattr+0xb3/0x100 [nfs]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<f9cbcb60>] ? nfs_getattr+0x0/0x100 [nfs]
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01b5bda>] vfs_getattr+0x4a/0x70
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01b5e39>] vfs_stat_fd+0x39/0x60
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01b5f66>] vfs_stat+0x16/0x20
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01b5f89>] sys_stat64+0x19/0x30
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01b26db>] ? vfs_read+0xab/0x110
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01b1e40>] ? do_sync_read+0x0/0x120
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c01b2812>] ? sys_read+0x42/0x70
Nov  3 09:54:14 pc kernel: [474304.152005]  [<c0103f7b>] sysenter_do_call+0x12/0x2f

I have used wireshark and filtered for NFS4 (given that I have lots of
NFV3 still) with "rpc.programversion == 4" on the wire and noticed that
only a couple of packets (1 each after the TCP setup and teardown) are
sent between the client and server.  It was many tens of minutes between
these last night when I was debugging.  For as long as I have been
looking this morning there has not been a single NFS4 packet.

Client machine is the Ubuntu 2.6.27-7-generic kernel and the server is
the Ubuntu 2.6.24-16-generic kernel.

I've tried restarting NFS on the server a number of times and that has
not helped.

Any ideas?

b.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: stuck/hung nfsv4 mounts
  2008-11-03 15:05 stuck/hung nfsv4 mounts Brian J. Murrell
@ 2008-11-03 16:59 ` Trond Myklebust
       [not found]   ` <1225731544.6958.6.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Trond Myklebust @ 2008-11-03 16:59 UTC (permalink / raw)
  To: Brian J. Murrell; +Cc: linux-nfs

On Mon, 2008-11-03 at 10:05 -0500, Brian J. Murrell wrote:
> I have a system which I migrated a couple of mounts from sys
> authentication on nfs3 to gssapi on nfsv4 to overcome the 16
> supplementary groups limitation inherent in the "sys" security model.
> 
> It was working for a while but now I have "stuck" (i.e. hung) NFS
> mounts.  It seems that a mount issued from the automounter kicked all of
> this off but I cannot be sure of that.
> 
> In my process table I have a number of processes blocked in the NFSv4
> path including a mount command:
> 
> /bin/mount -t nfs4 -s -o sec=krb5i,rw,soft,intr,rsize=8192,wsize=8192 linux:/home/share /autohome/share

BTW: NFSv4 + soft == BAD BAD BAD! You will see weird behaviour, as the
client and the server will have divergent opinions on what happened
if/when an RPC call fails.

Otherwise, have you checked on the state of your rpc.gssd? It looked as
if several of those traces were waiting around RPCSEC_GSS upcalls...

Cheers
  Trond


^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <1225731544.6958.6.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>]

* Re: stuck/hung nfsv4 mounts
       [not found]   ` <1225731544.6958.6.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2008-11-03 17:25     ` Jim Rees
  2008-11-03 17:37       ` Trond Myklebust
  2008-11-03 17:38       ` Benny Halevy
  2008-11-03 17:50     ` Brian J. Murrell
  1 sibling, 2 replies; 19+ messages in thread
From: Jim Rees @ 2008-11-03 17:25 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Brian J. Murrell, linux-nfs

Trond Myklebust wrote:

  BTW: NFSv4 + soft == BAD BAD BAD!

Maybe this combination should be prohibited.  Does it make any sense given
the stateful nature of v4?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: stuck/hung nfsv4 mounts
  2008-11-03 17:25     ` Jim Rees
@ 2008-11-03 17:37       ` Trond Myklebust
       [not found]         ` <1225733834.6958.12.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  2008-11-03 17:38       ` Benny Halevy
  1 sibling, 1 reply; 19+ messages in thread
From: Trond Myklebust @ 2008-11-03 17:37 UTC (permalink / raw)
  To: Jim Rees; +Cc: Brian J. Murrell, linux-nfs

On Mon, 2008-11-03 at 12:25 -0500, Jim Rees wrote:
> Trond Myklebust wrote:
> 
>   BTW: NFSv4 + soft == BAD BAD BAD!
> 
> Maybe this combination should be prohibited.  Does it make any sense given
> the stateful nature of v4?

It might make sense if we were to fix up the granularity of the recovery
routines so that we are able to recover all the state associated with
just a single open owner or lock owner. Currently we'd have to recover
all the state associated with that server.

IOW: we might be able to fix things up in the future, but right now,
NFSv4+soft is not a good idea.

Trond

^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <1225733834.6958.12.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>]

* Re: stuck/hung nfsv4 mounts
       [not found]         ` <1225733834.6958.12.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2008-11-03 21:40           ` Chuck Lever
  2008-11-03 22:20             ` Peter Staubach
  0 siblings, 1 reply; 19+ messages in thread
From: Chuck Lever @ 2008-11-03 21:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jim Rees, Brian J. Murrell, linux-nfs

On Nov 3, 2008, at Nov 3, 2008, 12:37 PM, Trond Myklebust wrote:
> On Mon, 2008-11-03 at 12:25 -0500, Jim Rees wrote:
>> Trond Myklebust wrote:
>>
>>  BTW: NFSv4 + soft == BAD BAD BAD!
>>
>> Maybe this combination should be prohibited.  Does it make any  
>> sense given
>> the stateful nature of v4?
>
> It might make sense if we were to fix up the granularity of the  
> recovery
> routines so that we are able to recover all the state associated with
> just a single open owner or lock owner. Currently we'd have to recover
> all the state associated with that server.
>
> IOW: we might be able to fix things up in the future, but right now,
> NFSv4+soft is not a good idea.

Two cents worth:  Until NFSv4+soft works reasonably well, I wouldn't  
have any problem with making "soft" a no-op for nfs4 mounts.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: stuck/hung nfsv4 mounts
  2008-11-03 21:40           ` Chuck Lever
@ 2008-11-03 22:20             ` Peter Staubach
  2008-11-03 22:47               ` Chuck Lever
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Staubach @ 2008-11-03 22:20 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Trond Myklebust, Jim Rees, Brian J. Murrell, linux-nfs

Chuck Lever wrote:
> On Nov 3, 2008, at Nov 3, 2008, 12:37 PM, Trond Myklebust wrote:
>> On Mon, 2008-11-03 at 12:25 -0500, Jim Rees wrote:
>>> Trond Myklebust wrote:
>>>
>>>  BTW: NFSv4 + soft == BAD BAD BAD!
>>>
>>> Maybe this combination should be prohibited.  Does it make any sense 
>>> given
>>> the stateful nature of v4?
>>
>> It might make sense if we were to fix up the granularity of the recovery
>> routines so that we are able to recover all the state associated with
>> just a single open owner or lock owner. Currently we'd have to recover
>> all the state associated with that server.
>>
>> IOW: we might be able to fix things up in the future, but right now,
>> NFSv4+soft is not a good idea.
>
> Two cents worth:  Until NFSv4+soft works reasonably well, I wouldn't 
> have any problem with making "soft" a no-op for nfs4 mounts. 

It may be the right thing to do in the short term, but it will
generate calls to support organizations.  Anyone trying it will
wonder why it doesn't work "as it used to".  That sort of thing.
When that happens, we will be forced to do something and not at
our choice of time.

Perhaps we should deny such mount attempts because at least that
would be obvious what was happening?  Perhaps we should also
deny NFSv4 mounting over UDP?

       ps

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: stuck/hung nfsv4 mounts
  2008-11-03 22:20             ` Peter Staubach
@ 2008-11-03 22:47               ` Chuck Lever
  2008-11-04 16:03                 ` Peter Staubach
  0 siblings, 1 reply; 19+ messages in thread
From: Chuck Lever @ 2008-11-03 22:47 UTC (permalink / raw)
  To: Peter Staubach; +Cc: Trond Myklebust, Jim Rees, Brian J. Murrell, linux-nfs

On Nov 3, 2008, at Nov 3, 2008, 5:20 PM, Peter Staubach wrote:
> Chuck Lever wrote:
>> On Nov 3, 2008, at Nov 3, 2008, 12:37 PM, Trond Myklebust wrote:
>>> On Mon, 2008-11-03 at 12:25 -0500, Jim Rees wrote:
>>>> Trond Myklebust wrote:
>>>>
>>>> BTW: NFSv4 + soft == BAD BAD BAD!
>>>>
>>>> Maybe this combination should be prohibited.  Does it make any  
>>>> sense given
>>>> the stateful nature of v4?
>>>
>>> It might make sense if we were to fix up the granularity of the  
>>> recovery
>>> routines so that we are able to recover all the state associated  
>>> with
>>> just a single open owner or lock owner. Currently we'd have to  
>>> recover
>>> all the state associated with that server.
>>>
>>> IOW: we might be able to fix things up in the future, but right now,
>>> NFSv4+soft is not a good idea.
>>
>> Two cents worth:  Until NFSv4+soft works reasonably well, I  
>> wouldn't have any problem with making "soft" a no-op for nfs4 mounts.
>
> It may be the right thing to do in the short term, but it will
> generate calls to support organizations.

As opposed to the support "call" we got today?  ;-)

> Anyone trying it will
> wonder why it doesn't work "as it used to".  That sort of thing.
> When that happens, we will be forced to do something and not at
> our choice of time.

I think our petard is hoist either way.  Better to avoid data  
corruption, by whatever means.

> Perhaps we should deny such mount attempts because at least that
> would be obvious what was happening?  Perhaps we should also
> deny NFSv4 mounting over UDP?

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: stuck/hung nfsv4 mounts
  2008-11-03 22:47               ` Chuck Lever
@ 2008-11-04 16:03                 ` Peter Staubach
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Staubach @ 2008-11-04 16:03 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Trond Myklebust, Jim Rees, Brian J. Murrell, linux-nfs

Chuck Lever wrote:
> On Nov 3, 2008, at Nov 3, 2008, 5:20 PM, Peter Staubach wrote:
>> Chuck Lever wrote:
>>> On Nov 3, 2008, at Nov 3, 2008, 12:37 PM, Trond Myklebust wrote:
>>>> On Mon, 2008-11-03 at 12:25 -0500, Jim Rees wrote:
>>>>> Trond Myklebust wrote:
>>>>>
>>>>> BTW: NFSv4 + soft == BAD BAD BAD!
>>>>>
>>>>> Maybe this combination should be prohibited.  Does it make any 
>>>>> sense given
>>>>> the stateful nature of v4?
>>>>
>>>> It might make sense if we were to fix up the granularity of the 
>>>> recovery
>>>> routines so that we are able to recover all the state associated with
>>>> just a single open owner or lock owner. Currently we'd have to recover
>>>> all the state associated with that server.
>>>>
>>>> IOW: we might be able to fix things up in the future, but right now,
>>>> NFSv4+soft is not a good idea.
>>>
>>> Two cents worth:  Until NFSv4+soft works reasonably well, I wouldn't 
>>> have any problem with making "soft" a no-op for nfs4 mounts.
>>
>> It may be the right thing to do in the short term, but it will
>> generate calls to support organizations.
>
> As opposed to the support "call" we got today?  ;-)
>

Well, at least, you have an answer today.

>> Anyone trying it will
>> wonder why it doesn't work "as it used to".  That sort of thing.
>> When that happens, we will be forced to do something and not at
>> our choice of time.
>
> I think our petard is hoist either way.  Better to avoid data 
> corruption, by whatever means.
>

Then it is probably time to fix it, sooner as opposed to later.

I don't have the resources at this time to put into doing this work,
so I am just expressing opinions.

I do know that it caused quite the flood of interest when it was
discovered that the superblock sharing stuff had broken a lot of
customers who expected to be able mount the same file system from
a server, multiple times, in different places, with different
options, and it stopped using the options beyond the first mount.

I am thinking that perhaps we can take something from this experience
and do better this next time.  This time, we know that we are making
an incompatible change and can do better.

       ps

>> Perhaps we should deny such mount attempts because at least that
>> would be obvious what was happening?  Perhaps we should also
>> deny NFSv4 mounting over UDP?
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: stuck/hung nfsv4 mounts
  2008-11-03 17:25     ` Jim Rees
  2008-11-03 17:37       ` Trond Myklebust
@ 2008-11-03 17:38       ` Benny Halevy
  1 sibling, 0 replies; 19+ messages in thread
From: Benny Halevy @ 2008-11-03 17:38 UTC (permalink / raw)
  To: Jim Rees; +Cc: Trond Myklebust, Brian J. Murrell, linux-nfs

On Nov. 03, 2008, 19:25 +0200, Jim Rees <rees@umich.edu> wrote:
> Trond Myklebust wrote:
> 
>   BTW: NFSv4 + soft == BAD BAD BAD!
> 
> Maybe this combination should be prohibited.  Does it make any sense given
> the stateful nature of v4?

>From the usage perspective it allows the application to fail rather
than stall.  If it would have been implemented similar to "intr"
and return an error to the app as if the system call had received
an interrupt but under the covers the RPC will be processed to
completion (or graceful cleanup) I think it might still make sense for nfs4.
Until then this combination should at least be warned about.

Benny

> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: stuck/hung nfsv4 mounts
       [not found]   ` <1225731544.6958.6.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  2008-11-03 17:25     ` Jim Rees
@ 2008-11-03 17:50     ` Brian J. Murrell
  2008-11-03 19:58       ` Kevin Coffman
  1 sibling, 1 reply; 19+ messages in thread
From: Brian J. Murrell @ 2008-11-03 17:50 UTC (permalink / raw)
  To: linux-nfs

On Mon, 2008-11-03 at 11:59 -0500, Trond Myklebust wrote:
> 
> BTW: NFSv4 + soft == BAD BAD BAD!

Ahh.  OK.

> You will see weird behaviour, as the
> client and the server will have divergent opinions on what happened
> if/when an RPC call fails.

I have removed the soft option from the auto map and killed the
in-progress mount.nfs4.  Seems the new one is not using soft any more:

root     22731 22730  0 12:06 ?        00:00:00 /sbin/mount.nfs4 linux:/home/share /autohome/share -s -o rw,sec=krb5i,intr,rsize=8192,wsize=8192

> Otherwise, have you checked on the state of your rpc.gssd? It looked as
> if several of those traces were waiting around RPCSEC_GSS upcalls...
I thought it was working.  I killed it and restarted it with -vvv -rrr
and this is what it said when automount re-issued the above mount:

Nov  3 12:02:15 pc rpc.gssd[21773]: rpcsec_gss: debug level is 3
Nov  3 12:06:00 pc rpc.gssd[21774]: in authgss_create_default()
Nov  3 12:06:00 pc rpc.gssd[21774]: in authgss_create()
Nov  3 12:06:00 pc rpc.gssd[21774]: authgss_create: name is 0x8ab8d88
Nov  3 12:06:00 pc rpc.gssd[21774]: authgss_create: gd->name is 0x8ab8ed0
Nov  3 12:06:00 pc rpc.gssd[21774]: in authgss_refresh()
Nov  3 12:06:00 pc rpc.gssd[21774]: struct rpc_gss_sec: 
Nov  3 12:06:00 pc rpc.gssd[21774]:      mechanism_OID: { 1 2 134 72 134 247 18 1 2 2 } 
Nov  3 12:06:00 pc rpc.gssd[21774]:      qop: 0 
Nov  3 12:06:00 pc rpc.gssd[21774]:      service: 1 
Nov  3 12:06:00 pc rpc.gssd[21774]:      cred: 0x8abb830 
Nov  3 12:06:00 pc rpc.gssd[21774]:      req_flags: 00000002 
Nov  3 12:06:00 pc rpc.gssd[21774]: in authgss_marshal()
Nov  3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_buf: encode success ((nil):0)
Nov  3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_cred: encode success (v 1, proc 1, seq 0, svc 1, ctx (nil):0)
Nov  3 12:06:00 pc rpc.gssd[21774]: in authgss_wrap()
Nov  3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_buf: encode success (0x8abbeb8:483)
Nov  3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_init_args: encode success (token 0x8abbeb8:483)
Nov  3 12:06:00 pc rpc.gssd[21774]: in authgss_validate()
Nov  3 12:06:00 pc rpc.gssd[21774]: in authgss_unwrap()
Nov  3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_buf: decode success (0x8abbe98:4)
Nov  3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_buf: decode success (0x8abc110:114)
Nov  3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_init_res decode success (ctx 0x8abbe98:4, maj 0, min 0, win 128, token 0x8abc110:114)
Nov  3 12:06:00 pc rpc.gssd[21774]: authgss_create_default: freeing name 0x8ab8d88
Nov  3 12:06:00 pc rpc.gssd[21774]: in authgss_get_private_data()
Nov  3 12:06:00 pc rpc.gssd[21774]: in authgss_free_private_data()
Nov  3 12:06:00 pc rpc.gssd[21774]: in authgss_destroy()
Nov  3 12:06:00 pc rpc.gssd[21774]: in authgss_destroy_context()
Nov  3 12:06:00 pc rpc.gssd[21774]: authgss_destroy: freeing name 0x8ab8ed0

The kdc logged:

Nov  3 12:06:00 linux krb5kdc[5006]: AS_REQ (1 etypes {1}) 10.75.22.1: ISSUE: authtime 1225731960, etypes {rep=1 tkt=16 ses=1}, nfs/pc.interlinx.bc.ca@ILINX for krbtgt/ILINX@ILINX
Nov  3 12:06:00 linux krb5kdc[5006]: TGS_REQ (1 etypes {1}) 10.75.22.1: ISSUE: authtime 1225731960, etypes {rep=1 tkt=1 ses=1}, nfs/pc.interlinx.bc.ca@ILINX for nfs/linux.interlinx.bc.ca@ILINX

in correlation to the new mount request, but the mount.nfs4 didn't
complete.

b.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: stuck/hung nfsv4 mounts
  2008-11-03 17:50     ` Brian J. Murrell
@ 2008-11-03 19:58       ` Kevin Coffman
       [not found]         ` <4d569c330811031158r26963e0w5bcf8331e0fb14b7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Kevin Coffman @ 2008-11-03 19:58 UTC (permalink / raw)
  To: Brian J. Murrell; +Cc: linux-nfs

On Mon, Nov 3, 2008 at 12:50 PM, Brian J. Murrell <brian-SquOHqY54CVWr29BmMi2cA@public.gmane.org> wrote:
> On Mon, 2008-11-03 at 11:59 -0500, Trond Myklebust wrote:
>
>> Otherwise, have you checked on the state of your rpc.gssd? It looked as
>> if several of those traces were waiting around RPCSEC_GSS upcalls...
>
> I thought it was working.  I killed it and restarted it with -vvv -rrr
> and this is what it said when automount re-issued the above mount:
>
> Nov  3 12:02:15 pc rpc.gssd[21773]: rpcsec_gss: debug level is 3
> Nov  3 12:06:00 pc rpc.gssd[21774]: in authgss_create_default()
> Nov  3 12:06:00 pc rpc.gssd[21774]: in authgss_create()
> Nov  3 12:06:00 pc rpc.gssd[21774]: authgss_create: name is 0x8ab8d88
> Nov  3 12:06:00 pc rpc.gssd[21774]: authgss_create: gd->name is 0x8ab8ed0
> Nov  3 12:06:00 pc rpc.gssd[21774]: in authgss_refresh()
> Nov  3 12:06:00 pc rpc.gssd[21774]: struct rpc_gss_sec:
> Nov  3 12:06:00 pc rpc.gssd[21774]:      mechanism_OID: { 1 2 134 72 134 247 18 1 2 2 }
> Nov  3 12:06:00 pc rpc.gssd[21774]:      qop: 0
> Nov  3 12:06:00 pc rpc.gssd[21774]:      service: 1
> Nov  3 12:06:00 pc rpc.gssd[21774]:      cred: 0x8abb830
> Nov  3 12:06:00 pc rpc.gssd[21774]:      req_flags: 00000002
> Nov  3 12:06:00 pc rpc.gssd[21774]: in authgss_marshal()
> Nov  3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_buf: encode success ((nil):0)
> Nov  3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_cred: encode success (v 1, proc 1, seq 0, svc 1, ctx (nil):0)
> Nov  3 12:06:00 pc rpc.gssd[21774]: in authgss_wrap()
> Nov  3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_buf: encode success (0x8abbeb8:483)
> Nov  3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_init_args: encode success (token 0x8abbeb8:483)
> Nov  3 12:06:00 pc rpc.gssd[21774]: in authgss_validate()
> Nov  3 12:06:00 pc rpc.gssd[21774]: in authgss_unwrap()
> Nov  3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_buf: decode success (0x8abbe98:4)
> Nov  3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_buf: decode success (0x8abc110:114)
> Nov  3 12:06:00 pc rpc.gssd[21774]: xdr_rpc_gss_init_res decode success (ctx 0x8abbe98:4, maj 0, min 0, win 128, token 0x8abc110:114)
> Nov  3 12:06:00 pc rpc.gssd[21774]: authgss_create_default: freeing name 0x8ab8d88
> Nov  3 12:06:00 pc rpc.gssd[21774]: in authgss_get_private_data()
> Nov  3 12:06:00 pc rpc.gssd[21774]: in authgss_free_private_data()
> Nov  3 12:06:00 pc rpc.gssd[21774]: in authgss_destroy()
> Nov  3 12:06:00 pc rpc.gssd[21774]: in authgss_destroy_context()
> Nov  3 12:06:00 pc rpc.gssd[21774]: authgss_destroy: freeing name 0x8ab8ed0

These are all from librpcsecgss (from the -rrr).  I don't see an error here.

You should have gotten output from gssd itself as well.  You might
also look for any errors on the server from rpc.svcgssd.

> The kdc logged:
>
> Nov  3 12:06:00 linux krb5kdc[5006]: AS_REQ (1 etypes {1}) 10.75.22.1: ISSUE: authtime 1225731960, etypes {rep=1 tkt=16 ses=1}, nfs/pc.interlinx.bc.ca@ILINX for krbtgt/ILINX@ILINX
> Nov  3 12:06:00 linux krb5kdc[5006]: TGS_REQ (1 etypes {1}) 10.75.22.1: ISSUE: authtime 1225731960, etypes {rep=1 tkt=1 ses=1}, nfs/pc.interlinx.bc.ca@ILINX for nfs/linux.interlinx.bc.ca@ILINX

This looks fine.  (Assuming linux.interlinx.bc.ca is the NFS server
and pc.interlinx.bc.ca is the client.)


> in correlation to the new mount request, but the mount.nfs4 didn't
> complete.
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <4d569c330811031158r26963e0w5bcf8331e0fb14b7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: stuck/hung nfsv4 mounts
       [not found]         ` <4d569c330811031158r26963e0w5bcf8331e0fb14b7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-11-03 20:09           ` Brian J. Murrell
  2008-11-03 20:28             ` Kevin Coffman
  0 siblings, 1 reply; 19+ messages in thread
From: Brian J. Murrell @ 2008-11-03 20:09 UTC (permalink / raw)
  To: linux-nfs

On Mon, 2008-11-03 at 14:58 -0500, Kevin Coffman wrote:
> 
> These are all from librpcsecgss (from the -rrr).  I don't see an error here.

Good.

> You should have gotten output from gssd itself as well.  You might
> also look for any errors on the server from rpc.svcgssd.

Nothing.

> This looks fine.  (Assuming linux.interlinx.bc.ca is the NFS server
> and pc.interlinx.bc.ca is the client.)

Indeed, they both are as you assume.

So I've restarted both the rpc.svcgssd and rpc.gssd with -rrr -vvv on
the server and client (respectively) to see what more I can see but I
can't get the upcall to trigger.  I would have thought simply issuing a
mount would do that, but it seems not.  What exactly causes the upcall
to be made and what might prevent it from being made?

b.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: stuck/hung nfsv4 mounts
  2008-11-03 20:09           ` Brian J. Murrell
@ 2008-11-03 20:28             ` Kevin Coffman
       [not found]               ` <4d569c330811031228r5bb9aefs7a970303910810e2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Kevin Coffman @ 2008-11-03 20:28 UTC (permalink / raw)
  To: Brian J. Murrell; +Cc: linux-nfs

On Mon, Nov 3, 2008 at 3:09 PM, Brian J. Murrell <brian-SquOHqY54CVWr29BmMi2cA@public.gmane.org> wrote:
> On Mon, 2008-11-03 at 14:58 -0500, Kevin Coffman wrote:
>>
>> These are all from librpcsecgss (from the -rrr).  I don't see an error here.
>
> Good.
>
>> You should have gotten output from gssd itself as well.  You might
>> also look for any errors on the server from rpc.svcgssd.
>
> Nothing.
>
>> This looks fine.  (Assuming linux.interlinx.bc.ca is the NFS server
>> and pc.interlinx.bc.ca is the client.)
>
> Indeed, they both are as you assume.
>
> So I've restarted both the rpc.svcgssd and rpc.gssd with -rrr -vvv on
> the server and client (respectively) to see what more I can see but I
> can't get the upcall to trigger.  I would have thought simply issuing a
> mount would do that, but it seems not.  What exactly causes the upcall
> to be made and what might prevent it from being made?

There should definitely be corresponding debug output from rpc.gssd
(-vvv) if you got those librpcsecgss (-rrr) messages.  I would suggest
that you leave off the -rrr for now as it appears to be extra clutter
at this point.  It might help if you run rpc.gssd in the foreground
(-f) while debugging.  Also, make sure this is the only rpc.gssd
running.

Re: the upcall.  Any new mount using Kerberos should cause the upcall
to create a new context with the server.  Make sure the filesystem is
not already mounted.  After the mount, any new user accessing that
mounted filesystem should cause an upcall to create a context for that
user.

K.C.

^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <4d569c330811031228r5bb9aefs7a970303910810e2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: stuck/hung nfsv4 mounts
       [not found]               ` <4d569c330811031228r5bb9aefs7a970303910810e2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-11-03 21:12                 ` Brian J. Murrell
  2008-11-03 22:33                   ` Kevin Coffman
  0 siblings, 1 reply; 19+ messages in thread
From: Brian J. Murrell @ 2008-11-03 21:12 UTC (permalink / raw)
  To: linux-nfs

On Mon, 2008-11-03 at 15:28 -0500, Kevin Coffman wrote:
> 
> There should definitely be corresponding debug output from rpc.gssd
> (-vvv) if you got those librpcsecgss (-rrr) messages.

Yes, but by my last post I had debug only on the client and not the
server but when I enabled on the server I couldn't elicit the client at
will.

> Re: the upcall.  Any new mount using Kerberos should cause the upcall
> to create a new context with the server.  Make sure the filesystem is
> not already mounted.

It was definitely not mounted the last time I tried to mount it and
produce the upcall output but it just never happened.

> After the mount, any new user accessing that
> mounted filesystem should cause an upcall to create a context for that
> user.

I seem to have since gotten an upcall and here is the corresponding
client and server debug:

Nov  3 15:17:33 pc rpc.gssd[2937]: in authgss_create_default()
Nov  3 15:17:33 pc rpc.gssd[2937]: in authgss_create()
Nov  3 15:17:33 pc rpc.gssd[2937]: authgss_create: name is 0x98fd358
Nov  3 15:17:33 pc rpc.gssd[2937]: authgss_create: gd->name is 0x98f9cc8
Nov  3 15:17:33 pc rpc.gssd[2937]: in authgss_refresh()
Nov  3 15:17:33 pc rpc.gssd[2937]: struct rpc_gss_sec: 
Nov  3 15:17:33 pc rpc.gssd[2937]:      mechanism_OID: { 1 2 134 72 134 247 18 1 2 2 } 
Nov  3 15:17:33 pc rpc.gssd[2937]:      qop: 0 
Nov  3 15:17:33 pc rpc.gssd[2937]:      service: 1 
Nov  3 15:17:33 pc rpc.gssd[2937]:      cred: 0x98fb568 
Nov  3 15:17:33 pc rpc.gssd[2937]:      req_flags: 00000002 
Nov  3 15:17:33 pc rpc.gssd[2937]: in authgss_marshal()
Nov  3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_buf: encode success ((nil):0)
Nov  3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_cred: encode success (v 1, proc 1, seq 0, svc 1, ctx (nil):0)
Nov  3 15:17:33 pc rpc.gssd[2937]: in authgss_wrap()
Nov  3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_buf: encode success (0x98fbc08:459)
Nov  3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_init_args: encode success (token 0x98fbc08:459)
Nov  3 15:17:33 pc rpc.gssd[2937]: in authgss_validate()
Nov  3 15:17:33 pc rpc.gssd[2937]: in authgss_unwrap()
Nov  3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_buf: decode success (0x98fd428:4)
Nov  3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_buf: decode success (0x990e428:114)
Nov  3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_init_res decode success (ctx 0x98fd428:4, maj 0, min 0, win 128, token 0x990e428:114)
Nov  3 15:17:33 pc rpc.gssd[2937]: authgss_create_default: freeing name 0x98fd358
Nov  3 15:17:33 pc rpc.gssd[2937]: in authgss_get_private_data()
Nov  3 15:17:33 pc rpc.gssd[2937]: in authgss_free_private_data()
Nov  3 15:17:33 pc rpc.gssd[2937]: in authgss_destroy()
Nov  3 15:17:33 pc rpc.gssd[2937]: in authgss_destroy_context()
Nov  3 15:17:33 pc rpc.gssd[2937]: authgss_destroy: freeing name 0x98f9cc8


Nov  3 15:17:33 linux krb5kdc[5006]: TGS_REQ (1 etypes {1}) 10.75.22.1: ISSUE: authtime 1225740577, etypes {rep=1 tkt=1 ses=1}, brian@ILINX for nfs/linux.interlinx.bc.ca@ILINX
Nov  3 15:17:33 linux rpc.svcgssd[13724]: leaving poll
Nov  3 15:17:33 linux rpc.svcgssd[13724]: handling null request
Nov  3 15:17:33 linux rpc.svcgssd[13724]: sname = brian@ILINX
Nov  3 15:17:33 linux rpc.svcgssd[13724]: DEBUG: serialize_krb5_ctx: lucid version!
Nov  3 15:17:33 linux rpc.svcgssd[13724]: prepare_krb5_rfc1964_buffer: serializing keys with enctype 4 and length 8
Nov  3 15:17:33 linux rpc.svcgssd[13724]: doing downcall
Nov  3 15:17:33 linux rpc.svcgssd[13724]: mech: krb5, hndl len: 4, ctx len 85, timeout: 2147483647, uid: 1001, gid: 1001, num aux grps: 13:
Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   1) 1001
Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   2) 117
Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   3) 1010
Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   4) 1011
Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   5) 2000
Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   6) 29
Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   7) 20
Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   8) 24
Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   9) 25
Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (  10) 30
Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (  11) 44
Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (  12) 46
Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (  13) 111
Nov  3 15:17:33 linux rpc.svcgssd[13724]: sending null reply
Nov  3 15:17:33 linux rpc.svcgssd[13724]: writing message: ...
Nov  3 15:17:33 linux rpc.svcgssd[13724]: finished handling null request
Nov  3 15:17:33 linux rpc.svcgssd[13724]: entering poll

And I still have an uncompleted mount.nfs4 command in my process table.

b.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: stuck/hung nfsv4 mounts
  2008-11-03 21:12                 ` Brian J. Murrell
@ 2008-11-03 22:33                   ` Kevin Coffman
       [not found]                     ` <4d569c330811031433k7ae18d4enfbda349e8f90a951-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Kevin Coffman @ 2008-11-03 22:33 UTC (permalink / raw)
  To: Brian J. Murrell; +Cc: linux-nfs

On Mon, Nov 3, 2008 at 4:12 PM, Brian J. Murrell <brian-SquOHqY54CVWr29BmMi2cA@public.gmane.org> wrote:
> On Mon, 2008-11-03 at 15:28 -0500, Kevin Coffman wrote:
>>
>> There should definitely be corresponding debug output from rpc.gssd
>> (-vvv) if you got those librpcsecgss (-rrr) messages.
>
> Yes, but by my last post I had debug only on the client and not the
> server but when I enabled on the server I couldn't elicit the client at
> will.
>
>> Re: the upcall.  Any new mount using Kerberos should cause the upcall
>> to create a new context with the server.  Make sure the filesystem is
>> not already mounted.
>
> It was definitely not mounted the last time I tried to mount it and
> produce the upcall output but it just never happened.
>
>> After the mount, any new user accessing that
>> mounted filesystem should cause an upcall to create a context for that
>> user.
>
> I seem to have since gotten an upcall and here is the corresponding
> client and server debug:
>
> Nov  3 15:17:33 pc rpc.gssd[2937]: in authgss_create_default()
> Nov  3 15:17:33 pc rpc.gssd[2937]: in authgss_create()
> Nov  3 15:17:33 pc rpc.gssd[2937]: authgss_create: name is 0x98fd358
> Nov  3 15:17:33 pc rpc.gssd[2937]: authgss_create: gd->name is 0x98f9cc8
> Nov  3 15:17:33 pc rpc.gssd[2937]: in authgss_refresh()
> Nov  3 15:17:33 pc rpc.gssd[2937]: struct rpc_gss_sec:
> Nov  3 15:17:33 pc rpc.gssd[2937]:      mechanism_OID: { 1 2 134 72 134 247 18 1 2 2 }
> Nov  3 15:17:33 pc rpc.gssd[2937]:      qop: 0
> Nov  3 15:17:33 pc rpc.gssd[2937]:      service: 1
> Nov  3 15:17:33 pc rpc.gssd[2937]:      cred: 0x98fb568
> Nov  3 15:17:33 pc rpc.gssd[2937]:      req_flags: 00000002
> Nov  3 15:17:33 pc rpc.gssd[2937]: in authgss_marshal()
> Nov  3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_buf: encode success ((nil):0)
> Nov  3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_cred: encode success (v 1, proc 1, seq 0, svc 1, ctx (nil):0)
> Nov  3 15:17:33 pc rpc.gssd[2937]: in authgss_wrap()
> Nov  3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_buf: encode success (0x98fbc08:459)
> Nov  3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_init_args: encode success (token 0x98fbc08:459)
> Nov  3 15:17:33 pc rpc.gssd[2937]: in authgss_validate()
> Nov  3 15:17:33 pc rpc.gssd[2937]: in authgss_unwrap()
> Nov  3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_buf: decode success (0x98fd428:4)
> Nov  3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_buf: decode success (0x990e428:114)
> Nov  3 15:17:33 pc rpc.gssd[2937]: xdr_rpc_gss_init_res decode success (ctx 0x98fd428:4, maj 0, min 0, win 128, token 0x990e428:114)
> Nov  3 15:17:33 pc rpc.gssd[2937]: authgss_create_default: freeing name 0x98fd358
> Nov  3 15:17:33 pc rpc.gssd[2937]: in authgss_get_private_data()
> Nov  3 15:17:33 pc rpc.gssd[2937]: in authgss_free_private_data()
> Nov  3 15:17:33 pc rpc.gssd[2937]: in authgss_destroy()
> Nov  3 15:17:33 pc rpc.gssd[2937]: in authgss_destroy_context()
> Nov  3 15:17:33 pc rpc.gssd[2937]: authgss_destroy: freeing name 0x98f9cc8

Once again, this is all from librpcsecgss.  No messages originating
from rpc.gssd itself here.

> Nov  3 15:17:33 linux krb5kdc[5006]: TGS_REQ (1 etypes {1}) 10.75.22.1: ISSUE: authtime 1225740577, etypes {rep=1 tkt=1 ses=1}, brian@ILINX for nfs/linux.interlinx.bc.ca@ILINX
> Nov  3 15:17:33 linux rpc.svcgssd[13724]: leaving poll
> Nov  3 15:17:33 linux rpc.svcgssd[13724]: handling null request
> Nov  3 15:17:33 linux rpc.svcgssd[13724]: sname = brian@ILINX
> Nov  3 15:17:33 linux rpc.svcgssd[13724]: DEBUG: serialize_krb5_ctx: lucid version!
> Nov  3 15:17:33 linux rpc.svcgssd[13724]: prepare_krb5_rfc1964_buffer: serializing keys with enctype 4 and length 8
> Nov  3 15:17:33 linux rpc.svcgssd[13724]: doing downcall
> Nov  3 15:17:33 linux rpc.svcgssd[13724]: mech: krb5, hndl len: 4, ctx len 85, timeout: 2147483647, uid: 1001, gid: 1001, num aux grps: 13:
> Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   1) 1001
> Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   2) 117
> Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   3) 1010
> Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   4) 1011
> Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   5) 2000
> Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   6) 29
> Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   7) 20
> Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   8) 24
> Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   9) 25
> Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (  10) 30
> Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (  11) 44
> Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (  12) 46
> Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (  13) 111
> Nov  3 15:17:33 linux rpc.svcgssd[13724]: sending null reply
> Nov  3 15:17:33 linux rpc.svcgssd[13724]: writing message: ...
> Nov  3 15:17:33 linux rpc.svcgssd[13724]: finished handling null request
> Nov  3 15:17:33 linux rpc.svcgssd[13724]: entering poll

This looks like the server (the svcgssd/user-land part anyway) was
happy, and replied with a context.  There were a couple of reports in
the past where the server returned error 524288 (GSS_S_NO_CONTEXT)
after getting through the user-land part successfully.  If you can get
a packet trace between the client and server during the mount /
context negotiation, check the NULL reply from the server for this
error code.

K.C.

^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <4d569c330811031433k7ae18d4enfbda349e8f90a951-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: stuck/hung nfsv4 mounts
       [not found]                     ` <4d569c330811031433k7ae18d4enfbda349e8f90a951-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2008-11-03 23:45                       ` Brian J. Murrell
  2008-11-04  1:24                       ` Brian J. Murrell
  1 sibling, 0 replies; 19+ messages in thread
From: Brian J. Murrell @ 2008-11-03 23:45 UTC (permalink / raw)
  To: linux-nfs

On Mon, 2008-11-03 at 17:33 -0500, Kevin Coffman wrote:
> 
> Once again, this is all from librpcsecgss.  No messages originating
> from rpc.gssd itself here.

Not sure why.  FWIW, 
> 
> > Nov  3 15:17:33 linux krb5kdc[5006]: TGS_REQ (1 etypes {1}) 10.75.22.1: ISSUE: authtime 1225740577, etypes {rep=1 tkt=1 ses=1}, brian@ILINX for nfs/linux.interlinx.bc.ca@ILINX
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]: leaving poll
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]: handling null request
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]: sname = brian@ILINX
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]: DEBUG: serialize_krb5_ctx: lucid version!
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]: prepare_krb5_rfc1964_buffer: serializing keys with enctype 4 and length 8
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]: doing downcall
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]: mech: krb5, hndl len: 4, ctx len 85, timeout: 2147483647, uid: 1001, gid: 1001, num aux grps: 13:
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   1) 1001
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   2) 117
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   3) 1010
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   4) 1011
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   5) 2000
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   6) 29
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   7) 20
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   8) 24
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (   9) 25
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (  10) 30
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (  11) 44
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (  12) 46
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]:   (  13) 111
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]: sending null reply
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]: writing message: ...
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]: finished handling null request
> > Nov  3 15:17:33 linux rpc.svcgssd[13724]: entering poll
> 
> If you can get
> a packet trace between the client and server during the mount /
> context negotiation, check the NULL reply from the server for this
> error code.

Well, this is just it.

# mount -t nfs4 -s -o sec=krb5i,rw,intr,rsize=8192,wsize=8192 linux:/home/share /mnt/test

on the client does not result in any NFS4 traffic between the client and
the server.  In fact if I ^C the above mount and issue it again, I don't
get another gss/svcgssd upcall.  Is there some kind of caching going on?
How can I invalidate the cache?  Probably it's moot though given that
this sequence of events is not resulting in any NFS4 traffic anyway.

This was all working before last night, then all of a sudden, it
stopped.  Nothing remarkable happened at the time that
could/should/would have caused this silliness.  ~sigh~

b.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: stuck/hung nfsv4 mounts
       [not found]                     ` <4d569c330811031433k7ae18d4enfbda349e8f90a951-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2008-11-03 23:45                       ` Brian J. Murrell
@ 2008-11-04  1:24                       ` Brian J. Murrell
  2008-11-04 15:14                         ` Brian J. Murrell
  2008-11-04 17:22                         ` Kevin Coffman
  1 sibling, 2 replies; 19+ messages in thread
From: Brian J. Murrell @ 2008-11-04  1:24 UTC (permalink / raw)
  To: linux-nfs

On Mon, 2008-11-03 at 17:33 -0500, Kevin Coffman wrote:
> If you can get
> a packet trace between the client and server during the mount /
> context negotiation, check the NULL reply from the server for this
> error code.

Here is the NULL Call from client:
No.     Time            Time Delta  Source                Destination           Protocol Info
   8868 20:07:43.933888 0.043671    10.75.22.1            10.75.22.3            NFS      V4 NULL Call (Reply In 8870)

Frame 8868 (606 bytes on wire, 606 bytes captured)
Ethernet II, Src: AsustekC_e1:65:cc (00:18:f3:e1:65:cc), Dst: Intel_3f:8b:74 (00:d0:b7:3f:8b:74)
Internet Protocol, Src: 10.75.22.1 (10.75.22.1), Dst: 10.75.22.3 (10.75.22.3)
Transmission Control Protocol, Src Port: imaps (993), Dst Port: nfs (2049), Seq: 1, Ack: 1, Len: 552
Remote Procedure Call, Type:Call XID:0x48023eeb
Network File System
    [Program Version: 4]
    [V4 Procedure: NULL (0)]
    GSS Token: ...
        GSS Token Length: 483
        GSS-API Generic Security Service Application Program Interface
            OID: 1.2.840.113554.1.2.2 (KRB5 - Kerberos 5)
            krb5_blob: ...
                krb5_tok_id: KRB5_AP_REQ (0x0001)
                Kerberos AP-REQ
                    Pvno: 5
                    MSG Type: AP-REQ (14)
                    Padding: 0
                    APOptions: 20000000 (Mutual required)
                        .0.. .... .... .... .... .... .... .... = Use Session Key: Do NOT use the session key to encrypt the ticket
                        ..1. .... .... .... .... .... .... .... = Mutual required: MUTUAL authentication is REQUIRED
                    Ticket
                        Tkt-vno: 5
                        Realm: ILINX
                        Server Name (Service and Host): nfs/linux.interlinx.bc.ca
                        enc-part des-cbc-crc
                    Authenticator des-cbc-crc

Here is the NULL reply from the server:

No.     Time            Time Delta  Source                Destination           Protocol Info
   8870 20:07:44.062206 0.128081    10.75.22.3            10.75.22.1            NFS      V4 NULL Reply (Call In 8868)

Frame 8870 (262 bytes on wire, 262 bytes captured)
Ethernet II, Src: Intel_3f:8b:74 (00:d0:b7:3f:8b:74), Dst: AsustekC_e1:65:cc (00:18:f3:e1:65:cc)
Internet Protocol, Src: 10.75.22.3 (10.75.22.3), Dst: 10.75.22.1 (10.75.22.1)
Transmission Control Protocol, Src Port: nfs (2049), Dst Port: imaps (993), Seq: 1, Ack: 553, Len: 208
Remote Procedure Call, Type:Reply XID:0x48023eeb
Network File System
    [Program Version: 4]
    [V4 Procedure: NULL (0)]
    GSS Context: <DATA>
    GSS Major Status: 0
    GSS Minor Status: 0
    GSS Sequence Window: 128
    GSS Token: ...
        GSS Token Length: 114
        GSS-API Generic Security Service Application Program Interface
            OID: 1.2.840.113554.1.2.2 (KRB5 - Kerberos 5)
            krb5_blob: ...
                krb5_tok_id: KRB5_AP_REP (0x0002)
                Kerberos AP-REP
                    Pvno: 5
                    MSG Type: AP-REP (15)
                    enc-part des-cbc-crc
                        Encryption type: des-cbc-crc (1)
                        enc-part: ...

b.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: stuck/hung nfsv4 mounts
  2008-11-04  1:24                       ` Brian J. Murrell
@ 2008-11-04 15:14                         ` Brian J. Murrell
  2008-11-04 17:22                         ` Kevin Coffman
  1 sibling, 0 replies; 19+ messages in thread
From: Brian J. Murrell @ 2008-11-04 15:14 UTC (permalink / raw)
  To: linux-nfs

Hi All,

Is there any more debugging I can/you want me to do here or can/shall I
just reboot this machine?

Thanx,
b.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: stuck/hung nfsv4 mounts
  2008-11-04  1:24                       ` Brian J. Murrell
  2008-11-04 15:14                         ` Brian J. Murrell
@ 2008-11-04 17:22                         ` Kevin Coffman
  1 sibling, 0 replies; 19+ messages in thread
From: Kevin Coffman @ 2008-11-04 17:22 UTC (permalink / raw)
  To: Brian J. Murrell; +Cc: linux-nfs

On Mon, Nov 3, 2008 at 8:24 PM, Brian J. Murrell <brian-SquOHqY54CVWr29BmMi2cA@public.gmane.org> wrote:
> On Mon, 2008-11-03 at 17:33 -0500, Kevin Coffman wrote:
>> If you can get
>> a packet trace between the client and server during the mount /
>> context negotiation, check the NULL reply from the server for this
>> error code.
>
> Here is the NULL Call from client:
> No.     Time            Time Delta  Source                Destination           Protocol Info
>   8868 20:07:43.933888 0.043671    10.75.22.1            10.75.22.3            NFS      V4 NULL Call (Reply In 8870)
>
> Frame 8868 (606 bytes on wire, 606 bytes captured)
> Ethernet II, Src: AsustekC_e1:65:cc (00:18:f3:e1:65:cc), Dst: Intel_3f:8b:74 (00:d0:b7:3f:8b:74)
> Internet Protocol, Src: 10.75.22.1 (10.75.22.1), Dst: 10.75.22.3 (10.75.22.3)
> Transmission Control Protocol, Src Port: imaps (993), Dst Port: nfs (2049), Seq: 1, Ack: 1, Len: 552
> Remote Procedure Call, Type:Call XID:0x48023eeb
> Network File System
>    [Program Version: 4]
>    [V4 Procedure: NULL (0)]
>    GSS Token: ...
>        GSS Token Length: 483
>        GSS-API Generic Security Service Application Program Interface
>            OID: 1.2.840.113554.1.2.2 (KRB5 - Kerberos 5)
>            krb5_blob: ...
>                krb5_tok_id: KRB5_AP_REQ (0x0001)
>                Kerberos AP-REQ
>                    Pvno: 5
>                    MSG Type: AP-REQ (14)
>                    Padding: 0
>                    APOptions: 20000000 (Mutual required)
>                        .0.. .... .... .... .... .... .... .... = Use Session Key: Do NOT use the session key to encrypt the ticket
>                        ..1. .... .... .... .... .... .... .... = Mutual required: MUTUAL authentication is REQUIRED
>                    Ticket
>                        Tkt-vno: 5
>                        Realm: ILINX
>                        Server Name (Service and Host): nfs/linux.interlinx.bc.ca
>                        enc-part des-cbc-crc
>                    Authenticator des-cbc-crc
>
> Here is the NULL reply from the server:
>
> No.     Time            Time Delta  Source                Destination           Protocol Info
>   8870 20:07:44.062206 0.128081    10.75.22.3            10.75.22.1            NFS      V4 NULL Reply (Call In 8868)
>
> Frame 8870 (262 bytes on wire, 262 bytes captured)
> Ethernet II, Src: Intel_3f:8b:74 (00:d0:b7:3f:8b:74), Dst: AsustekC_e1:65:cc (00:18:f3:e1:65:cc)
> Internet Protocol, Src: 10.75.22.3 (10.75.22.3), Dst: 10.75.22.1 (10.75.22.1)
> Transmission Control Protocol, Src Port: nfs (2049), Dst Port: imaps (993), Seq: 1, Ack: 553, Len: 208
> Remote Procedure Call, Type:Reply XID:0x48023eeb
> Network File System
>    [Program Version: 4]
>    [V4 Procedure: NULL (0)]
>    GSS Context: <DATA>
>    GSS Major Status: 0
>    GSS Minor Status: 0
>    GSS Sequence Window: 128
>    GSS Token: ...
>        GSS Token Length: 114
>        GSS-API Generic Security Service Application Program Interface
>            OID: 1.2.840.113554.1.2.2 (KRB5 - Kerberos 5)
>            krb5_blob: ...
>                krb5_tok_id: KRB5_AP_REP (0x0002)
>                Kerberos AP-REP
>                    Pvno: 5
>                    MSG Type: AP-REP (15)
>                    enc-part des-cbc-crc
>                        Encryption type: des-cbc-crc (1)
>                        enc-part: ...
>
> b.

>From this output, it looks like the response back from the server was
a success.  So it appears to be some other issue.

Perhaps, back to the processes stuck in the kernel?  Unless Trond has
suggestions, I would reboot the client and see if it happens again
(w/o soft mounts).

K.C.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2008-11-04 17:22 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-03 15:05 stuck/hung nfsv4 mounts Brian J. Murrell
2008-11-03 16:59 ` Trond Myklebust
     [not found]   ` <1225731544.6958.6.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-11-03 17:25     ` Jim Rees
2008-11-03 17:37       ` Trond Myklebust
     [not found]         ` <1225733834.6958.12.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-11-03 21:40           ` Chuck Lever
2008-11-03 22:20             ` Peter Staubach
2008-11-03 22:47               ` Chuck Lever
2008-11-04 16:03                 ` Peter Staubach
2008-11-03 17:38       ` Benny Halevy
2008-11-03 17:50     ` Brian J. Murrell
2008-11-03 19:58       ` Kevin Coffman
     [not found]         ` <4d569c330811031158r26963e0w5bcf8331e0fb14b7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-11-03 20:09           ` Brian J. Murrell
2008-11-03 20:28             ` Kevin Coffman
     [not found]               ` <4d569c330811031228r5bb9aefs7a970303910810e2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-11-03 21:12                 ` Brian J. Murrell
2008-11-03 22:33                   ` Kevin Coffman
     [not found]                     ` <4d569c330811031433k7ae18d4enfbda349e8f90a951-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-11-03 23:45                       ` Brian J. Murrell
2008-11-04  1:24                       ` Brian J. Murrell
2008-11-04 15:14                         ` Brian J. Murrell
2008-11-04 17:22                         ` Kevin Coffman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.