linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ben Greear <greearb@candelatech.com>
To: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: 'umount -f /mnt/foo' fails if server IP is gone.
Date: Thu, 17 Oct 2013 10:35:41 -0700	[thread overview]
Message-ID: <52601FED.6070708@candelatech.com> (raw)
In-Reply-To: <525D899F.5010604@candelatech.com>

On 10/15/2013 11:29 AM, Ben Greear wrote:
> Is 'umount -f' supposed to always work, even if the file server
> goes away?
>
> I have a user's system that just hangs forever in this case.
>
> Could be local changes we have made, but I'm curious about
> the expected behaviour before I go digging too deep...

Any input on this?  I don't mind trying to fix it, but I
would like to know how it is supposed to work.

Older kernels do not hang (we tried 3.0.x), but I'm not sure
exactly where the problem started.

Test case was to set up NFSv3 mount, then pull the Ethernet cable
on the nfs client machine.  This system is running 3.9.11+ kernel.

 From /proc/mounts:

10.2.46.90:/nfs_export on /mnt/lf/nfs3-001 type nfs 
(rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.2.46.90,mountvers=3,mountport=19408,mountproto=udp,srcaddr=10.2.46.91,local_lock=none,addr=10.2.46.90)

# umount /mnt/lf/nfs3-001
^C
# umount -f /mnt/lf/nfs3-001
[hangs forever it seems, certainly for a long time]


Here is a stack trace of hung processes, for instance:

Oct 17 10:24:18 localhost kernel: [688601.930366] SysRq : Show Blocked State
Oct 17 10:24:18 localhost kernel: [688601.931016]   task                PC stack   pid father
Oct 17 10:24:18 localhost kernel: [688601.931016] mkdir           D f1bf6700     0 16898  16831 0x00000082
Oct 17 10:24:18 localhost kernel: [688601.931016]  f070bd8c 00000046 00000282 f1bf6700 f5b55a20 c0d7e400 f5b55a20 c0d7e400
Oct 17 10:24:18 localhost kernel: [688601.931016]  c0d7e400 c0d7e400 c0d7e400 f79e9400 f5b55a20 f79e9400 f5b55a20 f58b19c0
Oct 17 10:24:18 localhost kernel: [688601.931016]  f8dc4fd0 f070bd50 f0ce9924 f070bd50 f8ec6bff f070bd94 f8dbf9f7 ee91a138
Oct 17 10:24:18 localhost kernel: [688601.931016] Call Trace:
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<f8ec6bff>] ? rpc_put_task+0xf/0x20 [sunrpc]
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<f8dbf9f7>] ? nfs_initiate_write+0xb7/0xe0 [nfs]
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c04a9f0e>] ? ktime_get_ts+0x3e/0x110
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c09cb133>] schedule+0x23/0x60
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c09cb1e6>] io_schedule+0x76/0xc0
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c051607d>] sleep_on_page+0xd/0x20
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c09c8d4d>] __wait_on_bit+0x4d/0x70
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c0516070>] ? __lock_page+0x90/0x90
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c0516301>] wait_on_page_bit+0x91/0xa0
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c0478710>] ? wake_atomic_t_function+0x50/0x50
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c05164cb>] filemap_fdatawait_range+0xcb/0x150
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c05166c7>] filemap_write_and_wait_range+0x97/0xb0
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<f8db4074>] nfs_file_fsync+0x44/0xa0 [nfs]
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<f8db4030>] ? nfs_file_fsync_commit+0xb0/0xb0 [nfs]
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c058e1f9>] vfs_fsync_range+0x59/0x70
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c058e237>] vfs_fsync+0x27/0x30
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<f8db4b0b>] nfs_file_flush+0x6b/0x90 [nfs]
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c05631a1>] filp_close+0x31/0x80
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c057ea55>] put_files_struct+0x85/0xe0
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c057eaf7>] exit_files+0x47/0x60
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c045b83c>] do_exit+0x25c/0x980
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c056a0be>] ? SyS_stat64+0x2e/0x40
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c045bf9e>] do_group_exit+0x3e/0xa0
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c045c018>] SyS_exit_group+0x18/0x20
Oct 17 10:24:18 localhost kernel: [688601.931016]  [<c09d370d>] sysenter_do_call+0x12/0x28
Oct 17 10:24:18 localhost kernel: [688601.931016] umount.nfs      D f11c4900     0 17150  17149 0x00000080
Oct 17 10:24:18 localhost kernel: [688602.225057]  f3955d00 00000082 efea0d8c f11c4900 f3955c8c c08d9f96 f104e700 c0d7e400
Oct 17 10:24:18 localhost kernel: [688602.225057]  c0d7e400 c0d7e400 c0d7e400 efea0d8c efea0c80 f79db400 f104e700 c0c3e980
Oct 17 10:24:18 localhost kernel: [688602.225057]  f3955cd0 f3955cb4 f3955e90 0000002c 0000005c 132df575 efea0d80 0000005c
Oct 17 10:24:18 localhost kernel: [688602.225057] Call Trace:
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c08d9f96>] ? __kfree_skb+0x36/0x90
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c09cb133>] schedule+0x23/0x60
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8ec6edd>] rpc_wait_bit_killable+0x2d/0x70 [sunrpc]
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c09c8d4d>] __wait_on_bit+0x4d/0x70
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8ec6eb0>] ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8ec6eb0>] ? __rpc_wait_for_completion_task+0x30/0x30 [sunrpc]
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c09c8e1b>] out_of_line_wait_on_bit+0xab/0xc0
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c0478710>] ? wake_atomic_t_function+0x50/0x50
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8ec7f9e>] __rpc_execute+0x11e/0x290 [sunrpc]
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8ebf130>] ? rpcproc_decode_null+0x10/0x10 [sunrpc]
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8ebf130>] ? rpcproc_decode_null+0x10/0x10 [sunrpc]
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c047865f>] ? wake_up_bit+0x5f/0x70
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8ec814c>] rpc_execute+0x3c/0xa0 [sunrpc]
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8ec0f09>] rpc_run_task+0x59/0x70 [sunrpc]
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8ec1022>] rpc_call_sync+0x42/0xa0 [sunrpc]
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8e0b46c>] nfs3_rpc_wrapper.clone.0+0x5c/0xa0 [nfsv3]
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8e0c0d4>] nfs3_proc_getattr+0x34/0x40 [nfsv3]
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8db7397>] __nfs_revalidate_inode+0xc7/0x140 [nfs]
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8db743f>] nfs_revalidate_inode+0x2f/0x60 [nfs]
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<f8db14a8>] nfs_weak_revalidate+0x38/0x50 [nfs]
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c056fba8>] complete_walk+0xa8/0xf0
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c0571e53>] path_lookupat+0x63/0x690
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c05724ae>] filename_lookup+0x2e/0xc0
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c05733a3>] user_path_at_empty+0x43/0x80
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c0578b9e>] ? __d_free+0x2e/0x50
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c064450c>] ? security_capable+0x1c/0x30
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c05733ff>] user_path_at+0x1f/0x30
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c05807c3>] SyS_umount+0x83/0x380
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c04d2606>] ? __audit_syscall_exit+0x1f6/0x290
Oct 17 10:24:18 localhost kernel: [688602.225057]  [<c09d370d>] sysenter_do_call+0x12/0x28

....

Oct 17 10:24:42 localhost kernel: [688631.186190] INFO: task mkdir:16898 blocked for more than 180 seconds.
Oct 17 10:24:42 localhost kernel: [688631.195666] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 17 10:24:42 localhost kernel: [688631.206304] mkdir           D f1bf6700     0 16898  16831 0x00000082
Oct 17 10:24:42 localhost kernel: [688631.215220]  f070bd8c 00000046 00000282 f1bf6700 f5b55a20 c0d7e400 f5b55a20 c0d7e400
Oct 17 10:24:42 localhost kernel: [688631.225933]  c0d7e400 c0d7e400 c0d7e400 f79e9400 f5b55a20 f79e9400 f5b55a20 f58b19c0
Oct 17 10:24:42 localhost kernel: [688631.236712]  f8dc4fd0 f070bd50 f0ce9924 f070bd50 f8ec6bff f070bd94 f8dbf9f7 ee91a138
Oct 17 10:24:42 localhost kernel: [688631.247550] Call Trace:
Oct 17 10:24:42 localhost kernel: [688631.252746]  [<f8ec6bff>] ? rpc_put_task+0xf/0x20 [sunrpc]
Oct 17 10:24:42 localhost kernel: [688631.261369]  [<f8dbf9f7>] ? nfs_initiate_write+0xb7/0xe0 [nfs]
Oct 17 10:24:42 localhost kernel: [688631.270065]  [<c04a9f0e>] ? ktime_get_ts+0x3e/0x110
Oct 17 10:24:42 localhost kernel: [688631.277724]  [<c09cb133>] schedule+0x23/0x60
Oct 17 10:24:42 localhost kernel: [688631.285298]  [<c09cb1e6>] io_schedule+0x76/0xc0
Oct 17 10:24:42 localhost kernel: [688631.292738]  [<c051607d>] sleep_on_page+0xd/0x20
Oct 17 10:24:42 localhost kernel: [688631.300316]  [<c09c8d4d>] __wait_on_bit+0x4d/0x70
Oct 17 10:24:42 localhost kernel: [688631.308117]  [<c0516070>] ? __lock_page+0x90/0x90
Oct 17 10:24:42 localhost kernel: [688631.315731]  [<c0516301>] wait_on_page_bit+0x91/0xa0
Oct 17 10:24:42 localhost kernel: [688631.323630]  [<c0478710>] ? wake_atomic_t_function+0x50/0x50
Oct 17 10:24:42 localhost kernel: [688631.332536]  [<c05164cb>] filemap_fdatawait_range+0xcb/0x150
Oct 17 10:24:42 localhost kernel: [688631.341221]  [<c05166c7>] filemap_write_and_wait_range+0x97/0xb0
Oct 17 10:24:42 localhost kernel: [688631.350224]  [<f8db4074>] nfs_file_fsync+0x44/0xa0 [nfs]
Oct 17 10:24:42 localhost kernel: [688631.358569]  [<f8db4030>] ? nfs_file_fsync_commit+0xb0/0xb0 [nfs]
Oct 17 10:24:42 localhost kernel: [688631.367764]  [<c058e1f9>] vfs_fsync_range+0x59/0x70
Oct 17 10:24:42 localhost kernel: [688631.375818]  [<c058e237>] vfs_fsync+0x27/0x30
Oct 17 10:24:42 localhost kernel: [688631.383346]  [<f8db4b0b>] nfs_file_flush+0x6b/0x90 [nfs]
Oct 17 10:24:42 localhost kernel: [688631.392117]  [<c05631a1>] filp_close+0x31/0x80
Oct 17 10:24:42 localhost kernel: [688631.399741]  [<c057ea55>] put_files_struct+0x85/0xe0
Oct 17 10:24:42 localhost kernel: [688631.407871]  [<c057eaf7>] exit_files+0x47/0x60
Oct 17 10:24:42 localhost kernel: [688631.415535]  [<c045b83c>] do_exit+0x25c/0x980
Oct 17 10:24:42 localhost kernel: [688631.423133]  [<c056a0be>] ? SyS_stat64+0x2e/0x40
Oct 17 10:24:42 localhost kernel: [688631.431078]  [<c045bf9e>] do_group_exit+0x3e/0xa0
Oct 17 10:24:42 localhost kernel: [688631.439103]  [<c045c018>] SyS_exit_group+0x18/0x20
Oct 17 10:24:42 localhost kernel: [688631.447169]  [<c09d370d>] sysenter_do_call+0x12/0x28
Oct 17 10:24:54 localhost kernel: [688643.517069] RPC: AUTH_GSS upcall timed out.


Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


  reply	other threads:[~2013-10-17 17:35 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-15 18:29 'umount -f /mnt/foo' fails if server IP is gone Ben Greear
2013-10-17 17:35 ` Ben Greear [this message]
2013-10-17 18:03   ` Chuck Lever
2013-10-17 18:08     ` Ben Greear
2013-10-17 18:16     ` Jeff Layton
2013-10-17 18:05   ` Myklebust, Trond
2013-10-17 18:11     ` Ben Greear
2013-10-17 18:23       ` Christopher T Vogan
2013-10-17 18:32       ` Myklebust, Trond
2013-10-17 18:35         ` Ben Greear
2013-10-17 18:42           ` Myklebust, Trond
2013-10-17 19:34             ` Ben Greear
2013-10-17 19:36               ` Ben Greear

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52601FED.6070708@candelatech.com \
    --to=greearb@candelatech.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).