public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Alan Post <adp@prgmr.com>
To: linux-nfs <linux-nfs@vger.kernel.org>
Subject: User process NFS write hang in wait_on_commit with kworker
Date: Mon, 17 Jun 2019 18:06:13 -0600	[thread overview]
Message-ID: <20190618000613.GR4158@turtle.email> (raw)

On May 20th I reported "User process NFS write hang followed
by automount hang requiring reboot" to this list.  There I
had a process that would hang on NFS write, followed by sync
hanging, eventually leading to my need to reboot the host.

On June 4th, after upgrading to Linux 4.19.44, I reported
the issue resolved.  Since that time, as I've deployed out
Linux 4.19.44, the issue has come back--sort of.

I have begun once again getting sync hangs following a
hung NFS write.  The hung write has a different stack trace
than any I previously reported:

    [<0>] wait_on_commit+0x60/0x90 [nfs]
    [<0>] __nfs_commit_inode+0x146/0x1a0 [nfs]
    [<0>] nfs_file_fsync+0xa7/0x1d0 [nfs]
    [<0>] filp_close+0x25/0x70
    [<0>] put_files_struct+0x66/0xb0
    [<0>] do_exit+0x2af/0xbb0
    [<0>] do_group_exit+0x35/0xa0
    [<0>] __x64_sys_exit_group+0xf/0x10
    [<0>] do_syscall_64+0x45/0x100
    [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [<0>] 0xffffffffffffffff

And there is attendant kworker thread:

    [<0>] wait_on_commit+0x60/0x90 [nfs]
    [<0>] __nfs_commit_inode+0x146/0x1a0 [nfs]
    [<0>] nfs_write_inode+0x5c/0x90 [nfs]
    [<0>] nfs4_write_inode+0xd/0x30 [nfsv4]
    [<0>] __writeback_single_inode+0x27a/0x320
    [<0>] writeback_sb_inodes+0x19a/0x460
    [<0>] wb_writeback+0x102/0x2f0
    [<0>] wb_workfn+0xa3/0x400
    [<0>] process_one_work+0x1e3/0x3d0
    [<0>] worker_thread+0x28/0x3c0
    [<0>] kthread+0x10e/0x130
    [<0>] ret_from_fork+0x35/0x40
    [<0>] 0xffffffffffffffff

Oddly enough, I can clear the problem without rebooting the host.
I arrange to block all traffic between the NFS server and NFS
client using iptables, of sufficient time for any open TCP
connections to timeout.  After which the connection apparently
reestablishes and unblocks the hung process.

I can't explain what's keeping the connection alive but apparently
stalled--requiring my manual intervention.  Do any of you have
ideas or speculation?  I'm happy to poke around in a packet capture
if the information provided isn't sufficient.

-A
-- 
Alan Post | Xen VPS hosting for the technically adept
PO Box 61688 | Sunnyvale, CA 94088-1681 | https://prgmr.com/
email: adp@prgmr.com

             reply	other threads:[~2019-06-18  0:04 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-18  0:06 Alan Post [this message]
2019-06-18 15:29 ` User process NFS write hang in wait_on_commit with kworker Benjamin Coddington
2019-06-19  0:07   ` Alan Post
2019-06-19 12:38     ` Benjamin Coddington
2019-06-21 20:47       ` Alan Post
2019-06-28 18:33         ` Alan Post
2019-07-02  9:55           ` Benjamin Coddington
2019-07-03 21:32             ` Alan Post
2019-07-05 23:53               ` Tom Talpey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190618000613.GR4158@turtle.email \
    --to=adp@prgmr.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox