From: "Tuomas Räsänen" <tuomasjjrasanen@opinsys.fi>
To: Jeff Layton <jlayton@poochiereds.net>
Cc: Veli-Matti Lintu <veli-matti.lintu@opinsys.fi>,
linux-nfs@vger.kernel.org
Subject: Re: Soft lockups on kerberised NFSv4.0 clients
Date: Wed, 3 Sep 2014 07:01:32 +0000 (UTC) [thread overview]
Message-ID: <927052272.63158.1409727692986.JavaMail.zimbra@opinsys.fi> (raw)
In-Reply-To: <1726881404.72983.1402308693418.JavaMail.zimbra@opinsys.fi>
----- Original Message -----
> From: "Tuomas Räsänen" <tuomasjjrasanen@opinsys.fi>
>
> ----- Original Message -----
> > From: "Jeff Layton" <jlayton@poochiereds.net>
> >
> > Ok, now that I look closer at your stack trace the problem appears to
> > be that the unlock code is waiting for the lock context's io_count to
> > drop to zero before allowing the unlock to proceed.
> >
> > That likely means that there is some outstanding I/O that isn't
> > completing, but it's possible that the problem is the CB_RECALL is
> > being ignored. This will probably require some analysis of wire captures.
>
> The lockup mechnism seems to be as follows: the process (which is always
> firefox) is killed, and it tries to unlock the file (which is always a
> mmapped sqlite3 WAL index) which still has some pending IOs going on. The
> return value of nfs_wait_bit_killable() (-ERESTARTSYS from
> fatal_signal_pending(current)) is ignored and the process just keeps looṕing
> because io_count seems to be stuck at 1 (I still don't know why..). This
> raised few questions:
>
> Why the return value of nfs_wait_bit_killable() is not handled? Should it be
> handled and if yes, how?
>
> Why the whole iocounter wait is not just implemented using wait_on_bit()?
>
> I changed do_unlk() to use wait_on_bit() instead of nfs_iocounter_wait() and
> softlockups seem to have disappeared:
>
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index 284ca90..eb41b32 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -781,7 +781,11 @@ do_unlk(struct file *filp, int cmd, struct file_lock
> *fl, int is_local)
>
> l_ctx = nfs_get_lock_context(nfs_file_open_context(filp));
> if (!IS_ERR(l_ctx)) {
> - status = nfs_iocounter_wait(&l_ctx->io_count);
> + struct nfs_io_counter *io_count = &l_ctx->io_count;
> + status = wait_on_bit(&io_count->flags,
> + NFS_IO_INPROGRESS,
> + nfs_wait_bit_killable,
> + TASK_KILLABLE);
> nfs_put_lock_context(l_ctx);
> if (status < 0)
> return status;
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index 2ffebf2..6b9089c 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -87,6 +87,7 @@ nfs_page_free(struct nfs_page *p)
> static void
> nfs_iocounter_inc(struct nfs_io_counter *c)
> {
> + set_bit(NFS_IO_INPROGRESS, &c->flags);
> atomic_inc(&c->io_count);
> }
>
> Any thoughts? I really want to understand the issue at hand and to help
> fixing it properly.
The same kind of patch was proposed by David Jeffery in http://www.spinics.net/lists/linux-nfs/msg45806.html and the discussion in that thread answered lot of my questions.
The proposed patch was not accepted but David's another patch fixes the softlockup symptom (as tested with jam.c) as well: http://www.spinics.net/lists/linux-nfs/msg45807.html
Case closed.
--
Tuomas
prev parent reply other threads:[~2014-09-03 7:09 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <199810131.34257.1400570367382.JavaMail.zimbra@opinsys.fi>
2014-05-20 8:40 ` Soft lockups on kerberised NFSv4.0 clients Veli-Matti Lintu
2014-05-20 14:21 ` Jeff Layton
2014-05-21 14:55 ` Veli-Matti Lintu
2014-05-21 20:53 ` Jeff Layton
2014-06-02 9:56 ` Tuomas Räsänen
2014-06-02 19:10 ` Veli-Matti Lintu
2014-06-09 10:11 ` Tuomas Räsänen
2014-06-17 13:51 ` Tuomas Räsänen
2014-09-03 7:01 ` Tuomas Räsänen [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=927052272.63158.1409727692986.JavaMail.zimbra@opinsys.fi \
--to=tuomasjjrasanen@opinsys.fi \
--cc=jlayton@poochiereds.net \
--cc=linux-nfs@vger.kernel.org \
--cc=veli-matti.lintu@opinsys.fi \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).