From: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
To: "Adamson, Andy" <William.Adamson@netapp.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH 1/1] NFSv4.1 fix a kswap nfs4_state_manger race
Date: Mon, 25 Nov 2013 21:06:44 +0000 [thread overview]
Message-ID: <1385413604.9247.3.camel@leira.trondhjem.org> (raw)
In-Reply-To: <3A65AD2F-797E-4292-BA9C-4CF20BD075CB@netapp.com>
[-- Attachment #1: Type: text/plain, Size: 4635 bytes --]
On Mon, 2013-11-25 at 20:29 +0000, Adamson, Andy wrote:
> On Nov 25, 2013, at 3:20 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com>
> wrote:
>
> >
> > On Nov 25, 2013, at 15:10, Adamson, Andy <William.Adamson@netapp.com> wrote:
> >
> >>
> >> On Nov 25, 2013, at 2:53 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com>
> >> wrote:
> >>
> >>>
> >>> On Nov 25, 2013, at 14:27, Adamson, Andy <William.Adamson@netapp.com> wrote:
> >>>
> >>>>
> >>>> On Nov 25, 2013, at 1:33 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com>
> >>>> wrote:
> >>>>
> >>>>>
> >>>>> On Nov 25, 2013, at 13:13, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote:
> >>>>>
> >>>>>>
> >>>>>> On Nov 25, 2013, at 12:57, <andros@netapp.com> <andros@netapp.com> wrote:
> >>>>>>
> >>>>>>> From: Andy Adamson <andros@netapp.com>
> >>>>>>>
> >>>>>>> The state manager is recovering expired state and recovery OPENs are being
> >>>>>>> processed. If kswapd is pruning inodes at the same time, a deadlock can occur
> >>>>>>> when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the
> >>>>>>> resultant layoutreturn gets an error that the state mangager is to handle,
> >>>>>>> causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq.
> >>>>>>>
> >>>>>>> At the same time an open is waiting for the inode deletion to complete in
> >>>>>>> __wait_on_freeing_inode.
> >>>>>>>
> >>>>>>> If the open is either the open called by the state manager, or an open from
> >>>>>>> the same open owner that is holding the NFSv4.0 sequence id which causes the
> >>>>>>> OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue,
> >>>>>>> then the state is deadlocked with kswapd.
> >>>>>>>
> >>>>>>> Do not handle LAYOUTRETURN errors when called from nfs4_evict_inode.
> >>>>>>
> >>>>>> Why are we waiting for recovery in LAYOUTRETURN at all? Layouts are automatically lost when the server reboots or when the lease is otherwise lost.
> >>>>>>
> >>>>>> IOW: Is there any reason why we need to special-case nfs4_evict_inode? Shouldn’t we just bail out on error in _all_ cases?
> >>>>>
> >>>>> BTW: Is it possible that we might have a similar problem with delegreturn? That too can be called from nfs4_evict_inode…
> >>>>
> >>>> Yes, good point. kswapd could be waiting for a delegation to return which has an error along with the same scenario with sys_open and the state manager running.
> >>>>
> >>>> With delegreturn, we most definately want to limit 'no error handling' to the evict inode case.
> >>>
> >>> Ah… I forgot that the delegreturn in nfs4_evict_inode is asynchronous and doesn’t wait for completion, so it shouldn’t be a problem here.
> >>
> >> Except we just changed that to fix a different state manager hang:
> >>
> >> commit 4a82fd7c4e78a1b7a224f9ae8bb7e1fd95f670e0
> >> Author: Andy Adamson <andros@netapp.com>
> >> Date: Fri Nov 15 16:36:16 2013 -0500
> >>
> >> NFSv4 wait on recovery for async session errors
> >
> > Right, but that won’t prevent nfs4_evict_inode from completing,
>
> Ah - I was thinking of the synchronous handlers call to nfs4_wait_clnt_recover - so yes, no problem
>
> -->Andy
>
> > and hence the OPEN that is waiting in nfs_fhget() can also complete, and so there is no deadlock with the state manager thread.
How about something like the attached...
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch.dif --]
[-- Type: text/x-patch; name="patch.dif", Size: 604 bytes --]
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index f01e2aa53210..e040359983ce 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7599,7 +7599,14 @@ static void nfs4_layoutreturn_done(struct rpc_task *task, void *calldata)
return;
server = NFS_SERVER(lrp->args.inode);
- if (nfs4_async_handle_error(task, server, NULL) == -EAGAIN) {
+ switch (task->tk_status) {
+ default:
+ task->tk_status = 0;
+ case 0:
+ break;
+ case -NFS4ERR_DELAY:
+ if (nfs4_async_handle_error(task, server, NULL) != -EAGAIN)
+ break;
rpc_restart_call_prepare(task);
return;
}
prev parent reply other threads:[~2013-11-25 21:06 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-25 17:57 [PATCH 0/1] NFSv4.1 fix a kswap nfs4_state_manger race andros
2013-11-25 17:57 ` [PATCH 1/1] " andros
2013-11-25 18:13 ` Myklebust, Trond
2013-11-25 18:17 ` Adamson, Andy
2013-11-25 18:28 ` Myklebust, Trond
2013-11-25 18:31 ` Adamson, Andy
2013-11-25 18:33 ` Myklebust, Trond
2013-11-25 19:27 ` Adamson, Andy
2013-11-25 19:53 ` Myklebust, Trond
2013-11-25 20:10 ` Adamson, Andy
2013-11-25 20:20 ` Myklebust, Trond
2013-11-25 20:29 ` Adamson, Andy
2013-11-25 20:51 ` Adamson, Andy
2013-11-25 20:54 ` Adamson, Andy
2013-11-25 20:57 ` Myklebust, Trond
2013-11-25 21:06 ` Myklebust, Trond [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1385413604.9247.3.camel@leira.trondhjem.org \
--to=trond.myklebust@netapp.com \
--cc=William.Adamson@netapp.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).