From: Trond Myklebust <trondmy@hammerspace.com>
To: "aglo@umich.edu" <aglo@umich.edu>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
"Anna.Schumaker@netapp.com" <Anna.Schumaker@netapp.com>,
"neilb@suse.de" <neilb@suse.de>,
"schumaker.anna@gmail.com" <schumaker.anna@gmail.com>
Subject: Re: [PATCH 2/2] NFSv4: Fix a state manager thread deadlock regression
Date: Fri, 22 Sep 2023 19:05:16 +0000 [thread overview]
Message-ID: <077cb75b44afd2404629c1388a92ca61da5092b1.camel@hammerspace.com> (raw)
In-Reply-To: <CAN-5tyEvYBr-bqOeO2Umt2DVa_CkKxT8_2Zo8Q1mfa9RN9VxQg@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3278 bytes --]
On Fri, 2023-09-22 at 13:22 -0400, Olga Kornievskaia wrote:
> On Wed, Sep 20, 2023 at 8:27 PM Trond Myklebust
> <trondmy@hammerspace.com> wrote:
> >
> > On Wed, 2023-09-20 at 15:38 -0400, Anna Schumaker wrote:
> > > Hi Trond,
> > >
> > > On Sun, Sep 17, 2023 at 7:12 PM <trondmy@kernel.org> wrote:
> > > >
> > > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > >
> > > > Commit 4dc73c679114 reintroduces the deadlock that was fixed by
> > > > commit
> > > > aeabb3c96186 ("NFSv4: Fix a NFSv4 state manager deadlock")
> > > > because
> > > > it
> > > > prevents the setup of new threads to handle reboot recovery,
> > > > while
> > > > the
> > > > older recovery thread is stuck returning delegations.
> > >
> > > I'm seeing a possible deadlock with xfstests generic/472 on NFS
> > > v4.x
> > > after applying this patch. The test itself checks for various
> > > swapfile
> > > edge cases, so it seems likely something is going on there.
> > >
> > > Let me know if you need more info
> > > Anna
> > >
> >
> > Did you turn off delegations on your server? If you don't, then
> > swap
> > will deadlock itself under various scenarios.
>
> Is there documentation somewhere that says that delegations must be
> turned off on the server if NFS over swap is enabled?
I think the question is more generally "Is there documentation for NFS
swap?"
>
> If the client can't handle delegations + swap, then shouldn't this be
> solved by (1) checking if we are in NFS over swap and then
> proactively
> setting 'dont want delegation' on open and/or (2) proactively return
> the delegation if received so that we don't get into the deadlock?
We could do that for NFSv4.1 and NFSv4.2, but the protocol will not
allow us to do that for NFSv4.0.
>
> I think this is similar to Anna's. With this patch, I'm running into
> a
> problem running against an ONTAP server using xfstests (no problems
> without the patch). During the run two stuck threads are:
> [root@unknown000c291be8aa aglo]# cat /proc/3724/stack
> [<0>] nfs4_run_state_manager+0x1c0/0x1f8 [nfsv4]
> [<0>] kthread+0x100/0x110
> [<0>] ret_from_fork+0x10/0x20
> [root@unknown000c291be8aa aglo]# cat /proc/3725/stack
> [<0>] nfs_wait_bit_killable+0x1c/0x88 [nfs]
> [<0>] nfs4_wait_clnt_recover+0xb4/0xf0 [nfsv4]
> [<0>] nfs4_client_recover_expired_lease+0x34/0x88 [nfsv4]
> [<0>] _nfs4_do_open.isra.0+0x94/0x408 [nfsv4]
> [<0>] nfs4_do_open+0x9c/0x238 [nfsv4]
> [<0>] nfs4_atomic_open+0x100/0x118 [nfsv4]
> [<0>] nfs4_file_open+0x11c/0x240 [nfsv4]
> [<0>] do_dentry_open+0x140/0x528
> [<0>] vfs_open+0x30/0x38
> [<0>] do_open+0x14c/0x360
> [<0>] path_openat+0x104/0x250
> [<0>] do_filp_open+0x84/0x138
> [<0>] file_open_name+0x134/0x190
> [<0>] __do_sys_swapoff+0x58/0x6e8
> [<0>] __arm64_sys_swapoff+0x18/0x28
> [<0>] invoke_syscall.constprop.0+0x7c/0xd0
> [<0>] do_el0_svc+0xb4/0xd0
> [<0>] el0_svc+0x50/0x228
> [<0>] el0t_64_sync_handler+0x134/0x150
> [<0>] el0t_64_sync+0x17c/0x180
Oh crap... Yes, that is a bug. Can you please apply the attached patch
on top of the original, and see if that fixes the problem?
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-fixup-NFSv4-Fix-a-state-manager-thread-deadlock-regr.patch --]
[-- Type: text/x-patch; name="0001-fixup-NFSv4-Fix-a-state-manager-thread-deadlock-regr.patch", Size: 1475 bytes --]
From ec89a837b71772feeb55cd9ece4e91900d4afa79 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust@hammerspace.com>
Date: Fri, 22 Sep 2023 15:00:08 -0400
Subject: [PATCH] fixup! NFSv4: Fix a state manager thread deadlock regression
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
fs/nfs/nfs4state.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 5751a6886da4..9a5d911a7edc 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -2762,17 +2762,17 @@ static int nfs4_run_state_manager(void *ptr)
nfs4_state_manager(clp);
if (test_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state) &&
- !test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state)) {
+ !test_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state)) {
wait_var_event_interruptible(&clp->cl_state,
test_bit(NFS4CLNT_RUN_MANAGER,
&clp->cl_state));
if (!atomic_read(&cl->cl_swapper))
clear_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state);
- if (refcount_read(&clp->cl_count) > 1 && !signalled())
+ if (refcount_read(&clp->cl_count) > 1 && !signalled() &&
+ !test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state))
goto again;
/* Either no longer a swapper, or were signalled */
clear_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state);
- nfs4_clear_state_manager_bit(clp);
}
if (refcount_read(&clp->cl_count) > 1 && !signalled() &&
--
2.41.0
next prev parent reply other threads:[~2023-09-22 19:05 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-17 23:05 [PATCH 1/2] NFSv4: Fix a nfs4_state_manager() race trondmy
2023-09-17 23:05 ` [PATCH 2/2] NFSv4: Fix a state manager thread deadlock regression trondmy
2023-09-18 1:25 ` NeilBrown
2023-09-18 2:27 ` Trond Myklebust
2023-09-20 19:38 ` Anna Schumaker
2023-09-21 0:15 ` Trond Myklebust
2023-09-22 17:22 ` Olga Kornievskaia
2023-09-22 19:05 ` Trond Myklebust [this message]
2023-09-22 21:00 ` Olga Kornievskaia
2023-09-22 21:06 ` Trond Myklebust
2023-09-24 17:08 ` Trond Myklebust
2023-09-26 14:55 ` Anna Schumaker
2023-09-26 14:31 ` Olga Kornievskaia
2023-09-25 22:28 ` NeilBrown
2023-09-25 22:44 ` Trond Myklebust
2023-09-25 23:04 ` NeilBrown
2023-09-25 23:20 ` Trond Myklebust
2023-09-18 1:17 ` [PATCH 1/2] NFSv4: Fix a nfs4_state_manager() race NeilBrown
2023-09-18 2:20 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=077cb75b44afd2404629c1388a92ca61da5092b1.camel@hammerspace.com \
--to=trondmy@hammerspace.com \
--cc=Anna.Schumaker@netapp.com \
--cc=aglo@umich.edu \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=schumaker.anna@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).