linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "aglo@umich.edu" <aglo@umich.edu>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"Anna.Schumaker@netapp.com" <Anna.Schumaker@netapp.com>,
	"neilb@suse.de" <neilb@suse.de>,
	"schumaker.anna@gmail.com" <schumaker.anna@gmail.com>
Subject: Re: [PATCH 2/2] NFSv4: Fix a state manager thread deadlock regression
Date: Sun, 24 Sep 2023 17:08:48 +0000	[thread overview]
Message-ID: <29ac4c1f8017735a6d4f8e48e04172dc91d461ae.camel@hammerspace.com> (raw)
In-Reply-To: <c1c6106c3b4a6106ff706130fe551b424512dd34.camel@hammerspace.com>

On Fri, 2023-09-22 at 17:06 -0400, Trond Myklebust wrote:
> On Fri, 2023-09-22 at 17:00 -0400, Olga Kornievskaia wrote:
> > On Fri, Sep 22, 2023 at 3:05 PM Trond Myklebust 
> > > 
> > > Oh crap... Yes, that is a bug. Can you please apply the attached
> > > patch
> > > on top of the original, and see if that fixes the problem?
> > 
> > I can't consistently reproduce the problem. Out of several xfstests
> > runs a couple got stuck in that state. So when I apply that patch
> > and
> > run, I can't tell if i'm no longer hitting or if I'm just not
> > hitting
> > the right condition.
> > 
> > Given I don't exactly know what's caused it I'm trying to find
> > something I can hit consistently. Any ideas? I mean this stack
> > trace
> > seems to imply a recovery open but I'm not doing any server reboots
> > or
> > connection drops.
> > 
> > > 
> 
> If I'm right about the root cause, then just turning off delegations
> on
> the server, setting up a NFS swap partition and then running some
> ordinary file open/close workload against the exact same server would
> probably suffice to trigger your stack trace 100% reliably.
> 
> I'll see if I can find time to test it over the weekend.

> 

Yep... Creating a 4G empty file on /mnt/nfs/swap/swapfile, running
mkswap  and then swapon followed by a simple bash line of
	echo "foo" >/mnt/nfs/foobar

will immediately lead to a hang. When I look at the stack for the bash
process, I see the following dump, which matches yours:

[root@vmw-test-1 ~]# cat /proc/1120/stack 
[<0>] nfs_wait_bit_killable+0x11/0x60 [nfs]
[<0>] nfs4_wait_clnt_recover+0x54/0x90 [nfsv4]
[<0>] nfs4_client_recover_expired_lease+0x29/0x60 [nfsv4]
[<0>] nfs4_do_open+0x170/0xa90 [nfsv4]
[<0>] nfs4_atomic_open+0x94/0x100 [nfsv4]
[<0>] nfs_atomic_open+0x2d9/0x670 [nfs]
[<0>] path_openat+0x3c3/0xd40
[<0>] do_filp_open+0xb4/0x160
[<0>] do_sys_openat2+0x81/0xe0
[<0>] __x64_sys_openat+0x81/0xa0
[<0>] do_syscall_64+0x68/0xa0
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8

With the fix I sent you:

[root@vmw-test-1 ~]# mount -t nfs -overs=4.2 vmw-test-2:/export /mnt/nfs
[root@vmw-test-1 ~]# mkswap /mnt/nfs/swap/swapfile 
mkswap: /mnt/nfs/swap/swapfile: warning: wiping old swap signature.
Setting up swapspace version 1, size = 4 GiB (4294963200 bytes)
no label, UUID=1360b0a3-833a-4ba7-b467-8a59d3723013
[root@vmw-test-1 ~]# swapon /mnt/nfs/swap/swapfile
[root@vmw-test-1 ~]# ps -efww | grep manage
root        1214       2  0 13:04 ?        00:00:00 [192.168.76.251-manager]
root        1216    1147  0 13:04 pts/0    00:00:00 grep --color=auto manage
[root@vmw-test-1 ~]# echo "foo" >/mnt/nfs/foobar
[root@vmw-test-1 ~]# cat /mnt/nfs/foobar
foo

So that returns behaviour to normal in my testing, and I no longer see
the hangs.

Let me send out a PATCHv2...
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



  reply	other threads:[~2023-09-24 17:08 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-17 23:05 [PATCH 1/2] NFSv4: Fix a nfs4_state_manager() race trondmy
2023-09-17 23:05 ` [PATCH 2/2] NFSv4: Fix a state manager thread deadlock regression trondmy
2023-09-18  1:25   ` NeilBrown
2023-09-18  2:27     ` Trond Myklebust
2023-09-20 19:38   ` Anna Schumaker
2023-09-21  0:15     ` Trond Myklebust
2023-09-22 17:22       ` Olga Kornievskaia
2023-09-22 19:05         ` Trond Myklebust
2023-09-22 21:00           ` Olga Kornievskaia
2023-09-22 21:06             ` Trond Myklebust
2023-09-24 17:08               ` Trond Myklebust [this message]
2023-09-26 14:55                 ` Anna Schumaker
2023-09-26 14:31               ` Olga Kornievskaia
2023-09-25 22:28           ` NeilBrown
2023-09-25 22:44             ` Trond Myklebust
2023-09-25 23:04               ` NeilBrown
2023-09-25 23:20                 ` Trond Myklebust
2023-09-18  1:17 ` [PATCH 1/2] NFSv4: Fix a nfs4_state_manager() race NeilBrown
2023-09-18  2:20   ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=29ac4c1f8017735a6d4f8e48e04172dc91d461ae.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=Anna.Schumaker@netapp.com \
    --cc=aglo@umich.edu \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=schumaker.anna@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).