From: "NeilBrown" <neilb@suse.de>
To: "Trond Myklebust" <trondmy@hammerspace.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
"aglo@umich.edu" <aglo@umich.edu>,
"Anna.Schumaker@netapp.com" <Anna.Schumaker@netapp.com>,
"schumaker.anna@gmail.com" <schumaker.anna@gmail.com>
Subject: Re: [PATCH 2/2] NFSv4: Fix a state manager thread deadlock regression
Date: Tue, 26 Sep 2023 09:04:05 +1000 [thread overview]
Message-ID: <169568304501.19404.1610884104930799751@noble.neil.brown.name> (raw)
In-Reply-To: <ee6c49e086b5ee2393d2bfc375383527eff72af8.camel@hammerspace.com>
On Tue, 26 Sep 2023, Trond Myklebust wrote:
> On Tue, 2023-09-26 at 08:28 +1000, NeilBrown wrote:
> > On Sat, 23 Sep 2023, Trond Myklebust wrote:
> > > On Fri, 2023-09-22 at 13:22 -0400, Olga Kornievskaia wrote:
> > > > On Wed, Sep 20, 2023 at 8:27 PM Trond Myklebust
> > > > <trondmy@hammerspace.com> wrote:
> > > > >
> > > > > On Wed, 2023-09-20 at 15:38 -0400, Anna Schumaker wrote:
> > > > > > Hi Trond,
> > > > > >
> > > > > > On Sun, Sep 17, 2023 at 7:12 PM <trondmy@kernel.org> wrote:
> > > > > > >
> > > > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > > > > >
> > > > > > > Commit 4dc73c679114 reintroduces the deadlock that was
> > > > > > > fixed by
> > > > > > > commit
> > > > > > > aeabb3c96186 ("NFSv4: Fix a NFSv4 state manager deadlock")
> > > > > > > because
> > > > > > > it
> > > > > > > prevents the setup of new threads to handle reboot
> > > > > > > recovery,
> > > > > > > while
> > > > > > > the
> > > > > > > older recovery thread is stuck returning delegations.
> > > > > >
> > > > > > I'm seeing a possible deadlock with xfstests generic/472 on
> > > > > > NFS
> > > > > > v4.x
> > > > > > after applying this patch. The test itself checks for various
> > > > > > swapfile
> > > > > > edge cases, so it seems likely something is going on there.
> > > > > >
> > > > > > Let me know if you need more info
> > > > > > Anna
> > > > > >
> > > > >
> > > > > Did you turn off delegations on your server? If you don't, then
> > > > > swap
> > > > > will deadlock itself under various scenarios.
> > > >
> > > > Is there documentation somewhere that says that delegations must
> > > > be
> > > > turned off on the server if NFS over swap is enabled?
> > >
> > > I think the question is more generally "Is there documentation for
> > > NFS
> > > swap?"
> >
> > The main difference between using NFS for swap and for regular file
> > IO
> > is in the handling of writes, and particularly in the style of memory
> > allocation that is safe while handling a write request (or anything
> > which might block some write request, etc).
> >
> > For buffered IO, memory allocations must be GFP_NOIO or
> > PF_MEMALLOC_NOIO.
> > For swap-out, memory allocations must be GFP_MEMALLOC or PG_MEMALLOC.
> >
> > That is the primary difference - all other differences are minor.
> > This
> > difference might justify documentation suggesting that
> > /proc/sys/vm/min_free_kbytes could usefully be increased, but I don't
> > see that more is needed.
> >
> > The NOIO/MEMALLOC distinction is properly plumbed through nfs,
> > sunrpc,
> > and networking and all "just works". The problem area is that
> > kthread_create() doesn't take a gfpflags_t argument, so it uses
> > GFP_KERNEL allocations to create the new thread.
> >
> > This means that when a write cannot proceed without state management,
> > and state management requests that a threads be started, there is a
> > risk
> > of memory allocation deadlock.
> > I believe the risk is there even for buffered IO, but I'm not 100%
> > certain and in practice I don't think a deadlock has ever been
> > reported.
> > With swap-out it is fairly easy to trigger a deadlock if there is
> > heavy
> > swap-out traffic when state management is needed.
> >
> > The common pattern in the kernel when a thread might be needed to
> > support writeout is to keep the thread running permanently (rather
> > than
> > to add a gfpflags_t to kthread_create), so that is what I added to
> > the
> > nfsv4 state manager.
> >
> > However the state manager thread has a second use - returning
> > delegations. This sometimes needs to run concurrently with state
> > management, so one thread is not enough.
> >
> > What is that context for delegation return? Does it ever block
> > writes?
> > If it doesn't, would it make sense to use a work queue for returning
> > delegations - maybe system_wq?
>
> These are potentially long-lived processes because there may be lock
> recovery involved, and because of the conflict with state recovery, so
> it does not make sense to put them on a workqueue.
>
Makes sense - thanks.
Are writes blocked while the delegation returns proceeds? If not, would
it be reasonable to start a separate kthread on-demand when a return is
requested?
Thanks,
NeilBrown
next prev parent reply other threads:[~2023-09-25 23:04 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-17 23:05 [PATCH 1/2] NFSv4: Fix a nfs4_state_manager() race trondmy
2023-09-17 23:05 ` [PATCH 2/2] NFSv4: Fix a state manager thread deadlock regression trondmy
2023-09-18 1:25 ` NeilBrown
2023-09-18 2:27 ` Trond Myklebust
2023-09-20 19:38 ` Anna Schumaker
2023-09-21 0:15 ` Trond Myklebust
2023-09-22 17:22 ` Olga Kornievskaia
2023-09-22 19:05 ` Trond Myklebust
2023-09-22 21:00 ` Olga Kornievskaia
2023-09-22 21:06 ` Trond Myklebust
2023-09-24 17:08 ` Trond Myklebust
2023-09-26 14:55 ` Anna Schumaker
2023-09-26 14:31 ` Olga Kornievskaia
2023-09-25 22:28 ` NeilBrown
2023-09-25 22:44 ` Trond Myklebust
2023-09-25 23:04 ` NeilBrown [this message]
2023-09-25 23:20 ` Trond Myklebust
2023-09-18 1:17 ` [PATCH 1/2] NFSv4: Fix a nfs4_state_manager() race NeilBrown
2023-09-18 2:20 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=169568304501.19404.1610884104930799751@noble.neil.brown.name \
--to=neilb@suse.de \
--cc=Anna.Schumaker@netapp.com \
--cc=aglo@umich.edu \
--cc=linux-nfs@vger.kernel.org \
--cc=schumaker.anna@gmail.com \
--cc=trondmy@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).