linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "NeilBrown" <neilb@suse.de>
To: "Trond Myklebust" <trondmy@hammerspace.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"aglo@umich.edu" <aglo@umich.edu>,
	"Anna.Schumaker@netapp.com" <Anna.Schumaker@netapp.com>,
	"schumaker.anna@gmail.com" <schumaker.anna@gmail.com>
Subject: Re: [PATCH 2/2] NFSv4: Fix a state manager thread deadlock regression
Date: Tue, 26 Sep 2023 09:04:05 +1000	[thread overview]
Message-ID: <169568304501.19404.1610884104930799751@noble.neil.brown.name> (raw)
In-Reply-To: <ee6c49e086b5ee2393d2bfc375383527eff72af8.camel@hammerspace.com>

On Tue, 26 Sep 2023, Trond Myklebust wrote:
> On Tue, 2023-09-26 at 08:28 +1000, NeilBrown wrote:
> > On Sat, 23 Sep 2023, Trond Myklebust wrote:
> > > On Fri, 2023-09-22 at 13:22 -0400, Olga Kornievskaia wrote:
> > > > On Wed, Sep 20, 2023 at 8:27 PM Trond Myklebust
> > > > <trondmy@hammerspace.com> wrote:
> > > > > 
> > > > > On Wed, 2023-09-20 at 15:38 -0400, Anna Schumaker wrote:
> > > > > > Hi Trond,
> > > > > > 
> > > > > > On Sun, Sep 17, 2023 at 7:12 PM <trondmy@kernel.org> wrote:
> > > > > > > 
> > > > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > > > > > 
> > > > > > > Commit 4dc73c679114 reintroduces the deadlock that was
> > > > > > > fixed by
> > > > > > > commit
> > > > > > > aeabb3c96186 ("NFSv4: Fix a NFSv4 state manager deadlock")
> > > > > > > because
> > > > > > > it
> > > > > > > prevents the setup of new threads to handle reboot
> > > > > > > recovery,
> > > > > > > while
> > > > > > > the
> > > > > > > older recovery thread is stuck returning delegations.
> > > > > > 
> > > > > > I'm seeing a possible deadlock with xfstests generic/472 on
> > > > > > NFS
> > > > > > v4.x
> > > > > > after applying this patch. The test itself checks for various
> > > > > > swapfile
> > > > > > edge cases, so it seems likely something is going on there.
> > > > > > 
> > > > > > Let me know if you need more info
> > > > > > Anna
> > > > > > 
> > > > > 
> > > > > Did you turn off delegations on your server? If you don't, then
> > > > > swap
> > > > > will deadlock itself under various scenarios.
> > > > 
> > > > Is there documentation somewhere that says that delegations must
> > > > be
> > > > turned off on the server if NFS over swap is enabled?
> > > 
> > > I think the question is more generally "Is there documentation for
> > > NFS
> > > swap?"
> > 
> > The main difference between using NFS for swap and for regular file
> > IO
> > is in the handling of writes, and particularly in the style of memory
> > allocation that is safe while handling a write request (or anything
> > which might block some write request, etc).
> > 
> > For buffered IO, memory allocations must be GFP_NOIO or
> > PF_MEMALLOC_NOIO.
> > For swap-out, memory allocations must be GFP_MEMALLOC or PG_MEMALLOC.
> > 
> > That is the primary difference - all other differences are minor. 
> > This
> > difference might justify documentation suggesting that 
> > /proc/sys/vm/min_free_kbytes could usefully be increased, but I don't
> > see that more is needed.
> > 
> > The NOIO/MEMALLOC distinction is properly plumbed through nfs,
> > sunrpc,
> > and networking and all "just works".  The problem area is that
> > kthread_create() doesn't take a gfpflags_t argument, so it uses
> > GFP_KERNEL allocations to create the new thread.
> > 
> > This means that when a write cannot proceed without state management,
> > and state management requests that a threads be started, there is a
> > risk
> > of memory allocation deadlock.
> > I believe the risk is there even for buffered IO, but I'm not 100%
> > certain and in practice I don't think a deadlock has ever been
> > reported.
> > With swap-out it is fairly easy to trigger a deadlock if there is
> > heavy
> > swap-out traffic when state management is needed.
> > 
> > The common pattern in the kernel when a thread might be needed to
> > support writeout is to keep the thread running permanently (rather
> > than
> > to add a gfpflags_t to kthread_create), so that is what I added to
> > the
> > nfsv4 state manager.
> > 
> > However the state manager thread has a second use - returning
> > delegations.  This sometimes needs to run concurrently with state
> > management, so one thread is not enough.
> > 
> > What is that context for delegation return?  Does it ever block
> > writes? 
> > If it doesn't, would it make sense to use a work queue for returning
> > delegations - maybe system_wq?
> 
> These are potentially long-lived processes because there may be lock
> recovery involved, and because of the conflict with state recovery, so
> it does not make sense to put them on a workqueue.
> 

Makes sense - thanks.
Are writes blocked while the delegation returns proceeds?  If not, would
it be reasonable to start a separate kthread on-demand when a return is
requested?

Thanks,
NeilBrown

  reply	other threads:[~2023-09-25 23:04 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-17 23:05 [PATCH 1/2] NFSv4: Fix a nfs4_state_manager() race trondmy
2023-09-17 23:05 ` [PATCH 2/2] NFSv4: Fix a state manager thread deadlock regression trondmy
2023-09-18  1:25   ` NeilBrown
2023-09-18  2:27     ` Trond Myklebust
2023-09-20 19:38   ` Anna Schumaker
2023-09-21  0:15     ` Trond Myklebust
2023-09-22 17:22       ` Olga Kornievskaia
2023-09-22 19:05         ` Trond Myklebust
2023-09-22 21:00           ` Olga Kornievskaia
2023-09-22 21:06             ` Trond Myklebust
2023-09-24 17:08               ` Trond Myklebust
2023-09-26 14:55                 ` Anna Schumaker
2023-09-26 14:31               ` Olga Kornievskaia
2023-09-25 22:28           ` NeilBrown
2023-09-25 22:44             ` Trond Myklebust
2023-09-25 23:04               ` NeilBrown [this message]
2023-09-25 23:20                 ` Trond Myklebust
2023-09-18  1:17 ` [PATCH 1/2] NFSv4: Fix a nfs4_state_manager() race NeilBrown
2023-09-18  2:20   ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=169568304501.19404.1610884104930799751@noble.neil.brown.name \
    --to=neilb@suse.de \
    --cc=Anna.Schumaker@netapp.com \
    --cc=aglo@umich.edu \
    --cc=linux-nfs@vger.kernel.org \
    --cc=schumaker.anna@gmail.com \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).