linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Jeff Layton <jlayton@redhat.com>
Cc: linux-mm@kvack.org, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org, xfs@oss.sgi.com,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Ming Lei <ming.lei@canonical.com>,
	netdev@vger.kernel.org
Subject: Re: [PATCH/RFC 00/19] Support loop-back NFS mounts
Date: Thu, 17 Apr 2014 10:20:48 +1000	[thread overview]
Message-ID: <20140417102048.2fc8275c@notabene.brown> (raw)
In-Reply-To: <20140416104207.75b044e8@tlielax.poochiereds.net>

[-- Attachment #1: Type: text/plain, Size: 3266 bytes --]

On Wed, 16 Apr 2014 10:42:07 -0400 Jeff Layton <jlayton@redhat.com> wrote:

> On Wed, 16 Apr 2014 14:03:35 +1000
> NeilBrown <neilb@suse.de> wrote:
> 

> > Comments, criticisms, etc most welcome.
> > 
> > Thanks,
> > NeilBrown
> > 
> 
> I've only given this a once-over, but the basic concept seems a bit
> flawed. IIUC, the basic idea is to disallow allocations done in knfsd
> threads context from doing fs-based reclaim.
> 
> This seems very heavy-handed, and like it could cause problems on a
> busy NFS server. Those sorts of servers are likely to have a lot of
> data in pagecache and thus we generally want to allow them to do do
> writeback when memory is tight.
> 
> It's generally acceptable for knfsd to recurse into local filesystem
> code for writeback. What you want to avoid in this situation is reclaim
> on NFS filesystems that happen to be from knfsd on the same box.
> 
> If you really want to fix this, what may make more sense is trying to
> plumb that information down more granularly. Maybe GFP_NONETFS and/or
> PF_NETFSTRANS flags?

Hi Jeff,
 a few clarifications first:

 1/ These changes probably won't affect a "busy NFS server" at all.  The
    PF_FSTRANS flag only get set in nfsd when it sees a request from the local
    host.  Most busy NFS servers would never see that, and so would never set
    PF_FSTRANS.

 2/ Setting PF_FSTRANS does not affect where writeback is done.  Direct
    reclaim hasn't performed filesystem writeback since 3.2, it is all done
    by kswapd (I think direct reclaim still writes to swap sometimes).
    The main effects of setting PF_FSTRANS (as modified by this page set)
    are:
      - when reclaim calls ->releasepage  __GFP_FS is not set in the gfp_t arg
      - various caches like dcache, icache etc are not shrunk from
        direct reclaim
    There are other effects, but I'm less clear on exactly what they mean.

A flag specific to network filesystems might make sense, but I don't think it
would solve all the deadlocks.

A good example is the deadlock with the flush-* threads.
flush-* will lock a page, and  then call ->writepage.  If ->writepage
allocates memory it can enter reclaim, call ->releasepage on NFS, and block
waiting for a COMMIT to complete.
The COMMIT might already be running, performing fsync on that same file that
flush-* is flushing.  It locks each page in turn.  When it  gets to the page
that flush-* has locked, it will deadlock.

xfs_vm_writepage does allocate memory with __GFP_FS set
   xfs_vm_writepage -> xfs_setfilesize_trans_alloc -> xfs_trans_alloc ->
   _xfs_trans_allo

and I have had this deadlock happen.  To avoid this we need flush-* to ensure
that no memory allocation blocks on NFS.  We could set a PF_NETFSTRANS there,
but as that code really has nothing to do with networks it would seem an odd
place to put a network-fs-specific flag.

In general, if nfsd is allowed to block on local filesystem, and local
filesystem is allowed to block on NFS, then a deadlock can happen.
We would need a clear hierarchy

   __GFP_NETFS > __GFP_FS > __GFP_IO

for it to work.  I'm not sure the extra level really helps a lot and it would
be a lot of churn.


Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2014-04-17  0:20 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-16  4:03 [PATCH/RFC 00/19] Support loop-back NFS mounts NeilBrown
2014-04-16  4:03 ` [PATCH 04/19] Make effect of PF_FSTRANS to disable __GFP_FS universal NeilBrown
2014-04-16  5:37   ` Dave Chinner
2014-04-16  6:17     ` NeilBrown
2014-04-17  1:03       ` NeilBrown
2014-04-17  4:41         ` Dave Chinner
2014-04-16  4:03 ` [PATCH 09/19] XFS: ensure xfs_file_*_read cannot deadlock in memory allocation NeilBrown
2014-04-16  6:04   ` Dave Chinner
2014-04-16  6:27     ` NeilBrown
2014-04-16  6:31     ` Dave Chinner
2014-04-16  4:03 ` [PATCH 10/19] NET: set PF_FSTRANS while holding sk_lock NeilBrown
2014-04-16  5:13   ` Eric Dumazet
2014-04-16  5:47     ` NeilBrown
2014-04-16 13:00     ` David Miller
2014-04-17  2:38       ` NeilBrown
2014-04-16  4:03 ` [PATCH 08/19] Set PF_FSTRANS while write_cache_pages calls ->writepage NeilBrown
2014-04-16  4:03 ` [PATCH 14/19] driver core: set PF_FSTRANS while holding gdp_mutex NeilBrown
2014-04-16  4:03 ` [PATCH 12/19] NET: set PF_FSTRANS while holding rtnl_lock NeilBrown
2014-04-16  4:03 ` [PATCH 07/19] nfsd and VM: use PF_LESS_THROTTLE to avoid throttle in shrink_inactive_list NeilBrown
2014-04-16  4:03 ` [PATCH 02/19] lockdep: lockdep_set_current_reclaim_state should save old value NeilBrown
2014-04-16  4:03 ` [PATCH 05/19] SUNRPC: track whether a request is coming from a loop-back interface NeilBrown
2014-04-16 14:47   ` Jeff Layton
2014-04-16 23:25     ` NeilBrown
2014-04-16  4:03 ` [PATCH 06/19] nfsd: set PF_FSTRANS for nfsd threads NeilBrown
2014-04-16  7:28   ` Peter Zijlstra
2014-04-16  4:03 ` [PATCH 11/19] FS: set PF_FSTRANS while holding mmap_sem in exec.c NeilBrown
2014-04-16  4:03 ` [PATCH 13/19] MM: set PF_FSTRANS while allocating per-cpu memory to avoid deadlock NeilBrown
2014-04-16  5:49   ` Dave Chinner
2014-04-16  6:22     ` NeilBrown
2014-04-16  6:30       ` Dave Chinner
2014-04-16  4:03 ` [PATCH 01/19] Promote current_{set, restore}_flags_nested from xfs to global NeilBrown
2014-04-16  4:03 ` [PATCH 03/19] lockdep: improve scenario messages for RECLAIM_FS errors NeilBrown
2014-04-16  7:22   ` Peter Zijlstra
2014-04-16  4:03 ` [PATCH 19/19] XFS: set PF_FSTRANS while ilock is held in xfs_free_eofblocks NeilBrown
2014-04-16  6:18   ` Dave Chinner
2014-04-16  4:03 ` [PATCH 15/19] nfsd: set PF_FSTRANS when client_mutex is held NeilBrown
2014-04-16  4:03 ` [PATCH 17/19] VFS: set PF_FSTRANS while namespace_sem " NeilBrown
2014-04-16  4:46   ` Al Viro
2014-04-16  5:52     ` NeilBrown
     [not found]       ` <20140416155230.4d02e4b9-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2014-04-16 16:37         ` Al Viro
2014-04-16  4:03 ` [PATCH 16/19] VFS: use GFP_NOFS rather than GFP_KERNEL in __d_alloc NeilBrown
2014-04-16  6:25   ` Dave Chinner
2014-04-16  6:49     ` NeilBrown
2014-04-16  9:00       ` Dave Chinner
2014-04-17  0:51         ` NeilBrown
2014-04-17  5:58           ` Dave Chinner
2014-04-16  4:03 ` [PATCH 18/19] nfsd: set PF_FSTRANS during nfsd4_do_callback_rpc NeilBrown
2014-04-16 14:42 ` [PATCH/RFC 00/19] Support loop-back NFS mounts Jeff Layton
2014-04-17  0:20   ` NeilBrown [this message]
2014-04-17  1:27     ` Dave Chinner
2014-04-17  1:50       ` NeilBrown
2014-04-17  4:23         ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140417102048.2fc8275c@notabene.brown \
    --to=neilb@suse.de \
    --cc=jlayton@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=ming.lei@canonical.com \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).