Re: [PATCH 3/4] NFS: avoid deadlocks with loop-back mounted NFS filesystems.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Anna Schumaker <Anna.Schumaker@netapp.com>
To: NeilBrown <neilb@suse.de>, Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Trond Myklebust <trond.myklebust@primarydata.com>,
	Ingo Molnar <mingo@redhat.com>
Cc: <linux-fsdevel@vger.kernel.org>, <linux-mm@kvack.org>,
	<linux-nfs@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	Jeff Layton <jeff.layton@primarydata.com>
Subject: Re: [PATCH 3/4] NFS: avoid deadlocks with loop-back mounted NFS filesystems.
Date: Tue, 16 Sep 2014 08:39:39 -0400	[thread overview]
Message-ID: <54182F8B.8010302@Netapp.com> (raw)
In-Reply-To: <20140916053135.22257.68002.stgit@notabene.brown>

On 09/16/2014 01:31 AM, NeilBrown wrote:
> Support for loop-back mounted NFS filesystems is useful when NFS is
> used to access shared storage in a high-availability cluster.
>
> If the node running the NFS server fails, some other node can mount the
> filesystem and start providing NFS service.  If that node already had
> the filesystem NFS mounted, it will now have it loop-back mounted.
>
> nfsd can suffer a deadlock when allocating memory and entering direct
> reclaim.
> While direct reclaim does not write to the NFS filesystem it can send
> and wait for a COMMIT through nfs_release_page().

Is there anything that can be done on the nfsd side to prevent the deadlocks?

Anna

>
> This patch modifies nfs_release_page() to wait a limited time for the
> commit to complete - one second.  If the commit doesn't complete
> in this time, nfs_release_page() will fail.  This means it might now
> fail in some cases where it wouldn't before.  These cases are only
> when 'gfp' includes '__GFP_WAIT'.
>
> nfs_release_page() is only called by try_to_release_page(), and that
> can only be called on an NFS page with required 'gfp' flags from
>  - page_cache_pipe_buf_steal() in splice.c
>  - shrink_page_list() in vmscan.c
>  - invalidate_inode_pages2_range() in truncate.c
>
> The first two handle failure quite safely.  The last is only called
> after ->launder_page() has been called, and that will have waited
> for the commit to finish already.
>
> So aborting if the commit takes longer than 1 second is perfectly safe.
>
> 1 second may be longer than is really necessary, but it is much
> shorter than the current maximum wait, so this is not a regression.
> Some waiting is needed to help slow down memory allocation to the
> rate that we can complete writeout of pages.
>
> In those rare cases where it is nfsd, or something that nfsd is
> waiting for, that is calling nfs_release_page(), this delay will at
> most cause a small hic-cough in places where it currently deadlocks.
>
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfs/file.c  |   24 ++++++++++++++----------
>  fs/nfs/write.c |    2 ++
>  2 files changed, 16 insertions(+), 10 deletions(-)
>
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index 524dd80d1898..8d74983417af 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -468,17 +468,21 @@ static int nfs_release_page(struct page *page, gfp_t gfp)
>  
>  	dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page);
>  
> -	/* Only do I/O if gfp is a superset of GFP_KERNEL, and we're not
> -	 * doing this memory reclaim for a fs-related allocation.
> +	/* Always try to initiate a 'commit' if relevant, but only
> +	 * wait for it if __GFP_WAIT is set and the calling process is
> +	 * allowed to block.  Even then, only wait 1 second.  Waiting
> +	 * indefinitely can cause deadlocks when the NFS server is on
> +	 * this machine, and there is no particular need to wait
> +	 * extensively here.  A short wait has the benefit that
> +	 * someone else can worry about the freezer.
>  	 */
> -	if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL &&
> -	    !(current->flags & PF_FSTRANS)) {
> -		int how = FLUSH_SYNC;
> -
> -		/* Don't let kswapd deadlock waiting for OOM RPC calls */
> -		if (current_is_kswapd())
> -			how = 0;
> -		nfs_commit_inode(mapping->host, how);
> +	if (mapping) {
> +		nfs_commit_inode(mapping->host, 0);
> +		if ((gfp & __GFP_WAIT) &&
> +		    !current_is_kswapd() &&
> +		    !(current->flags & PF_FSTRANS))
> +			wait_on_page_bit_killable_timeout(page, PG_private,
> +							  HZ);
>  	}
>  	/* If PagePrivate() is set, then the page is not freeable */
>  	if (PagePrivate(page))
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index 175d5d073ccf..b5d83c7545d4 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -731,6 +731,8 @@ static void nfs_inode_remove_request(struct nfs_page *req)
>  		if (likely(!PageSwapCache(head->wb_page))) {
>  			set_page_private(head->wb_page, 0);
>  			ClearPagePrivate(head->wb_page);
> +			smp_mb__after_atomic();
> +			wake_up_page(head->wb_page, PG_private);
>  			clear_bit(PG_MAPPED, &head->wb_flags);
>  		}
>  		nfsi->npages--;
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)

From: Anna Schumaker <Anna.Schumaker@netapp.com>
To: NeilBrown <neilb@suse.de>, Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Trond Myklebust <trond.myklebust@primarydata.com>,
	Ingo Molnar <mingo@redhat.com>
Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	Jeff Layton <jeff.layton@primarydata.com>
Subject: Re: [PATCH 3/4] NFS: avoid deadlocks with loop-back mounted NFS filesystems.
Date: Tue, 16 Sep 2014 08:39:39 -0400	[thread overview]
Message-ID: <54182F8B.8010302@Netapp.com> (raw)
In-Reply-To: <20140916053135.22257.68002.stgit@notabene.brown>

On 09/16/2014 01:31 AM, NeilBrown wrote:
> Support for loop-back mounted NFS filesystems is useful when NFS is
> used to access shared storage in a high-availability cluster.
>
> If the node running the NFS server fails, some other node can mount the
> filesystem and start providing NFS service.  If that node already had
> the filesystem NFS mounted, it will now have it loop-back mounted.
>
> nfsd can suffer a deadlock when allocating memory and entering direct
> reclaim.
> While direct reclaim does not write to the NFS filesystem it can send
> and wait for a COMMIT through nfs_release_page().

Is there anything that can be done on the nfsd side to prevent the deadlocks?

Anna

>
> This patch modifies nfs_release_page() to wait a limited time for the
> commit to complete - one second.  If the commit doesn't complete
> in this time, nfs_release_page() will fail.  This means it might now
> fail in some cases where it wouldn't before.  These cases are only
> when 'gfp' includes '__GFP_WAIT'.
>
> nfs_release_page() is only called by try_to_release_page(), and that
> can only be called on an NFS page with required 'gfp' flags from
>  - page_cache_pipe_buf_steal() in splice.c
>  - shrink_page_list() in vmscan.c
>  - invalidate_inode_pages2_range() in truncate.c
>
> The first two handle failure quite safely.  The last is only called
> after ->launder_page() has been called, and that will have waited
> for the commit to finish already.
>
> So aborting if the commit takes longer than 1 second is perfectly safe.
>
> 1 second may be longer than is really necessary, but it is much
> shorter than the current maximum wait, so this is not a regression.
> Some waiting is needed to help slow down memory allocation to the
> rate that we can complete writeout of pages.
>
> In those rare cases where it is nfsd, or something that nfsd is
> waiting for, that is calling nfs_release_page(), this delay will at
> most cause a small hic-cough in places where it currently deadlocks.
>
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfs/file.c  |   24 ++++++++++++++----------
>  fs/nfs/write.c |    2 ++
>  2 files changed, 16 insertions(+), 10 deletions(-)
>
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index 524dd80d1898..8d74983417af 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -468,17 +468,21 @@ static int nfs_release_page(struct page *page, gfp_t gfp)
>  
>  	dfprintk(PAGECACHE, "NFS: release_page(%p)\n", page);
>  
> -	/* Only do I/O if gfp is a superset of GFP_KERNEL, and we're not
> -	 * doing this memory reclaim for a fs-related allocation.
> +	/* Always try to initiate a 'commit' if relevant, but only
> +	 * wait for it if __GFP_WAIT is set and the calling process is
> +	 * allowed to block.  Even then, only wait 1 second.  Waiting
> +	 * indefinitely can cause deadlocks when the NFS server is on
> +	 * this machine, and there is no particular need to wait
> +	 * extensively here.  A short wait has the benefit that
> +	 * someone else can worry about the freezer.
>  	 */
> -	if (mapping && (gfp & GFP_KERNEL) == GFP_KERNEL &&
> -	    !(current->flags & PF_FSTRANS)) {
> -		int how = FLUSH_SYNC;
> -
> -		/* Don't let kswapd deadlock waiting for OOM RPC calls */
> -		if (current_is_kswapd())
> -			how = 0;
> -		nfs_commit_inode(mapping->host, how);
> +	if (mapping) {
> +		nfs_commit_inode(mapping->host, 0);
> +		if ((gfp & __GFP_WAIT) &&
> +		    !current_is_kswapd() &&
> +		    !(current->flags & PF_FSTRANS))
> +			wait_on_page_bit_killable_timeout(page, PG_private,
> +							  HZ);
>  	}
>  	/* If PagePrivate() is set, then the page is not freeable */
>  	if (PagePrivate(page))
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index 175d5d073ccf..b5d83c7545d4 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -731,6 +731,8 @@ static void nfs_inode_remove_request(struct nfs_page *req)
>  		if (likely(!PageSwapCache(head->wb_page))) {
>  			set_page_private(head->wb_page, 0);
>  			ClearPagePrivate(head->wb_page);
> +			smp_mb__after_atomic();
> +			wake_up_page(head->wb_page, PG_private);
>  			clear_bit(PG_MAPPED, &head->wb_flags);
>  		}
>  		nfsi->npages--;
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2014-09-16 12:39 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-16  5:31 [PATCH 0/4] Remove possible deadlocks in nfs_release_page() NeilBrown
2014-09-16  5:31 ` NeilBrown
2014-09-16  5:31 ` [PATCH 2/4] MM: export page_wakeup functions NeilBrown
2014-09-16  5:31   ` NeilBrown
2014-09-16  5:31 ` [PATCH 3/4] NFS: avoid deadlocks with loop-back mounted NFS filesystems NeilBrown
2014-09-16  5:31   ` NeilBrown
2014-09-16 12:39   ` Anna Schumaker [this message]
2014-09-16 12:39     ` Anna Schumaker
2014-09-16 23:37     ` NeilBrown
2014-09-16 23:37       ` NeilBrown
2014-09-16  5:31 ` [PATCH 4/4] NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page() NeilBrown
2014-09-16  5:31   ` NeilBrown
2014-09-16 22:04   ` Trond Myklebust
2014-09-16 22:04     ` Trond Myklebust
2014-09-17  1:10     ` NeilBrown
2014-09-17  1:32       ` Trond Myklebust
2014-09-17  1:32         ` Trond Myklebust
2014-09-17  3:12         ` NeilBrown
2014-09-17  3:12           ` NeilBrown
2014-09-16  5:31 ` [PATCH 1/4] SCHED: add some "wait..on_bit...timeout()" interfaces NeilBrown
2014-09-16  5:31   ` NeilBrown
2014-09-18 14:42   ` Peter Zijlstra
2014-09-18 14:42     ` Peter Zijlstra
2014-09-23  2:10     ` NeilBrown
2014-09-23 21:30       ` Andrew Morton
2014-09-23 21:30         ` Andrew Morton
2014-09-16 11:47 ` [PATCH 0/4] Remove possible deadlocks in nfs_release_page() Jeff Layton
2014-09-16 11:47   ` Jeff Layton
2014-09-16 11:47   ` Jeff Layton
2014-09-16 23:41   ` NeilBrown
2014-09-16 23:41     ` NeilBrown
2014-09-17  0:19     ` Jeff Layton
  -- strict thread matches above, loose matches on Subject: below --
2014-09-18  6:03 [PATCH 0/4] Remove possible deadlocks in nfs_release_page() - V2 NeilBrown
2014-09-18  6:03 ` [PATCH 3/4] NFS: avoid deadlocks with loop-back mounted NFS filesystems NeilBrown
2014-09-18 12:01   ` Jeff Layton
2014-09-22  1:37     ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54182F8B.8010302@Netapp.com \
    --to=anna.schumaker@netapp.com \
    --cc=akpm@linux-foundation.org \
    --cc=jeff.layton@primarydata.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=neilb@suse.de \
    --cc=peterz@infradead.org \
    --cc=trond.myklebust@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.