All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: xfs <linux-xfs@vger.kernel.org>,
	Dave Chinner <david@fromorbit.com>,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCH] xfs: don't nest transactions when scanning for eofblocks
Date: Fri, 19 Feb 2021 08:09:53 -0500	[thread overview]
Message-ID: <20210219130953.GB757814@bfoster> (raw)
In-Reply-To: <20210219042940.GB7193@magnolia>

On Thu, Feb 18, 2021 at 08:29:40PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Brian Foster reported a lockdep warning on xfs/167:
> 
> ============================================
> WARNING: possible recursive locking detected
> 5.11.0-rc4 #35 Tainted: G        W I
> --------------------------------------------
> fsstress/17733 is trying to acquire lock:
> ffff8e0fd1d90650 (sb_internal){++++}-{0:0}, at: xfs_free_eofblocks+0x104/0x1d0 [xfs]
> 
> but task is already holding lock:
> ffff8e0fd1d90650 (sb_internal){++++}-{0:0}, at: xfs_trans_alloc_inode+0x5f/0x160 [xfs]
> 
> stack backtrace:
> CPU: 38 PID: 17733 Comm: fsstress Tainted: G        W I       5.11.0-rc4 #35
> Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
> Call Trace:
>  dump_stack+0x8b/0xb0
>  __lock_acquire.cold+0x159/0x2ab
>  lock_acquire+0x116/0x370
>  xfs_trans_alloc+0x1ad/0x310 [xfs]
>  xfs_free_eofblocks+0x104/0x1d0 [xfs]
>  xfs_blockgc_scan_inode+0x24/0x60 [xfs]
>  xfs_inode_walk_ag+0x202/0x4b0 [xfs]
>  xfs_inode_walk+0x66/0xc0 [xfs]
>  xfs_trans_alloc+0x160/0x310 [xfs]
>  xfs_trans_alloc_inode+0x5f/0x160 [xfs]
>  xfs_alloc_file_space+0x105/0x300 [xfs]
>  xfs_file_fallocate+0x270/0x460 [xfs]
>  vfs_fallocate+0x14d/0x3d0
>  __x64_sys_fallocate+0x3e/0x70
>  do_syscall_64+0x33/0x40
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> The cause of this is the new code that spurs a scan to garbage collect
> speculative preallocations if we fail to reserve enough blocks while
> allocating a transaction.  While the warning itself is a fairly benign
> lockdep complaint, it does bring to light a potential livelock.
> 
> Specifically, when we kick off that scan, we're still holding onto the
> transaction's log reservation.  If the blockgc scan finds something to
> free, it will need its own transaction, which means that it can block on
> the log grant.  This means that if there are enough writer threads to
> take all the log reservation space with that first transaction, the
> second reservation attempts will all block on log space that cannot be
> freed, leading to a livelock.
> 

The text above around a prospective livelock doesn't seem accurate.
Otherwise the code looks fine to me. I don't have a preference between
this patch or the other one...

Brian

> Fix this by freeing the transaction and jumping back to xfs_trans_alloc
> like this patch in the V4 submission[1].
> 
> [1] https://lore.kernel.org/linux-xfs/161142798066.2171939.9311024588681972086.stgit@magnolia/
> 
> Fixes: a1a7d05a0576 ("xfs: flush speculative space allocations when we run out of space")
> Reported-by: Brian Foster <bfoster@redhat.com>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/xfs_trans.c |   13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index 44f72c09c203..377f3961d7ed 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -260,6 +260,7 @@ xfs_trans_alloc(
>  	struct xfs_trans	**tpp)
>  {
>  	struct xfs_trans	*tp;
> +	bool			want_retry = true;
>  	int			error;
>  
>  	/*
> @@ -267,6 +268,7 @@ xfs_trans_alloc(
>  	 * GFP_NOFS allocation context so that we avoid lockdep false positives
>  	 * by doing GFP_KERNEL allocations inside sb_start_intwrite().
>  	 */
> +retry:
>  	tp = kmem_cache_zalloc(xfs_trans_zone, GFP_KERNEL | __GFP_NOFAIL);
>  	if (!(flags & XFS_TRANS_NO_WRITECOUNT))
>  		sb_start_intwrite(mp->m_super);
> @@ -289,7 +291,9 @@ xfs_trans_alloc(
>  	tp->t_firstblock = NULLFSBLOCK;
>  
>  	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> -	if (error == -ENOSPC) {
> +	if (error == -ENOSPC && want_retry) {
> +		xfs_trans_cancel(tp);
> +
>  		/*
>  		 * We weren't able to reserve enough space for the transaction.
>  		 * Flush the other speculative space allocations to free space.
> @@ -297,8 +301,11 @@ xfs_trans_alloc(
>  		 * other locks.
>  		 */
>  		error = xfs_blockgc_free_space(mp, NULL);
> -		if (!error)
> -			error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> +		if (error)
> +			return error;
> +
> +		want_retry = false;
> +		goto retry;
>  	}
>  	if (error) {
>  		xfs_trans_cancel(tp);
> 


  reply	other threads:[~2021-02-19 13:11 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-19  4:29 [PATCH] xfs: don't nest transactions when scanning for eofblocks Darrick J. Wong
2021-02-19 13:09 ` Brian Foster [this message]
2021-02-19 17:23   ` Darrick J. Wong
2021-02-19 17:23 ` [PATCH v2] " Darrick J. Wong
2021-02-19 18:12   ` Brian Foster
2021-02-20  3:44   ` Allison Henderson
2021-02-25  7:45   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210219130953.GB757814@bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.