public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Carlos Maiolino <cmaiolino@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH V2] Stop searching for free slots in an inode chunk when there are none
Date: Mon, 7 Aug 2017 11:19:56 -0400	[thread overview]
Message-ID: <20170807151954.GB43920@bfoster.bfoster> (raw)
In-Reply-To: <20170807101710.18734-1-cmaiolino@redhat.com>

On Mon, Aug 07, 2017 at 12:17:10PM +0200, Carlos Maiolino wrote:
> In a filesystem without finobt, the Space manager selects an AG to alloc a new
> inode, where xfs_dialloc_ag_inobt() will search the AG for the free slot chunk.
> 
> When the new inode is in the samge AG as its parent, the btree will be searched
> starting on the parent's record, and then retried from the top if no slot is
> available beyond the parent's record.
> 
> To exit this loop though, xfs_dialloc_ag_inobt(), relies on the fact that the
> btree must have a free slot available, once its callers relied on the
> agi->freecount when deciding how/where to allocate this new inode.
> 
> In the case when the agi->freecount is corrupted, showing available inodes in an
> AG, when in fact there is none, this becomes an infinite loop.
> 
> Add a way to stop the loop when a free slot is not found in the btree, making
> the function to fall into the whole AG scan which will then, be able to detect
> the corruption and shut the filesystem down.
> 
> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
> ---
> 
> V2: Use searchdistance variable to stop the infinite loop instead of adding a
>     new mechanism.
> 
>  fs/xfs/libxfs/xfs_ialloc.c | 34 ++++++++++++++++++----------------
>  1 file changed, 18 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index d41ade5d293e..8e535290390d 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -1123,6 +1123,7 @@ xfs_dialloc_ag_inobt(
>  	int			error;
>  	int			offset;
>  	int			i, j;
> +	int			searchdistance = 10;
>  
>  	pag = xfs_perag_get(mp, agno);
>  
> @@ -1149,7 +1150,6 @@ xfs_dialloc_ag_inobt(
>  	if (pagno == agno) {
>  		int		doneleft;	/* done, to the left */
>  		int		doneright;	/* done, to the right */
> -		int		searchdistance = 10;
>  
>  		error = xfs_inobt_lookup(cur, pagino, XFS_LOOKUP_LE, &i);
>  		if (error)
> @@ -1210,10 +1210,10 @@ xfs_dialloc_ag_inobt(
>  		/*
>  		 * Loop until we find an inode chunk with a free inode.
>  		 */
> -		while (!doneleft || !doneright) {
> +		while (--searchdistance > 0 && (!doneleft || !doneright)) {
>  			int	useleft;  /* using left inode chunk this time */
>  
> -			if (!--searchdistance) {
> +			if (searchdistance <= 0) {

If the loop terminates when searchdistance <= 0, when would we ever hit
this?

>  				/*
>  				 * Not in range - save last search
>  				 * location and allocate a new inode
> @@ -1268,19 +1268,21 @@ xfs_dialloc_ag_inobt(
>  				goto error1;
>  		}
>  
> -		/*
> -		 * We've reached the end of the btree. because
> -		 * we are only searching a small chunk of the
> -		 * btree each search, there is obviously free
> -		 * inodes closer to the parent inode than we
> -		 * are now. restart the search again.
> -		 */
> -		pag->pagl_pagino = NULLAGINO;
> -		pag->pagl_leftrec = NULLAGINO;
> -		pag->pagl_rightrec = NULLAGINO;
> -		xfs_btree_del_cursor(tcur, XFS_BTREE_NOERROR);
> -		xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> -		goto restart_pagno;
> +		if (searchdistance > 0) {
> +			/*
> +			 * We've reached the end of the btree. because
> +			 * we are only searching a small chunk of the
> +			 * btree each search, there is obviously free
> +			 * inodes closer to the parent inode than we
> +			 * are now. restart the search again.
> +			 */
> +			pag->pagl_pagino = NULLAGINO;
> +			pag->pagl_leftrec = NULLAGINO;
> +			pag->pagl_rightrec = NULLAGINO;
> +			xfs_btree_del_cursor(tcur, XFS_BTREE_NOERROR);
> +			xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> +			goto restart_pagno;
> +		}

Hmm.. so if we have a tree with just a handful of records (say 3, for
example), we'd end up running through the tree searchdistance times
before correctly identifying the state and moving on (to the full
search). We avoid the livelock, but fwiw, that still seems like
buggy/brittle code to me. That's just my .02. If we want to use this
approach, we should at least add a comment somewhere that explains
how/why we handle this situation as such (i.e., that we require/rely on
searchdistance to deplete in the worst case of a corruption).

Also note that this has potential for performance side effects in the
common (non-corruption) case. That may or may not ultimately be a
problem, but the fallback to the optimized search is the brute force AG
scan. Given that, I think that case at least has to be considered and
noted in the commit log.

It looks to me that it shouldn't be a major problem because it only
affects the situation where the cached search "wraps" to the outside of
the tree, and that probably doesn't happen often with a search distance
of 10 records and a large tree. I am a bit curious where the
searchdistance of 10 comes from though (we fit many more records in a
single inobt leaf block)..?

Brian

>  	}
>  
>  	/*
> -- 
> 2.13.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2017-08-07 15:20 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-07 10:17 [PATCH V2] Stop searching for free slots in an inode chunk when there are none Carlos Maiolino
2017-08-07 15:19 ` Brian Foster [this message]
2017-08-10 11:56   ` Dave Chinner
2017-08-10 12:28     ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170807151954.GB43920@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=cmaiolino@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox