public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Omar Sandoval <osandov@osandov.com>
Cc: linux-xfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 1/2] xfs: fix realtime file data space leak
Date: Mon, 2 Dec 2019 17:56:45 -0800	[thread overview]
Message-ID: <20191203015645.GG7335@magnolia> (raw)
In-Reply-To: <fe86a7464d77f770736404a9fbfcdfbb04b59826.1574799066.git.osandov@fb.com>

On Tue, Nov 26, 2019 at 12:13:28PM -0800, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Realtime files in XFS allocate extents in rextsize units. However, the
> written/unwritten state of those extents is still tracked in blocksize
> units. Therefore, a realtime file can be split up into written and
> unwritten extents that are not necessarily aligned to the realtime
> extent size. __xfs_bunmapi() has some logic to handle these various
> corner cases. Consider how it handles the following case:
> 
> 1. The last extent is unwritten.
> 2. The last extent is smaller than the realtime extent size.
> 3. startblock of the last extent is not aligned to the realtime extent
>    size, but startblock + blockcount is.
> 
> In this case, __xfs_bunmapi() calls xfs_bmap_add_extent_unwritten_real()
> to set the second-to-last extent to unwritten. This should merge the
> last and second-to-last extents, so __xfs_bunmapi() moves on to the
> second-to-last extent.
> 
> However, if the size of the last and second-to-last extents combined is
> greater than MAXEXTLEN, xfs_bmap_add_extent_unwritten_real() does not
> merge the two extents. When that happens, __xfs_bunmapi() skips past the
> last extent without unmapping it, thus leaking the space.
> 
> Fix it by only unwriting the minimum amount needed to align the last
> extent to the realtime extent size, which is guaranteed to merge with
> the last extent.
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c | 25 ++++++++++++++-----------
>  1 file changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 02469d59c787..6f8791a1e460 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -5376,16 +5376,17 @@ __xfs_bunmapi(
>  		}
>  		div_u64_rem(del.br_startblock, mp->m_sb.sb_rextsize, &mod);
>  		if (mod) {
> +			xfs_extlen_t off = mp->m_sb.sb_rextsize - mod;
> +
>  			/*
>  			 * Realtime extent is lined up at the end but not
>  			 * at the front.  We'll get rid of full extents if
>  			 * we can.
>  			 */
> -			mod = mp->m_sb.sb_rextsize - mod;
> -			if (del.br_blockcount > mod) {
> -				del.br_blockcount -= mod;
> -				del.br_startoff += mod;
> -				del.br_startblock += mod;
> +			if (del.br_blockcount > off) {
> +				del.br_blockcount -= off;
> +				del.br_startoff += off;
> +				del.br_startblock += off;

Ok, so we make this change so that we no longer change @mod once it's
set by the div64 operation...

>  			} else if (del.br_startoff == start &&
>  				   (del.br_state == XFS_EXT_UNWRITTEN ||
>  				    tp->t_blk_res == 0)) {
> @@ -5403,6 +5404,7 @@ __xfs_bunmapi(
>  				continue;
>  			} else if (del.br_state == XFS_EXT_UNWRITTEN) {
>  				struct xfs_bmbt_irec	prev;
> +				xfs_fileoff_t		unwrite_start;
>  
>  				/*
>  				 * This one is already unwritten.
> @@ -5416,12 +5418,13 @@ __xfs_bunmapi(
>  				ASSERT(!isnullstartblock(prev.br_startblock));
>  				ASSERT(del.br_startblock ==
>  				       prev.br_startblock + prev.br_blockcount);
> -				if (prev.br_startoff < start) {
> -					mod = start - prev.br_startoff;
> -					prev.br_blockcount -= mod;
> -					prev.br_startblock += mod;
> -					prev.br_startoff = start;
> -				}

...and here, we have a @del extent that is unwritten and a @prev extent
that is written.  We aim to trick xfs_bmap_add_extent_unwritten_real
into extending @del towards startoff==0 and returning with @icur
pointing at @del (not @prev) so that the next time we go around the loop
we see an rtextsize-aligned @del and simply unmap it...

> +				unwrite_start = max3(start,
> +						     del.br_startoff - mod,
> +						     prev.br_startoff);

...however, if @prev is too long to convert+combine with @del, the
conversion routine converts @prev to unwritten and returns with @icur
pointing to @prev, not @del.  That's how we leak @del.

This patch fixes that by capping the conversion to the start of the
rtext alignment, which means that we can always merge with @del and
always return with @icur pointing at @del.  Ok, that's exactly what the
commit message says.

It was /really/ helpful to be able to use the test case to walk through
exactly what this patch is trying to fix.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> +				mod = unwrite_start - prev.br_startoff;
> +				prev.br_startoff = unwrite_start;
> +				prev.br_startblock += mod;
> +				prev.br_blockcount -= mod;
>  				prev.br_state = XFS_EXT_UNWRITTEN;
>  				error = xfs_bmap_add_extent_unwritten_real(tp,
>  						ip, whichfork, &icur, &cur,
> -- 
> 2.24.0
> 

  reply	other threads:[~2019-12-03  1:56 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-26 20:13 [PATCH 0/2] xfs: fixes for realtime file truncation Omar Sandoval
2019-11-26 20:13 ` [PATCH 1/2] xfs: fix realtime file data space leak Omar Sandoval
2019-12-03  1:56   ` Darrick J. Wong [this message]
2019-11-26 20:13 ` [PATCH 2/2] xfs: don't check for AG deadlock for realtime files in bunmapi Omar Sandoval
2019-11-27  0:36   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191203015645.GG7335@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=kernel-team@fb.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=osandov@osandov.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox