From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Omar Sandoval <osandov@osandov.com>
Cc: linux-xfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 1/2] xfs: fix realtime file data space leak
Date: Mon, 2 Dec 2019 17:56:45 -0800 [thread overview]
Message-ID: <20191203015645.GG7335@magnolia> (raw)
In-Reply-To: <fe86a7464d77f770736404a9fbfcdfbb04b59826.1574799066.git.osandov@fb.com>
On Tue, Nov 26, 2019 at 12:13:28PM -0800, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
>
> Realtime files in XFS allocate extents in rextsize units. However, the
> written/unwritten state of those extents is still tracked in blocksize
> units. Therefore, a realtime file can be split up into written and
> unwritten extents that are not necessarily aligned to the realtime
> extent size. __xfs_bunmapi() has some logic to handle these various
> corner cases. Consider how it handles the following case:
>
> 1. The last extent is unwritten.
> 2. The last extent is smaller than the realtime extent size.
> 3. startblock of the last extent is not aligned to the realtime extent
> size, but startblock + blockcount is.
>
> In this case, __xfs_bunmapi() calls xfs_bmap_add_extent_unwritten_real()
> to set the second-to-last extent to unwritten. This should merge the
> last and second-to-last extents, so __xfs_bunmapi() moves on to the
> second-to-last extent.
>
> However, if the size of the last and second-to-last extents combined is
> greater than MAXEXTLEN, xfs_bmap_add_extent_unwritten_real() does not
> merge the two extents. When that happens, __xfs_bunmapi() skips past the
> last extent without unmapping it, thus leaking the space.
>
> Fix it by only unwriting the minimum amount needed to align the last
> extent to the realtime extent size, which is guaranteed to merge with
> the last extent.
>
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
> fs/xfs/libxfs/xfs_bmap.c | 25 ++++++++++++++-----------
> 1 file changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 02469d59c787..6f8791a1e460 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -5376,16 +5376,17 @@ __xfs_bunmapi(
> }
> div_u64_rem(del.br_startblock, mp->m_sb.sb_rextsize, &mod);
> if (mod) {
> + xfs_extlen_t off = mp->m_sb.sb_rextsize - mod;
> +
> /*
> * Realtime extent is lined up at the end but not
> * at the front. We'll get rid of full extents if
> * we can.
> */
> - mod = mp->m_sb.sb_rextsize - mod;
> - if (del.br_blockcount > mod) {
> - del.br_blockcount -= mod;
> - del.br_startoff += mod;
> - del.br_startblock += mod;
> + if (del.br_blockcount > off) {
> + del.br_blockcount -= off;
> + del.br_startoff += off;
> + del.br_startblock += off;
Ok, so we make this change so that we no longer change @mod once it's
set by the div64 operation...
> } else if (del.br_startoff == start &&
> (del.br_state == XFS_EXT_UNWRITTEN ||
> tp->t_blk_res == 0)) {
> @@ -5403,6 +5404,7 @@ __xfs_bunmapi(
> continue;
> } else if (del.br_state == XFS_EXT_UNWRITTEN) {
> struct xfs_bmbt_irec prev;
> + xfs_fileoff_t unwrite_start;
>
> /*
> * This one is already unwritten.
> @@ -5416,12 +5418,13 @@ __xfs_bunmapi(
> ASSERT(!isnullstartblock(prev.br_startblock));
> ASSERT(del.br_startblock ==
> prev.br_startblock + prev.br_blockcount);
> - if (prev.br_startoff < start) {
> - mod = start - prev.br_startoff;
> - prev.br_blockcount -= mod;
> - prev.br_startblock += mod;
> - prev.br_startoff = start;
> - }
...and here, we have a @del extent that is unwritten and a @prev extent
that is written. We aim to trick xfs_bmap_add_extent_unwritten_real
into extending @del towards startoff==0 and returning with @icur
pointing at @del (not @prev) so that the next time we go around the loop
we see an rtextsize-aligned @del and simply unmap it...
> + unwrite_start = max3(start,
> + del.br_startoff - mod,
> + prev.br_startoff);
...however, if @prev is too long to convert+combine with @del, the
conversion routine converts @prev to unwritten and returns with @icur
pointing to @prev, not @del. That's how we leak @del.
This patch fixes that by capping the conversion to the start of the
rtext alignment, which means that we can always merge with @del and
always return with @icur pointing at @del. Ok, that's exactly what the
commit message says.
It was /really/ helpful to be able to use the test case to walk through
exactly what this patch is trying to fix.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
--D
> + mod = unwrite_start - prev.br_startoff;
> + prev.br_startoff = unwrite_start;
> + prev.br_startblock += mod;
> + prev.br_blockcount -= mod;
> prev.br_state = XFS_EXT_UNWRITTEN;
> error = xfs_bmap_add_extent_unwritten_real(tp,
> ip, whichfork, &icur, &cur,
> --
> 2.24.0
>
next prev parent reply other threads:[~2019-12-03 1:56 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-26 20:13 [PATCH 0/2] xfs: fixes for realtime file truncation Omar Sandoval
2019-11-26 20:13 ` [PATCH 1/2] xfs: fix realtime file data space leak Omar Sandoval
2019-12-03 1:56 ` Darrick J. Wong [this message]
2019-11-26 20:13 ` [PATCH 2/2] xfs: don't check for AG deadlock for realtime files in bunmapi Omar Sandoval
2019-11-27 0:36 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191203015645.GG7335@magnolia \
--to=darrick.wong@oracle.com \
--cc=kernel-team@fb.com \
--cc=linux-xfs@vger.kernel.org \
--cc=osandov@osandov.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.