public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 03/14] libxfs: support unmapping reflink blocks
Date: Wed, 1 Jul 2015 11:26:32 +1000	[thread overview]
Message-ID: <20150701012632.GR22807@dastard> (raw)
In-Reply-To: <20150625233930.4992.88802.stgit@birch.djwong.org>

On Thu, Jun 25, 2015 at 04:39:30PM -0700, Darrick J. Wong wrote:
> When we're unmapping blocks from a file, we need to decrease refcounts
> in the btree and only free blocks if they refcount is 1.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c          |    5 +
>  fs/xfs/libxfs/xfs_reflink_btree.c |  140 +++++++++++++++++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_reflink_btree.h |    4 +
>  3 files changed, 147 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 057fa9a..3f5e8da 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -45,6 +45,7 @@
>  #include "xfs_symlink.h"
>  #include "xfs_attr_leaf.h"
>  #include "xfs_filestream.h"
> +#include "xfs_reflink_btree.h"
>  
>  
>  kmem_zone_t		*xfs_bmap_free_item_zone;
> @@ -4984,8 +4985,8 @@ xfs_bmap_del_extent(
>  	 * If we need to, add to list of extents to delete.
>  	 */
>  	if (do_fx)
> -		xfs_bmap_add_free(mp, flist, del->br_startblock,
> -				  del->br_blockcount, ip->i_ino);
> +		xfs_reflink_bmap_add_free(mp, flist, del->br_startblock,
> +					  del->br_blockcount, ip->i_ino, tp);

I think this is the wrong abstraction. I think the code should look
like this:

	if (do_fx) {
		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
			error = xfs_reflink_del_extent(mp, tp, flist,
						del->br_startblock,
						del->br_blockcount, ip->i_ino);
			if (error)
				goto done;
		} else
			xfs_bmap_add_free()
	}

Because what we are doing is deleting an extent from the reflink
btree, not adding a freed extent to the "to-be-freed" list.


> diff --git a/fs/xfs/libxfs/xfs_reflink_btree.c b/fs/xfs/libxfs/xfs_reflink_btree.c
> index 380ed72..f40ba1f 100644
> --- a/fs/xfs/libxfs/xfs_reflink_btree.c
> +++ b/fs/xfs/libxfs/xfs_reflink_btree.c

Again, xfs_reflink.c

> @@ -935,3 +935,143 @@ error0:
>  	xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
>  	return error;
>  }
> +
> +/**
> + * xfs_reflink_bmap_add_free() - release a range of blocks
> + *
> + * @mp: XFS mount object
> + * @flist: List of blocks to be freed at the end of the transaction
> + * @fsbno: First fs block of the range to release
> + * @len: Length of range
> + * @owner: owner of the extent
> + * @tp: transaction that goes with the free operation
> + */
> +int
> +xfs_reflink_bmap_add_free(
> +	struct xfs_mount	*mp,		/* mount point structure */
> +	xfs_bmap_free_t		*flist,		/* list of extents */
> +	xfs_fsblock_t		fsbno,		/* fs block number of extent */
> +	xfs_filblks_t		fslen,		/* length of extent */
> +	uint64_t		owner,		/* extent owner */
> +	struct xfs_trans	*tp)		/* transaction */
> +{
> +	struct xfs_btree_cur	*cur;
> +	int			error;
> +	struct xfs_buf		*agbp;
> +	xfs_agnumber_t		agno;		/* allocation group number */
> +	xfs_agblock_t		agbno;		/* ag start of range to free */
> +	xfs_agblock_t		agbend;		/* ag end of range to free */
> +	xfs_extlen_t		aglen;		/* ag length of range to free */
> +	int			i, have;
> +	xfs_agblock_t		lbno;		/* rlextent start */
> +	xfs_extlen_t		llen;		/* rlextent length */
> +	xfs_nlink_t		lnr;		/* rlextent refcount */
> +	xfs_agblock_t		bno;		/* rlext block # in loop */
> +	xfs_extlen_t		len;		/* rlext length in loop */
> +	unsigned long long	blocks_freed;
> +	xfs_fsblock_t		range_fsb;
> +
> +	if (!xfs_sb_version_hasreflink(&mp->m_sb)) {
> +		xfs_bmap_add_free(mp, flist, fsbno, fslen, owner);
> +		return 0;
> +	}

That canbe dropped.
> +
> +	agno = XFS_FSB_TO_AGNO(mp, fsbno);
> +	agbno = XFS_FSB_TO_AGBNO(mp, fsbno);
> +	CHECK_AG_NUMBER(mp, agno);
> +	ASSERT(fslen < mp->m_sb.sb_agblocks);
> +	CHECK_AG_EXTENT(mp, agbno, fslen);

These extent lengths have already been checked. If they are invalid,
then the extent deletion would have errored out with corruption
long before we get here.

> +	aglen = fslen;
> +
> +	/*
> +	 * Drop reference counts in the reflink tree.
> +	 */
> +	error = xfs_alloc_read_agf(mp, tp, agno, 0, &agbp);
> +	if (error)
> +		return error;
> +
> +	/*
> +	 * Grab a rl btree cursor.
> +	 */
> +	cur = xfs_reflinkbt_init_cursor(mp, tp, agbp, agno);
> +	bno = agbno;
> +	len = aglen;
> +	agbend = agbno + aglen - 1;
> +	blocks_freed = 0;
> +
> +	/*
> +	 * Account for a left extent that partially covers our range.
> +	 */
> +	error = xfs_reflink_lookup_le(cur, bno, &have);
> +	if (error)
> +		goto error0;
> +	if (have) {
> +		error = xfs_reflink_get_rec(cur, &lbno, &llen, &lnr, &i);
> +		if (error)
> +			goto error0;
> +		XFS_WANT_CORRUPTED_RLEXT_GOTO(mp, i, lbno, llen, lnr, error0);
> +		if (lbno + llen > bno) {
> +			blocks_freed += min(len, lbno + llen - bno);
> +			bno += blocks_freed;
> +			len -= blocks_freed;
> +		}
> +	}

So we unconditionally look up the reflink btree on extent free to
see if we need to free it, even if the inode has not been reflinked?
Doesn't this add a lot of overhead to the extent freeing?

Indeed, why not just mark inodes that have been reflinked (i.e. have
shared extents) with an on-disk flag so that we know if we need to
do reflink btree work or not? That way the code fragment above could
just check an inode flag rather than always calling into this
function for reflink enabled filesystems....

> +	while (len > 0) {
> +		/*
> +		 * Go find the next rlext.
> +		 */
> +		range_fsb = XFS_AGB_TO_FSB(mp, agno, bno);
> +		error = xfs_btree_increment(cur, 0, &have);
> +		if (error)
> +			goto error0;
> +		if (!have) {
> +			/*
> +			 * There's no right rlextent, so free bno to the end.
> +			 */
> +			lbno = bno + len;
> +			llen = 0;
> +		} else {
> +			/*
> +			 * Find the next rlextent.
> +			 */
> +			error = xfs_reflink_get_rec(cur, &lbno, &llen,
> +					&lnr, &i);
> +			if (error)
> +				goto error0;
> +			XFS_WANT_CORRUPTED_RLEXT_GOTO(mp, i, lbno, llen, lnr,
> +						      error0);
> +			if (lbno >= bno + len) {
> +				lbno = bno + len;
> +				llen = 0;
> +			}
> +		}
> +
> +		/*
> +		 * Free everything up to the start of the rlextent and
> +		 * account for still-mapped blocks.
> +		 */
> +		if (lbno - bno > 0) {
> +			xfs_bmap_add_free(mp, flist, range_fsb, lbno - bno,
> +					  owner);
> +			len -= lbno - bno;
> +			bno += lbno - bno;
> +		}
> +		llen = min(llen, agbend + 1 - lbno);
> +		blocks_freed += llen;
> +		len -= llen;
> +		bno += llen;
> +	}
> +
> +	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> +
> +	error = xfs_reflinkbt_adjust_refcount(mp, tp, agbp, agno, agbno, aglen,
> +					      -1);

Hmmm - we just walked the btree to determine what extents to
free, and now we are going to walk the btree again to drop the
reference counts on shared extents? So every extent that gets freed
does two walks of the reflink btree regardless of the whether it has
shared blocks or not?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2015-07-01  1:26 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-25 23:39 [RFC(RAP) 00/14] xfs: add reflink and dedupe support Darrick J. Wong
2015-06-25 23:39 ` [PATCH 01/14] xfs: create a per-AG btree to track reference counts Darrick J. Wong
2015-07-01  0:13   ` Dave Chinner
2015-07-01 22:52     ` Darrick J. Wong
2015-07-01 23:30       ` Dave Chinner
2015-06-25 23:39 ` [PATCH 02/14] libxfs: adjust refcounts in reflink btree Darrick J. Wong
2015-07-01  1:06   ` Dave Chinner
2015-07-01 23:10     ` Darrick J. Wong
2015-07-01 23:32       ` Dave Chinner
2015-06-25 23:39 ` [PATCH 03/14] libxfs: support unmapping reflink blocks Darrick J. Wong
2015-07-01  1:26   ` Dave Chinner [this message]
2015-07-02  2:27     ` Darrick J. Wong
2015-06-25 23:39 ` [PATCH 04/14] libxfs: block-mapper changes to support reflink Darrick J. Wong
2015-06-25 23:39 ` [PATCH 05/14] xfs: add reflink functions and ioctl Darrick J. Wong
2015-06-25 23:39 ` [PATCH 06/14] xfs: implement copy-on-write for reflinked blocks Darrick J. Wong
2015-06-25 23:39 ` [PATCH 07/14] xfs: handle directio " Darrick J. Wong
2015-06-25 23:40 ` [PATCH 08/14] xfs: teach fiemap about reflink'd extents Darrick J. Wong
2015-06-25 23:40 ` [PATCH 09/14] xfs: copy-on-write reflinked blocks when zeroing ranges of blocks Darrick J. Wong
2015-06-25 23:40 ` [PATCH 10/14] xfs: minimize impact to non-reflink files via reflink per-inode flag Darrick J. Wong
2015-07-01  1:58   ` Dave Chinner
2015-07-01 22:59     ` Darrick J. Wong
2015-07-01 23:49       ` Dave Chinner
2015-07-02  2:32     ` Darrick J. Wong
2015-07-02  7:07       ` Dave Chinner
2015-06-25 23:40 ` [PATCH 11/14] xfs: emulate the btrfs dedupe extent same ioctl Darrick J. Wong
2015-06-25 23:40 ` [PATCH 12/14] xfs: support XFS_XFLAG_REFLINK (and FS_NOCOW_FL) on reflink filesystems Darrick J. Wong
2015-06-25 23:40 ` [PATCH 13/14] xfs: add reflink btree root when expanding the filesystem Darrick J. Wong
2015-06-25 23:40 ` [PATCH 14/14] xfs: add reflink btree block detection to log recovery Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150701012632.GR22807@dastard \
    --to=david@fromorbit.com \
    --cc=darrick.wong@oracle.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox