public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: david@fromorbit.com, darrick.wong@oracle.com
Cc: xfs@oss.sgi.com
Subject: [PATCH 45/51] xfs_repair: process reverse-mapping data into refcount data
Date: Tue, 06 Oct 2015 22:10:05 -0700	[thread overview]
Message-ID: <20151007051005.1504.89142.stgit@birch.djwong.org> (raw)
In-Reply-To: <20151007050513.1504.28089.stgit@birch.djwong.org>

Take all the reverse-mapping data we've acquired and use it to generate
reference count data.  This data is used in phase 5 to rebuild the
refcount btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/phase4.c |   27 ++++++
 repair/rmap.c   |  236 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/rmap.h   |    2 
 3 files changed, 263 insertions(+), 2 deletions(-)


diff --git a/repair/phase4.c b/repair/phase4.c
index 98aab35..0be8579 100644
--- a/repair/phase4.c
+++ b/repair/phase4.c
@@ -183,6 +183,21 @@ _("%s while checking reverse-mappings"),
 }
 
 static void
+compute_ag_refcounts(
+	work_queue_t	*wq,
+	xfs_agnumber_t	agno,
+	void		*arg)
+{
+	int		error;
+
+	error = compute_refcounts(wq->mp, agno);
+	if (error)
+		do_error(
+_("%s while computing reference count records.\n"),
+			 strerror(-error));
+}
+
+static void
 process_rmap_data(
 	struct xfs_mount	*mp)
 {
@@ -196,6 +211,14 @@ process_rmap_data(
 	for (i = 0; i < mp->m_sb.sb_agcount; i++)
 		queue_work(&wq, check_rmap_btrees, i, NULL);
 	destroy_work_queue(&wq);
+
+	if (!xfs_sb_version_hasreflink(&mp->m_sb))
+		return;
+
+	create_work_queue(&wq, mp, libxfs_nproc());
+	for (i = 0; i < mp->m_sb.sb_agcount; i++)
+		queue_work(&wq, compute_ag_refcounts, i, NULL);
+	destroy_work_queue(&wq);
 }
 
 void
@@ -349,7 +372,9 @@ phase4(xfs_mount_t *mp)
 
 	/*
 	 * Process all the reverse-mapping data that we collected.  This
-	 * involves checking the rmap data against the btree.
+	 * involves checking the rmap data against the btree, computing
+	 * reference counts based on the rmap data, and checking the counts
+	 * against the refcount btree.
 	 */
 	process_rmap_data(mp);
 
diff --git a/repair/rmap.c b/repair/rmap.c
index 47fdabc..8818913 100644
--- a/repair/rmap.c
+++ b/repair/rmap.c
@@ -40,6 +40,7 @@ struct xfs_ag_rmap {
 	struct xfs_slab	*ar_raw_rmaps;		/* unmerged rmaps */
 	int		ar_flcount;		/* agfl entries from leftover */
 						/* agbt allocations */
+	struct xfs_slab	*ar_refcount_items;	/* refcount items, p4-5 */
 };
 
 static struct xfs_ag_rmap *ag_rmaps;
@@ -83,7 +84,8 @@ bool
 needs_rmap_work(
 	struct xfs_mount	*mp)
 {
-	return xfs_sb_version_hasrmapbt(&mp->m_sb);
+	return xfs_sb_version_hasreflink(&mp->m_sb) ||
+	       xfs_sb_version_hasrmapbt(&mp->m_sb);
 }
 
 /**
@@ -116,6 +118,11 @@ _("Insufficient memory while allocating reverse mapping slabs."));
 		if (error)
 			do_error(
 _("Insufficient memory while allocating raw metadata reverse mapping slabs."));
+		error = init_slab(&ag_rmaps[i].ar_refcount_items,
+				  sizeof(struct xfs_refcount_irec));
+		if (error)
+			do_error(
+_("Insufficient memory while allocating refcount item slabs."));
 	}
 }
 
@@ -136,6 +143,7 @@ free_rmaps(
 	for (i = 0; i < mp->m_sb.sb_agcount; i++) {
 		free_slab(&ag_rmaps[i].ar_rmaps);
 		free_slab(&ag_rmaps[i].ar_raw_rmaps);
+		free_slab(&ag_rmaps[i].ar_refcount_items);
 	}
 	free(ag_rmaps);
 	ag_rmaps = NULL;
@@ -555,6 +563,232 @@ dump_rmap(
 # define dump_rmap(m, a, r)
 #endif
 
+/*
+ * Rebuilding the Reference Count & Reverse Mapping Btrees
+ *
+ * The reference count (refcnt) and reverse mapping (rmap) btrees are rebuilt
+ * during phase 5, like all other AG btrees.  Therefore, reverse mappings must
+ * be processed into reference counts at the end of phase 4, and the rmaps must
+ * be recorded during phase 4.  There is a need to access the rmaps in physical
+ * block order, but no particular need for random access, so the slab.c code
+ * provides a big logical array (consisting of smaller slabs) and some inorder
+ * iterator functions.
+ *
+ * Once we've recorded all the reverse mappings, we're ready to translate the
+ * rmaps into refcount entries.  Imagine the rmap entries as rectangles
+ * representing extents of physical blocks, and that the rectangles can be laid
+ * down to allow them to overlap each other; then we know that we must emit
+ * a refcnt btree entry wherever the amount of overlap changes, i.e. the
+ * emission stimulus is level-triggered:
+ *
+ *                 -    ---
+ *       --      ----- ----   ---        ------
+ * --   ----     ----------- ----     ---------
+ * -------------------------------- -----------
+ * ^ ^  ^^ ^^    ^ ^^ ^^^  ^^^^  ^ ^^ ^  ^     ^
+ * 2 1  23 21    3 43 234  2123  1 01 2  3     0
+ *
+ * For our purposes, a rmap is a tuple (startblock, len, fileoff, owner).
+ *
+ * Note that in the actual refcnt btree we don't store the refcount < 2 cases
+ * because the bnobt tells us which blocks are free; single-use blocks aren't
+ * recorded in the bnobt or the refcntbt.  If the rmapbt supports storing
+ * multiple entries covering a given block we could theoretically dispense with
+ * the refcntbt and simply count rmaps, but that's inefficient in the (hot)
+ * write path, so we'll take the cost of the extra tree to save time.  Also
+ * there's no guarantee that rmap will be enabled.
+ *
+ * Given an array of rmaps sorted by physical block number, a starting physical
+ * block (sp), a bag to hold rmaps that cover sp, and the next physical
+ * block where the level changes (np), we can reconstruct the refcount
+ * btree as follows:
+ *
+ * While there are still unprocessed rmaps in the array,
+ *  - Set sp to the physical block (pblk) of the next unprocessed rmap.
+ *  - Add to the bag all rmaps in the array where startblock == sp.
+ *  - Set np to the physical block where the bag size will change.
+ *    This is the minimum of (the pblk of the next unprocessed rmap) and
+ *    (startblock + len of each rmap in the bag).
+ *  - Record the bag size as old_bag_size.
+ *
+ *  - While the bag isn't empty,
+ *     - Remove from the bag all rmaps where startblock + len == np.
+ *     - Add to the bag all rmaps in the array where startblock == np.
+ *     - If the bag size isn't old_bag_size, store the refcount entry
+ *       (sp, np - sp, bag_size) in the refcnt btree.
+ *     - If the bag is empty, break out of the inner loop.
+ *     - Set old_bag_size to the bag size
+ *     - Set sp = np.
+ *     - Set np to the physical block where the bag size will change.
+ *       This is the minimum of (the pblk of the next unprocessed rmap) and
+ *       (startblock + len of each rmap in the bag).
+ *
+ * An implementation detail is that because this processing happens during
+ * phase 4, the refcount entries are stored in an array so that phase 5 can
+ * load them into the refcount btree.  The rmaps can be loaded directly into
+ * the rmap btree during phase 5 as well.
+ */
+
+/*
+ * Emit a refcount object for refcntbt reconstruction during phase 5.
+ */
+#define REFCOUNT_CLAMP(nr)	((nr) > MAXREFCOUNT ? MAXREFCOUNT : (nr))
+static void
+refcount_emit(
+	struct xfs_mount		*mp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		len,
+	size_t			nr_rmaps)
+{
+	struct xfs_refcount_irec	rlrec;
+	int			error;
+	struct xfs_slab		*rlslab;
+
+	rlslab = ag_rmaps[agno].ar_refcount_items;
+	ASSERT(nr_rmaps > 0);
+
+	dbg_printf("REFL: agno=%u pblk=%u, len=%u -> refcount=%zu\n",
+		agno, agbno, len, nr_rmaps);
+	rlrec.rc_startblock = agbno;
+	rlrec.rc_blockcount = len;
+	rlrec.rc_refcount = REFCOUNT_CLAMP(nr_rmaps);
+	error = slab_add(rlslab, &rlrec);
+	if (error)
+		do_error(
+_("Insufficient memory while recreating refcount tree."));
+}
+#undef REFCOUNT_CLAMP
+
+/**
+ * compute_refcounts() - Transform a pile of physical block mapping
+ *			 observations into refcount data for eventual
+ *			 rebuilding of the btrees.
+ *
+ * @mp: XFS mount object.
+ * @agno: AG number.
+ */
+#define RMAP_END(r)	((r)->rm_startblock + XFS_RMAP_LEN((r)->rm_blockcount))
+int
+compute_refcounts(
+	struct xfs_mount		*mp,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_bag		*stack_top = NULL;
+	struct xfs_slab		*rmaps;
+	struct xfs_slab_cursor	*rmaps_cur;
+	struct xfs_rmap_irec	*array_cur;
+	struct xfs_rmap_irec	*rmap;
+	xfs_agblock_t		sbno;	/* first bno of this rmap set */
+	xfs_agblock_t		cbno;	/* first bno of this refcount set */
+	xfs_agblock_t		nbno;	/* next bno where rmap set changes */
+	size_t			n, idx;
+	size_t			old_stack_nr;
+	int			error;
+
+	if (!xfs_sb_version_hasreflink(&mp->m_sb))
+		return 0;
+
+	rmaps = ag_rmaps[agno].ar_rmaps;
+
+	error = init_slab_cursor(rmaps, rmap_compare, &rmaps_cur);
+	if (error)
+		return error;
+
+	error = init_bag(&stack_top);
+	if (error)
+		goto err;
+
+	/* While there are rmaps to be processed... */
+	n = 0;
+	while (n < slab_count(rmaps)) {
+		array_cur = peek_slab_cursor(rmaps_cur);
+		sbno = cbno = array_cur->rm_startblock;
+		/* Push all rmaps with pblk == sbno onto the stack */
+		for (;
+		     array_cur && array_cur->rm_startblock == sbno;
+		     array_cur = peek_slab_cursor(rmaps_cur)) {
+			advance_slab_cursor(rmaps_cur); n++;
+			dump_rmap("push0", agno, array_cur);
+			error = bag_add(stack_top, array_cur);
+			if (error)
+				goto err;
+		}
+
+		/* Set nbno to the bno of the next refcount change */
+		if (n < slab_count(rmaps))
+			nbno = array_cur->rm_startblock;
+		else
+			nbno = NULLAGBLOCK;
+		foreach_bag_ptr(stack_top, idx, rmap) {
+			nbno = min(nbno, RMAP_END(rmap));
+		}
+
+		/* Emit reverse mappings, if needed */
+		ASSERT(nbno > sbno);
+		old_stack_nr = bag_count(stack_top);
+
+		/* While stack isn't empty... */
+		while (bag_count(stack_top)) {
+			/* Pop all rmaps that end at nbno */
+			foreach_bag_ptr_reverse(stack_top, idx, rmap) {
+				if (RMAP_END(rmap) != nbno)
+					continue;
+				dump_rmap("pop", agno, rmap);
+				error = bag_remove(stack_top, idx);
+				if (error)
+					goto err;
+			}
+
+			/* Push array items that start at nbno */
+			for (;
+			     array_cur && array_cur->rm_startblock == nbno;
+			     array_cur = peek_slab_cursor(rmaps_cur)) {
+				advance_slab_cursor(rmaps_cur); n++;
+				dump_rmap("push1", agno, array_cur);
+				error = bag_add(stack_top, array_cur);
+				if (error)
+					goto err;
+			}
+
+			/* Emit refcount if necessary */
+			ASSERT(nbno > cbno);
+			if (bag_count(stack_top) != old_stack_nr) {
+				if (old_stack_nr > 1) {
+					refcount_emit(mp, agno, cbno,
+						      nbno - cbno,
+						      old_stack_nr);
+				}
+				cbno = nbno;
+			}
+
+			/* Stack empty, go find the next rmap */
+			if (bag_count(stack_top) == 0)
+				break;
+			old_stack_nr = bag_count(stack_top);
+			sbno = nbno;
+
+			/* Set nbno to the bno of the next refcount change */
+			if (n < slab_count(rmaps))
+				nbno = array_cur->rm_startblock;
+			else
+				nbno = NULLAGBLOCK;
+			foreach_bag_ptr(stack_top, idx, rmap) {
+				nbno = min(nbno, RMAP_END(rmap));
+			}
+
+			/* Emit reverse mappings, if needed */
+			ASSERT(nbno > sbno);
+		}
+	}
+err:
+	free_bag(&stack_top);
+	free_slab_cursor(&rmaps_cur);
+
+	return error;
+}
+#undef RMAP_END
+
 /**
  * rmap_record_count() -- Return the number of rmap objects for an AG.
  *
diff --git a/repair/rmap.h b/repair/rmap.h
index 0b4e73b..13df5d6 100644
--- a/repair/rmap.h
+++ b/repair/rmap.h
@@ -39,6 +39,8 @@ extern int init_rmap_cursor(xfs_agnumber_t, struct xfs_slab_cursor **);
 extern void rmap_avoid_check(void);
 extern int check_rmaps(struct xfs_mount *, xfs_agnumber_t);
 
+extern int compute_refcounts(struct xfs_mount *, xfs_agnumber_t);
+
 extern void fix_freelist(struct xfs_mount *, xfs_agnumber_t, bool);
 extern void rmap_store_agflcount(struct xfs_mount *, xfs_agnumber_t, int);
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2015-10-07  5:10 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-07  5:05 [RFCv3 00/51] xfsprogs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
2015-10-07  5:05 ` [PATCH 01/51] libxcmd: provide a common function to report command runtimes Darrick J. Wong
2015-10-13 17:48   ` Christoph Hellwig
2015-10-13 22:39     ` Darrick J. Wong
2015-10-14  5:35   ` [PATCH v2 " Darrick J. Wong
2015-10-14  7:31     ` Christoph Hellwig
2015-10-07  5:05 ` [PATCH 02/51] libxfs: add reflink and dedupe ioctls Darrick J. Wong
2015-10-07  5:05 ` [PATCH 03/51] xfs_io: support reflink and dedupe of file ranges Darrick J. Wong
2015-10-14  5:36   ` [PATCH v2 " Darrick J. Wong
2015-11-09  7:54     ` Christoph Hellwig
2015-11-09 18:33       ` Darrick J. Wong
2015-11-09 18:57         ` Darrick J. Wong
2015-11-09 21:35           ` Dave Chinner
2015-11-10  6:27             ` Darrick J. Wong
2015-10-07  5:05 ` [PATCH 04/51] xfs_io: unshare blocks via fallocate Darrick J. Wong
2015-10-07  5:05 ` [PATCH 05/51] xfs_db: enable blocktrash for checksummed filesystems Darrick J. Wong
2015-10-07  5:05 ` [PATCH 06/51] xfs_db: trash the block at the top of the cursor stack Darrick J. Wong
2015-10-07  5:05 ` [PATCH 07/51] xfs_db: enable blockget for v5 filesystems Darrick J. Wong
2015-10-14 17:08   ` Christoph Hellwig
2015-10-14 18:20     ` Darrick J. Wong
2015-10-14 18:23       ` Christoph Hellwig
2015-10-14 19:52         ` Darrick J. Wong
2015-10-14 21:26         ` Dave Chinner
2015-10-07  5:06 ` [PATCH 08/51] libxfs: reorder xfs_bmap_add_free args Darrick J. Wong
2015-10-07  5:06 ` [PATCH 09/51] libxfs: add the reverse-mapping btree Darrick J. Wong
2015-10-07  5:06 ` [PATCH 10/51] libxfs: resync xfs_prealloc_blocks with the kernel Darrick J. Wong
2015-10-07  5:06 ` [PATCH 11/51] xfs: rmap btree transaction reservations Darrick J. Wong
2015-10-07  5:06 ` [PATCH 12/51] xfs: rmap btree requires more reserved free space Darrick J. Wong
2015-10-07  5:06 ` [PATCH 13/51] libxfs: propagate a bunch of case changes to mkfs and repair Darrick J. Wong
2015-10-07  5:06 ` [PATCH 14/51] libxfs: fix min freelist length calculation Darrick J. Wong
2015-10-07  5:06 ` [PATCH 15/51] libxfs: add the RMAP CRC to the xfs_magics list Darrick J. Wong
2015-10-07  5:06 ` [PATCH 16/51] libxfs: enhance rmapbt definition to support reflink Darrick J. Wong
2015-10-07  5:07 ` [PATCH 17/51] libxfs: refactor short btree block verification Darrick J. Wong
2015-10-07  5:07 ` [PATCH 18/51] xfs: don't update rmapbt when fixing agfl Darrick J. Wong
2015-10-07  5:07 ` [PATCH 19/51] libxfs: implement XFS_IOC_SWAPEXT when rmap btree is enabled Darrick J. Wong
2015-10-07  5:07 ` [PATCH 20/51] xfs_db: display rmap btree contents Darrick J. Wong
2015-10-07  5:07 ` [PATCH 21/51] xfs_dump: display enhanced rmapbt fields Darrick J. Wong
2015-10-07  5:07 ` [PATCH 22/51] xfs_db: check rmapbt Darrick J. Wong
2015-10-07  5:07 ` [PATCH 23/51] xfs_db: copy the rmap btree Darrick J. Wong
2015-10-07  5:07 ` [PATCH 24/51] xfs_growfs: report rmapbt presence Darrick J. Wong
2015-10-07  5:07 ` [PATCH 25/51] xfs_repair: use rmap btree data to check block types Darrick J. Wong
2015-10-07  5:08 ` [PATCH 26/51] xfs_repair: mask off length appropriately Darrick J. Wong
2015-10-07  5:08 ` [PATCH 27/51] xfs_repair: fix fino_bno calculation when rmapbt is enabled Darrick J. Wong
2015-10-07  5:08 ` [PATCH 28/51] xfs_repair: create a slab API for allocating arrays in large chunks Darrick J. Wong
2015-10-07  5:08 ` [PATCH 29/51] xfs_repair: collect reverse-mapping data for refcount/rmap tree rebuilding Darrick J. Wong
2015-10-07  5:08 ` [PATCH 30/51] xfs_repair: record and merge raw rmap data Darrick J. Wong
2015-10-07  5:08 ` [PATCH 31/51] xfs_repair: add inode bmbt block rmaps Darrick J. Wong
2015-10-07  5:08 ` [PATCH 32/51] xfs_repair: add fixed-location per-AG rmaps Darrick J. Wong
2015-10-21 21:08   ` Darrick J. Wong
2015-10-07  5:08 ` [PATCH 33/51] xfs_repair: check existing rmapbt entries against observed rmaps Darrick J. Wong
2015-10-07  5:08 ` [PATCH 34/51] xfs_repair: rebuild reverse-mapping btree Darrick J. Wong
2015-10-07  5:09 ` [PATCH 35/51] xfs_repair: add per-AG btree blocks to rmap data and add to rmapbt Darrick J. Wong
2015-10-07  5:09 ` [PATCH 36/51] mkfs.xfs: Create rmapbt filesystems Darrick J. Wong
2015-10-07  5:09 ` [PATCH 37/51] xfs_mkfs: initialize extra fields during mkfs Darrick J. Wong
2015-10-07  5:09 ` [PATCH 38/51] libxfs: add support for refcount btrees Darrick J. Wong
2015-10-07  5:09 ` [PATCH 39/51] xfs_db: dump refcount btree data Darrick J. Wong
2015-10-07  5:09 ` [PATCH 40/51] xfs_db: add support for checking the refcount btree Darrick J. Wong
2015-10-07  5:09 ` [PATCH 41/51] xfs_db: metadump should copy the refcount btree too Darrick J. Wong
2015-10-07  5:09 ` [PATCH 42/51] xfs_growfs: report the presence of the reflink feature Darrick J. Wong
2015-10-07  5:09 ` [PATCH 43/51] xfs_repair: check the existing refcount btree Darrick J. Wong
2015-10-07  5:09 ` [PATCH 44/51] xfs_repair: handle multiple owners of data blocks Darrick J. Wong
2015-10-07  5:10 ` Darrick J. Wong [this message]
2015-10-07  5:10 ` [PATCH 46/51] xfs_repair: record reflink inode state Darrick J. Wong
2015-10-07  5:10 ` [PATCH 47/51] xfs_repair: fix inode reflink flags Darrick J. Wong
2015-10-07  5:10 ` [PATCH 48/51] xfs_repair: check the refcount btree against our observed reference counts when -n Darrick J. Wong
2015-10-07  5:10 ` [PATCH 49/51] xfs_repair: rebuild the refcount btree Darrick J. Wong
2015-10-07  5:10 ` [PATCH 50/51] mkfs.xfs: format reflink enabled filesystems Darrick J. Wong
2015-10-07  5:10 ` [PATCH 51/51] mkfs: hack around not having enough log blocks Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151007051005.1504.89142.stgit@birch.djwong.org \
    --to=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox