[PATCH v2 00/12] xfsprogs: format the log correctly on v5 supers

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 00/12] xfsprogs: format the log correctly on v5 supers
@ 2015-09-11 18:55 Brian Foster
  2015-09-11 18:55 ` [PATCH v2 01/12] libxfs: validate metadata LSNs against log on v5 superblocks Brian Foster
                   ` (11 more replies)
  0 siblings, 12 replies; 23+ messages in thread
From: Brian Foster @ 2015-09-11 18:55 UTC (permalink / raw)
  To: xfs

This is pass 2 of the userspace side of the invalid metadata LSN fixes.
It is based on the RFCv2 version of the associated kernel fixes:

  http://oss.sgi.com/pipermail/xfs/2015-September/043627.html

Outside of the rebase to the latest kernel patch, this version adds
support for the xfs_db uuid command and xfs_metadump. Both of these
utilities clear the log under certain circumstances and thus both have
been updated to format the log with a bumped cycle number when
appropriate.

The only remaining gap that I'm aware of is xfs_copy. xfs_copy support
is not included because the log formatting code therein is slightly more
complicated than what is done in other places. In particular, it uses
its own buffer and alignment mechanisms that have to deal with direct
I/O alignment rules and aren't easily compatible with the existing
buffered log formatting mechanism. I haven't quite figured out how best
to deal with that yet.

Thoughts, reviews, flames appreciated.

Brian

v2:
- Rebase to latest kernel variant of invalid metadata LSN detection.
- Added support for xfs_db's uuid command and xfs_metadump.
v1: http://oss.sgi.com/pipermail/xfs/2015-August/043397.html

Brian Foster (12):
  libxfs: validate metadata LSNs against log on v5 superblocks
  libxfs: track largest metadata LSN in use via verifiers
  libxfs: don't hardcode cycle 1 into unmount op header
  libxfs: pass lsn param to log clear and record header logging helpers
  libxfs: add ability to clear log to arbitrary log cycle
  libxlog: pull struct xlog out of xlog_is_dirty()
  xfs_repair: track log state throughout all recovery phases
  xfs_repair: process the log in no_modify mode
  xfs_repair: format the log with forward cycle number on v5 supers
  xfs_repair: don't clear the log by default
  xfs_db: do not reset current lsn from uuid command on v5 supers
  db/metadump: bump lsn when log is cleared on v5 supers

 copy/xfs_copy.c             |   4 +-
 db/init.c                   |  25 ++++---
 db/metadump.c               |  21 ++++--
 db/sb.c                     |  18 ++++-
 include/libxfs.h            |  13 ++--
 include/libxlog.h           |   3 +-
 include/xfs_mount.h         |   6 ++
 libxfs/libxfs_priv.h        |   2 +
 libxfs/rdwr.c               | 160 +++++++++++++++++++++++++++++++++++++-------
 libxfs/util.c               |  38 +++++++++++
 libxfs/xfs_alloc.c          |   9 ++-
 libxfs/xfs_attr_leaf.c      |   2 +
 libxfs/xfs_btree.c          |  14 +++-
 libxfs/xfs_da_btree.c       |   3 +
 libxfs/xfs_dir2_block.c     |   1 +
 libxfs/xfs_dir2_data.c      |   1 +
 libxfs/xfs_dir2_leaf.c      |   1 +
 libxfs/xfs_dir2_node.c      |   1 +
 libxfs/xfs_ialloc.c         |   8 ++-
 libxfs/xfs_sb.c             |   8 +++
 libxfs/xfs_symlink_remote.c |   3 +
 libxlog/util.c              |  37 +++++-----
 mkfs/xfs_mkfs.c             |   4 +-
 repair/phase2.c             |  83 +++++++++++++++--------
 repair/xfs_repair.c         |  79 +++++++++++++++++++++-
 25 files changed, 436 insertions(+), 108 deletions(-)

-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 01/12] libxfs: validate metadata LSNs against log on v5 superblocks
  2015-09-11 18:55 [PATCH v2 00/12] xfsprogs: format the log correctly on v5 supers Brian Foster
@ 2015-09-11 18:55 ` Brian Foster
  2015-09-11 18:55 ` [PATCH v2 02/12] libxfs: track largest metadata LSN in use via verifiers Brian Foster
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Brian Foster @ 2015-09-11 18:55 UTC (permalink / raw)
  To: xfs

Backport the associated kernel commit. Replace the
xfs_detect_invalid_lsn() helper with a stub.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 libxfs/libxfs_priv.h        |  2 ++
 libxfs/util.c               |  7 +++++++
 libxfs/xfs_alloc.c          |  9 +++++++--
 libxfs/xfs_attr_leaf.c      |  2 ++
 libxfs/xfs_btree.c          | 14 ++++++++++++--
 libxfs/xfs_da_btree.c       |  3 +++
 libxfs/xfs_dir2_block.c     |  1 +
 libxfs/xfs_dir2_data.c      |  1 +
 libxfs/xfs_dir2_leaf.c      |  1 +
 libxfs/xfs_dir2_node.c      |  1 +
 libxfs/xfs_ialloc.c         |  8 ++++++--
 libxfs/xfs_sb.c             |  8 ++++++++
 libxfs/xfs_symlink_remote.c |  3 +++
 13 files changed, 54 insertions(+), 6 deletions(-)

diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index 22f2d53..59cb6c3 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -505,4 +505,6 @@ void xfs_verifier_error(struct xfs_buf *bp);
 /* xfs_rtalloc.c */
 int libxfs_rtfree_extent(struct xfs_trans *, xfs_rtblock_t, xfs_extlen_t);
 
+extern void	xfs_detect_invalid_lsn(struct xfs_mount *, xfs_lsn_t);
+
 #endif	/* __LIBXFS_INTERNAL_XFS_H__ */
diff --git a/libxfs/util.c b/libxfs/util.c
index c9f9175..43338b8 100644
--- a/libxfs/util.c
+++ b/libxfs/util.c
@@ -729,3 +729,10 @@ xfs_verifier_error(
 		  bp->b_error == -EFSBADCRC ? "CRC error" : "corruption",
 		  bp->b_bn, BBTOB(bp->b_length));
 }
+
+void
+xfs_detect_invalid_lsn(
+	struct xfs_mount	*mp,
+	xfs_lsn_t		lsn)
+{
+}
diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 4f3008a..f1f2791 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -478,6 +478,8 @@ xfs_agfl_verify(
 		    be32_to_cpu(agfl->agfl_bno[i]) >= mp->m_sb.sb_agblocks)
 			return false;
 	}
+
+	xfs_detect_invalid_lsn(mp, be64_to_cpu(XFS_BUF_TO_AGFL(bp)->agfl_lsn));
 	return true;
 }
 
@@ -2255,9 +2257,12 @@ xfs_agf_verify(
  {
 	struct xfs_agf	*agf = XFS_BUF_TO_AGF(bp);
 
-	if (xfs_sb_version_hascrc(&mp->m_sb) &&
-	    !uuid_equal(&agf->agf_uuid, &mp->m_sb.sb_meta_uuid))
+	if (xfs_sb_version_hascrc(&mp->m_sb)) {
+		if (!uuid_equal(&agf->agf_uuid, &mp->m_sb.sb_meta_uuid))
 			return false;
+		xfs_detect_invalid_lsn(mp,
+				be64_to_cpu(XFS_BUF_TO_AGF(bp)->agf_lsn));
+	}
 
 	if (!(agf->agf_magicnum == cpu_to_be32(XFS_AGF_MAGIC) &&
 	      XFS_AGF_GOOD_VERSION(be32_to_cpu(agf->agf_versionnum)) &&
diff --git a/libxfs/xfs_attr_leaf.c b/libxfs/xfs_attr_leaf.c
index cc25068..aca3cf8 100644
--- a/libxfs/xfs_attr_leaf.c
+++ b/libxfs/xfs_attr_leaf.c
@@ -262,6 +262,8 @@ xfs_attr3_leaf_verify(
 			return false;
 		if (be64_to_cpu(hdr3->info.blkno) != bp->b_bn)
 			return false;
+
+		xfs_detect_invalid_lsn(mp, be64_to_cpu(hdr3->info.lsn));
 	} else {
 		if (ichdr.magic != XFS_ATTR_LEAF_MAGIC)
 			return false;
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index a16ae7d..b2b2db5 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -240,8 +240,13 @@ bool
 xfs_btree_lblock_verify_crc(
 	struct xfs_buf		*bp)
 {
-	if (xfs_sb_version_hascrc(&bp->b_target->bt_mount->m_sb))
+	struct xfs_btree_block	*block = XFS_BUF_TO_BLOCK(bp);
+	struct xfs_mount	*mp = bp->b_target->bt_mount;
+
+	if (xfs_sb_version_hascrc(&bp->b_target->bt_mount->m_sb)) {
+		xfs_detect_invalid_lsn(mp, be64_to_cpu(block->bb_u.l.bb_lsn));
 		return xfs_buf_verify_cksum(bp, XFS_BTREE_LBLOCK_CRC_OFF);
+	}
 
 	return true;
 }
@@ -272,8 +277,13 @@ bool
 xfs_btree_sblock_verify_crc(
 	struct xfs_buf		*bp)
 {
-	if (xfs_sb_version_hascrc(&bp->b_target->bt_mount->m_sb))
+	struct xfs_btree_block  *block = XFS_BUF_TO_BLOCK(bp);
+	struct xfs_mount	*mp = bp->b_target->bt_mount;
+
+	if (xfs_sb_version_hascrc(&bp->b_target->bt_mount->m_sb)) {
+		xfs_detect_invalid_lsn(mp, be64_to_cpu(block->bb_u.s.bb_lsn));
 		return xfs_buf_verify_cksum(bp, XFS_BTREE_SBLOCK_CRC_OFF);
+	}
 
 	return true;
 }
diff --git a/libxfs/xfs_da_btree.c b/libxfs/xfs_da_btree.c
index 289dc1e..beb02ed 100644
--- a/libxfs/xfs_da_btree.c
+++ b/libxfs/xfs_da_btree.c
@@ -146,6 +146,8 @@ xfs_da3_node_verify(
 			return false;
 		if (be64_to_cpu(hdr3->info.blkno) != bp->b_bn)
 			return false;
+
+		xfs_detect_invalid_lsn(mp, be64_to_cpu(hdr3->info.lsn));
 	} else {
 		if (ichdr.magic != XFS_DA_NODE_MAGIC)
 			return false;
@@ -318,6 +320,7 @@ xfs_da3_node_create(
 	if (xfs_sb_version_hascrc(&mp->m_sb)) {
 		struct xfs_da3_node_hdr *hdr3 = bp->b_addr;
 
+		memset(hdr3, 0, sizeof(struct xfs_da3_node_hdr));
 		ichdr.magic = XFS_DA3_NODE_MAGIC;
 		hdr3->info.blkno = cpu_to_be64(bp->b_bn);
 		hdr3->info.owner = cpu_to_be64(args->dp->i_ino);
diff --git a/libxfs/xfs_dir2_block.c b/libxfs/xfs_dir2_block.c
index 489f301..c886d70 100644
--- a/libxfs/xfs_dir2_block.c
+++ b/libxfs/xfs_dir2_block.c
@@ -68,6 +68,7 @@ xfs_dir3_block_verify(
 			return false;
 		if (be64_to_cpu(hdr3->blkno) != bp->b_bn)
 			return false;
+		xfs_detect_invalid_lsn(mp, be64_to_cpu(hdr3->lsn));
 	} else {
 		if (hdr3->magic != cpu_to_be32(XFS_DIR2_BLOCK_MAGIC))
 			return false;
diff --git a/libxfs/xfs_dir2_data.c b/libxfs/xfs_dir2_data.c
index 0c9f529..b68c7ab 100644
--- a/libxfs/xfs_dir2_data.c
+++ b/libxfs/xfs_dir2_data.c
@@ -222,6 +222,7 @@ xfs_dir3_data_verify(
 			return false;
 		if (be64_to_cpu(hdr3->blkno) != bp->b_bn)
 			return false;
+		xfs_detect_invalid_lsn(mp, be64_to_cpu(hdr3->lsn));
 	} else {
 		if (hdr3->magic != cpu_to_be32(XFS_DIR2_DATA_MAGIC))
 			return false;
diff --git a/libxfs/xfs_dir2_leaf.c b/libxfs/xfs_dir2_leaf.c
index 80d03b3..e2dac37 100644
--- a/libxfs/xfs_dir2_leaf.c
+++ b/libxfs/xfs_dir2_leaf.c
@@ -162,6 +162,7 @@ xfs_dir3_leaf_verify(
 			return false;
 		if (be64_to_cpu(leaf3->info.blkno) != bp->b_bn)
 			return false;
+		xfs_detect_invalid_lsn(mp, be64_to_cpu(leaf3->info.lsn));
 	} else {
 		if (leaf->hdr.info.magic != cpu_to_be16(magic))
 			return false;
diff --git a/libxfs/xfs_dir2_node.c b/libxfs/xfs_dir2_node.c
index 0514cea..5328859 100644
--- a/libxfs/xfs_dir2_node.c
+++ b/libxfs/xfs_dir2_node.c
@@ -95,6 +95,7 @@ xfs_dir3_free_verify(
 			return false;
 		if (be64_to_cpu(hdr3->blkno) != bp->b_bn)
 			return false;
+		xfs_detect_invalid_lsn(mp, be64_to_cpu(hdr3->lsn));
 	} else {
 		if (hdr->magic != cpu_to_be32(XFS_DIR2_FREE_MAGIC))
 			return false;
diff --git a/libxfs/xfs_ialloc.c b/libxfs/xfs_ialloc.c
index 93bfaea..a9b0e4a 100644
--- a/libxfs/xfs_ialloc.c
+++ b/libxfs/xfs_ialloc.c
@@ -2495,9 +2495,13 @@ xfs_agi_verify(
 	struct xfs_mount *mp = bp->b_target->bt_mount;
 	struct xfs_agi	*agi = XFS_BUF_TO_AGI(bp);
 
-	if (xfs_sb_version_hascrc(&mp->m_sb) &&
-	    !uuid_equal(&agi->agi_uuid, &mp->m_sb.sb_meta_uuid))
+	if (xfs_sb_version_hascrc(&mp->m_sb)) {
+		if (!uuid_equal(&agi->agi_uuid, &mp->m_sb.sb_meta_uuid))
 			return false;
+		xfs_detect_invalid_lsn(mp,
+				be64_to_cpu(XFS_BUF_TO_AGI(bp)->agi_lsn));
+	}
+
 	/*
 	 * Validate the magic number of the agi block.
 	 */
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index f944a58..5fe892d 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -161,6 +161,14 @@ xfs_mount_validate_sb(
 "Filesystem can not be safely mounted by this kernel.");
 			return -EINVAL;
 		}
+	} else if (xfs_sb_version_hascrc(sbp)) {
+		/*
+		 * We can't read verify the sb LSN because the read verifier is
+		 * called before the log is allocated and processed. We know the
+		 * log is set up before write verifier (!check_version) calls,
+		 * so just check it here.
+		 */
+		xfs_detect_invalid_lsn(mp, sbp->sb_lsn);
 	}
 
 	if (xfs_sb_version_has_pquotino(sbp)) {
diff --git a/libxfs/xfs_symlink_remote.c b/libxfs/xfs_symlink_remote.c
index 7d46d9e..364329a 100644
--- a/libxfs/xfs_symlink_remote.c
+++ b/libxfs/xfs_symlink_remote.c
@@ -57,6 +57,7 @@ xfs_symlink_hdr_set(
 	if (!xfs_sb_version_hascrc(&mp->m_sb))
 		return 0;
 
+	memset(dsl, 0, sizeof(struct xfs_dsymlink_hdr));
 	dsl->sl_magic = cpu_to_be32(XFS_SYMLINK_MAGIC);
 	dsl->sl_offset = cpu_to_be32(offset);
 	dsl->sl_bytes = cpu_to_be32(size);
@@ -114,6 +115,8 @@ xfs_symlink_verify(
 	if (dsl->sl_owner == 0)
 		return false;
 
+	xfs_detect_invalid_lsn(mp, be64_to_cpu(dsl->sl_lsn));
+
 	return true;
 }
 
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 02/12] libxfs: track largest metadata LSN in use via verifiers
  2015-09-11 18:55 [PATCH v2 00/12] xfsprogs: format the log correctly on v5 supers Brian Foster
  2015-09-11 18:55 ` [PATCH v2 01/12] libxfs: validate metadata LSNs against log on v5 superblocks Brian Foster
@ 2015-09-11 18:55 ` Brian Foster
  2015-09-23  3:44   ` Dave Chinner
  2015-09-11 18:55 ` [PATCH v2 03/12] libxfs: don't hardcode cycle 1 into unmount op header Brian Foster
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Brian Foster @ 2015-09-11 18:55 UTC (permalink / raw)
  To: xfs

The LSN validation helper is called in the I/O verifier codepath for
metadata that embed a last-modification LSN. While the codepath exists,
this is not used in userspace as in the kernel because the former
doesn't have an active log.

xfs_repair does need to check the validity of the LSN metadata with
respect to the on-disk log, however. Use the LSN validation mechanism to
track the largest LSN that has been seen. Export the value so repair can
use it once it has processed the entire filesystem. Note that the helper
continues to always return true to preserve existing behavior.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 include/libxfs.h |  1 +
 libxfs/util.c    | 31 +++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/include/libxfs.h b/include/libxfs.h
index b1604e2..cc06fc6 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -134,6 +134,7 @@ typedef struct {
 #define LIBXFS_DIRECT		0x0020	/* can use direct I/O, not buffered */
 
 extern char	*progname;
+extern xfs_lsn_t libxfs_max_lsn;
 extern int	libxfs_init (libxfs_init_t *);
 extern void	libxfs_destroy (void);
 extern int	libxfs_device_to_fd (dev_t);
diff --git a/libxfs/util.c b/libxfs/util.c
index 43338b8..236575a 100644
--- a/libxfs/util.c
+++ b/libxfs/util.c
@@ -730,9 +730,40 @@ xfs_verifier_error(
 		  bp->b_bn, BBTOB(bp->b_length));
 }
 
+/*
+ * This is called from I/O verifiers on v5 superblock filesystems. In the
+ * kernel, it validates the metadata LSN parameter against the current LSN of
+ * the active log. We don't have an active log in userspace so this kind of
+ * validation is not required.
+ *
+ * xfs_repair piggybacks off this mechanism to help track the largest metadata
+ * LSN in use on a filesystem. Keep a record of the largest LSN seen such that
+ * repair can validate it against the state of the log.
+ */
+xfs_lsn_t		libxfs_max_lsn = 0;
+pthread_mutex_t		libxfs_max_lsn_lock = PTHREAD_MUTEX_INITIALIZER;
+
 void
 xfs_detect_invalid_lsn(
 	struct xfs_mount	*mp,
 	xfs_lsn_t		lsn)
 {
+	int			cycle = CYCLE_LSN(lsn);
+	int			block = BLOCK_LSN(lsn);
+	int			max_cycle;
+	int			max_block;
+
+	if (lsn == NULLCOMMITLSN)
+		return;
+
+	pthread_mutex_lock(&libxfs_max_lsn_lock);
+
+	max_cycle = CYCLE_LSN(libxfs_max_lsn);
+	max_block = BLOCK_LSN(libxfs_max_lsn);
+
+	if ((cycle > max_cycle) ||
+	    (cycle == max_cycle && block > max_block))
+		libxfs_max_lsn = lsn;
+
+	pthread_mutex_unlock(&libxfs_max_lsn_lock);
 }
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 02/12] libxfs: track largest metadata LSN in use via verifiers
  2015-09-11 18:55 ` [PATCH v2 02/12] libxfs: track largest metadata LSN in use via verifiers Brian Foster
@ 2015-09-23  3:44   ` Dave Chinner
  2015-09-23 13:18     ` Brian Foster
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2015-09-23  3:44 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Fri, Sep 11, 2015 at 02:55:32PM -0400, Brian Foster wrote:
> The LSN validation helper is called in the I/O verifier codepath for
> metadata that embed a last-modification LSN. While the codepath exists,
> this is not used in userspace as in the kernel because the former
> doesn't have an active log.
> 
> xfs_repair does need to check the validity of the LSN metadata with
> respect to the on-disk log, however. Use the LSN validation mechanism to
> track the largest LSN that has been seen. Export the value so repair can
> use it once it has processed the entire filesystem. Note that the helper
> continues to always return true to preserve existing behavior.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
....
> +xfs_lsn_t		libxfs_max_lsn = 0;
> +pthread_mutex_t		libxfs_max_lsn_lock = PTHREAD_MUTEX_INITIALIZER;
> +
>  void
>  xfs_detect_invalid_lsn(
>  	struct xfs_mount	*mp,
>  	xfs_lsn_t		lsn)
>  {
> +	int			cycle = CYCLE_LSN(lsn);
> +	int			block = BLOCK_LSN(lsn);
> +	int			max_cycle;
> +	int			max_block;
> +
> +	if (lsn == NULLCOMMITLSN)
> +		return;
> +
> +	pthread_mutex_lock(&libxfs_max_lsn_lock);
> +
> +	max_cycle = CYCLE_LSN(libxfs_max_lsn);
> +	max_block = BLOCK_LSN(libxfs_max_lsn);
> +
> +	if ((cycle > max_cycle) ||
> +	    (cycle == max_cycle && block > max_block))
> +		libxfs_max_lsn = lsn;
> +
> +	pthread_mutex_unlock(&libxfs_max_lsn_lock);

This will have the same lock contention problems that the kernel
code would have had - my repair scalablity tests regularly reach
over 1GB/s of metadata being prefetched through tens of threads, so
this is going have a significant impact on performance in those
tests....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 02/12] libxfs: track largest metadata LSN in use via verifiers
  2015-09-23  3:44   ` Dave Chinner
@ 2015-09-23 13:18     ` Brian Foster
  2015-09-23 22:36       ` Dave Chinner
  0 siblings, 1 reply; 23+ messages in thread
From: Brian Foster @ 2015-09-23 13:18 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Sep 23, 2015 at 01:44:06PM +1000, Dave Chinner wrote:
> On Fri, Sep 11, 2015 at 02:55:32PM -0400, Brian Foster wrote:
> > The LSN validation helper is called in the I/O verifier codepath for
> > metadata that embed a last-modification LSN. While the codepath exists,
> > this is not used in userspace as in the kernel because the former
> > doesn't have an active log.
> > 
> > xfs_repair does need to check the validity of the LSN metadata with
> > respect to the on-disk log, however. Use the LSN validation mechanism to
> > track the largest LSN that has been seen. Export the value so repair can
> > use it once it has processed the entire filesystem. Note that the helper
> > continues to always return true to preserve existing behavior.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> ....
> > +xfs_lsn_t		libxfs_max_lsn = 0;
> > +pthread_mutex_t		libxfs_max_lsn_lock = PTHREAD_MUTEX_INITIALIZER;
> > +
> >  void
> >  xfs_detect_invalid_lsn(
> >  	struct xfs_mount	*mp,
> >  	xfs_lsn_t		lsn)
> >  {
> > +	int			cycle = CYCLE_LSN(lsn);
> > +	int			block = BLOCK_LSN(lsn);
> > +	int			max_cycle;
> > +	int			max_block;
> > +
> > +	if (lsn == NULLCOMMITLSN)
> > +		return;
> > +
> > +	pthread_mutex_lock(&libxfs_max_lsn_lock);
> > +
> > +	max_cycle = CYCLE_LSN(libxfs_max_lsn);
> > +	max_block = BLOCK_LSN(libxfs_max_lsn);
> > +
> > +	if ((cycle > max_cycle) ||
> > +	    (cycle == max_cycle && block > max_block))
> > +		libxfs_max_lsn = lsn;
> > +
> > +	pthread_mutex_unlock(&libxfs_max_lsn_lock);
> 
> This will have the same lock contention problems that the kernel
> code would have had - my repair scalablity tests regularly reach
> over 1GB/s of metadata being prefetched through tens of threads, so
> this is going have a significant impact on performance in those
> tests....
> 

Hmm, Ok. I initially didn't have locking here (by accident) but
reproduced some breakage for which the exact details escape me. I
suspect it was a test and set race.

I suppose we could try doing something like what we plan to do on the
kernel side in terms of a racy check followed by the locked check so the
lock contention goes away once we find the max lsn. The context here is
different, however, in that we're using a 64-bit LSN rather than
independently updated cycle and block fields. It could also take a while
to stabilize the max lsn depending on the fs. (There are some details on
the kernel side I'd like to get worked out first as well).

Perhaps we could try to make this smarter... in the case where either
the log has been zeroed or we've identified an LSN beyond the current
log state, we only really need to track the max cycle value since a log
format is imminent at that point. In the case where the fs is
consistent, we could seed the starting value with the current log lsn,
or track per-ag max LSNs without locking and compare them at at the end,
etc.

I'll have to think about this some more and see what's effective. I'd
also like to quantify the effect the current locking has on performance
if possible. Can you provide a brief description of your typical repair
test that you would expect this to hurt? E.g., a large fs, many AGs,
populated with fs_mark and repaired with many threads..? Any special
storage configuration? Thanks.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 02/12] libxfs: track largest metadata LSN in use via verifiers
  2015-09-23 13:18     ` Brian Foster
@ 2015-09-23 22:36       ` Dave Chinner
  2015-10-01 20:38         ` Brian Foster
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2015-09-23 22:36 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Wed, Sep 23, 2015 at 09:18:31AM -0400, Brian Foster wrote:
> On Wed, Sep 23, 2015 at 01:44:06PM +1000, Dave Chinner wrote:
> > On Fri, Sep 11, 2015 at 02:55:32PM -0400, Brian Foster wrote:
> > > +
> > > +	pthread_mutex_lock(&libxfs_max_lsn_lock);
> > > +
> > > +	max_cycle = CYCLE_LSN(libxfs_max_lsn);
> > > +	max_block = BLOCK_LSN(libxfs_max_lsn);
> > > +
> > > +	if ((cycle > max_cycle) ||
> > > +	    (cycle == max_cycle && block > max_block))
> > > +		libxfs_max_lsn = lsn;

Actually, we have XFS_LSN_CMP(lsn1, lsn2) for this. i.e.

	if (XFS_LSN_CMP(lsn, libxfs_max_lsn) > 0)
		libxfs_max_lsn = lsn;

> > > +
> > > +	pthread_mutex_unlock(&libxfs_max_lsn_lock);
> > 
> > This will have the same lock contention problems that the kernel
> > code would have had - my repair scalablity tests regularly reach
> > over 1GB/s of metadata being prefetched through tens of threads, so
> > this is going have a significant impact on performance in those
> > tests....
> > 
> 
> Hmm, Ok. I initially didn't have locking here (by accident) but
> reproduced some breakage for which the exact details escape me. I
> suspect it was a test and set race.
> 
> I suppose we could try doing something like what we plan to do on the
> kernel side in terms of a racy check followed by the locked check so the
> lock contention goes away once we find the max lsn. The context here is
> different, however, in that we're using a 64-bit LSN rather than
> independently updated cycle and block fields. It could also take a while
> to stabilize the max lsn depending on the fs. (There are some details on
> the kernel side I'd like to get worked out first as well).

It still has to work on 32 bit machines, where 64 bit operations are
not atomic....

> Perhaps we could try to make this smarter... in the case where either
> the log has been zeroed or we've identified an LSN beyond the current
> log state, we only really need to track the max cycle value since a log
> format is imminent at that point. In the case where the fs is
> consistent, we could seed the starting value with the current log lsn,
> or track per-ag max LSNs without locking and compare them at at the end,
> etc.

Yes, we should seen the initial "max lsn" with whatever we find in
the log, regardless of whether the fs is consistent or not...

Also, per-ag max lsn tracking would break the lock up - that will
bring scope down to the point where contention won't be a problem...

> I'll have to think about this some more and see what's effective. I'd
> also like to quantify the effect the current locking has on performance
> if possible. Can you provide a brief description of your typical repair
> test that you would expect this to hurt? E.g., a large fs, many AGs,
> populated with fs_mark and repaired with many threads..? Any special
> storage configuration? Thanks.

Just my usual 500TB fs_mark test...

$ cat ~/tests/fsmark-50-test-xfs.sh 
#!/bin/bash

QUOTA=
MKFSOPTS=
NFILES=100000
DEV=/dev/vdc
LOGBSIZE=256k

while [ $# -gt 0 ]; do
        case "$1" in
        -q)     QUOTA="uquota,gquota,pquota" ;;
        -N)     NFILES=$2 ; shift ;;
        -d)     DEV=$2 ; shift ;;
        -l)     LOGBSIZE=$2; shift ;;
        --)     shift ; break ;;
        esac
        shift
done
MKFSOPTS="$MKFSOPTS $*"

echo QUOTA=$QUOTA
echo MKFSOPTS=$MKFSOPTS
echo DEV=$DEV

sudo umount /mnt/scratch > /dev/null 2>&1
sudo mkfs.xfs -f $MKFSOPTS $DEV
sudo mount -o nobarrier,logbsize=$LOGBSIZE,$QUOTA $DEV /mnt/scratch
sudo chmod 777 /mnt/scratch
cd /home/dave/src/fs_mark-3.3/
sudo sh -c "echo 1 > /proc/sys/fs/xfs/stats_clear"
time ./fs_mark  -D  10000  -S0  -n  $NFILES  -s  0  -L  32 \
        -d  /mnt/scratch/0  -d  /mnt/scratch/1 \
        -d  /mnt/scratch/2  -d  /mnt/scratch/3 \
        -d  /mnt/scratch/4  -d  /mnt/scratch/5 \
        -d  /mnt/scratch/6  -d  /mnt/scratch/7 \
        -d  /mnt/scratch/8  -d  /mnt/scratch/9 \
        -d  /mnt/scratch/10  -d  /mnt/scratch/11 \
        -d  /mnt/scratch/12  -d  /mnt/scratch/13 \
        -d  /mnt/scratch/14  -d  /mnt/scratch/15 \
        | tee >(stats --trim-outliers | tail -1 1>&2)
sync

echo Repair
sudo umount /mnt/scratch
time sudo xfs_repair -o bhash=100101 -v -v -t 1 $DEV
time sudo mount -o nobarrier,logbsize=$LOGBSIZE,$QUOTA $DEV /mnt/scratch

echo bulkstat files

time (
        sudo ~/src/xfstests-dev/src/bstat  -q /mnt/scratch 1024 | wc -l
)

echo walking files
~/tests/walk-scratch.sh

echo removing files
for f in /mnt/scratch/* ; do time rm -rf $f &  done
wait

sudo umount /mnt/scratch
$

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 02/12] libxfs: track largest metadata LSN in use via verifiers
  2015-09-23 22:36       ` Dave Chinner
@ 2015-10-01 20:38         ` Brian Foster
  2015-10-02  2:16           ` Dave Chinner
  0 siblings, 1 reply; 23+ messages in thread
From: Brian Foster @ 2015-10-01 20:38 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, Sep 24, 2015 at 08:36:25AM +1000, Dave Chinner wrote:
> On Wed, Sep 23, 2015 at 09:18:31AM -0400, Brian Foster wrote:
> > On Wed, Sep 23, 2015 at 01:44:06PM +1000, Dave Chinner wrote:
> > > On Fri, Sep 11, 2015 at 02:55:32PM -0400, Brian Foster wrote:
> > > > +
> > > > +	pthread_mutex_lock(&libxfs_max_lsn_lock);
> > > > +
> > > > +	max_cycle = CYCLE_LSN(libxfs_max_lsn);
> > > > +	max_block = BLOCK_LSN(libxfs_max_lsn);
> > > > +
> > > > +	if ((cycle > max_cycle) ||
> > > > +	    (cycle == max_cycle && block > max_block))
> > > > +		libxfs_max_lsn = lsn;
> 
> Actually, we have XFS_LSN_CMP(lsn1, lsn2) for this. i.e.
> 
> 	if (XFS_LSN_CMP(lsn, libxfs_max_lsn) > 0)
> 		libxfs_max_lsn = lsn;
> 
> > > > +
> > > > +	pthread_mutex_unlock(&libxfs_max_lsn_lock);
> > > 
> > > This will have the same lock contention problems that the kernel
> > > code would have had - my repair scalablity tests regularly reach
> > > over 1GB/s of metadata being prefetched through tens of threads, so
> > > this is going have a significant impact on performance in those
> > > tests....
> > > 
> > 
...
> 
> > I'll have to think about this some more and see what's effective. I'd
> > also like to quantify the effect the current locking has on performance
> > if possible. Can you provide a brief description of your typical repair
> > test that you would expect this to hurt? E.g., a large fs, many AGs,
> > populated with fs_mark and repaired with many threads..? Any special
> > storage configuration? Thanks.
> 
> Just my usual 500TB fs_mark test...
> 

Thanks for the test information and sample results. I wasn't able to get
close enough to the base numbers you mentioned on IRC with the spinning
rust storage I have available. Instead, I tried running something
similar using a large ramdisk as a backing store. I have a 500T sparse
file formatted with XFS and populated with ~25m inodes that uses roughly
~16GB of the backing store (leaving another 16GB of usable RAM for the
server).

I run xfs_repair[1] against that 500TB fs and see spikes of throughput up
over 2GB/s and get repair result reports like the following:

Phase           Start           End             Duration
Phase 1:        10/01 13:03:44  10/01 13:03:45  1 second
Phase 2:        10/01 13:03:45  10/01 13:03:46  1 second
Phase 3:        10/01 13:03:46  10/01 13:05:01  1 minute, 15 seconds
Phase 4:        10/01 13:05:01  10/01 13:05:14  13 seconds
Phase 5:        10/01 13:05:14  10/01 13:05:15  1 second
Phase 6:        10/01 13:05:15  10/01 13:05:50  35 seconds
Phase 7:        10/01 13:05:50  10/01 13:05:50

The numbers don't change that much on repeated runs and if I do a quick
and dirty average of the duration of phases 3, 4 and 6 and compare with
results from the for-next branch, the runtime degradation is on the
order of tenths of a second. Here's a for-next (e.g., no max lsn
tracking) run for reference:

Phase           Start           End             Duration
Phase 1:        10/01 13:19:53  10/01 13:19:53
Phase 2:        10/01 13:19:53  10/01 13:19:56  3 seconds
Phase 3:        10/01 13:19:56  10/01 13:21:11  1 minute, 15 seconds
Phase 4:        10/01 13:21:11  10/01 13:21:22  11 seconds
Phase 5:        10/01 13:21:22  10/01 13:21:23  1 second
Phase 6:        10/01 13:21:23  10/01 13:21:57  34 seconds
Phase 7:        10/01 13:21:57  10/01 13:21:57

So I'm not seeing much difference here with the max lsn tracking as it
is implemented in this series. Out of curiosity, I ran a v3.2.2
xfs_repair binary that happened to be installed on this host, got a much
faster result than even the current master, and via perf diff discovered
that the biggest difference between the runs was actual CRC calculation.
Based on that, I ran the same crc=0 test against the current code with
the following results:

Phase           Start           End             Duration
Phase 1:        10/01 13:53:49  10/01 13:53:49
Phase 2:        10/01 13:53:49  10/01 13:53:50  1 second
Phase 3:        10/01 13:53:50  10/01 13:54:52  1 minute, 2 seconds
Phase 4:        10/01 13:54:52  10/01 13:55:01  9 seconds
Phase 5:        10/01 13:55:01  10/01 13:55:01
Phase 6:        10/01 13:55:01  10/01 13:55:35  34 seconds
Phase 7:        10/01 13:55:35  10/01 13:55:35

... so that knocks off another 15s or so from the test. Note that the
lsn lock is irrelevant in the crc=0 case as there are no metadata LSNs,
thus no verification occurs.

All in all, I can't really reproduce any tangible degradation due to the
maxlsn lock and I don't really want to prematurely optimize it if it's
not a contention point in practice. Thoughts? If you get a chance, care
to give this code a quick run under your xfs_repair test environment? If
you can reproduce something there, I can continue to try and figure out
what might be different in my test.

Brian

[1] xfs_repair -o bhash=100101 -v -v -t 1 -f <file>

> $ cat ~/tests/fsmark-50-test-xfs.sh 
> #!/bin/bash
> 
> QUOTA=
> MKFSOPTS=
> NFILES=100000
> DEV=/dev/vdc
> LOGBSIZE=256k
> 
> while [ $# -gt 0 ]; do
>         case "$1" in
>         -q)     QUOTA="uquota,gquota,pquota" ;;
>         -N)     NFILES=$2 ; shift ;;
>         -d)     DEV=$2 ; shift ;;
>         -l)     LOGBSIZE=$2; shift ;;
>         --)     shift ; break ;;
>         esac
>         shift
> done
> MKFSOPTS="$MKFSOPTS $*"
> 
> echo QUOTA=$QUOTA
> echo MKFSOPTS=$MKFSOPTS
> echo DEV=$DEV
> 
> sudo umount /mnt/scratch > /dev/null 2>&1
> sudo mkfs.xfs -f $MKFSOPTS $DEV
> sudo mount -o nobarrier,logbsize=$LOGBSIZE,$QUOTA $DEV /mnt/scratch
> sudo chmod 777 /mnt/scratch
> cd /home/dave/src/fs_mark-3.3/
> sudo sh -c "echo 1 > /proc/sys/fs/xfs/stats_clear"
> time ./fs_mark  -D  10000  -S0  -n  $NFILES  -s  0  -L  32 \
>         -d  /mnt/scratch/0  -d  /mnt/scratch/1 \
>         -d  /mnt/scratch/2  -d  /mnt/scratch/3 \
>         -d  /mnt/scratch/4  -d  /mnt/scratch/5 \
>         -d  /mnt/scratch/6  -d  /mnt/scratch/7 \
>         -d  /mnt/scratch/8  -d  /mnt/scratch/9 \
>         -d  /mnt/scratch/10  -d  /mnt/scratch/11 \
>         -d  /mnt/scratch/12  -d  /mnt/scratch/13 \
>         -d  /mnt/scratch/14  -d  /mnt/scratch/15 \
>         | tee >(stats --trim-outliers | tail -1 1>&2)
> sync
> 
> echo Repair
> sudo umount /mnt/scratch
> time sudo xfs_repair -o bhash=100101 -v -v -t 1 $DEV
> time sudo mount -o nobarrier,logbsize=$LOGBSIZE,$QUOTA $DEV /mnt/scratch
> 
> echo bulkstat files
> 
> time (
>         sudo ~/src/xfstests-dev/src/bstat  -q /mnt/scratch 1024 | wc -l
> )
> 
> echo walking files
> ~/tests/walk-scratch.sh
> 
> echo removing files
> for f in /mnt/scratch/* ; do time rm -rf $f &  done
> wait
> 
> sudo umount /mnt/scratch
> $
> 
> -- 
> Dave Chinner
> david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 02/12] libxfs: track largest metadata LSN in use via verifiers
  2015-10-01 20:38         ` Brian Foster
@ 2015-10-02  2:16           ` Dave Chinner
  2015-10-02 11:33             ` Brian Foster
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2015-10-02  2:16 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Thu, Oct 01, 2015 at 04:38:51PM -0400, Brian Foster wrote:
> On Thu, Sep 24, 2015 at 08:36:25AM +1000, Dave Chinner wrote:
> > On Wed, Sep 23, 2015 at 09:18:31AM -0400, Brian Foster wrote:
> > > > This will have the same lock contention problems that the kernel
> > > > code would have had - my repair scalablity tests regularly reach
> > > > over 1GB/s of metadata being prefetched through tens of threads, so
> > > > this is going have a significant impact on performance in those
> > > > tests....
> ...
> > 
> > > I'll have to think about this some more and see what's effective. I'd
> > > also like to quantify the effect the current locking has on performance
> > > if possible. Can you provide a brief description of your typical repair
> > > test that you would expect this to hurt? E.g., a large fs, many AGs,
> > > populated with fs_mark and repaired with many threads..? Any special
> > > storage configuration? Thanks.
> > 
> > Just my usual 500TB fs_mark test...
> 
> Thanks for the test information and sample results. I wasn't able to get
> close enough to the base numbers you mentioned on IRC with the spinning
> rust storage I have available. Instead, I tried running something
> similar using a large ramdisk as a backing store. I have a 500T sparse
> file formatted with XFS and populated with ~25m inodes that uses roughly
> ~16GB of the backing store (leaving another 16GB of usable RAM for the
> server).

Keep in mind when doing this sort of testing that a ramdisk does not
perform like normal storage - it does synchronous IO via memcpy()
rather than async IO via DMA, and hence has a very different CPU
cache footprint, concurrency and latency profile to normal IO.

I'm not saying your numbers are invalid, just making sure that you
remember that ramdisks, while fast, may not give results that are
representative of normal storage behaviour...

> I run xfs_repair[1] against that 500TB fs and see spikes of throughput up
> over 2GB/s and get repair result reports like the following:

<snip>

> So I'm not seeing much difference here with the max lsn tracking as it
> is implemented in this series.

Yup, same here.  min/max spread between the 4.2.0 debian unstable
package and a locally built 4.2.0+lsn tracking is the same. Both
varied between 2m55s and 3m5s, all used roughly the same user and
system CPU.  So we'll commit it as is ;)

> Out of curiosity, I ran a v3.2.2
> xfs_repair binary that happened to be installed on this host, got a much
> faster result than even the current master, and via perf diff discovered
> that the biggest difference between the runs was actual CRC calculation.

I think you might have drawn the incorrect conclusion from the
profile data...

> Based on that, I ran the same crc=0 test against the current code with
> the following results:
> 
> Phase           Start           End             Duration
> Phase 1:        10/01 13:53:49  10/01 13:53:49
> Phase 2:        10/01 13:53:49  10/01 13:53:50  1 second
> Phase 3:        10/01 13:53:50  10/01 13:54:52  1 minute, 2 seconds
> Phase 4:        10/01 13:54:52  10/01 13:55:01  9 seconds
> Phase 5:        10/01 13:55:01  10/01 13:55:01
> Phase 6:        10/01 13:55:01  10/01 13:55:35  34 seconds
> Phase 7:        10/01 13:55:35  10/01 13:55:35
> 
> ... so that knocks off another 15s or so from the test. Note that the
> lsn lock is irrelevant in the crc=0 case as there are no metadata LSNs,
> thus no verification occurs.

My results:
			    fsmark creation
mkfs (4.2.0)		files/s (avg)	wall time	repair
defaults		203209		4m57s		2m55s-3m05s
-m crc=0		188837		5m06s		2m07s-2m10s

On first glance, I've just replicated your results. But the PCP
monitoring I run all the time tell a different story.  Both
xfs_repair runs are io bound during phase 3 and 4, but the crc=0 run
does a *lot less IO*.

The reason for this will be obvious when I tell you: inodes are 512
bytes when CRCs are enabled, 256 bytes when they aren't. Hence
there's twice as much metadata to be read when CRCs are enabled, and
when it is read at a fixed rate, it takes twice as long to read.

The trap here is that perf profiles tend to present
"percentage of CPU used by a function" rather than absolute CPU
consumed by the function across a test run. That's why you might
think that the CPU consumed by CRCs is responsible for the
difference in performance - it's the only really obvious difference
in the profile. What I think you'll find if you look deeper is that
the time spent in memcpy() deep in the ramdisk IO path has doubled
and that's where most of the additional time comes from.

Apples to apples comparison at the IO level on my test rig:

			    fsmark creation
mkfs			files/s (avg)	wall time	repair
defaults		203209		4m57s		2m55s-3m05s
-m crc=0		188837		5m06s		2m07s-2m10s
-m crc=0 -i size=512	194331		5m00s		3m00s-3m10s

Yup, when doing IO-equivalent testing, CRCs have no measurable
wall time impact on xfs_repair runtime.

Performance analysis is tricky stuff.... ;)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 02/12] libxfs: track largest metadata LSN in use via verifiers
  2015-10-02  2:16           ` Dave Chinner
@ 2015-10-02 11:33             ` Brian Foster
  0 siblings, 0 replies; 23+ messages in thread
From: Brian Foster @ 2015-10-02 11:33 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Fri, Oct 02, 2015 at 12:16:34PM +1000, Dave Chinner wrote:
> On Thu, Oct 01, 2015 at 04:38:51PM -0400, Brian Foster wrote:
> > On Thu, Sep 24, 2015 at 08:36:25AM +1000, Dave Chinner wrote:
> > > On Wed, Sep 23, 2015 at 09:18:31AM -0400, Brian Foster wrote:
> > > > > This will have the same lock contention problems that the kernel
> > > > > code would have had - my repair scalablity tests regularly reach
> > > > > over 1GB/s of metadata being prefetched through tens of threads, so
> > > > > this is going have a significant impact on performance in those
> > > > > tests....
> > ...
> > > 
> > > > I'll have to think about this some more and see what's effective. I'd
> > > > also like to quantify the effect the current locking has on performance
> > > > if possible. Can you provide a brief description of your typical repair
> > > > test that you would expect this to hurt? E.g., a large fs, many AGs,
> > > > populated with fs_mark and repaired with many threads..? Any special
> > > > storage configuration? Thanks.
> > > 
> > > Just my usual 500TB fs_mark test...
> > 
> > Thanks for the test information and sample results. I wasn't able to get
> > close enough to the base numbers you mentioned on IRC with the spinning
> > rust storage I have available. Instead, I tried running something
> > similar using a large ramdisk as a backing store. I have a 500T sparse
> > file formatted with XFS and populated with ~25m inodes that uses roughly
> > ~16GB of the backing store (leaving another 16GB of usable RAM for the
> > server).
> 
> Keep in mind when doing this sort of testing that a ramdisk does not
> perform like normal storage - it does synchronous IO via memcpy()
> rather than async IO via DMA, and hence has a very different CPU
> cache footprint, concurrency and latency profile to normal IO.
> 
> I'm not saying your numbers are invalid, just making sure that you
> remember that ramdisks, while fast, may not give results that are
> representative of normal storage behaviour...
> 

Yeah, good point. I kind of just got here as a last resort. I'm going to
see if I can get something set up to more easily test this kind of thing
in the future.

> > I run xfs_repair[1] against that 500TB fs and see spikes of throughput up
> > over 2GB/s and get repair result reports like the following:
> 
> <snip>
> 
> > So I'm not seeing much difference here with the max lsn tracking as it
> > is implemented in this series.
> 
> Yup, same here.  min/max spread between the 4.2.0 debian unstable
> package and a locally built 4.2.0+lsn tracking is the same. Both
> varied between 2m55s and 3m5s, all used roughly the same user and
> system CPU.  So we'll commit it as is ;)
> 

Ok, thanks for verifying that. I still have to rebase this code once the
kernel side is reviewed and make a couple minor updates as well.

> > Out of curiosity, I ran a v3.2.2
> > xfs_repair binary that happened to be installed on this host, got a much
> > faster result than even the current master, and via perf diff discovered
> > that the biggest difference between the runs was actual CRC calculation.
> 
> I think you might have drawn the incorrect conclusion from the
> profile data...
> 
> > Based on that, I ran the same crc=0 test against the current code with
> > the following results:
> > 
> > Phase           Start           End             Duration
> > Phase 1:        10/01 13:53:49  10/01 13:53:49
> > Phase 2:        10/01 13:53:49  10/01 13:53:50  1 second
> > Phase 3:        10/01 13:53:50  10/01 13:54:52  1 minute, 2 seconds
> > Phase 4:        10/01 13:54:52  10/01 13:55:01  9 seconds
> > Phase 5:        10/01 13:55:01  10/01 13:55:01
> > Phase 6:        10/01 13:55:01  10/01 13:55:35  34 seconds
> > Phase 7:        10/01 13:55:35  10/01 13:55:35
> > 
> > ... so that knocks off another 15s or so from the test. Note that the
> > lsn lock is irrelevant in the crc=0 case as there are no metadata LSNs,
> > thus no verification occurs.
> 
> My results:
> 			    fsmark creation
> mkfs (4.2.0)		files/s (avg)	wall time	repair
> defaults		203209		4m57s		2m55s-3m05s
> -m crc=0		188837		5m06s		2m07s-2m10s
> 
> On first glance, I've just replicated your results. But the PCP
> monitoring I run all the time tell a different story.  Both
> xfs_repair runs are io bound during phase 3 and 4, but the crc=0 run
> does a *lot less IO*.
> 
> The reason for this will be obvious when I tell you: inodes are 512
> bytes when CRCs are enabled, 256 bytes when they aren't. Hence
> there's twice as much metadata to be read when CRCs are enabled, and
> when it is read at a fixed rate, it takes twice as long to read.
> 

Ah, right. I probably should have formatted the crc=0 fs with isize=512.

> The trap here is that perf profiles tend to present
> "percentage of CPU used by a function" rather than absolute CPU
> consumed by the function across a test run. That's why you might
> think that the CPU consumed by CRCs is responsible for the
> difference in performance - it's the only really obvious difference
> in the profile. What I think you'll find if you look deeper is that
> the time spent in memcpy() deep in the ramdisk IO path has doubled
> and that's where most of the additional time comes from.
> 
> Apples to apples comparison at the IO level on my test rig:
> 
> 			    fsmark creation
> mkfs			files/s (avg)	wall time	repair
> defaults		203209		4m57s		2m55s-3m05s
> -m crc=0		188837		5m06s		2m07s-2m10s
> -m crc=0 -i size=512	194331		5m00s		3m00s-3m10s
> 
> Yup, when doing IO-equivalent testing, CRCs have no measurable
> wall time impact on xfs_repair runtime.
> 

Ok, that minor range shift definitely seems much more reasonable a
result. I thought the 1m or so difference I saw (on spinning storage)
seemed a bit much, enough to try and figure out what was going on there.
I was thinking a regression at first, but the crc stuff jumped out at
the top of the cpu profile so I left it at that and forgot about the
isize change. Thanks for the analysis...

Brian

> Performance analysis is tricky stuff.... ;)
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 03/12] libxfs: don't hardcode cycle 1 into unmount op header
  2015-09-11 18:55 [PATCH v2 00/12] xfsprogs: format the log correctly on v5 supers Brian Foster
  2015-09-11 18:55 ` [PATCH v2 01/12] libxfs: validate metadata LSNs against log on v5 superblocks Brian Foster
  2015-09-11 18:55 ` [PATCH v2 02/12] libxfs: track largest metadata LSN in use via verifiers Brian Foster
@ 2015-09-11 18:55 ` Brian Foster
  2015-09-23  3:48   ` Dave Chinner
  2015-09-11 18:55 ` [PATCH v2 04/12] libxfs: pass lsn param to log clear and record header logging helpers Brian Foster
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 23+ messages in thread
From: Brian Foster @ 2015-09-11 18:55 UTC (permalink / raw)
  To: xfs

The libxfs helper to write a log record after zeroing the log fills much
of the record header and unmount record with dummy data. It also
hardcodes the cycle number into the transaction oh_tid field as the
kernel expects to find the cycle stamped at the top of each block and
the original oh_tid value packed into h_cycle_data of the record header.

The log clearing code requires the ability to format the log to an
arbitrary cycle number to fix v5 superblock log recovery ordering
problems. As a result, the unmount record helper must not hardcode a
cycle of 1.

Fix up libxfs_log_header() to pack the unmount record appropriately, as
is already done for extra blocks that might exist beyond the record. Use
h_cycle_data for the original 32-bit word of the log record data block
and stamp the cycle number in its place. This allows unmount_record() to
work for arbitrary cycle numbers and libxfs_log_header() to pack a cycle
value that matches the lsn used in the record header. Note that this
patch does not change behavior as the lsn is still hardcoded to (1:0).

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 libxfs/rdwr.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index bc77699..ef18749 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -122,7 +122,7 @@ static void unmount_record(void *p)
 	} magic = { XLOG_UNMOUNT_TYPE, 0, 0 };
 
 	memset(p, 0, BBSIZE);
-	op->oh_tid = cpu_to_be32(1);
+	op->oh_tid = cpu_to_be32(0xb0c0d0d0);
 	op->oh_len = cpu_to_be32(sizeof(magic));
 	op->oh_clientid = XFS_LOG;
 	op->oh_flags = XLOG_UNMOUNT_TRANS;
@@ -188,10 +188,6 @@ libxfs_log_header(
 
 	len = ((version == 2) && sunit) ? BTOBB(sunit) : 1;
 
-	/* note that oh_tid actually contains the cycle number
-	 * and the tid is stored in h_cycle_data[0] - that's the
-	 * way things end up on disk.
-	 */
 	memset(p, 0, BBSIZE);
 	head->h_magicno = cpu_to_be32(XLOG_HEADER_MAGIC_NUM);
 	head->h_cycle = cpu_to_be32(1);
@@ -203,7 +199,6 @@ libxfs_log_header(
 	head->h_crc = cpu_to_le32(0);
 	head->h_prev_block = cpu_to_be32(-1);
 	head->h_num_logops = cpu_to_be32(1);
-	head->h_cycle_data[0] = cpu_to_be32(0xb0c0d0d0);
 	head->h_fmt = cpu_to_be32(fmt);
 	head->h_size = cpu_to_be32(XLOG_HEADER_CYCLE_SIZE);
 
@@ -212,11 +207,25 @@ libxfs_log_header(
 
 	memcpy(&head->h_fs_uuid, fs_uuid, sizeof(uuid_t));
 
-	len = MAX(len, 2);
 	p = nextfunc(p, BBSIZE, private);
 	unmount_record(p);
 
+	/*
+	 * The kernel expects to see either a log record header magic or the LSN
+	 * cycle at the top of every log block (for example, see
+	 * xlog_[un]pack_data() and xlog_get_cycle()). Pack the unmount record
+	 * block appropriately here.
+	 */
 	cycle_lsn = CYCLE_LSN_DISK(head->h_lsn);
+	head->h_cycle_data[0] = *(__be32 *)p;
+	*(__be32 *)p = cycle_lsn;
+
+	/*
+	 * Now zero any remaining blocks in the record and stamp with the cycle.
+	 * Note that we don't need to swap into h_cycle_data because it has
+	 * already been initialized to zero.
+	 */
+	len = MAX(len, 2);
 	for (i = 2; i < len; i++) {
 		p = nextfunc(p, BBSIZE, private);
 		memset(p, 0, BBSIZE);
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 03/12] libxfs: don't hardcode cycle 1 into unmount op header
  2015-09-11 18:55 ` [PATCH v2 03/12] libxfs: don't hardcode cycle 1 into unmount op header Brian Foster
@ 2015-09-23  3:48   ` Dave Chinner
  2015-09-23 13:22     ` Brian Foster
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2015-09-23  3:48 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Fri, Sep 11, 2015 at 02:55:33PM -0400, Brian Foster wrote:
> The libxfs helper to write a log record after zeroing the log fills much
> of the record header and unmount record with dummy data. It also
> hardcodes the cycle number into the transaction oh_tid field as the
> kernel expects to find the cycle stamped at the top of each block and
> the original oh_tid value packed into h_cycle_data of the record header.
> 
> The log clearing code requires the ability to format the log to an
> arbitrary cycle number to fix v5 superblock log recovery ordering
> problems. As a result, the unmount record helper must not hardcode a
> cycle of 1.
> 
> Fix up libxfs_log_header() to pack the unmount record appropriately, as
> is already done for extra blocks that might exist beyond the record. Use
> h_cycle_data for the original 32-bit word of the log record data block
> and stamp the cycle number in its place. This allows unmount_record() to
> work for arbitrary cycle numbers and libxfs_log_header() to pack a cycle
> value that matches the lsn used in the record header. Note that this
> patch does not change behavior as the lsn is still hardcoded to (1:0).
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  libxfs/rdwr.c | 23 ++++++++++++++++-------
>  1 file changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
> index bc77699..ef18749 100644
> --- a/libxfs/rdwr.c
> +++ b/libxfs/rdwr.c
> @@ -122,7 +122,7 @@ static void unmount_record(void *p)
>  	} magic = { XLOG_UNMOUNT_TYPE, 0, 0 };
>  
>  	memset(p, 0, BBSIZE);
> -	op->oh_tid = cpu_to_be32(1);
> +	op->oh_tid = cpu_to_be32(0xb0c0d0d0);
>  	op->oh_len = cpu_to_be32(sizeof(magic));
>  	op->oh_clientid = XFS_LOG;
>  	op->oh_flags = XLOG_UNMOUNT_TRANS;
> @@ -188,10 +188,6 @@ libxfs_log_header(
>  
>  	len = ((version == 2) && sunit) ? BTOBB(sunit) : 1;
>  
> -	/* note that oh_tid actually contains the cycle number
> -	 * and the tid is stored in h_cycle_data[0] - that's the
> -	 * way things end up on disk.
> -	 */

This note needs to be hoisted up to the  setting of op->oh_tid to
explain the magic number being used...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 03/12] libxfs: don't hardcode cycle 1 into unmount op header
  2015-09-23  3:48   ` Dave Chinner
@ 2015-09-23 13:22     ` Brian Foster
  2015-09-24  0:37       ` Dave Chinner
  0 siblings, 1 reply; 23+ messages in thread
From: Brian Foster @ 2015-09-23 13:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Sep 23, 2015 at 01:48:44PM +1000, Dave Chinner wrote:
> On Fri, Sep 11, 2015 at 02:55:33PM -0400, Brian Foster wrote:
> > The libxfs helper to write a log record after zeroing the log fills much
> > of the record header and unmount record with dummy data. It also
> > hardcodes the cycle number into the transaction oh_tid field as the
> > kernel expects to find the cycle stamped at the top of each block and
> > the original oh_tid value packed into h_cycle_data of the record header.
> > 
> > The log clearing code requires the ability to format the log to an
> > arbitrary cycle number to fix v5 superblock log recovery ordering
> > problems. As a result, the unmount record helper must not hardcode a
> > cycle of 1.
> > 
> > Fix up libxfs_log_header() to pack the unmount record appropriately, as
> > is already done for extra blocks that might exist beyond the record. Use
> > h_cycle_data for the original 32-bit word of the log record data block
> > and stamp the cycle number in its place. This allows unmount_record() to
> > work for arbitrary cycle numbers and libxfs_log_header() to pack a cycle
> > value that matches the lsn used in the record header. Note that this
> > patch does not change behavior as the lsn is still hardcoded to (1:0).
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  libxfs/rdwr.c | 23 ++++++++++++++++-------
> >  1 file changed, 16 insertions(+), 7 deletions(-)
> > 
> > diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
> > index bc77699..ef18749 100644
> > --- a/libxfs/rdwr.c
> > +++ b/libxfs/rdwr.c
> > @@ -122,7 +122,7 @@ static void unmount_record(void *p)
> >  	} magic = { XLOG_UNMOUNT_TYPE, 0, 0 };
> >  
> >  	memset(p, 0, BBSIZE);
> > -	op->oh_tid = cpu_to_be32(1);
> > +	op->oh_tid = cpu_to_be32(0xb0c0d0d0);
> >  	op->oh_len = cpu_to_be32(sizeof(magic));
> >  	op->oh_clientid = XFS_LOG;
> >  	op->oh_flags = XLOG_UNMOUNT_TRANS;
> > @@ -188,10 +188,6 @@ libxfs_log_header(
> >  
> >  	len = ((version == 2) && sunit) ? BTOBB(sunit) : 1;
> >  
> > -	/* note that oh_tid actually contains the cycle number
> > -	 * and the tid is stored in h_cycle_data[0] - that's the
> > -	 * way things end up on disk.
> > -	 */
> 
> This note needs to be hoisted up to the  setting of op->oh_tid to
> explain the magic number being used...
> 

This requires that I understand what the magic number being used
actually means. Any ideas? ;)

I just assumed it was a dummy tid value. The comment removed above just
explains why that value is being put where it is (cycle value in oh_tid
and tid value in h_cycle_data) as the original code implicitly
implements the cycle data packing. That is undone by this patch. The
packing is now done explicitly (with its own comments) in the caller and
thus the original comment is irrelevant.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 03/12] libxfs: don't hardcode cycle 1 into unmount op header
  2015-09-23 13:22     ` Brian Foster
@ 2015-09-24  0:37       ` Dave Chinner
  2015-09-24 13:00         ` Brian Foster
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Chinner @ 2015-09-24  0:37 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Wed, Sep 23, 2015 at 09:22:00AM -0400, Brian Foster wrote:
> On Wed, Sep 23, 2015 at 01:48:44PM +1000, Dave Chinner wrote:
> > On Fri, Sep 11, 2015 at 02:55:33PM -0400, Brian Foster wrote:
> > > The libxfs helper to write a log record after zeroing the log fills much
> > > of the record header and unmount record with dummy data. It also
> > > hardcodes the cycle number into the transaction oh_tid field as the
> > > kernel expects to find the cycle stamped at the top of each block and
> > > the original oh_tid value packed into h_cycle_data of the record header.
> > > 
> > > The log clearing code requires the ability to format the log to an
> > > arbitrary cycle number to fix v5 superblock log recovery ordering
> > > problems. As a result, the unmount record helper must not hardcode a
> > > cycle of 1.
> > > 
> > > Fix up libxfs_log_header() to pack the unmount record appropriately, as
> > > is already done for extra blocks that might exist beyond the record. Use
> > > h_cycle_data for the original 32-bit word of the log record data block
> > > and stamp the cycle number in its place. This allows unmount_record() to
> > > work for arbitrary cycle numbers and libxfs_log_header() to pack a cycle
> > > value that matches the lsn used in the record header. Note that this
> > > patch does not change behavior as the lsn is still hardcoded to (1:0).
> > > 
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > >  libxfs/rdwr.c | 23 ++++++++++++++++-------
> > >  1 file changed, 16 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
> > > index bc77699..ef18749 100644
> > > --- a/libxfs/rdwr.c
> > > +++ b/libxfs/rdwr.c
> > > @@ -122,7 +122,7 @@ static void unmount_record(void *p)
> > >  	} magic = { XLOG_UNMOUNT_TYPE, 0, 0 };
> > >  
> > >  	memset(p, 0, BBSIZE);
> > > -	op->oh_tid = cpu_to_be32(1);
> > > +	op->oh_tid = cpu_to_be32(0xb0c0d0d0);
> > >  	op->oh_len = cpu_to_be32(sizeof(magic));
> > >  	op->oh_clientid = XFS_LOG;
> > >  	op->oh_flags = XLOG_UNMOUNT_TRANS;
> > > @@ -188,10 +188,6 @@ libxfs_log_header(
> > >  
> > >  	len = ((version == 2) && sunit) ? BTOBB(sunit) : 1;
> > >  
> > > -	/* note that oh_tid actually contains the cycle number
> > > -	 * and the tid is stored in h_cycle_data[0] - that's the
> > > -	 * way things end up on disk.
> > > -	 */
> > 
> > This note needs to be hoisted up to the  setting of op->oh_tid to
> > explain the magic number being used...
> > 
> 
> This requires that I understand what the magic number being used
> actually means. Any ideas? ;)

When this gets unpacked by log recovery, the head->h_cycle_data[0]
value gets written back over the op->oh_tid of the unmount record,
and so we see an unmount record with a transaction ID of 0xb0c0d0d0.
On seeing that magic number in an unmount record we know it was
written by userspace and whatever was in the log is now ancient
history.

> I just assumed it was a dummy tid value.

It's a canary (sort of).

*ding*
*ding*
*ding*

Obscure Magic Number Reference Acheivement Unlocked!

[BC: ancient history. Dodo: a dead canary (sort of). ;) ]

> The comment removed above just
> explains why that value is being put where it is (cycle value in oh_tid
> and tid value in h_cycle_data) as the original code implicitly
> implements the cycle data packing. That is undone by this patch. The
> packing is now done explicitly (with its own comments) in the caller and
> thus the original comment is irrelevant.

Fair enough - I didn't connect the two bits properly.

Hmmm - this code does not CRC the unmount record on disk and we
don't validate the unmount record CRC in the kernel as valid in the
kernel before we use it, because it never gets to the unpack stage;
we just look to see if ophdr->oh_flags contains XLOG_UNMOUNT_TRANS
and then we use it...

If we are writing a new lsn into it now, should we be CRCing this
unmount record?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 03/12] libxfs: don't hardcode cycle 1 into unmount op header
  2015-09-24  0:37       ` Dave Chinner
@ 2015-09-24 13:00         ` Brian Foster
  0 siblings, 0 replies; 23+ messages in thread
From: Brian Foster @ 2015-09-24 13:00 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, Sep 24, 2015 at 10:37:35AM +1000, Dave Chinner wrote:
> On Wed, Sep 23, 2015 at 09:22:00AM -0400, Brian Foster wrote:
> > On Wed, Sep 23, 2015 at 01:48:44PM +1000, Dave Chinner wrote:
> > > On Fri, Sep 11, 2015 at 02:55:33PM -0400, Brian Foster wrote:
...
> > > > --- a/libxfs/rdwr.c
> > > > +++ b/libxfs/rdwr.c
> > > > @@ -122,7 +122,7 @@ static void unmount_record(void *p)
> > > >  	} magic = { XLOG_UNMOUNT_TYPE, 0, 0 };
> > > >  
> > > >  	memset(p, 0, BBSIZE);
> > > > -	op->oh_tid = cpu_to_be32(1);
> > > > +	op->oh_tid = cpu_to_be32(0xb0c0d0d0);
> > > >  	op->oh_len = cpu_to_be32(sizeof(magic));
> > > >  	op->oh_clientid = XFS_LOG;
> > > >  	op->oh_flags = XLOG_UNMOUNT_TRANS;
> > > > @@ -188,10 +188,6 @@ libxfs_log_header(
> > > >  
> > > >  	len = ((version == 2) && sunit) ? BTOBB(sunit) : 1;
> > > >  
> > > > -	/* note that oh_tid actually contains the cycle number
> > > > -	 * and the tid is stored in h_cycle_data[0] - that's the
> > > > -	 * way things end up on disk.
> > > > -	 */
> > > 
> > > This note needs to be hoisted up to the  setting of op->oh_tid to
> > > explain the magic number being used...
> > > 
> > 
> > This requires that I understand what the magic number being used
> > actually means. Any ideas? ;)
> 
> When this gets unpacked by log recovery, the head->h_cycle_data[0]
> value gets written back over the op->oh_tid of the unmount record,
> and so we see an unmount record with a transaction ID of 0xb0c0d0d0.
> On seeing that magic number in an unmount record we know it was
> written by userspace and whatever was in the log is now ancient
> history.
> 
> > I just assumed it was a dummy tid value.
> 
> It's a canary (sort of).
> 
> *ding*
> *ding*
> *ding*
> 
> Obscure Magic Number Reference Acheivement Unlocked!
> 
> [BC: ancient history. Dodo: a dead canary (sort of). ;) ]
> 

Hah, I figured it had to be some kind of obscure reference. ;)

> > The comment removed above just
> > explains why that value is being put where it is (cycle value in oh_tid
> > and tid value in h_cycle_data) as the original code implicitly
> > implements the cycle data packing. That is undone by this patch. The
> > packing is now done explicitly (with its own comments) in the caller and
> > thus the original comment is irrelevant.
> 
> Fair enough - I didn't connect the two bits properly.
> 
> Hmmm - this code does not CRC the unmount record on disk and we
> don't validate the unmount record CRC in the kernel as valid in the
> kernel before we use it, because it never gets to the unpack stage;
> we just look to see if ophdr->oh_flags contains XLOG_UNMOUNT_TRANS
> and then we use it...
> 
> If we are writing a new lsn into it now, should we be CRCing this
> unmount record?
> 

Good question. I'm not so sure it's that important to crc the record in
this particular context. We're reformatting the log and clearing any
real data, after all.

That said, when skimming back through the kernel code I had a faint
recollection of why I started all of this log recovery work in the first
place. ;) The previous EFI/EFD logging fixes, umount hang fixes and
these LSN fixes all spilled out from the inability to effectively test
torn log write detection in an effort to support XFS on pmem. Recall our
previous discussion here:

  http://oss.sgi.com/pipermail/xfs/2015-July/042415.html

I believe I made progress on that work with respect to the
aforementioned discussion, but it was still very much in flux and all of
this stuff got in the way of starting to test what I had at the time.
The short of it is that due to that work, we'll end up doing crc
verification of the head of the log before the real log recovery
actually starts. I don't quite recall whether this incorporated crc
verification of the unmount record in the clean log case (I suspect
not), but I can add that as a TODO item to look into once I get back to
that work. If there's a good enough reason to do that in the kernel, we
could certainly revisit the userspace log formatting code to set a crc.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 04/12] libxfs: pass lsn param to log clear and record header logging helpers
  2015-09-11 18:55 [PATCH v2 00/12] xfsprogs: format the log correctly on v5 supers Brian Foster
                   ` (2 preceding siblings ...)
  2015-09-11 18:55 ` [PATCH v2 03/12] libxfs: don't hardcode cycle 1 into unmount op header Brian Foster
@ 2015-09-11 18:55 ` Brian Foster
  2015-09-11 18:55 ` [PATCH v2 05/12] libxfs: add ability to clear log to arbitrary log cycle Brian Foster
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Brian Foster @ 2015-09-11 18:55 UTC (permalink / raw)
  To: xfs

In preparation to support the ability to format the log with an
arbitrary cycle number, the log clear and record logging helpers must be
updated to receive the desired cycle and LSN values as parameters.

Update libxfs_log_clear() to receive the desired cycle number to format
the log with. Define a preprocessor directive to represent the currently
hardcoded case of cycle 1. Update libxfs_log_header() to receive the lsn
and tail_lsn of the record to write. Use a NULL value LSN to represent
the currently hardcoded behavior.

All callers are updated to use the current default values. As such, this
patch does not change behavior in any way.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 copy/xfs_copy.c  |  4 ++--
 db/sb.c          |  2 +-
 include/libxfs.h | 12 ++++++++----
 libxfs/rdwr.c    | 26 ++++++++++++++++++++------
 mkfs/xfs_mkfs.c  |  2 +-
 repair/phase2.c  |  2 +-
 6 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/copy/xfs_copy.c b/copy/xfs_copy.c
index 2f4f5cb..949be5f 100644
--- a/copy/xfs_copy.c
+++ b/copy/xfs_copy.c
@@ -1212,8 +1212,8 @@ write_log_header(int fd, wbuf *buf, xfs_mount_t *mp)
 
 	offset = libxfs_log_header(p, &buf->owner->uuid,
 			xfs_sb_version_haslogv2(&mp->m_sb) ? 2 : 1,
-			mp->m_sb.sb_logsunit, XLOG_FMT,
-			next_log_chunk, buf);
+			mp->m_sb.sb_logsunit, XLOG_FMT, NULLCOMMITLSN,
+			NULLCOMMITLSN, next_log_chunk, buf);
 	do_write(buf->owner);
 
 	return roundup(logstart + offset, buf->length);
diff --git a/db/sb.c b/db/sb.c
index 598e787..560e653 100644
--- a/db/sb.c
+++ b/db/sb.c
@@ -278,7 +278,7 @@ sb_logzero(uuid_t *uuidp)
 			(xfs_extlen_t)XFS_FSB_TO_BB(mp, mp->m_sb.sb_logblocks),
 			uuidp,
 			xfs_sb_version_haslogv2(&mp->m_sb) ? 2 : 1,
-			mp->m_sb.sb_logsunit, XLOG_FMT)) {
+			mp->m_sb.sb_logsunit, XLOG_FMT, XLOG_INIT_CYCLE)) {
 		dbprintf(_("ERROR: cannot clear the log\n"));
 		return 0;
 	}
diff --git a/include/libxfs.h b/include/libxfs.h
index cc06fc6..6c87934 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -149,10 +149,14 @@ extern int	platform_nproc(void);
 /* check or write log footer: specify device, log size in blocks & uuid */
 typedef char	*(libxfs_get_block_t)(char *, int, void *);
 
-extern int	libxfs_log_clear (struct xfs_buftarg *, xfs_daddr_t, uint,
-				uuid_t *, int, int, int);
-extern int	libxfs_log_header (char *, uuid_t *, int, int, int,
-				libxfs_get_block_t *, void *);
+/*
+ * Helpers to clear the log to a particular log cycle.
+ */
+#define XLOG_INIT_CYCLE	1
+extern int	libxfs_log_clear(struct xfs_buftarg *, xfs_daddr_t, uint,
+				 uuid_t *, int, int, int, int);
+extern int	libxfs_log_header(char *, uuid_t *, int, int, int, xfs_lsn_t,
+				  xfs_lsn_t, libxfs_get_block_t *, void *);
 
 
 /* Shared utility routines */
diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index ef18749..c64dba0 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -149,14 +149,21 @@ libxfs_log_clear(
 	uuid_t			*fs_uuid,
 	int			version,
 	int			sunit,
-	int			fmt)
+	int			fmt,
+	int			cycle)
 {
 	xfs_buf_t		*bp;
 	int			len;
+	xfs_lsn_t		lsn;
 
 	if (!btp->dev || !fs_uuid)
 		return -EINVAL;
 
+	if (cycle != XLOG_INIT_CYCLE)
+		return -EINVAL;
+
+	lsn = xlog_assign_lsn(cycle, 0);
+
 	/* first zero the log */
 	libxfs_device_zero(btp, start, length);
 
@@ -164,8 +171,8 @@ libxfs_log_clear(
 	len = ((version == 2) && sunit) ? BTOBB(sunit) : 2;
 	len = MAX(len, 2);
 	bp = libxfs_getbufr(btp, start, len);
-	libxfs_log_header(XFS_BUF_PTR(bp),
-			  fs_uuid, version, sunit, fmt, next, bp);
+	libxfs_log_header(XFS_BUF_PTR(bp), fs_uuid, version, sunit, fmt, lsn,
+			  lsn, next, bp);
 	bp->b_flags |= LIBXFS_B_DIRTY;
 	libxfs_putbufr(bp);
 	return 0;
@@ -178,6 +185,8 @@ libxfs_log_header(
 	int			version,
 	int			sunit,
 	int			fmt,
+	xfs_lsn_t		lsn,
+	xfs_lsn_t		tail_lsn,
 	libxfs_get_block_t	*nextfunc,
 	void			*private)
 {
@@ -186,11 +195,16 @@ libxfs_log_header(
 	__be32			cycle_lsn;
 	int			i, len;
 
+	if (lsn == NULLCOMMITLSN)
+		lsn = xlog_assign_lsn(XLOG_INIT_CYCLE, 0);
+	if (tail_lsn == NULLCOMMITLSN)
+		tail_lsn = lsn;
+
 	len = ((version == 2) && sunit) ? BTOBB(sunit) : 1;
 
 	memset(p, 0, BBSIZE);
 	head->h_magicno = cpu_to_be32(XLOG_HEADER_MAGIC_NUM);
-	head->h_cycle = cpu_to_be32(1);
+	head->h_cycle = cpu_to_be32(CYCLE_LSN(lsn));
 	head->h_version = cpu_to_be32(version);
 	if (len != 1)
 		head->h_len = cpu_to_be32(sunit - BBSIZE);
@@ -202,8 +216,8 @@ libxfs_log_header(
 	head->h_fmt = cpu_to_be32(fmt);
 	head->h_size = cpu_to_be32(XLOG_HEADER_CYCLE_SIZE);
 
-	head->h_lsn = cpu_to_be64(xlog_assign_lsn(1, 0));
-	head->h_tail_lsn = cpu_to_be64(xlog_assign_lsn(1, 0));
+	head->h_lsn = cpu_to_be64(lsn);
+	head->h_tail_lsn = cpu_to_be64(tail_lsn);
 
 	memcpy(&head->h_fs_uuid, fs_uuid, sizeof(uuid_t));
 
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index d993fc0..238d400 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -2670,7 +2670,7 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 	libxfs_log_clear(mp->m_logdev_targp,
 		XFS_FSB_TO_DADDR(mp, logstart),
 		(xfs_extlen_t)XFS_FSB_TO_BB(mp, logblocks),
-		&sbp->sb_uuid, logversion, lsunit, XLOG_FMT);
+		&sbp->sb_uuid, logversion, lsunit, XLOG_FMT, XLOG_INIT_CYCLE);
 
 	mp = libxfs_mount(mp, sbp, xi.ddev, xi.logdev, xi.rtdev, 0);
 	if (mp == NULL) {
diff --git a/repair/phase2.c b/repair/phase2.c
index 7e264c4..0673a0c 100644
--- a/repair/phase2.c
+++ b/repair/phase2.c
@@ -98,7 +98,7 @@ zero_log(xfs_mount_t *mp)
 		(xfs_extlen_t)XFS_FSB_TO_BB(mp, mp->m_sb.sb_logblocks),
 		&mp->m_sb.sb_uuid,
 		xfs_sb_version_haslogv2(&mp->m_sb) ? 2 : 1,
-		mp->m_sb.sb_logsunit, XLOG_FMT);
+		mp->m_sb.sb_logsunit, XLOG_FMT, XLOG_INIT_CYCLE);
 }
 
 /*
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 05/12] libxfs: add ability to clear log to arbitrary log cycle
  2015-09-11 18:55 [PATCH v2 00/12] xfsprogs: format the log correctly on v5 supers Brian Foster
                   ` (3 preceding siblings ...)
  2015-09-11 18:55 ` [PATCH v2 04/12] libxfs: pass lsn param to log clear and record header logging helpers Brian Foster
@ 2015-09-11 18:55 ` Brian Foster
  2015-09-11 18:55 ` [PATCH v2 06/12] libxlog: pull struct xlog out of xlog_is_dirty() Brian Foster
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Brian Foster @ 2015-09-11 18:55 UTC (permalink / raw)
  To: xfs

The libxfs_log_clear() helper currently zeroes the log and writes a
single log record such that the kernel code detects the log has been
zeroed and mounts successfully. This is not sufficient for v5
filesystems, which must have the log cleared to an LSN that is
guaranteed to be ahead of any LSN that has been previously stamped into
on-disk metadata.

Update libxfs_log_clear() to support the ability to format the log to an
arbitrary cycle number. First, the log is physically zeroed. A log
record is written to the first block of the log with the desired lsn and
refers to the tail_lsn as the last record of the previous cycle. The
rest of the log is filled with log records of the previous cycle. This
causes the kernel to set the current LSN to start of the desired cycle
number at mount time.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 libxfs/rdwr.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 61 insertions(+), 6 deletions(-)

diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index c64dba0..d87baec 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -155,26 +155,81 @@ libxfs_log_clear(
 	xfs_buf_t		*bp;
 	int			len;
 	xfs_lsn_t		lsn;
+	xfs_lsn_t		tail_lsn;
+	xfs_daddr_t		blk;
+	xfs_daddr_t		end_blk;
 
 	if (!btp->dev || !fs_uuid)
 		return -EINVAL;
 
-	if (cycle != XLOG_INIT_CYCLE)
+	if (cycle < XLOG_INIT_CYCLE)
 		return -EINVAL;
 
-	lsn = xlog_assign_lsn(cycle, 0);
-
 	/* first zero the log */
 	libxfs_device_zero(btp, start, length);
 
-	/* then write a log record header */
+	/*
+	 * Initialize the log record length and LSNs. XLOG_INIT_CYCLE is a
+	 * special reset case where we only write a single record where the lsn
+	 * and tail_lsn match. Otherwise, the record lsn starts at block 0 of
+	 * the specified cycle and points tail_lsn at the last record of the
+	 * previous cycle.
+	 */
 	len = ((version == 2) && sunit) ? BTOBB(sunit) : 2;
 	len = MAX(len, 2);
+	lsn = xlog_assign_lsn(cycle, 0);
+	if (cycle == XLOG_INIT_CYCLE)
+		tail_lsn = lsn;
+	else
+		tail_lsn = xlog_assign_lsn(cycle - 1, length - len);
+
+	/* write out the first log record */
 	bp = libxfs_getbufr(btp, start, len);
-	libxfs_log_header(XFS_BUF_PTR(bp), fs_uuid, version, sunit, fmt, lsn,
-			  lsn, next, bp);
+	libxfs_log_header(XFS_BUF_PTR(bp), fs_uuid, version, sunit, fmt,
+			  lsn, tail_lsn, next, bp);
 	bp->b_flags |= LIBXFS_B_DIRTY;
 	libxfs_putbufr(bp);
+
+	/*
+	 * There's nothing else to do if this is a log reset. The kernel detects
+	 * the rest of the log is zeroed and starts at cycle 1.
+	 */
+	if (cycle == XLOG_INIT_CYCLE)
+		return 0;
+
+	/*
+	 * Otherwise, fill everything beyond the initial record with records of
+	 * the previous cycle so the kernel head/tail detection works correctly.
+	 *
+	 * We don't particularly care about the record size or content here.
+	 * It's only important that the headers are in place such that the
+	 * kernel finds 1.) a clean log and 2.) the correct current cycle value.
+	 * Therefore, bump up the record size to the max to use larger I/Os and
+	 * improve performance.
+	 */
+	cycle--;
+	blk = start + len;
+	end_blk = start + length;
+
+	len = min(end_blk - blk, BTOBB(BDSTRAT_SIZE));
+	while (blk < end_blk) {
+		lsn = xlog_assign_lsn(cycle, blk - start);
+		tail_lsn = xlog_assign_lsn(cycle, blk - start - len);
+
+		bp = libxfs_getbufr(btp, blk, len);
+		/*
+		 * Note: pass the full buffer length as the sunit to initialize
+		 * the entire buffer.
+		 */
+		libxfs_log_header(XFS_BUF_PTR(bp), fs_uuid, version, BBTOB(len),
+				  fmt, lsn, tail_lsn, next, bp);
+		bp->b_flags |= LIBXFS_B_DIRTY;
+		libxfs_putbufr(bp);
+
+		len = min(end_blk - blk, BTOBB(BDSTRAT_SIZE));
+		blk += len;
+	}
+
 	return 0;
 }
 
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 06/12] libxlog: pull struct xlog out of xlog_is_dirty()
  2015-09-11 18:55 [PATCH v2 00/12] xfsprogs: format the log correctly on v5 supers Brian Foster
                   ` (4 preceding siblings ...)
  2015-09-11 18:55 ` [PATCH v2 05/12] libxfs: add ability to clear log to arbitrary log cycle Brian Foster
@ 2015-09-11 18:55 ` Brian Foster
  2015-09-11 18:55 ` [PATCH v2 07/12] xfs_repair: track log state throughout all recovery phases Brian Foster
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Brian Foster @ 2015-09-11 18:55 UTC (permalink / raw)
  To: xfs

The xlog_is_dirty() helper is called in various places to acquire the
current log head, tail and to determine whether the log is dirty. Some
callers will require additional information to deal with formatting the
log, such as the current LSN. xlog_is_dirty() already acquires this
information through existing sub-helpers, but it is not available to
callers as the xlog structure is allocated on the local stack.

Update xlog_is_dirty() to receive the xlog structure as a parameter and
pass it along such that additional information about the log is
available to callers. Update the existing callers to allocate the xlog
structure on the stack.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 db/metadump.c     |  5 +++--
 db/sb.c           |  3 ++-
 include/libxlog.h |  3 ++-
 libxlog/util.c    | 37 +++++++++++++++++++------------------
 4 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/db/metadump.c b/db/metadump.c
index af96e12..129670e 100644
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -2512,7 +2512,8 @@ copy_sb_inodes(void)
 static int
 copy_log(void)
 {
-	int dirty;
+	struct xlog	log;
+	int		dirty;
 
 	if (show_progress)
 		print_progress("Copying log");
@@ -2530,7 +2531,7 @@ copy_log(void)
 	if (!obfuscate && !zero_stale_data)
 		goto done;
 
-	dirty = xlog_is_dirty(mp, &x, 0);
+	dirty = xlog_is_dirty(mp, &log, &x, 0);
 
 	switch (dirty) {
 	case 0:
diff --git a/db/sb.c b/db/sb.c
index 560e653..9c585ca 100644
--- a/db/sb.c
+++ b/db/sb.c
@@ -229,6 +229,7 @@ int xlog_recover_do_trans(struct xlog *log, xlog_recover_t *t, int p)
 int
 sb_logcheck(void)
 {
+	struct xlog	log;
 	int		dirty;
 
 	if (mp->m_sb.sb_logstart) {
@@ -247,7 +248,7 @@ sb_logcheck(void)
 
 	libxfs_buftarg_init(mp, x.ddev, x.logdev, x.rtdev);
 
-	dirty = xlog_is_dirty(mp, &x, 0);
+	dirty = xlog_is_dirty(mp, &log, &x, 0);
 	if (dirty == -1) {
 		dbprintf(_("ERROR: cannot find log head/tail, run xfs_repair\n"));
 		return 0;
diff --git a/include/libxlog.h b/include/libxlog.h
index 05b16e8..6c6e83a 100644
--- a/include/libxlog.h
+++ b/include/libxlog.h
@@ -84,7 +84,8 @@ extern int	print_record_header;
 extern libxfs_init_t	x;
 
 
-extern int xlog_is_dirty(xfs_mount_t *mp, libxfs_init_t *x, int verbose);
+extern int xlog_is_dirty(struct xfs_mount *, struct xlog *, libxfs_init_t *,
+			 int);
 extern struct xfs_buf *xlog_get_bp(struct xlog *, int);
 extern void	xlog_put_bp(struct xfs_buf *);
 extern int	xlog_bread(struct xlog *log, xfs_daddr_t blk_no, int nbblks,
diff --git a/libxlog/util.c b/libxlog/util.c
index c6b8f2a..72b8b18 100644
--- a/libxlog/util.c
+++ b/libxlog/util.c
@@ -29,15 +29,15 @@ libxfs_init_t x;
  */
 int
 xlog_is_dirty(
-	xfs_mount_t *mp,
-	libxfs_init_t *x,
-	int verbose)
+	struct xfs_mount	*mp,
+	struct xlog		*log,
+	libxfs_init_t		*x,
+	int			verbose)
 {
-	int error;
-	struct xlog	log;
-	xfs_daddr_t head_blk, tail_blk;
+	int			error;
+	xfs_daddr_t		head_blk, tail_blk;
 
-	memset(&log, 0, sizeof(log));
+	memset(log, 0, sizeof(*log));
 
 	/* We (re-)init members of libxfs_init_t here?  really? */
 	x->logBBsize = XFS_FSB_TO_BB(mp, mp->m_sb.sb_logblocks);
@@ -46,23 +46,24 @@ xlog_is_dirty(
 	if (xfs_sb_version_hassector(&mp->m_sb))
 		x->lbsize <<= (mp->m_sb.sb_logsectlog - BBSHIFT);
 
-	log.l_dev = mp->m_logdev_targp;
-	log.l_logBBsize = x->logBBsize;
-	log.l_logBBstart = x->logBBstart;
-	log.l_sectBBsize = BTOBB(x->lbsize);
-	log.l_mp = mp;
+	log->l_dev = mp->m_logdev_targp;
+	log->l_logBBsize = x->logBBsize;
+	log->l_logBBstart = x->logBBstart;
+	log->l_sectBBsize = BTOBB(x->lbsize);
+	log->l_mp = mp;
 	if (xfs_sb_version_hassector(&mp->m_sb)) {
-		log.l_sectbb_log = mp->m_sb.sb_logsectlog - BBSHIFT;
-		ASSERT(log.l_sectbb_log <= mp->m_sectbb_log);
+		log->l_sectbb_log = mp->m_sb.sb_logsectlog - BBSHIFT;
+		ASSERT(log->l_sectbb_log <= mp->m_sectbb_log);
 		/* for larger sector sizes, must have v2 or external log */
-		ASSERT(log.l_sectbb_log == 0 ||
-			log.l_logBBstart == 0 ||
+		ASSERT(log->l_sectbb_log == 0 ||
+			log->l_logBBstart == 0 ||
 			xfs_sb_version_haslogv2(&mp->m_sb));
 		ASSERT(mp->m_sb.sb_logsectlog >= BBSHIFT);
 	}
-	log.l_sectbb_mask = (1 << log.l_sectbb_log) - 1;
+	log->l_sectbb_mask = (1 << log->l_sectbb_log) - 1;
 
-	if ((error = xlog_find_tail(&log, &head_blk, &tail_blk))) {
+	error = xlog_find_tail(log, &head_blk, &tail_blk);
+	if (error) {
 		xlog_warn(_("%s: cannot find log head/tail "
 			  "(xlog_find_tail=%d)\n"),
 			__func__, error);
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 07/12] xfs_repair: track log state throughout all recovery phases
  2015-09-11 18:55 [PATCH v2 00/12] xfsprogs: format the log correctly on v5 supers Brian Foster
                   ` (5 preceding siblings ...)
  2015-09-11 18:55 ` [PATCH v2 06/12] libxlog: pull struct xlog out of xlog_is_dirty() Brian Foster
@ 2015-09-11 18:55 ` Brian Foster
  2015-09-11 18:55 ` [PATCH v2 08/12] xfs_repair: process the log in no_modify mode Brian Foster
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Brian Foster @ 2015-09-11 18:55 UTC (permalink / raw)
  To: xfs

xfs_repair examines and clears the log in phase 2. Phase 2 acquires the
log state in local data structures that are lost upon phase exit. v5
filesystems require that the log is formatted with a higher cycle number
after the fs is repaired. This requires assessment of the log state to
determine whether a reformat is necessary.

Rather than duplicate the log processing code, update phase 2 to
populate a globally available log data structure. Add a log pointer to
xfs_mount, as exists in kernel space, that repair uses to store a
reference to the log that is available to various phases. Note that this
patch simply plumbs through the global log data structure and does not
change behavior in any way.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 include/xfs_mount.h |  6 ++++++
 repair/phase2.c     | 34 +++++++++++++++++++---------------
 repair/xfs_repair.c |  8 +++++++-
 3 files changed, 32 insertions(+), 16 deletions(-)

diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index ed897a2..5ec6866 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -98,6 +98,12 @@ typedef struct xfs_mount {
 		int	qi_dqperchunk;
 	}			*m_quotainfo;
 
+	/*
+	 * xlog is defined in libxlog and thus is not intialized by libxfs. This
+	 * allows an application to initialize and store a reference to the log
+	 * if warranted.
+	 */
+	struct xlog		*m_log;
 } xfs_mount_t;
 
 /*
diff --git a/repair/phase2.c b/repair/phase2.c
index 0673a0c..11504e3 100644
--- a/repair/phase2.c
+++ b/repair/phase2.c
@@ -39,33 +39,33 @@ static void
 zero_log(xfs_mount_t *mp)
 {
 	int error;
-	struct xlog	log;
 	xfs_daddr_t head_blk, tail_blk;
+	struct xlog	*log = mp->m_log;
 
-	memset(&log, 0, sizeof(log));
+	memset(log, 0, sizeof(struct xlog));
 	x.logBBsize = XFS_FSB_TO_BB(mp, mp->m_sb.sb_logblocks);
 	x.logBBstart = XFS_FSB_TO_DADDR(mp, mp->m_sb.sb_logstart);
 	x.lbsize = BBSIZE;
 	if (xfs_sb_version_hassector(&mp->m_sb))
 		x.lbsize <<= (mp->m_sb.sb_logsectlog - BBSHIFT);
 
-	log.l_dev = mp->m_logdev_targp;
-	log.l_logBBsize = x.logBBsize;
-	log.l_logBBstart = x.logBBstart;
-	log.l_sectBBsize  = BTOBB(x.lbsize);
-	log.l_mp = mp;
+	log->l_dev = mp->m_logdev_targp;
+	log->l_logBBsize = x.logBBsize;
+	log->l_logBBstart = x.logBBstart;
+	log->l_sectBBsize  = BTOBB(x.lbsize);
+	log->l_mp = mp;
 	if (xfs_sb_version_hassector(&mp->m_sb)) {
-		log.l_sectbb_log = mp->m_sb.sb_logsectlog - BBSHIFT;
-		ASSERT(log.l_sectbb_log <= mp->m_sectbb_log);
+		log->l_sectbb_log = mp->m_sb.sb_logsectlog - BBSHIFT;
+		ASSERT(log->l_sectbb_log <= mp->m_sectbb_log);
 		/* for larger sector sizes, must have v2 or external log */
-		ASSERT(log.l_sectbb_log == 0 ||
-			log.l_logBBstart == 0 ||
+		ASSERT(log->l_sectbb_log == 0 ||
+			log->l_logBBstart == 0 ||
 			xfs_sb_version_haslogv2(&mp->m_sb));
 		ASSERT(mp->m_sb.sb_logsectlog >= BBSHIFT);
 	}
-	log.l_sectbb_mask = (1 << log.l_sectbb_log) - 1;
+	log->l_sectbb_mask = (1 << log->l_sectbb_log) - 1;
 
-	if ((error = xlog_find_tail(&log, &head_blk, &tail_blk))) {
+	if ((error = xlog_find_tail(log, &head_blk, &tail_blk))) {
 		do_warn(_("zero_log: cannot find log head/tail "
 			  "(xlog_find_tail=%d), zeroing it anyway\n"),
 			error);
@@ -93,12 +93,16 @@ zero_log(xfs_mount_t *mp)
 		}
 	}
 
-	libxfs_log_clear(log.l_dev,
-		XFS_FSB_TO_DADDR(mp, mp->m_sb.sb_logstart),
+	libxfs_log_clear(log->l_dev, XFS_FSB_TO_DADDR(mp, mp->m_sb.sb_logstart),
 		(xfs_extlen_t)XFS_FSB_TO_BB(mp, mp->m_sb.sb_logblocks),
 		&mp->m_sb.sb_uuid,
 		xfs_sb_version_haslogv2(&mp->m_sb) ? 2 : 1,
 		mp->m_sb.sb_logsunit, XLOG_FMT, XLOG_INIT_CYCLE);
+
+	/* update the log data structure with new state */
+	error = xlog_find_tail(log, &head_blk, &tail_blk);
+	if (error || head_blk != tail_blk)
+		do_error(_("failed to clear log"));
 }
 
 /*
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 85a012b..0e80124 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -539,6 +539,7 @@ main(int argc, char **argv)
 	xfs_dsb_t	*dsb;
 	xfs_buf_t	*sbp;
 	xfs_mount_t	xfs_m;
+	struct xlog	log = {0};
 	char		*msgbuf;
 	struct xfs_sb	psb;
 	int		rval;
@@ -620,7 +621,11 @@ main(int argc, char **argv)
 		}
 	}
 
-	/* prepare the mount structure */
+	/*
+	 * Prepare the mount structure. Point the log reference to our local
+	 * copy so it's available to the various phases. The log bits are
+	 * initialized in phase 2.
+	 */
 	memset(&xfs_m, 0, sizeof(xfs_mount_t));
 	mp = libxfs_mount(&xfs_m, &psb, x.ddev, x.logdev, x.rtdev, 0);
 
@@ -630,6 +635,7 @@ main(int argc, char **argv)
 			progname);
 		exit(1);
 	}
+	mp->m_log = &log;
 
 	/*
 	 * set XFS-independent status vars from the mount/sb structure
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 08/12] xfs_repair: process the log in no_modify mode
  2015-09-11 18:55 [PATCH v2 00/12] xfsprogs: format the log correctly on v5 supers Brian Foster
                   ` (6 preceding siblings ...)
  2015-09-11 18:55 ` [PATCH v2 07/12] xfs_repair: track log state throughout all recovery phases Brian Foster
@ 2015-09-11 18:55 ` Brian Foster
  2015-09-11 18:55 ` [PATCH v2 09/12] xfs_repair: format the log with forward cycle number on v5 supers Brian Foster
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Brian Foster @ 2015-09-11 18:55 UTC (permalink / raw)
  To: xfs

xfs_repair does not zero the log in no_modify mode. In doing so, it also
skips the function that scans the log, locates the head/tail blocks and
sets the current LSN. Now that the log state is used beyond phase 2, the
log scan must occur regardless of whether no_modify mode is enabled or
not.

Update phase 2 to always execute the log scanning code. Push down the
no_modify checks into the log clearing helper such that the log is still
not modified in no_modify mode.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 repair/phase2.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/repair/phase2.c b/repair/phase2.c
index 11504e3..72132ce 100644
--- a/repair/phase2.c
+++ b/repair/phase2.c
@@ -75,7 +75,7 @@ zero_log(xfs_mount_t *mp)
 	_("zero_log: head block %" PRId64 " tail block %" PRId64 "\n"),
 				head_blk, tail_blk);
 		}
-		if (head_blk != tail_blk) {
+		if (!no_modify && head_blk != tail_blk) {
 			if (zap_log) {
 				do_warn(_(
 "ALERT: The filesystem has valuable metadata changes in a log which is being\n"
@@ -93,6 +93,9 @@ zero_log(xfs_mount_t *mp)
 		}
 	}
 
+	if (no_modify)
+		return;
+
 	libxfs_log_clear(log->l_dev, XFS_FSB_TO_DADDR(mp, mp->m_sb.sb_logstart),
 		(xfs_extlen_t)XFS_FSB_TO_BB(mp, mp->m_sb.sb_logblocks),
 		&mp->m_sb.sb_uuid,
@@ -136,10 +139,8 @@ phase2(
 		do_log(_("Phase 2 - using internal log\n"));
 
 	/* Zero log if applicable */
-	if (!no_modify)  {
-		do_log(_("        - zero log...\n"));
-		zero_log(mp);
-	}
+	do_log(_("        - zero log...\n"));
+	zero_log(mp);
 
 	do_log(_("        - scan filesystem freespace and inode maps...\n"));
 
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 09/12] xfs_repair: format the log with forward cycle number on v5 supers
  2015-09-11 18:55 [PATCH v2 00/12] xfsprogs: format the log correctly on v5 supers Brian Foster
                   ` (7 preceding siblings ...)
  2015-09-11 18:55 ` [PATCH v2 08/12] xfs_repair: process the log in no_modify mode Brian Foster
@ 2015-09-11 18:55 ` Brian Foster
  2015-09-11 18:55 ` [PATCH v2 10/12] xfs_repair: don't clear the log by default Brian Foster
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 23+ messages in thread
From: Brian Foster @ 2015-09-11 18:55 UTC (permalink / raw)
  To: xfs

v5 filesystems use the current LSN and the last modified LSN stored
within fs metadata to provide correct log recovery ordering. xfs_repair
historically clears the log in phase 2. This resets to the current LSN
of the filesystem to the initial cycle, as if the fs was just created.

This is problematic because the filesystem LSN is now behind many
pre-existing metadata structures on-disk until either the current
filesystem LSN catches up or those particular data structures are
modified and written out. If a filesystem crash occurs in the meantime,
log recovery can incorrectly skip log items and cause filesystem
corruption.

Update xfs_repair to check the maximum metadata LSN value against the
current log state once the filesystem has been processed. If the maximum
LSN exceeds the current LSN with respect to the log, reformat the log
with a cycle number that exceeds that of the maximum LSN.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 repair/xfs_repair.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 68 insertions(+), 2 deletions(-)

diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 0e80124..8285d9d 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -531,6 +531,63 @@ _("sb realtime summary inode %" PRIu64 " %sinconsistent with calculated value %u
 
 }
 
+/*
+ * v5 superblock metadata track the LSN of last modification and thus require
+ * that the current LSN is always moving forward. The current LSN is reset if
+ * the log has been cleared, which puts the log behind parts of the filesystem
+ * on-disk and can disrupt log recovery.
+ *
+ * We have tracked the maximum LSN of every piece of metadata that has been read
+ * in via the read verifiers. Compare the max LSN with the log and if the log is
+ * behind, bump the cycle number and reformat the log.
+ */
+static void
+format_log_max_lsn(
+	struct xfs_mount	*mp)
+{
+	struct xlog		*log = mp->m_log;
+	int			max_cycle;
+	int			max_block;
+	int			new_cycle;
+	xfs_daddr_t		logstart;
+	xfs_daddr_t		logblocks;
+	int			logversion;
+
+	if (!xfs_sb_version_hascrc(&mp->m_sb))
+		return;
+
+	/*
+	 * If the log is ahead of the highest metadata LSN we've seen, we're
+	 * safe and there's nothing to do.
+	 */
+	max_cycle = CYCLE_LSN(libxfs_max_lsn);
+	max_block = BLOCK_LSN(libxfs_max_lsn);
+	if (max_cycle < log->l_curr_cycle ||
+	    (max_cycle == log->l_curr_cycle && max_block < log->l_curr_block))
+		return;
+
+	/*
+	 * Going to the next cycle should be sufficient but we bump by a few
+	 * counts to help cover any metadata LSNs we could have missed.
+	 */
+	new_cycle = max_cycle + 3;
+	logstart = XFS_FSB_TO_DADDR(mp, mp->m_sb.sb_logstart);
+	logblocks = XFS_FSB_TO_BB(mp, mp->m_sb.sb_logblocks);
+	logversion = xfs_sb_version_haslogv2(&mp->m_sb) ? 2 : 1;
+
+	do_warn(_("Maximum metadata LSN (%d:%d) is ahead of log (%d:%d).\n"),
+		max_cycle, max_block, log->l_curr_cycle, log->l_curr_block);
+
+	if (no_modify) {
+		do_warn(_("Would format log to cycle %d.\n"), new_cycle);
+		return;
+	}
+
+	do_warn(_("Format log to cycle %d.\n"), new_cycle);
+	libxfs_log_clear(log->l_dev, logstart, logblocks, &mp->m_sb.sb_uuid,
+			 logversion, mp->m_sb.sb_logsunit, XLOG_FMT, new_cycle);
+}
+
 int
 main(int argc, char **argv)
 {
@@ -896,6 +953,12 @@ _("Warning:  project quota information would be cleared.\n"
 		stop_progress_rpt();
 
 	if (no_modify)  {
+		/*
+		 * Warn if the current LSN is problematic and the log requires a
+		 * reformat.
+		 */
+		format_log_max_lsn(mp);
+
 		do_log(
 	_("No modify flag set, skipping filesystem flush and exiting.\n"));
 		if (verbose)
@@ -931,11 +994,14 @@ _("Note - stripe unit (%d) and width (%d) were copied from a backup superblock.\
 	libxfs_writebuf(sbp, 0);
 
 	/*
-	 * Done, flush all cached buffers and inodes.
+	 * Done. Flush all cached buffers and inodes first to ensure all
+	 * verifiers are run (where we discover the max metadata LSN), reformat
+	 * the log if necessary and unmount.
 	 */
 	libxfs_bcache_flush();
-
+	format_log_max_lsn(mp);
 	libxfs_umount(mp);
+
 	if (x.rtdev)
 		libxfs_device_close(x.rtdev);
 	if (x.logdev && x.logdev != x.ddev)
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 10/12] xfs_repair: don't clear the log by default
  2015-09-11 18:55 [PATCH v2 00/12] xfsprogs: format the log correctly on v5 supers Brian Foster
                   ` (8 preceding siblings ...)
  2015-09-11 18:55 ` [PATCH v2 09/12] xfs_repair: format the log with forward cycle number on v5 supers Brian Foster
@ 2015-09-11 18:55 ` Brian Foster
  2015-09-11 18:55 ` [PATCH v2 11/12] xfs_db: do not reset current lsn from uuid command on v5 supers Brian Foster
  2015-09-11 18:55 ` [PATCH v2 12/12] db/metadump: bump lsn when log is cleared " Brian Foster
  11 siblings, 0 replies; 23+ messages in thread
From: Brian Foster @ 2015-09-11 18:55 UTC (permalink / raw)
  To: xfs

xfs_repair currently clears the log regardless of whether it is
corrupted, clean or contains data. This is traditionally harmless but
now causes log recovery problems on v5 filesystems. v5 filesystems
expect a clean log to always have an LSN out ahead of the maximum last
modification LSN stamped on any bit of metadata throughout the fs. If
this is not the case, repair must reformat the log with a larger cycle
number after fs processing is complete.

Given that unconditional log clearing actually introduces a filesystem
inconsistency on v5 superblocks (that repair must subsequently recover
from) and provides no tangible benefit for v4 filesystems that otherwise
have a clean and covered log, it is more appropriate behavior to not
clear the log by default.

Update xfs_repair to always and only clear the log when the -L parameter
is specified. Retain the existing logic to require -L or otherwise exit
if the log appears to contain data. Adopt similar behavior if the log
appears to be corrupted.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 repair/phase2.c | 60 ++++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 40 insertions(+), 20 deletions(-)

diff --git a/repair/phase2.c b/repair/phase2.c
index 72132ce..fe7ed2b 100644
--- a/repair/phase2.c
+++ b/repair/phase2.c
@@ -36,11 +36,13 @@ int xlog_recover_do_trans(struct xlog *log, xlog_recover_t *t, int p)
 }
 
 static void
-zero_log(xfs_mount_t *mp)
+zero_log(
+	struct xfs_mount	*mp)
 {
-	int error;
-	xfs_daddr_t head_blk, tail_blk;
-	struct xlog	*log = mp->m_log;
+	int			error;
+	xfs_daddr_t		head_blk;
+	xfs_daddr_t		tail_blk;
+	struct xlog		*log = mp->m_log;
 
 	memset(log, 0, sizeof(struct xlog));
 	x.logBBsize = XFS_FSB_TO_BB(mp, mp->m_sb.sb_logblocks);
@@ -65,10 +67,22 @@ zero_log(xfs_mount_t *mp)
 	}
 	log->l_sectbb_mask = (1 << log->l_sectbb_log) - 1;
 
-	if ((error = xlog_find_tail(log, &head_blk, &tail_blk))) {
-		do_warn(_("zero_log: cannot find log head/tail "
-			  "(xlog_find_tail=%d), zeroing it anyway\n"),
+	/*
+	 * Find the log head and tail and alert the user to the situation if the
+	 * log appears corrupted or contains data. In either case, we do not
+	 * proceed past this point unless the user explicitly requests to zap
+	 * the log.
+	 */
+	error = xlog_find_tail(log, &head_blk, &tail_blk);
+	if (error) {
+		do_warn(
+		_("zero_log: cannot find log head/tail (xlog_find_tail=%d)\n"),
 			error);
+		if (!no_modify && !zap_log)
+			do_error(_(
+"ERROR: The log head and/or tail cannot be discovered. Attempt to mount the\n"
+"filesystem to replay the log or use the -L option to destroy the log and\n"
+"attempt a repair.\n"));
 	} else {
 		if (verbose) {
 			do_warn(
@@ -93,19 +107,25 @@ zero_log(xfs_mount_t *mp)
 		}
 	}
 
-	if (no_modify)
-		return;
-
-	libxfs_log_clear(log->l_dev, XFS_FSB_TO_DADDR(mp, mp->m_sb.sb_logstart),
-		(xfs_extlen_t)XFS_FSB_TO_BB(mp, mp->m_sb.sb_logblocks),
-		&mp->m_sb.sb_uuid,
-		xfs_sb_version_haslogv2(&mp->m_sb) ? 2 : 1,
-		mp->m_sb.sb_logsunit, XLOG_FMT, XLOG_INIT_CYCLE);
-
-	/* update the log data structure with new state */
-	error = xlog_find_tail(log, &head_blk, &tail_blk);
-	if (error || head_blk != tail_blk)
-		do_error(_("failed to clear log"));
+	/*
+	 * Only clear the log when explicitly requested. Doing so is unnecessary
+	 * unless something is wrong. Further, this resets the current LSN of
+	 * the filesystem and creates more work for repair of v5 superblock
+	 * filesystems.
+	 */
+	if (!no_modify && zap_log) {
+		libxfs_log_clear(log->l_dev,
+			XFS_FSB_TO_DADDR(mp, mp->m_sb.sb_logstart),
+			(xfs_extlen_t)XFS_FSB_TO_BB(mp, mp->m_sb.sb_logblocks),
+			&mp->m_sb.sb_uuid,
+			xfs_sb_version_haslogv2(&mp->m_sb) ? 2 : 1,
+			mp->m_sb.sb_logsunit, XLOG_FMT, XLOG_INIT_CYCLE);
+
+		/* update the log data structure with new state */
+		error = xlog_find_tail(log, &head_blk, &tail_blk);
+		if (error || head_blk != tail_blk)
+			do_error(_("failed to clear log"));
+	}
 }
 
 /*
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 11/12] xfs_db: do not reset current lsn from uuid command on v5 supers
  2015-09-11 18:55 [PATCH v2 00/12] xfsprogs: format the log correctly on v5 supers Brian Foster
                   ` (9 preceding siblings ...)
  2015-09-11 18:55 ` [PATCH v2 10/12] xfs_repair: don't clear the log by default Brian Foster
@ 2015-09-11 18:55 ` Brian Foster
  2015-09-11 18:55 ` [PATCH v2 12/12] db/metadump: bump lsn when log is cleared " Brian Foster
  11 siblings, 0 replies; 23+ messages in thread
From: Brian Foster @ 2015-09-11 18:55 UTC (permalink / raw)
  To: xfs

The xfs_db uuid command modifes the uuid of the filesystem. As part of
this process, it is required to clear the log and format it with records
that use the new uuid. It currently resets the log to cycle 1, which is
not safe for v5 superblocks.

Update the uuid log clearing implementation to bump the current cycle
when the log is formatted on v5 superblock filesystems. This ensures
that the current LSN remains ahead of metadata LSNs on the subsequent
mount. Since the log is checked and cleared across a couple different
functions, also add a new global xlog structure, associate it with the
preexisting global mount structure and reference it to get and use the
current log cycle.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 db/init.c | 25 ++++++++++++++-----------
 db/sb.c   | 19 +++++++++++++++----
 2 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/db/init.c b/db/init.c
index 9537a38..c0472c8 100644
--- a/db/init.c
+++ b/db/init.c
@@ -17,6 +17,7 @@
  */
 
 #include "libxfs.h"
+#include "libxlog.h"
 #include <signal.h>
 #include "command.h"
 #include "init.h"
@@ -28,17 +29,18 @@
 #include "malloc.h"
 #include "type.h"
 
-static char	**cmdline;
-static int	ncmdline;
-char		*fsdevice;
-int		blkbb;
-int		exitcode;
-int		expert_mode;
-int		force;
-xfs_mount_t	xmount;
-xfs_mount_t	*mp;
-libxfs_init_t	x;
-xfs_agnumber_t	cur_agno = NULLAGNUMBER;
+static char		**cmdline;
+static int		ncmdline;
+char			*fsdevice;
+int			blkbb;
+int			exitcode;
+int			expert_mode;
+int			force;
+struct xfs_mount	xmount;
+struct xfs_mount	*mp;
+struct xlog		xlog;
+libxfs_init_t		x;
+xfs_agnumber_t		cur_agno = NULLAGNUMBER;
 
 static void
 usage(void)
@@ -154,6 +156,7 @@ init(
 			progname, fsdevice);
 		exit(1);
 	}
+	mp->m_log = &xlog;
 	blkbb = 1 << mp->m_blkbb_log;
 
 	/*
diff --git a/db/sb.c b/db/sb.c
index 9c585ca..30c622d 100644
--- a/db/sb.c
+++ b/db/sb.c
@@ -229,7 +229,6 @@ int xlog_recover_do_trans(struct xlog *log, xlog_recover_t *t, int p)
 int
 sb_logcheck(void)
 {
-	struct xlog	log;
 	int		dirty;
 
 	if (mp->m_sb.sb_logstart) {
@@ -248,7 +247,7 @@ sb_logcheck(void)
 
 	libxfs_buftarg_init(mp, x.ddev, x.logdev, x.rtdev);
 
-	dirty = xlog_is_dirty(mp, &log, &x, 0);
+	dirty = xlog_is_dirty(mp, mp->m_log, &x, 0);
 	if (dirty == -1) {
 		dbprintf(_("ERROR: cannot find log head/tail, run xfs_repair\n"));
 		return 0;
@@ -269,20 +268,32 @@ sb_logcheck(void)
 static int
 sb_logzero(uuid_t *uuidp)
 {
+	int	cycle = XLOG_INIT_CYCLE;
+	int	error;
+
 	if (!sb_logcheck())
 		return 0;
 
+	/*
+	 * The log must always move forward on v5 superblocks. Bump it to the
+	 * next cycle.
+	 */
+	if (xfs_sb_version_hascrc(&mp->m_sb))
+		cycle = mp->m_log->l_curr_cycle + 1;
+
 	dbprintf(_("Clearing log and setting UUID\n"));
 
-	if (libxfs_log_clear(mp->m_logdev_targp,
+	error =  libxfs_log_clear(mp->m_logdev_targp,
 			XFS_FSB_TO_DADDR(mp, mp->m_sb.sb_logstart),
 			(xfs_extlen_t)XFS_FSB_TO_BB(mp, mp->m_sb.sb_logblocks),
 			uuidp,
 			xfs_sb_version_haslogv2(&mp->m_sb) ? 2 : 1,
-			mp->m_sb.sb_logsunit, XLOG_FMT, XLOG_INIT_CYCLE)) {
+			mp->m_sb.sb_logsunit, XLOG_FMT, cycle);
+	if (error) {
 		dbprintf(_("ERROR: cannot clear the log\n"));
 		return 0;
 	}
+
 	return 1;
 }
 
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 12/12] db/metadump: bump lsn when log is cleared on v5 supers
  2015-09-11 18:55 [PATCH v2 00/12] xfsprogs: format the log correctly on v5 supers Brian Foster
                   ` (10 preceding siblings ...)
  2015-09-11 18:55 ` [PATCH v2 11/12] xfs_db: do not reset current lsn from uuid command on v5 supers Brian Foster
@ 2015-09-11 18:55 ` Brian Foster
  11 siblings, 0 replies; 23+ messages in thread
From: Brian Foster @ 2015-09-11 18:55 UTC (permalink / raw)
  To: xfs

xfs_metadump handles the log in different ways depending on the mode of
operation. If the log is dirty or obfuscation and stale data zeroing are
disabled, the log is copied as is. In all other scenarios, the log is
explicitly zeroed. This is incorrect for version 5 superblocks where the
current LSN is always expected to be ahead of all fs metadata.

Update metadump to use libxfs_log_clear() to format the log with an
elevated LSN rather than zero the log and reset the current the LSN.
Metadump does not use buffers for the dump target, instead using a
cursor implementation to access the log via a single memory buffer.
Therefore, update libxfs_log_clear() to receive an optional (but
exclusive to the buftarg parameter) memory buffer pointer for the log.
If the pointer is provided, the log format is written out to this
buffer. Otherwise, fall back to the original behavior and access the log
through buftarg buffers.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 db/metadump.c       | 16 +++++++++++--
 db/sb.c             |  2 +-
 include/libxfs.h    |  4 ++--
 libxfs/rdwr.c       | 68 +++++++++++++++++++++++++++++++++++++++--------------
 mkfs/xfs_mkfs.c     |  2 +-
 repair/phase2.c     |  2 +-
 repair/xfs_repair.c |  5 ++--
 7 files changed, 72 insertions(+), 27 deletions(-)

diff --git a/db/metadump.c b/db/metadump.c
index 129670e..56b733e 100644
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -2514,6 +2514,10 @@ copy_log(void)
 {
 	struct xlog	log;
 	int		dirty;
+	xfs_daddr_t	logstart;
+	int		logblocks;
+	int		logversion;
+	int		cycle = XLOG_INIT_CYCLE;
 
 	if (show_progress)
 		print_progress("Copying log");
@@ -2538,8 +2542,16 @@ copy_log(void)
 		/* clear out a clean log */
 		if (show_progress)
 			print_progress("Zeroing clean log");
-		memset(iocur_top->data, 0,
-			mp->m_sb.sb_logblocks * mp->m_sb.sb_blocksize);
+
+		logstart = XFS_FSB_TO_DADDR(mp, mp->m_sb.sb_logstart);
+		logblocks = XFS_FSB_TO_BB(mp, mp->m_sb.sb_logblocks);
+		logversion = xfs_sb_version_haslogv2(&mp->m_sb) ? 2 : 1;
+		if (xfs_sb_version_hascrc(&mp->m_sb))
+			cycle = log.l_curr_cycle + 1;
+
+		libxfs_log_clear(NULL, iocur_top->data, logstart, logblocks,
+				 &mp->m_sb.sb_uuid, logversion,
+				 mp->m_sb.sb_logsunit, XLOG_FMT, cycle);
 		break;
 	case 1:
 		/* keep the dirty log */
diff --git a/db/sb.c b/db/sb.c
index 30c622d..17d446c 100644
--- a/db/sb.c
+++ b/db/sb.c
@@ -283,7 +283,7 @@ sb_logzero(uuid_t *uuidp)
 
 	dbprintf(_("Clearing log and setting UUID\n"));
 
-	error =  libxfs_log_clear(mp->m_logdev_targp,
+	error =  libxfs_log_clear(mp->m_logdev_targp, NULL,
 			XFS_FSB_TO_DADDR(mp, mp->m_sb.sb_logstart),
 			(xfs_extlen_t)XFS_FSB_TO_BB(mp, mp->m_sb.sb_logblocks),
 			uuidp,
diff --git a/include/libxfs.h b/include/libxfs.h
index 6c87934..f733c36 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -153,8 +153,8 @@ typedef char	*(libxfs_get_block_t)(char *, int, void *);
  * Helpers to clear the log to a particular log cycle.
  */
 #define XLOG_INIT_CYCLE	1
-extern int	libxfs_log_clear(struct xfs_buftarg *, xfs_daddr_t, uint,
-				 uuid_t *, int, int, int, int);
+extern int	libxfs_log_clear(struct xfs_buftarg *, char *, xfs_daddr_t,
+				 uint, uuid_t *, int, int, int, int);
 extern int	libxfs_log_header(char *, uuid_t *, int, int, int, xfs_lsn_t,
 				  xfs_lsn_t, libxfs_get_block_t *, void *);
 
diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index d87baec..a4cc3f4 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -132,41 +132,57 @@ static void unmount_record(void *p)
 	memcpy((char *)p + sizeof(xlog_op_header_t), &magic, sizeof(magic));
 }
 
-static char *next(char *ptr, int offset, void *private)
+static char *next(
+	char		*ptr,
+	int		offset,
+	void		*private)
 {
-	xfs_buf_t	*buf = (xfs_buf_t *)private;
+	struct xfs_buf	*buf = (struct xfs_buf *)private;
 
-	if (XFS_BUF_COUNT(buf) < (int)(ptr - XFS_BUF_PTR(buf)) + offset)
+	if (buf &&
+	    (XFS_BUF_COUNT(buf) < (int)(ptr - XFS_BUF_PTR(buf)) + offset))
 		abort();
+
 	return ptr + offset;
 }
 
+/*
+ * Format the log. The caller provides either a buftarg which is used to access
+ * the log via buffers or a direct pointer to a buffer that encapsulates the
+ * entire log.
+ */
 int
 libxfs_log_clear(
 	struct xfs_buftarg	*btp,
+	char			*dptr,
 	xfs_daddr_t		start,
-	uint			length,
+	uint			length,		/* basic blocks */
 	uuid_t			*fs_uuid,
 	int			version,
-	int			sunit,
+	int			sunit,		/* bytes */
 	int			fmt,
 	int			cycle)
 {
-	xfs_buf_t		*bp;
+	struct xfs_buf		*bp = NULL;
 	int			len;
 	xfs_lsn_t		lsn;
 	xfs_lsn_t		tail_lsn;
 	xfs_daddr_t		blk;
 	xfs_daddr_t		end_blk;
+	char			*ptr;
 
-	if (!btp->dev || !fs_uuid)
+	if (((btp && dptr) || (!btp && !dptr)) ||
+	    (btp && !btp->dev) || !fs_uuid)
 		return -EINVAL;
 
 	if (cycle < XLOG_INIT_CYCLE)
 		return -EINVAL;
 
 	/* first zero the log */
-	libxfs_device_zero(btp, start, length);
+	if (btp)
+		libxfs_device_zero(btp, start, length);
+	else
+		memset(dptr, 0, BBTOB(length));
 
 	/*
 	 * Initialize the log record length and LSNs. XLOG_INIT_CYCLE is a
@@ -184,11 +200,17 @@ libxfs_log_clear(
 		tail_lsn = xlog_assign_lsn(cycle - 1, length - len);
 
 	/* write out the first log record */
-	bp = libxfs_getbufr(btp, start, len);
-	libxfs_log_header(XFS_BUF_PTR(bp), fs_uuid, version, sunit, fmt,
-			  lsn, tail_lsn, next, bp);
-	bp->b_flags |= LIBXFS_B_DIRTY;
-	libxfs_putbufr(bp);
+	ptr = dptr;
+	if (btp) {
+		bp = libxfs_getbufr(btp, start, len);
+		ptr = XFS_BUF_PTR(bp);
+	}
+	libxfs_log_header(ptr, fs_uuid, version, sunit, fmt, lsn, tail_lsn,
+			  next, bp);
+	if (bp) {
+		bp->b_flags |= LIBXFS_B_DIRTY;
+		libxfs_putbufr(bp);
+	}
 
 	/*
 	 * There's nothing else to do if this is a log reset. The kernel detects
@@ -209,6 +231,8 @@ libxfs_log_clear(
 	 */
 	cycle--;
 	blk = start + len;
+	if (dptr)
+		dptr += BBTOB(len);
 	end_blk = start + length;
 
 	len = min(end_blk - blk, BTOBB(BDSTRAT_SIZE));
@@ -216,18 +240,26 @@ libxfs_log_clear(
 		lsn = xlog_assign_lsn(cycle, blk - start);
 		tail_lsn = xlog_assign_lsn(cycle, blk - start - len);
 
-		bp = libxfs_getbufr(btp, blk, len);
+		ptr = dptr;
+		if (btp) {
+			bp = libxfs_getbufr(btp, blk, len);
+			ptr = XFS_BUF_PTR(bp);
+		}
 		/*
 		 * Note: pass the full buffer length as the sunit to initialize
 		 * the entire buffer.
 		 */
-		libxfs_log_header(XFS_BUF_PTR(bp), fs_uuid, version, BBTOB(len),
-				  fmt, lsn, tail_lsn, next, bp);
-		bp->b_flags |= LIBXFS_B_DIRTY;
-		libxfs_putbufr(bp);
+		libxfs_log_header(ptr, fs_uuid, version, BBTOB(len), fmt, lsn,
+				  tail_lsn, next, bp);
+		if (bp) {
+			bp->b_flags |= LIBXFS_B_DIRTY;
+			libxfs_putbufr(bp);
+		}
 
 		len = min(end_blk - blk, BTOBB(BDSTRAT_SIZE));
 		blk += len;
+		if (dptr)
+			dptr += BBTOB(len);
 	}
 
 	return 0;
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 238d400..5f939b5 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -2667,7 +2667,7 @@ _("size %s specified for log subvolume is too large, maximum is %lld blocks\n"),
 	/*
 	 * Zero the log....
 	 */
-	libxfs_log_clear(mp->m_logdev_targp,
+	libxfs_log_clear(mp->m_logdev_targp, NULL,
 		XFS_FSB_TO_DADDR(mp, logstart),
 		(xfs_extlen_t)XFS_FSB_TO_BB(mp, logblocks),
 		&sbp->sb_uuid, logversion, lsunit, XLOG_FMT, XLOG_INIT_CYCLE);
diff --git a/repair/phase2.c b/repair/phase2.c
index fe7ed2b..e26e2fc 100644
--- a/repair/phase2.c
+++ b/repair/phase2.c
@@ -114,7 +114,7 @@ zero_log(
 	 * filesystems.
 	 */
 	if (!no_modify && zap_log) {
-		libxfs_log_clear(log->l_dev,
+		libxfs_log_clear(log->l_dev, NULL,
 			XFS_FSB_TO_DADDR(mp, mp->m_sb.sb_logstart),
 			(xfs_extlen_t)XFS_FSB_TO_BB(mp, mp->m_sb.sb_logblocks),
 			&mp->m_sb.sb_uuid,
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 8285d9d..bed2ff5 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -584,8 +584,9 @@ format_log_max_lsn(
 	}
 
 	do_warn(_("Format log to cycle %d.\n"), new_cycle);
-	libxfs_log_clear(log->l_dev, logstart, logblocks, &mp->m_sb.sb_uuid,
-			 logversion, mp->m_sb.sb_logsunit, XLOG_FMT, new_cycle);
+	libxfs_log_clear(log->l_dev, NULL, logstart, logblocks,
+			 &mp->m_sb.sb_uuid, logversion, mp->m_sb.sb_logsunit,
+			 XLOG_FMT, new_cycle);
 }
 
 int
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-10-02 11:33 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-11 18:55 [PATCH v2 00/12] xfsprogs: format the log correctly on v5 supers Brian Foster
2015-09-11 18:55 ` [PATCH v2 01/12] libxfs: validate metadata LSNs against log on v5 superblocks Brian Foster
2015-09-11 18:55 ` [PATCH v2 02/12] libxfs: track largest metadata LSN in use via verifiers Brian Foster
2015-09-23  3:44   ` Dave Chinner
2015-09-23 13:18     ` Brian Foster
2015-09-23 22:36       ` Dave Chinner
2015-10-01 20:38         ` Brian Foster
2015-10-02  2:16           ` Dave Chinner
2015-10-02 11:33             ` Brian Foster
2015-09-11 18:55 ` [PATCH v2 03/12] libxfs: don't hardcode cycle 1 into unmount op header Brian Foster
2015-09-23  3:48   ` Dave Chinner
2015-09-23 13:22     ` Brian Foster
2015-09-24  0:37       ` Dave Chinner
2015-09-24 13:00         ` Brian Foster
2015-09-11 18:55 ` [PATCH v2 04/12] libxfs: pass lsn param to log clear and record header logging helpers Brian Foster
2015-09-11 18:55 ` [PATCH v2 05/12] libxfs: add ability to clear log to arbitrary log cycle Brian Foster
2015-09-11 18:55 ` [PATCH v2 06/12] libxlog: pull struct xlog out of xlog_is_dirty() Brian Foster
2015-09-11 18:55 ` [PATCH v2 07/12] xfs_repair: track log state throughout all recovery phases Brian Foster
2015-09-11 18:55 ` [PATCH v2 08/12] xfs_repair: process the log in no_modify mode Brian Foster
2015-09-11 18:55 ` [PATCH v2 09/12] xfs_repair: format the log with forward cycle number on v5 supers Brian Foster
2015-09-11 18:55 ` [PATCH v2 10/12] xfs_repair: don't clear the log by default Brian Foster
2015-09-11 18:55 ` [PATCH v2 11/12] xfs_db: do not reset current lsn from uuid command on v5 supers Brian Foster
2015-09-11 18:55 ` [PATCH v2 12/12] db/metadump: bump lsn when log is cleared " Brian Foster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox