public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 0/2] xfs: detect and warn about invalid metadata lsns
@ 2015-09-11 18:53 Brian Foster
  2015-09-11 18:53 ` [RFC v2 1/2] xfs: create a daily warning mechanism Brian Foster
  2015-09-11 18:53 ` [RFC v2 2/2] xfs: validate metadata LSNs against log on v5 superblocks Brian Foster
  0 siblings, 2 replies; 5+ messages in thread
From: Brian Foster @ 2015-09-11 18:53 UTC (permalink / raw)
  To: xfs

Hi all,

Here's a second rfc pass at the kernel side of invalid metadata LSN
detection for v5 supers. This is an rfc simply because my understanding
is that the userspace code should probably go first so that the tools
for repair are available before the kernel starts detecting and warning
about this problem.

The primary change in this version is how the problem is handled once
detected. I've been a bit undecided on whether a shutdown is really
warranted because if the fs doesn't crash, it will eventually
self-correct. On the flipside, a WARN_ON_ONCE() is not sufficient
because it could suppress warnings on separate filesystems once one
filesystem has encountered the issue and fired a warning.

Therefore, this version uses something of a happy medium. The filesystem
is not shutdown but a mount flag is used to warn about the problem at
least once per-mount. Any instances thereafter are ratelimited to a once
per 24 hour period.

Brian

rfcv2:
- Refactored lsn validation and warning code into separate helpers.
- Updated warning mechanism to warn at least once per fs (ratelimited
  thereafter).
- No longer shutdown the fs on invalid metadata lsn detection.
rfcv1: http://oss.sgi.com/pipermail/xfs/2015-August/043396.html

Brian Foster (2):
  xfs: create a daily warning mechanism
  xfs: validate metadata LSNs against log on v5 superblocks

 fs/xfs/libxfs/xfs_alloc.c          |  9 +++++--
 fs/xfs/libxfs/xfs_attr_leaf.c      |  2 ++
 fs/xfs/libxfs/xfs_btree.c          | 14 +++++++++--
 fs/xfs/libxfs/xfs_da_btree.c       |  3 +++
 fs/xfs/libxfs/xfs_dir2_block.c     |  1 +
 fs/xfs/libxfs/xfs_dir2_data.c      |  1 +
 fs/xfs/libxfs/xfs_dir2_leaf.c      |  1 +
 fs/xfs/libxfs/xfs_dir2_node.c      |  1 +
 fs/xfs/libxfs/xfs_ialloc.c         |  8 +++++--
 fs/xfs/libxfs/xfs_sb.c             |  8 +++++++
 fs/xfs/libxfs/xfs_symlink_remote.c |  3 +++
 fs/xfs/xfs_log.c                   |  1 -
 fs/xfs/xfs_log_priv.h              | 24 +++++++++++++++++++
 fs/xfs/xfs_log_recover.c           | 15 +++++++++++-
 fs/xfs/xfs_message.h               | 18 ++++++++++----
 fs/xfs/xfs_mount.c                 | 49 ++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_mount.h                 |  3 +++
 17 files changed, 149 insertions(+), 12 deletions(-)

-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC v2 1/2] xfs: create a daily warning mechanism
  2015-09-11 18:53 [RFC v2 0/2] xfs: detect and warn about invalid metadata lsns Brian Foster
@ 2015-09-11 18:53 ` Brian Foster
  2015-09-11 18:53 ` [RFC v2 2/2] xfs: validate metadata LSNs against log on v5 superblocks Brian Foster
  1 sibling, 0 replies; 5+ messages in thread
From: Brian Foster @ 2015-09-11 18:53 UTC (permalink / raw)
  To: xfs

The warning mechanism in XFS currently uses the default ratelimit time
interval and burst parameters. The default parameters are intended to
avoid flooding the logs and whatnot due to messages within fairly short
timeframes (e.g., 10 messages within 5s). The forthcoming invalid
metadata LSN detection must provide a consistent, but not incessant
warning to the user that a repair is required to reformat the log.

Update the core ratelimit mechanism to allow customized parameters,
continue to pass the ratelimit defaults for existing users, and define a
new xfs_warn_daily() mechanism to fire a message on a 24 hour interval.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_message.h | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_message.h b/fs/xfs/xfs_message.h
index 8540115..9f1a036 100644
--- a/fs/xfs/xfs_message.h
+++ b/fs/xfs/xfs_message.h
@@ -30,15 +30,22 @@ void xfs_debug(const struct xfs_mount *mp, const char *fmt, ...)
 }
 #endif
 
-#define xfs_printk_ratelimited(func, dev, fmt, ...)		\
+#define __xfs_printk_ratelimited(interval, burst, func, dev, fmt, ...)	\
 do {									\
-	static DEFINE_RATELIMIT_STATE(_rs,				\
-				      DEFAULT_RATELIMIT_INTERVAL,	\
-				      DEFAULT_RATELIMIT_BURST);		\
+	static DEFINE_RATELIMIT_STATE(_rs, interval, burst);		\
 	if (__ratelimit(&_rs))						\
 		func(dev, fmt, ##__VA_ARGS__);			\
 } while (0)
 
+#define xfs_printk_ratelimited(func, dev, fmt, ...)		\
+	__xfs_printk_ratelimited(DEFAULT_RATELIMIT_INTERVAL,	\
+				 DEFAULT_RATELIMIT_BURST,	\
+				 func, dev, fmt, ##__VA_ARGS__)
+
+#define xfs_printk_daily(func, dev, fmt, ...)			\
+	__xfs_printk_ratelimited(86400 * HZ, 1, func, dev, fmt,	\
+				 ##__VA_ARGS__)
+
 #define xfs_emerg_ratelimited(dev, fmt, ...)				\
 	xfs_printk_ratelimited(xfs_emerg, dev, fmt, ##__VA_ARGS__)
 #define xfs_alert_ratelimited(dev, fmt, ...)				\
@@ -56,6 +63,9 @@ do {									\
 #define xfs_debug_ratelimited(dev, fmt, ...)				\
 	xfs_printk_ratelimited(xfs_debug, dev, fmt, ##__VA_ARGS__)
 
+#define xfs_warn_daily(dev, fmt, ...)					\
+	xfs_printk_daily(xfs_warn, dev, fmt, ##__VA_ARGS__)
+
 extern void assfail(char *expr, char *f, int l);
 extern void asswarn(char *expr, char *f, int l);
 
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [RFC v2 2/2] xfs: validate metadata LSNs against log on v5 superblocks
  2015-09-11 18:53 [RFC v2 0/2] xfs: detect and warn about invalid metadata lsns Brian Foster
  2015-09-11 18:53 ` [RFC v2 1/2] xfs: create a daily warning mechanism Brian Foster
@ 2015-09-11 18:53 ` Brian Foster
  2015-09-22  8:07   ` Dave Chinner
  1 sibling, 1 reply; 5+ messages in thread
From: Brian Foster @ 2015-09-11 18:53 UTC (permalink / raw)
  To: xfs

Since the onset of v5 superblocks, the LSN of the last modification has
been included in a variety of on-disk data structures. This LSN is used
to provide log recovery ordering guarantees (e.g., to ensure an older
log recovery item is not replayed over a newer target data structure).

While this works correctly from the point a filesystem is formatted and
mounted, userspace tools have some problematic behaviors that defeat
this mechanism. For example, xfs_repair historically zeroes out the log
unconditionally (regardless of whether corruption is detected). If this
occurs, the LSN of the filesystem is reset and the log is now in a
problematic state with respect to on-disk metadata structures that might
have a larger LSN. Until either the log catches up to the highest
previously used metadata LSN or each affected data structure is modified
and written out without incident (which resets the metadata LSN), log
recovery is susceptible to filesystem corruption.

This problem is ultimately addressed and repaired in the associated
userspace tools. The kernel is still responsible to detect the problem
and notify the user that something is wrong. Check the superblock LSN at
mount time and notify the user and fail the mount if it is invalid.
>From that point on, simply warn the user if an invalid metadata LSN is
detected from the various metadata I/O verifiers. A new xfs_mount flag
is added to guarantee that a warning fires at least once for each
affected filesystem mount. From that point forward, further instances of
the problem are rate limited to a once per 24 hour period.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c          |  9 +++++--
 fs/xfs/libxfs/xfs_attr_leaf.c      |  2 ++
 fs/xfs/libxfs/xfs_btree.c          | 14 +++++++++--
 fs/xfs/libxfs/xfs_da_btree.c       |  3 +++
 fs/xfs/libxfs/xfs_dir2_block.c     |  1 +
 fs/xfs/libxfs/xfs_dir2_data.c      |  1 +
 fs/xfs/libxfs/xfs_dir2_leaf.c      |  1 +
 fs/xfs/libxfs/xfs_dir2_node.c      |  1 +
 fs/xfs/libxfs/xfs_ialloc.c         |  8 +++++--
 fs/xfs/libxfs/xfs_sb.c             |  8 +++++++
 fs/xfs/libxfs/xfs_symlink_remote.c |  3 +++
 fs/xfs/xfs_log.c                   |  1 -
 fs/xfs/xfs_log_priv.h              | 24 +++++++++++++++++++
 fs/xfs/xfs_log_recover.c           | 15 +++++++++++-
 fs/xfs/xfs_mount.c                 | 49 ++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_mount.h                 |  3 +++
 16 files changed, 135 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index ffad7f2..cb26016 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -482,6 +482,8 @@ xfs_agfl_verify(
 		    be32_to_cpu(agfl->agfl_bno[i]) >= mp->m_sb.sb_agblocks)
 			return false;
 	}
+
+	xfs_detect_invalid_lsn(mp, be64_to_cpu(XFS_BUF_TO_AGFL(bp)->agfl_lsn));
 	return true;
 }
 
@@ -2259,9 +2261,12 @@ xfs_agf_verify(
  {
 	struct xfs_agf	*agf = XFS_BUF_TO_AGF(bp);
 
-	if (xfs_sb_version_hascrc(&mp->m_sb) &&
-	    !uuid_equal(&agf->agf_uuid, &mp->m_sb.sb_meta_uuid))
+	if (xfs_sb_version_hascrc(&mp->m_sb)) {
+		if (!uuid_equal(&agf->agf_uuid, &mp->m_sb.sb_meta_uuid))
 			return false;
+		xfs_detect_invalid_lsn(mp,
+				be64_to_cpu(XFS_BUF_TO_AGF(bp)->agf_lsn));
+	}
 
 	if (!(agf->agf_magicnum == cpu_to_be32(XFS_AGF_MAGIC) &&
 	      XFS_AGF_GOOD_VERSION(be32_to_cpu(agf->agf_versionnum)) &&
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index 33df52d..643ba73 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -266,6 +266,8 @@ xfs_attr3_leaf_verify(
 			return false;
 		if (be64_to_cpu(hdr3->info.blkno) != bp->b_bn)
 			return false;
+
+		xfs_detect_invalid_lsn(mp, be64_to_cpu(hdr3->info.lsn));
 	} else {
 		if (ichdr.magic != XFS_ATTR_LEAF_MAGIC)
 			return false;
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index f7d7ee7..c2c5765 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -243,8 +243,13 @@ bool
 xfs_btree_lblock_verify_crc(
 	struct xfs_buf		*bp)
 {
-	if (xfs_sb_version_hascrc(&bp->b_target->bt_mount->m_sb))
+	struct xfs_btree_block	*block = XFS_BUF_TO_BLOCK(bp);
+	struct xfs_mount	*mp = bp->b_target->bt_mount;
+
+	if (xfs_sb_version_hascrc(&bp->b_target->bt_mount->m_sb)) {
+		xfs_detect_invalid_lsn(mp, be64_to_cpu(block->bb_u.l.bb_lsn));
 		return xfs_buf_verify_cksum(bp, XFS_BTREE_LBLOCK_CRC_OFF);
+	}
 
 	return true;
 }
@@ -275,8 +280,13 @@ bool
 xfs_btree_sblock_verify_crc(
 	struct xfs_buf		*bp)
 {
-	if (xfs_sb_version_hascrc(&bp->b_target->bt_mount->m_sb))
+	struct xfs_btree_block  *block = XFS_BUF_TO_BLOCK(bp);
+	struct xfs_mount	*mp = bp->b_target->bt_mount;
+
+	if (xfs_sb_version_hascrc(&bp->b_target->bt_mount->m_sb)) {
+		xfs_detect_invalid_lsn(mp, be64_to_cpu(block->bb_u.s.bb_lsn));
 		return xfs_buf_verify_cksum(bp, XFS_BTREE_SBLOCK_CRC_OFF);
+	}
 
 	return true;
 }
diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
index be43248..2ea7982 100644
--- a/fs/xfs/libxfs/xfs_da_btree.c
+++ b/fs/xfs/libxfs/xfs_da_btree.c
@@ -150,6 +150,8 @@ xfs_da3_node_verify(
 			return false;
 		if (be64_to_cpu(hdr3->info.blkno) != bp->b_bn)
 			return false;
+
+		xfs_detect_invalid_lsn(mp, be64_to_cpu(hdr3->info.lsn));
 	} else {
 		if (ichdr.magic != XFS_DA_NODE_MAGIC)
 			return false;
@@ -322,6 +324,7 @@ xfs_da3_node_create(
 	if (xfs_sb_version_hascrc(&mp->m_sb)) {
 		struct xfs_da3_node_hdr *hdr3 = bp->b_addr;
 
+		memset(hdr3, 0, sizeof(struct xfs_da3_node_hdr));
 		ichdr.magic = XFS_DA3_NODE_MAGIC;
 		hdr3->info.blkno = cpu_to_be64(bp->b_bn);
 		hdr3->info.owner = cpu_to_be64(args->dp->i_ino);
diff --git a/fs/xfs/libxfs/xfs_dir2_block.c b/fs/xfs/libxfs/xfs_dir2_block.c
index 4778d1d..9cd553f 100644
--- a/fs/xfs/libxfs/xfs_dir2_block.c
+++ b/fs/xfs/libxfs/xfs_dir2_block.c
@@ -71,6 +71,7 @@ xfs_dir3_block_verify(
 			return false;
 		if (be64_to_cpu(hdr3->blkno) != bp->b_bn)
 			return false;
+		xfs_detect_invalid_lsn(mp, be64_to_cpu(hdr3->lsn));
 	} else {
 		if (hdr3->magic != cpu_to_be32(XFS_DIR2_BLOCK_MAGIC))
 			return false;
diff --git a/fs/xfs/libxfs/xfs_dir2_data.c b/fs/xfs/libxfs/xfs_dir2_data.c
index 824131e..d9c6d6e 100644
--- a/fs/xfs/libxfs/xfs_dir2_data.c
+++ b/fs/xfs/libxfs/xfs_dir2_data.c
@@ -224,6 +224,7 @@ xfs_dir3_data_verify(
 			return false;
 		if (be64_to_cpu(hdr3->blkno) != bp->b_bn)
 			return false;
+		xfs_detect_invalid_lsn(mp, be64_to_cpu(hdr3->lsn));
 	} else {
 		if (hdr3->magic != cpu_to_be32(XFS_DIR2_DATA_MAGIC))
 			return false;
diff --git a/fs/xfs/libxfs/xfs_dir2_leaf.c b/fs/xfs/libxfs/xfs_dir2_leaf.c
index f300240..896de9e 100644
--- a/fs/xfs/libxfs/xfs_dir2_leaf.c
+++ b/fs/xfs/libxfs/xfs_dir2_leaf.c
@@ -164,6 +164,7 @@ xfs_dir3_leaf_verify(
 			return false;
 		if (be64_to_cpu(leaf3->info.blkno) != bp->b_bn)
 			return false;
+		xfs_detect_invalid_lsn(mp, be64_to_cpu(leaf3->info.lsn));
 	} else {
 		if (leaf->hdr.info.magic != cpu_to_be16(magic))
 			return false;
diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
index cc28e92..c0bfeef 100644
--- a/fs/xfs/libxfs/xfs_dir2_node.c
+++ b/fs/xfs/libxfs/xfs_dir2_node.c
@@ -97,6 +97,7 @@ xfs_dir3_free_verify(
 			return false;
 		if (be64_to_cpu(hdr3->blkno) != bp->b_bn)
 			return false;
+		xfs_detect_invalid_lsn(mp, be64_to_cpu(hdr3->lsn));
 	} else {
 		if (hdr->magic != cpu_to_be32(XFS_DIR2_FREE_MAGIC))
 			return false;
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 54deb2d..e91343b 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -2500,9 +2500,13 @@ xfs_agi_verify(
 	struct xfs_mount *mp = bp->b_target->bt_mount;
 	struct xfs_agi	*agi = XFS_BUF_TO_AGI(bp);
 
-	if (xfs_sb_version_hascrc(&mp->m_sb) &&
-	    !uuid_equal(&agi->agi_uuid, &mp->m_sb.sb_meta_uuid))
+	if (xfs_sb_version_hascrc(&mp->m_sb)) {
+		if (!uuid_equal(&agi->agi_uuid, &mp->m_sb.sb_meta_uuid))
 			return false;
+		xfs_detect_invalid_lsn(mp,
+				be64_to_cpu(XFS_BUF_TO_AGI(bp)->agi_lsn));
+	}
+
 	/*
 	 * Validate the magic number of the agi block.
 	 */
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 4742514..badf587 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -163,6 +163,14 @@ xfs_mount_validate_sb(
 "Filesystem can not be safely mounted by this kernel.");
 			return -EINVAL;
 		}
+	} else if (xfs_sb_version_hascrc(sbp)) {
+		/*
+		 * We can't read verify the sb LSN because the read verifier is
+		 * called before the log is allocated and processed. We know the
+		 * log is set up before write verifier (!check_version) calls,
+		 * so just check it here.
+		 */
+		xfs_detect_invalid_lsn(mp, sbp->sb_lsn);
 	}
 
 	if (xfs_sb_version_has_pquotino(sbp)) {
diff --git a/fs/xfs/libxfs/xfs_symlink_remote.c b/fs/xfs/libxfs/xfs_symlink_remote.c
index 8f8af05..aa6b3c1 100644
--- a/fs/xfs/libxfs/xfs_symlink_remote.c
+++ b/fs/xfs/libxfs/xfs_symlink_remote.c
@@ -60,6 +60,7 @@ xfs_symlink_hdr_set(
 	if (!xfs_sb_version_hascrc(&mp->m_sb))
 		return 0;
 
+	memset(dsl, 0, sizeof(struct xfs_dsymlink_hdr));
 	dsl->sl_magic = cpu_to_be32(XFS_SYMLINK_MAGIC);
 	dsl->sl_offset = cpu_to_be32(offset);
 	dsl->sl_bytes = cpu_to_be32(size);
@@ -117,6 +118,8 @@ xfs_symlink_verify(
 	if (dsl->sl_owner == 0)
 		return false;
 
+	xfs_detect_invalid_lsn(mp, be64_to_cpu(dsl->sl_lsn));
+
 	return true;
 }
 
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index aaadee0..a34476b 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -4022,4 +4022,3 @@ xlog_iclogs_empty(
 	} while (iclog != log->l_iclog);
 	return 1;
 }
-
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 950f3f9..c0acc71 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -560,4 +560,28 @@ static inline void xlog_wait(wait_queue_head_t *wq, spinlock_t *lock)
 	remove_wait_queue(wq, &wait);
 }
 
+/*
+ * The LSN is valid so long as it is behind the current LSN. If it isn't, this
+ * means that the next log record that includes this metadata could have a
+ * smaller LSN. In turn, this means that the modification in the log would not
+ * replay.
+ */
+static inline bool
+xlog_valid_lsn(
+	struct xlog	*log,
+	xfs_lsn_t	lsn)
+{
+	int		cycle = CYCLE_LSN(lsn);
+	int		block = BLOCK_LSN(lsn);
+	bool		valid = true;
+
+	spin_lock(&log->l_icloglock);
+	if ((cycle > log->l_curr_cycle) ||
+	    (cycle == log->l_curr_cycle && block > log->l_curr_block))
+		valid = false;
+	spin_unlock(&log->l_icloglock);
+
+	return valid;
+}
+
 #endif	/* __XFS_LOG_PRIV_H__ */
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 512a094..c6ea06a 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -4609,9 +4609,22 @@ xlog_recover(
 	int		error;
 
 	/* find the tail of the log */
-	if ((error = xlog_find_tail(log, &head_blk, &tail_blk)))
+	error = xlog_find_tail(log, &head_blk, &tail_blk);
+	if (error)
 		return error;
 
+	/*
+	 * The superblock was read before the log was available and thus the LSN
+	 * could not be verified. Check the superblock LSN against the current
+	 * LSN now that it's known.
+	 */
+	if (xfs_sb_version_hascrc(&log->l_mp->m_sb) &&
+	    !xlog_valid_lsn(log, log->l_mp->m_sb.sb_lsn)) {
+		xfs_warn(log->l_mp,
+	"Invalid superblock LSN (ahead of log). Please run xfs_repair.");
+		return -EINVAL;
+	}
+
 	if (tail_blk != head_blk) {
 		/* There used to be a comment here:
 		 *
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index bf92e0c..0acfea8 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -41,6 +41,7 @@
 #include "xfs_trace.h"
 #include "xfs_icache.h"
 #include "xfs_sysfs.h"
+#include "xfs_log_priv.h"
 
 
 static DEFINE_MUTEX(xfs_uuid_table_mutex);
@@ -1301,3 +1302,51 @@ xfs_dev_is_read_only(
 	}
 	return 0;
 }
+
+/*
+ * Verify that an LSN stamped into a piece of metadata is valid. This is
+ * intended for use in read verifiers on v5 superblocks.
+ */
+void
+xfs_detect_invalid_lsn(
+	struct xfs_mount	*mp,
+	xfs_lsn_t		lsn)
+{
+	struct xlog		*log = mp->m_log;
+	int			cycle = CYCLE_LSN(lsn);
+	int			block = BLOCK_LSN(lsn);
+	const char 		*fmt;
+
+	/*
+	 * 'norecovery' mode skips mount-time log processing and unconditionally
+	 * resets the LSN.
+	 */
+	if (mp->m_flags & XFS_MOUNT_NORECOVERY)
+		return;
+
+	if (xlog_valid_lsn(mp->m_log, lsn))
+		return;
+
+	/*
+	 * Warn at least once for each filesystem susceptible to this problem
+	 * and once per day thereafter.
+	 *
+	 * XXX: Specify the minimum xfs_repair version required to resolve?
+	 * Older versions will silently reintroduce the problem.
+	 *
+	 * We should eventually convert this condition into a hard error (via
+	 * verifier failure). Defer that behavior for a few release cycles or so
+	 * until the userspace fixes have had the opportunity to propogate
+	 * throughout the various distributions.
+	 */
+	fmt = "WARNING: Metadata has LSN (%d:%d) ahead of current LSN (%d:%d). "
+"This may result in filesystem failure. Please unmount and run xfs_repair to resolve.";
+	if (!mp->m_warned_bad_lsn) {
+		mp->m_warned_bad_lsn = true;
+		xfs_warn(mp, fmt, cycle, block, log->l_curr_cycle,
+			 log->l_curr_block);
+	} else {
+		xfs_warn_daily(mp, fmt, cycle, block, log->l_curr_cycle,
+			       log->l_curr_block);
+	}
+}
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 7999e91..572ebfc 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -127,6 +127,7 @@ typedef struct xfs_mount {
 	int64_t			m_low_space[XFS_LOWSP_MAX];
 						/* low free space thresholds */
 	struct xfs_kobj		m_kobj;
+	bool			m_warned_bad_lsn;/* bad metadata lsn detected */
 
 	struct workqueue_struct *m_buf_workqueue;
 	struct workqueue_struct	*m_data_workqueue;
@@ -336,4 +337,6 @@ extern int	xfs_dev_is_read_only(struct xfs_mount *, char *);
 
 extern void	xfs_set_low_space_thresholds(struct xfs_mount *);
 
+extern void	xfs_detect_invalid_lsn(struct xfs_mount *, xfs_lsn_t);
+
 #endif	/* __XFS_MOUNT_H__ */
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC v2 2/2] xfs: validate metadata LSNs against log on v5 superblocks
  2015-09-11 18:53 ` [RFC v2 2/2] xfs: validate metadata LSNs against log on v5 superblocks Brian Foster
@ 2015-09-22  8:07   ` Dave Chinner
  2015-09-22 14:21     ` Brian Foster
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2015-09-22  8:07 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Fri, Sep 11, 2015 at 02:53:55PM -0400, Brian Foster wrote:
> Since the onset of v5 superblocks, the LSN of the last modification has
> been included in a variety of on-disk data structures. This LSN is used
> to provide log recovery ordering guarantees (e.g., to ensure an older
> log recovery item is not replayed over a newer target data structure).
> 
> While this works correctly from the point a filesystem is formatted and
> mounted, userspace tools have some problematic behaviors that defeat
> this mechanism. For example, xfs_repair historically zeroes out the log
> unconditionally (regardless of whether corruption is detected). If this
> occurs, the LSN of the filesystem is reset and the log is now in a
> problematic state with respect to on-disk metadata structures that might
> have a larger LSN. Until either the log catches up to the highest
> previously used metadata LSN or each affected data structure is modified
> and written out without incident (which resets the metadata LSN), log
> recovery is susceptible to filesystem corruption.
> 
> This problem is ultimately addressed and repaired in the associated
> userspace tools. The kernel is still responsible to detect the problem
> and notify the user that something is wrong. Check the superblock LSN at
> mount time and notify the user and fail the mount if it is invalid.
> From that point on, simply warn the user if an invalid metadata LSN is
> detected from the various metadata I/O verifiers. A new xfs_mount flag
> is added to guarantee that a warning fires at least once for each
> affected filesystem mount. From that point forward, further instances of
> the problem are rate limited to a once per 24 hour period.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c          |  9 +++++--
>  fs/xfs/libxfs/xfs_attr_leaf.c      |  2 ++
>  fs/xfs/libxfs/xfs_btree.c          | 14 +++++++++--
>  fs/xfs/libxfs/xfs_da_btree.c       |  3 +++
>  fs/xfs/libxfs/xfs_dir2_block.c     |  1 +
>  fs/xfs/libxfs/xfs_dir2_data.c      |  1 +
>  fs/xfs/libxfs/xfs_dir2_leaf.c      |  1 +
>  fs/xfs/libxfs/xfs_dir2_node.c      |  1 +
>  fs/xfs/libxfs/xfs_ialloc.c         |  8 +++++--
>  fs/xfs/libxfs/xfs_sb.c             |  8 +++++++
>  fs/xfs/libxfs/xfs_symlink_remote.c |  3 +++
>  fs/xfs/xfs_log.c                   |  1 -
>  fs/xfs/xfs_log_priv.h              | 24 +++++++++++++++++++
>  fs/xfs/xfs_log_recover.c           | 15 +++++++++++-
>  fs/xfs/xfs_mount.c                 | 49 ++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_mount.h                 |  3 +++
>  16 files changed, 135 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index ffad7f2..cb26016 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -482,6 +482,8 @@ xfs_agfl_verify(
>  		    be32_to_cpu(agfl->agfl_bno[i]) >= mp->m_sb.sb_agblocks)
>  			return false;
>  	}
> +
> +	xfs_detect_invalid_lsn(mp, be64_to_cpu(XFS_BUF_TO_AGFL(bp)->agfl_lsn));
>  	return true;

	xfs_log_check_lsn(mp, <lsn>);

> +/*
> + * The LSN is valid so long as it is behind the current LSN. If it isn't, this
> + * means that the next log record that includes this metadata could have a
> + * smaller LSN. In turn, this means that the modification in the log would not
> + * replay.
> + */
> +static inline bool
> +xlog_valid_lsn(
> +	struct xlog	*log,
> +	xfs_lsn_t	lsn)
> +{
> +	int		cycle = CYCLE_LSN(lsn);
> +	int		block = BLOCK_LSN(lsn);
> +	bool		valid = true;
> +
> +	spin_lock(&log->l_icloglock);
> +	if ((cycle > log->l_curr_cycle) ||
> +	    (cycle == log->l_curr_cycle && block > log->l_curr_block))
> +		valid = false;
> +	spin_unlock(&log->l_icloglock);
> +
> +	return valid;
> +}

We do not want to take the l_icloglock in metadata IO
context. It's a global fs lock, and it's a hot lock, too, so we
do not want to be taking this during every metadata IO completion.

I think, at minimum, we need to sample the current values and do an
unlocked check, then if the result is suspect we need to do an
accurate locked check. This means that we will almost never take the
l_icloglock here. e.g:

{
	int cycle = ACCESS_ONCE(log->l_curr_cycle);
	int block = ACCESS_ONCE(log->l_curr_block);

	if (CYCLE_LSN(lsn) > cycle ||
	    (CYCLE_LSN(lsn) == cycle && BLOCK_LSN(lsn) > block)) {
		/* suspect, take icloglock to check again */
		bool valid = true;

		....
		return valid;
	}
	return true;
}


> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -41,6 +41,7 @@
>  #include "xfs_trace.h"
>  #include "xfs_icache.h"
>  #include "xfs_sysfs.h"
> +#include "xfs_log_priv.h"
>  
>  
>  static DEFINE_MUTEX(xfs_uuid_table_mutex);
> @@ -1301,3 +1302,51 @@ xfs_dev_is_read_only(
>  	}
>  	return 0;
>  }
> +
> +/*
> + * Verify that an LSN stamped into a piece of metadata is valid. This is
> + * intended for use in read verifiers on v5 superblocks.
> + */
> +void
> +xfs_detect_invalid_lsn(
> +	struct xfs_mount	*mp,
> +	xfs_lsn_t		lsn)
> +{

This shoul dbe called xfs_log_check_lsn() and be located in
xfs_log.c.

> +	struct xlog		*log = mp->m_log;
> +	int			cycle = CYCLE_LSN(lsn);
> +	int			block = BLOCK_LSN(lsn);
> +	const char 		*fmt;
> +
> +	/*
> +	 * 'norecovery' mode skips mount-time log processing and unconditionally
> +	 * resets the LSN.
> +	 */
> +	if (mp->m_flags & XFS_MOUNT_NORECOVERY)
> +		return;
> +
> +	if (xlog_valid_lsn(mp->m_log, lsn))
> +		return;
> +
> +	/*
> +	 * Warn at least once for each filesystem susceptible to this problem
> +	 * and once per day thereafter.
> +	 *
> +	 * XXX: Specify the minimum xfs_repair version required to resolve?
> +	 * Older versions will silently reintroduce the problem.
> +	 *
> +	 * We should eventually convert this condition into a hard error (via
> +	 * verifier failure). Defer that behavior for a few release cycles or so
> +	 * until the userspace fixes have had the opportunity to propogate
> +	 * throughout the various distributions.
> +	 */
> +	fmt = "WARNING: Metadata has LSN (%d:%d) ahead of current LSN (%d:%d). "
> +"This may result in filesystem failure. Please unmount and run xfs_repair to resolve.";
> +	if (!mp->m_warned_bad_lsn) {
> +		mp->m_warned_bad_lsn = true;
> +		xfs_warn(mp, fmt, cycle, block, log->l_curr_cycle,
> +			 log->l_curr_block);
> +	} else {
> +		xfs_warn_daily(mp, fmt, cycle, block, log->l_curr_cycle,
> +			       log->l_curr_block);
> +	}

That's really ugly. Rate limited prints should always occur at least
once for each ratelimit key, and now i've seen this I can say the
first patch needs rework.

that is, each xfs_warn_daily() caller needs it's own ratelimit key,
rather than sharing the global ratelimit key that all other XFS
messages share.

e.g.:
	xfs_warn_daily(mp, mp->m_log->l_lsn_rlstate,
"Corruption warning: Metadata has LSN (%d:%d) ahead of current LSN (%d:%d). "
"Please unmount and run xfs_repair (>= v4.3) to resolve.".
			cycle, block, log->l_curr_cycle, log->l_curr_block);

But really, thinking on this further (the "corruption warning" bit)
I'm starting to think we should just shut the filesystem down when
we hit this and force the user to fix it immediately. if we don't,
then who knows whether we are making things worse or not...

> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -127,6 +127,7 @@ typedef struct xfs_mount {
>  	int64_t			m_low_space[XFS_LOWSP_MAX];
>  						/* low free space thresholds */
>  	struct xfs_kobj		m_kobj;
> +	bool			m_warned_bad_lsn;/* bad metadata lsn detected */

That's where I'd put the ratelimit key (or in the struct xlog).

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC v2 2/2] xfs: validate metadata LSNs against log on v5 superblocks
  2015-09-22  8:07   ` Dave Chinner
@ 2015-09-22 14:21     ` Brian Foster
  0 siblings, 0 replies; 5+ messages in thread
From: Brian Foster @ 2015-09-22 14:21 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, Sep 22, 2015 at 06:07:25PM +1000, Dave Chinner wrote:
> On Fri, Sep 11, 2015 at 02:53:55PM -0400, Brian Foster wrote:
> > Since the onset of v5 superblocks, the LSN of the last modification has
> > been included in a variety of on-disk data structures. This LSN is used
> > to provide log recovery ordering guarantees (e.g., to ensure an older
> > log recovery item is not replayed over a newer target data structure).
> > 
> > While this works correctly from the point a filesystem is formatted and
> > mounted, userspace tools have some problematic behaviors that defeat
> > this mechanism. For example, xfs_repair historically zeroes out the log
> > unconditionally (regardless of whether corruption is detected). If this
> > occurs, the LSN of the filesystem is reset and the log is now in a
> > problematic state with respect to on-disk metadata structures that might
> > have a larger LSN. Until either the log catches up to the highest
> > previously used metadata LSN or each affected data structure is modified
> > and written out without incident (which resets the metadata LSN), log
> > recovery is susceptible to filesystem corruption.
> > 
> > This problem is ultimately addressed and repaired in the associated
> > userspace tools. The kernel is still responsible to detect the problem
> > and notify the user that something is wrong. Check the superblock LSN at
> > mount time and notify the user and fail the mount if it is invalid.
> > From that point on, simply warn the user if an invalid metadata LSN is
> > detected from the various metadata I/O verifiers. A new xfs_mount flag
> > is added to guarantee that a warning fires at least once for each
> > affected filesystem mount. From that point forward, further instances of
> > the problem are rate limited to a once per 24 hour period.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_alloc.c          |  9 +++++--
> >  fs/xfs/libxfs/xfs_attr_leaf.c      |  2 ++
> >  fs/xfs/libxfs/xfs_btree.c          | 14 +++++++++--
> >  fs/xfs/libxfs/xfs_da_btree.c       |  3 +++
> >  fs/xfs/libxfs/xfs_dir2_block.c     |  1 +
> >  fs/xfs/libxfs/xfs_dir2_data.c      |  1 +
> >  fs/xfs/libxfs/xfs_dir2_leaf.c      |  1 +
> >  fs/xfs/libxfs/xfs_dir2_node.c      |  1 +
> >  fs/xfs/libxfs/xfs_ialloc.c         |  8 +++++--
> >  fs/xfs/libxfs/xfs_sb.c             |  8 +++++++
> >  fs/xfs/libxfs/xfs_symlink_remote.c |  3 +++
> >  fs/xfs/xfs_log.c                   |  1 -
> >  fs/xfs/xfs_log_priv.h              | 24 +++++++++++++++++++
> >  fs/xfs/xfs_log_recover.c           | 15 +++++++++++-
> >  fs/xfs/xfs_mount.c                 | 49 ++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/xfs_mount.h                 |  3 +++
> >  16 files changed, 135 insertions(+), 8 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index ffad7f2..cb26016 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -482,6 +482,8 @@ xfs_agfl_verify(
> >  		    be32_to_cpu(agfl->agfl_bno[i]) >= mp->m_sb.sb_agblocks)
> >  			return false;
> >  	}
> > +
> > +	xfs_detect_invalid_lsn(mp, be64_to_cpu(XFS_BUF_TO_AGFL(bp)->agfl_lsn));
> >  	return true;
> 
> 	xfs_log_check_lsn(mp, <lsn>);
> 

Ok.

> > +/*
> > + * The LSN is valid so long as it is behind the current LSN. If it isn't, this
> > + * means that the next log record that includes this metadata could have a
> > + * smaller LSN. In turn, this means that the modification in the log would not
> > + * replay.
> > + */
> > +static inline bool
> > +xlog_valid_lsn(
> > +	struct xlog	*log,
> > +	xfs_lsn_t	lsn)
> > +{
> > +	int		cycle = CYCLE_LSN(lsn);
> > +	int		block = BLOCK_LSN(lsn);
> > +	bool		valid = true;
> > +
> > +	spin_lock(&log->l_icloglock);
> > +	if ((cycle > log->l_curr_cycle) ||
> > +	    (cycle == log->l_curr_cycle && block > log->l_curr_block))
> > +		valid = false;
> > +	spin_unlock(&log->l_icloglock);
> > +
> > +	return valid;
> > +}
> 
> We do not want to take the l_icloglock in metadata IO
> context. It's a global fs lock, and it's a hot lock, too, so we
> do not want to be taking this during every metadata IO completion.
> 
> I think, at minimum, we need to sample the current values and do an
> unlocked check, then if the result is suspect we need to do an
> accurate locked check. This means that we will almost never take the
> l_icloglock here. e.g:
> 
> {
> 	int cycle = ACCESS_ONCE(log->l_curr_cycle);
> 	int block = ACCESS_ONCE(log->l_curr_block);
> 
> 	if (CYCLE_LSN(lsn) > cycle ||
> 	    (CYCLE_LSN(lsn) == cycle && BLOCK_LSN(lsn) > block)) {
> 		/* suspect, take icloglock to check again */
> 		bool valid = true;
> 
> 		....
> 		return valid;
> 	}
> 	return true;
> }
> 

Good point. In this case, I may just lift the lock out of the helper and
allow this to be the responsibility of the caller, but I'll see how it
looks.

> 
> > --- a/fs/xfs/xfs_mount.c
> > +++ b/fs/xfs/xfs_mount.c
> > @@ -41,6 +41,7 @@
> >  #include "xfs_trace.h"
> >  #include "xfs_icache.h"
> >  #include "xfs_sysfs.h"
> > +#include "xfs_log_priv.h"
> >  
> >  
> >  static DEFINE_MUTEX(xfs_uuid_table_mutex);
> > @@ -1301,3 +1302,51 @@ xfs_dev_is_read_only(
> >  	}
> >  	return 0;
> >  }
> > +
> > +/*
> > + * Verify that an LSN stamped into a piece of metadata is valid. This is
> > + * intended for use in read verifiers on v5 superblocks.
> > + */
> > +void
> > +xfs_detect_invalid_lsn(
> > +	struct xfs_mount	*mp,
> > +	xfs_lsn_t		lsn)
> > +{
> 
> This shoul dbe called xfs_log_check_lsn() and be located in
> xfs_log.c.
> 

Ok.

> > +	struct xlog		*log = mp->m_log;
> > +	int			cycle = CYCLE_LSN(lsn);
> > +	int			block = BLOCK_LSN(lsn);
> > +	const char 		*fmt;
> > +
> > +	/*
> > +	 * 'norecovery' mode skips mount-time log processing and unconditionally
> > +	 * resets the LSN.
> > +	 */
> > +	if (mp->m_flags & XFS_MOUNT_NORECOVERY)
> > +		return;
> > +
> > +	if (xlog_valid_lsn(mp->m_log, lsn))
> > +		return;
> > +
> > +	/*
> > +	 * Warn at least once for each filesystem susceptible to this problem
> > +	 * and once per day thereafter.
> > +	 *
> > +	 * XXX: Specify the minimum xfs_repair version required to resolve?
> > +	 * Older versions will silently reintroduce the problem.
> > +	 *
> > +	 * We should eventually convert this condition into a hard error (via
> > +	 * verifier failure). Defer that behavior for a few release cycles or so
> > +	 * until the userspace fixes have had the opportunity to propogate
> > +	 * throughout the various distributions.
> > +	 */
> > +	fmt = "WARNING: Metadata has LSN (%d:%d) ahead of current LSN (%d:%d). "
> > +"This may result in filesystem failure. Please unmount and run xfs_repair to resolve.";
> > +	if (!mp->m_warned_bad_lsn) {
> > +		mp->m_warned_bad_lsn = true;
> > +		xfs_warn(mp, fmt, cycle, block, log->l_curr_cycle,
> > +			 log->l_curr_block);
> > +	} else {
> > +		xfs_warn_daily(mp, fmt, cycle, block, log->l_curr_cycle,
> > +			       log->l_curr_block);
> > +	}
> 
> That's really ugly. Rate limited prints should always occur at least
> once for each ratelimit key, and now i've seen this I can say the
> first patch needs rework.
> 

This is definitely more ugly than the shutdown approach. As mentioned,
I've been waffling back and forth on how to handle this condition. I had
already sent the shutdown version, so I figured I would put together
what I thought would be necessary to handle this with a warning for
comparison.

> that is, each xfs_warn_daily() caller needs it's own ratelimit key,
> rather than sharing the global ratelimit key that all other XFS
> messages share.
> 

What do you mean by a key? What is the purpose?

On further thought... I think I see what you mean. Each mount should
report in the ratelimited timeframe independently. Or in other words,
the rate limit of one mount should not affect message delivery on
another.

> e.g.:
> 	xfs_warn_daily(mp, mp->m_log->l_lsn_rlstate,
> "Corruption warning: Metadata has LSN (%d:%d) ahead of current LSN (%d:%d). "
> "Please unmount and run xfs_repair (>= v4.3) to resolve.".
> 			cycle, block, log->l_curr_cycle, log->l_curr_block);
> 
> But really, thinking on this further (the "corruption warning" bit)
> I'm starting to think we should just shut the filesystem down when
> we hit this and force the user to fix it immediately. if we don't,
> then who knows whether we are making things worse or not...
> 

I was originally thinking the shutdown might not be ideal because we
induce the very situation we're warning about (forcing a log recovery).
Since this verifies on metadata reads, however, we should in theory
avoid the problem before it could ever be introduced. My follow on
question to that was whether a shutdown is appropriate for something
that is somewhat minor and self-corrective..?

Given the ugliness of the multiple warnings, new mount flag, additional
complication of the ratelimit mechanism, etc., I might just go back to
the shutdown and call it a day. ;) I think the subtle point raised above
about calling this a corruption is a sufficient argument. While this
might be minor and self-correcting, it is still technically a corruption
of sorts and should be identified and repaired appropriately. Thanks for
the feedback.

Brian

> > --- a/fs/xfs/xfs_mount.h
> > +++ b/fs/xfs/xfs_mount.h
> > @@ -127,6 +127,7 @@ typedef struct xfs_mount {
> >  	int64_t			m_low_space[XFS_LOWSP_MAX];
> >  						/* low free space thresholds */
> >  	struct xfs_kobj		m_kobj;
> > +	bool			m_warned_bad_lsn;/* bad metadata lsn detected */
> 
> That's where I'd put the ratelimit key (or in the struct xlog).
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-09-22 14:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-11 18:53 [RFC v2 0/2] xfs: detect and warn about invalid metadata lsns Brian Foster
2015-09-11 18:53 ` [RFC v2 1/2] xfs: create a daily warning mechanism Brian Foster
2015-09-11 18:53 ` [RFC v2 2/2] xfs: validate metadata LSNs against log on v5 superblocks Brian Foster
2015-09-22  8:07   ` Dave Chinner
2015-09-22 14:21     ` Brian Foster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox