* [RFC v2 0/2] xfs: detect and warn about invalid metadata lsns @ 2015-09-11 18:53 Brian Foster 2015-09-11 18:53 ` [RFC v2 1/2] xfs: create a daily warning mechanism Brian Foster 2015-09-11 18:53 ` [RFC v2 2/2] xfs: validate metadata LSNs against log on v5 superblocks Brian Foster 0 siblings, 2 replies; 5+ messages in thread From: Brian Foster @ 2015-09-11 18:53 UTC (permalink / raw) To: xfs Hi all, Here's a second rfc pass at the kernel side of invalid metadata LSN detection for v5 supers. This is an rfc simply because my understanding is that the userspace code should probably go first so that the tools for repair are available before the kernel starts detecting and warning about this problem. The primary change in this version is how the problem is handled once detected. I've been a bit undecided on whether a shutdown is really warranted because if the fs doesn't crash, it will eventually self-correct. On the flipside, a WARN_ON_ONCE() is not sufficient because it could suppress warnings on separate filesystems once one filesystem has encountered the issue and fired a warning. Therefore, this version uses something of a happy medium. The filesystem is not shutdown but a mount flag is used to warn about the problem at least once per-mount. Any instances thereafter are ratelimited to a once per 24 hour period. Brian rfcv2: - Refactored lsn validation and warning code into separate helpers. - Updated warning mechanism to warn at least once per fs (ratelimited thereafter). - No longer shutdown the fs on invalid metadata lsn detection. rfcv1: http://oss.sgi.com/pipermail/xfs/2015-August/043396.html Brian Foster (2): xfs: create a daily warning mechanism xfs: validate metadata LSNs against log on v5 superblocks fs/xfs/libxfs/xfs_alloc.c | 9 +++++-- fs/xfs/libxfs/xfs_attr_leaf.c | 2 ++ fs/xfs/libxfs/xfs_btree.c | 14 +++++++++-- fs/xfs/libxfs/xfs_da_btree.c | 3 +++ fs/xfs/libxfs/xfs_dir2_block.c | 1 + fs/xfs/libxfs/xfs_dir2_data.c | 1 + fs/xfs/libxfs/xfs_dir2_leaf.c | 1 + fs/xfs/libxfs/xfs_dir2_node.c | 1 + fs/xfs/libxfs/xfs_ialloc.c | 8 +++++-- fs/xfs/libxfs/xfs_sb.c | 8 +++++++ fs/xfs/libxfs/xfs_symlink_remote.c | 3 +++ fs/xfs/xfs_log.c | 1 - fs/xfs/xfs_log_priv.h | 24 +++++++++++++++++++ fs/xfs/xfs_log_recover.c | 15 +++++++++++- fs/xfs/xfs_message.h | 18 ++++++++++---- fs/xfs/xfs_mount.c | 49 ++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_mount.h | 3 +++ 17 files changed, 149 insertions(+), 12 deletions(-) -- 2.1.0 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
* [RFC v2 1/2] xfs: create a daily warning mechanism 2015-09-11 18:53 [RFC v2 0/2] xfs: detect and warn about invalid metadata lsns Brian Foster @ 2015-09-11 18:53 ` Brian Foster 2015-09-11 18:53 ` [RFC v2 2/2] xfs: validate metadata LSNs against log on v5 superblocks Brian Foster 1 sibling, 0 replies; 5+ messages in thread From: Brian Foster @ 2015-09-11 18:53 UTC (permalink / raw) To: xfs The warning mechanism in XFS currently uses the default ratelimit time interval and burst parameters. The default parameters are intended to avoid flooding the logs and whatnot due to messages within fairly short timeframes (e.g., 10 messages within 5s). The forthcoming invalid metadata LSN detection must provide a consistent, but not incessant warning to the user that a repair is required to reformat the log. Update the core ratelimit mechanism to allow customized parameters, continue to pass the ratelimit defaults for existing users, and define a new xfs_warn_daily() mechanism to fire a message on a 24 hour interval. Signed-off-by: Brian Foster <bfoster@redhat.com> --- fs/xfs/xfs_message.h | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_message.h b/fs/xfs/xfs_message.h index 8540115..9f1a036 100644 --- a/fs/xfs/xfs_message.h +++ b/fs/xfs/xfs_message.h @@ -30,15 +30,22 @@ void xfs_debug(const struct xfs_mount *mp, const char *fmt, ...) } #endif -#define xfs_printk_ratelimited(func, dev, fmt, ...) \ +#define __xfs_printk_ratelimited(interval, burst, func, dev, fmt, ...) \ do { \ - static DEFINE_RATELIMIT_STATE(_rs, \ - DEFAULT_RATELIMIT_INTERVAL, \ - DEFAULT_RATELIMIT_BURST); \ + static DEFINE_RATELIMIT_STATE(_rs, interval, burst); \ if (__ratelimit(&_rs)) \ func(dev, fmt, ##__VA_ARGS__); \ } while (0) +#define xfs_printk_ratelimited(func, dev, fmt, ...) \ + __xfs_printk_ratelimited(DEFAULT_RATELIMIT_INTERVAL, \ + DEFAULT_RATELIMIT_BURST, \ + func, dev, fmt, ##__VA_ARGS__) + +#define xfs_printk_daily(func, dev, fmt, ...) \ + __xfs_printk_ratelimited(86400 * HZ, 1, func, dev, fmt, \ + ##__VA_ARGS__) + #define xfs_emerg_ratelimited(dev, fmt, ...) \ xfs_printk_ratelimited(xfs_emerg, dev, fmt, ##__VA_ARGS__) #define xfs_alert_ratelimited(dev, fmt, ...) \ @@ -56,6 +63,9 @@ do { \ #define xfs_debug_ratelimited(dev, fmt, ...) \ xfs_printk_ratelimited(xfs_debug, dev, fmt, ##__VA_ARGS__) +#define xfs_warn_daily(dev, fmt, ...) \ + xfs_printk_daily(xfs_warn, dev, fmt, ##__VA_ARGS__) + extern void assfail(char *expr, char *f, int l); extern void asswarn(char *expr, char *f, int l); -- 2.1.0 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [RFC v2 2/2] xfs: validate metadata LSNs against log on v5 superblocks 2015-09-11 18:53 [RFC v2 0/2] xfs: detect and warn about invalid metadata lsns Brian Foster 2015-09-11 18:53 ` [RFC v2 1/2] xfs: create a daily warning mechanism Brian Foster @ 2015-09-11 18:53 ` Brian Foster 2015-09-22 8:07 ` Dave Chinner 1 sibling, 1 reply; 5+ messages in thread From: Brian Foster @ 2015-09-11 18:53 UTC (permalink / raw) To: xfs Since the onset of v5 superblocks, the LSN of the last modification has been included in a variety of on-disk data structures. This LSN is used to provide log recovery ordering guarantees (e.g., to ensure an older log recovery item is not replayed over a newer target data structure). While this works correctly from the point a filesystem is formatted and mounted, userspace tools have some problematic behaviors that defeat this mechanism. For example, xfs_repair historically zeroes out the log unconditionally (regardless of whether corruption is detected). If this occurs, the LSN of the filesystem is reset and the log is now in a problematic state with respect to on-disk metadata structures that might have a larger LSN. Until either the log catches up to the highest previously used metadata LSN or each affected data structure is modified and written out without incident (which resets the metadata LSN), log recovery is susceptible to filesystem corruption. This problem is ultimately addressed and repaired in the associated userspace tools. The kernel is still responsible to detect the problem and notify the user that something is wrong. Check the superblock LSN at mount time and notify the user and fail the mount if it is invalid. >From that point on, simply warn the user if an invalid metadata LSN is detected from the various metadata I/O verifiers. A new xfs_mount flag is added to guarantee that a warning fires at least once for each affected filesystem mount. From that point forward, further instances of the problem are rate limited to a once per 24 hour period. Signed-off-by: Brian Foster <bfoster@redhat.com> --- fs/xfs/libxfs/xfs_alloc.c | 9 +++++-- fs/xfs/libxfs/xfs_attr_leaf.c | 2 ++ fs/xfs/libxfs/xfs_btree.c | 14 +++++++++-- fs/xfs/libxfs/xfs_da_btree.c | 3 +++ fs/xfs/libxfs/xfs_dir2_block.c | 1 + fs/xfs/libxfs/xfs_dir2_data.c | 1 + fs/xfs/libxfs/xfs_dir2_leaf.c | 1 + fs/xfs/libxfs/xfs_dir2_node.c | 1 + fs/xfs/libxfs/xfs_ialloc.c | 8 +++++-- fs/xfs/libxfs/xfs_sb.c | 8 +++++++ fs/xfs/libxfs/xfs_symlink_remote.c | 3 +++ fs/xfs/xfs_log.c | 1 - fs/xfs/xfs_log_priv.h | 24 +++++++++++++++++++ fs/xfs/xfs_log_recover.c | 15 +++++++++++- fs/xfs/xfs_mount.c | 49 ++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_mount.h | 3 +++ 16 files changed, 135 insertions(+), 8 deletions(-) diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index ffad7f2..cb26016 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -482,6 +482,8 @@ xfs_agfl_verify( be32_to_cpu(agfl->agfl_bno[i]) >= mp->m_sb.sb_agblocks) return false; } + + xfs_detect_invalid_lsn(mp, be64_to_cpu(XFS_BUF_TO_AGFL(bp)->agfl_lsn)); return true; } @@ -2259,9 +2261,12 @@ xfs_agf_verify( { struct xfs_agf *agf = XFS_BUF_TO_AGF(bp); - if (xfs_sb_version_hascrc(&mp->m_sb) && - !uuid_equal(&agf->agf_uuid, &mp->m_sb.sb_meta_uuid)) + if (xfs_sb_version_hascrc(&mp->m_sb)) { + if (!uuid_equal(&agf->agf_uuid, &mp->m_sb.sb_meta_uuid)) return false; + xfs_detect_invalid_lsn(mp, + be64_to_cpu(XFS_BUF_TO_AGF(bp)->agf_lsn)); + } if (!(agf->agf_magicnum == cpu_to_be32(XFS_AGF_MAGIC) && XFS_AGF_GOOD_VERSION(be32_to_cpu(agf->agf_versionnum)) && diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c index 33df52d..643ba73 100644 --- a/fs/xfs/libxfs/xfs_attr_leaf.c +++ b/fs/xfs/libxfs/xfs_attr_leaf.c @@ -266,6 +266,8 @@ xfs_attr3_leaf_verify( return false; if (be64_to_cpu(hdr3->info.blkno) != bp->b_bn) return false; + + xfs_detect_invalid_lsn(mp, be64_to_cpu(hdr3->info.lsn)); } else { if (ichdr.magic != XFS_ATTR_LEAF_MAGIC) return false; diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index f7d7ee7..c2c5765 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -243,8 +243,13 @@ bool xfs_btree_lblock_verify_crc( struct xfs_buf *bp) { - if (xfs_sb_version_hascrc(&bp->b_target->bt_mount->m_sb)) + struct xfs_btree_block *block = XFS_BUF_TO_BLOCK(bp); + struct xfs_mount *mp = bp->b_target->bt_mount; + + if (xfs_sb_version_hascrc(&bp->b_target->bt_mount->m_sb)) { + xfs_detect_invalid_lsn(mp, be64_to_cpu(block->bb_u.l.bb_lsn)); return xfs_buf_verify_cksum(bp, XFS_BTREE_LBLOCK_CRC_OFF); + } return true; } @@ -275,8 +280,13 @@ bool xfs_btree_sblock_verify_crc( struct xfs_buf *bp) { - if (xfs_sb_version_hascrc(&bp->b_target->bt_mount->m_sb)) + struct xfs_btree_block *block = XFS_BUF_TO_BLOCK(bp); + struct xfs_mount *mp = bp->b_target->bt_mount; + + if (xfs_sb_version_hascrc(&bp->b_target->bt_mount->m_sb)) { + xfs_detect_invalid_lsn(mp, be64_to_cpu(block->bb_u.s.bb_lsn)); return xfs_buf_verify_cksum(bp, XFS_BTREE_SBLOCK_CRC_OFF); + } return true; } diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c index be43248..2ea7982 100644 --- a/fs/xfs/libxfs/xfs_da_btree.c +++ b/fs/xfs/libxfs/xfs_da_btree.c @@ -150,6 +150,8 @@ xfs_da3_node_verify( return false; if (be64_to_cpu(hdr3->info.blkno) != bp->b_bn) return false; + + xfs_detect_invalid_lsn(mp, be64_to_cpu(hdr3->info.lsn)); } else { if (ichdr.magic != XFS_DA_NODE_MAGIC) return false; @@ -322,6 +324,7 @@ xfs_da3_node_create( if (xfs_sb_version_hascrc(&mp->m_sb)) { struct xfs_da3_node_hdr *hdr3 = bp->b_addr; + memset(hdr3, 0, sizeof(struct xfs_da3_node_hdr)); ichdr.magic = XFS_DA3_NODE_MAGIC; hdr3->info.blkno = cpu_to_be64(bp->b_bn); hdr3->info.owner = cpu_to_be64(args->dp->i_ino); diff --git a/fs/xfs/libxfs/xfs_dir2_block.c b/fs/xfs/libxfs/xfs_dir2_block.c index 4778d1d..9cd553f 100644 --- a/fs/xfs/libxfs/xfs_dir2_block.c +++ b/fs/xfs/libxfs/xfs_dir2_block.c @@ -71,6 +71,7 @@ xfs_dir3_block_verify( return false; if (be64_to_cpu(hdr3->blkno) != bp->b_bn) return false; + xfs_detect_invalid_lsn(mp, be64_to_cpu(hdr3->lsn)); } else { if (hdr3->magic != cpu_to_be32(XFS_DIR2_BLOCK_MAGIC)) return false; diff --git a/fs/xfs/libxfs/xfs_dir2_data.c b/fs/xfs/libxfs/xfs_dir2_data.c index 824131e..d9c6d6e 100644 --- a/fs/xfs/libxfs/xfs_dir2_data.c +++ b/fs/xfs/libxfs/xfs_dir2_data.c @@ -224,6 +224,7 @@ xfs_dir3_data_verify( return false; if (be64_to_cpu(hdr3->blkno) != bp->b_bn) return false; + xfs_detect_invalid_lsn(mp, be64_to_cpu(hdr3->lsn)); } else { if (hdr3->magic != cpu_to_be32(XFS_DIR2_DATA_MAGIC)) return false; diff --git a/fs/xfs/libxfs/xfs_dir2_leaf.c b/fs/xfs/libxfs/xfs_dir2_leaf.c index f300240..896de9e 100644 --- a/fs/xfs/libxfs/xfs_dir2_leaf.c +++ b/fs/xfs/libxfs/xfs_dir2_leaf.c @@ -164,6 +164,7 @@ xfs_dir3_leaf_verify( return false; if (be64_to_cpu(leaf3->info.blkno) != bp->b_bn) return false; + xfs_detect_invalid_lsn(mp, be64_to_cpu(leaf3->info.lsn)); } else { if (leaf->hdr.info.magic != cpu_to_be16(magic)) return false; diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c index cc28e92..c0bfeef 100644 --- a/fs/xfs/libxfs/xfs_dir2_node.c +++ b/fs/xfs/libxfs/xfs_dir2_node.c @@ -97,6 +97,7 @@ xfs_dir3_free_verify( return false; if (be64_to_cpu(hdr3->blkno) != bp->b_bn) return false; + xfs_detect_invalid_lsn(mp, be64_to_cpu(hdr3->lsn)); } else { if (hdr->magic != cpu_to_be32(XFS_DIR2_FREE_MAGIC)) return false; diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c index 54deb2d..e91343b 100644 --- a/fs/xfs/libxfs/xfs_ialloc.c +++ b/fs/xfs/libxfs/xfs_ialloc.c @@ -2500,9 +2500,13 @@ xfs_agi_verify( struct xfs_mount *mp = bp->b_target->bt_mount; struct xfs_agi *agi = XFS_BUF_TO_AGI(bp); - if (xfs_sb_version_hascrc(&mp->m_sb) && - !uuid_equal(&agi->agi_uuid, &mp->m_sb.sb_meta_uuid)) + if (xfs_sb_version_hascrc(&mp->m_sb)) { + if (!uuid_equal(&agi->agi_uuid, &mp->m_sb.sb_meta_uuid)) return false; + xfs_detect_invalid_lsn(mp, + be64_to_cpu(XFS_BUF_TO_AGI(bp)->agi_lsn)); + } + /* * Validate the magic number of the agi block. */ diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index 4742514..badf587 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -163,6 +163,14 @@ xfs_mount_validate_sb( "Filesystem can not be safely mounted by this kernel."); return -EINVAL; } + } else if (xfs_sb_version_hascrc(sbp)) { + /* + * We can't read verify the sb LSN because the read verifier is + * called before the log is allocated and processed. We know the + * log is set up before write verifier (!check_version) calls, + * so just check it here. + */ + xfs_detect_invalid_lsn(mp, sbp->sb_lsn); } if (xfs_sb_version_has_pquotino(sbp)) { diff --git a/fs/xfs/libxfs/xfs_symlink_remote.c b/fs/xfs/libxfs/xfs_symlink_remote.c index 8f8af05..aa6b3c1 100644 --- a/fs/xfs/libxfs/xfs_symlink_remote.c +++ b/fs/xfs/libxfs/xfs_symlink_remote.c @@ -60,6 +60,7 @@ xfs_symlink_hdr_set( if (!xfs_sb_version_hascrc(&mp->m_sb)) return 0; + memset(dsl, 0, sizeof(struct xfs_dsymlink_hdr)); dsl->sl_magic = cpu_to_be32(XFS_SYMLINK_MAGIC); dsl->sl_offset = cpu_to_be32(offset); dsl->sl_bytes = cpu_to_be32(size); @@ -117,6 +118,8 @@ xfs_symlink_verify( if (dsl->sl_owner == 0) return false; + xfs_detect_invalid_lsn(mp, be64_to_cpu(dsl->sl_lsn)); + return true; } diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index aaadee0..a34476b 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -4022,4 +4022,3 @@ xlog_iclogs_empty( } while (iclog != log->l_iclog); return 1; } - diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index 950f3f9..c0acc71 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -560,4 +560,28 @@ static inline void xlog_wait(wait_queue_head_t *wq, spinlock_t *lock) remove_wait_queue(wq, &wait); } +/* + * The LSN is valid so long as it is behind the current LSN. If it isn't, this + * means that the next log record that includes this metadata could have a + * smaller LSN. In turn, this means that the modification in the log would not + * replay. + */ +static inline bool +xlog_valid_lsn( + struct xlog *log, + xfs_lsn_t lsn) +{ + int cycle = CYCLE_LSN(lsn); + int block = BLOCK_LSN(lsn); + bool valid = true; + + spin_lock(&log->l_icloglock); + if ((cycle > log->l_curr_cycle) || + (cycle == log->l_curr_cycle && block > log->l_curr_block)) + valid = false; + spin_unlock(&log->l_icloglock); + + return valid; +} + #endif /* __XFS_LOG_PRIV_H__ */ diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 512a094..c6ea06a 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -4609,9 +4609,22 @@ xlog_recover( int error; /* find the tail of the log */ - if ((error = xlog_find_tail(log, &head_blk, &tail_blk))) + error = xlog_find_tail(log, &head_blk, &tail_blk); + if (error) return error; + /* + * The superblock was read before the log was available and thus the LSN + * could not be verified. Check the superblock LSN against the current + * LSN now that it's known. + */ + if (xfs_sb_version_hascrc(&log->l_mp->m_sb) && + !xlog_valid_lsn(log, log->l_mp->m_sb.sb_lsn)) { + xfs_warn(log->l_mp, + "Invalid superblock LSN (ahead of log). Please run xfs_repair."); + return -EINVAL; + } + if (tail_blk != head_blk) { /* There used to be a comment here: * diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index bf92e0c..0acfea8 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -41,6 +41,7 @@ #include "xfs_trace.h" #include "xfs_icache.h" #include "xfs_sysfs.h" +#include "xfs_log_priv.h" static DEFINE_MUTEX(xfs_uuid_table_mutex); @@ -1301,3 +1302,51 @@ xfs_dev_is_read_only( } return 0; } + +/* + * Verify that an LSN stamped into a piece of metadata is valid. This is + * intended for use in read verifiers on v5 superblocks. + */ +void +xfs_detect_invalid_lsn( + struct xfs_mount *mp, + xfs_lsn_t lsn) +{ + struct xlog *log = mp->m_log; + int cycle = CYCLE_LSN(lsn); + int block = BLOCK_LSN(lsn); + const char *fmt; + + /* + * 'norecovery' mode skips mount-time log processing and unconditionally + * resets the LSN. + */ + if (mp->m_flags & XFS_MOUNT_NORECOVERY) + return; + + if (xlog_valid_lsn(mp->m_log, lsn)) + return; + + /* + * Warn at least once for each filesystem susceptible to this problem + * and once per day thereafter. + * + * XXX: Specify the minimum xfs_repair version required to resolve? + * Older versions will silently reintroduce the problem. + * + * We should eventually convert this condition into a hard error (via + * verifier failure). Defer that behavior for a few release cycles or so + * until the userspace fixes have had the opportunity to propogate + * throughout the various distributions. + */ + fmt = "WARNING: Metadata has LSN (%d:%d) ahead of current LSN (%d:%d). " +"This may result in filesystem failure. Please unmount and run xfs_repair to resolve."; + if (!mp->m_warned_bad_lsn) { + mp->m_warned_bad_lsn = true; + xfs_warn(mp, fmt, cycle, block, log->l_curr_cycle, + log->l_curr_block); + } else { + xfs_warn_daily(mp, fmt, cycle, block, log->l_curr_cycle, + log->l_curr_block); + } +} diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 7999e91..572ebfc 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -127,6 +127,7 @@ typedef struct xfs_mount { int64_t m_low_space[XFS_LOWSP_MAX]; /* low free space thresholds */ struct xfs_kobj m_kobj; + bool m_warned_bad_lsn;/* bad metadata lsn detected */ struct workqueue_struct *m_buf_workqueue; struct workqueue_struct *m_data_workqueue; @@ -336,4 +337,6 @@ extern int xfs_dev_is_read_only(struct xfs_mount *, char *); extern void xfs_set_low_space_thresholds(struct xfs_mount *); +extern void xfs_detect_invalid_lsn(struct xfs_mount *, xfs_lsn_t); + #endif /* __XFS_MOUNT_H__ */ -- 2.1.0 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC v2 2/2] xfs: validate metadata LSNs against log on v5 superblocks 2015-09-11 18:53 ` [RFC v2 2/2] xfs: validate metadata LSNs against log on v5 superblocks Brian Foster @ 2015-09-22 8:07 ` Dave Chinner 2015-09-22 14:21 ` Brian Foster 0 siblings, 1 reply; 5+ messages in thread From: Dave Chinner @ 2015-09-22 8:07 UTC (permalink / raw) To: Brian Foster; +Cc: xfs On Fri, Sep 11, 2015 at 02:53:55PM -0400, Brian Foster wrote: > Since the onset of v5 superblocks, the LSN of the last modification has > been included in a variety of on-disk data structures. This LSN is used > to provide log recovery ordering guarantees (e.g., to ensure an older > log recovery item is not replayed over a newer target data structure). > > While this works correctly from the point a filesystem is formatted and > mounted, userspace tools have some problematic behaviors that defeat > this mechanism. For example, xfs_repair historically zeroes out the log > unconditionally (regardless of whether corruption is detected). If this > occurs, the LSN of the filesystem is reset and the log is now in a > problematic state with respect to on-disk metadata structures that might > have a larger LSN. Until either the log catches up to the highest > previously used metadata LSN or each affected data structure is modified > and written out without incident (which resets the metadata LSN), log > recovery is susceptible to filesystem corruption. > > This problem is ultimately addressed and repaired in the associated > userspace tools. The kernel is still responsible to detect the problem > and notify the user that something is wrong. Check the superblock LSN at > mount time and notify the user and fail the mount if it is invalid. > From that point on, simply warn the user if an invalid metadata LSN is > detected from the various metadata I/O verifiers. A new xfs_mount flag > is added to guarantee that a warning fires at least once for each > affected filesystem mount. From that point forward, further instances of > the problem are rate limited to a once per 24 hour period. > > Signed-off-by: Brian Foster <bfoster@redhat.com> > --- > fs/xfs/libxfs/xfs_alloc.c | 9 +++++-- > fs/xfs/libxfs/xfs_attr_leaf.c | 2 ++ > fs/xfs/libxfs/xfs_btree.c | 14 +++++++++-- > fs/xfs/libxfs/xfs_da_btree.c | 3 +++ > fs/xfs/libxfs/xfs_dir2_block.c | 1 + > fs/xfs/libxfs/xfs_dir2_data.c | 1 + > fs/xfs/libxfs/xfs_dir2_leaf.c | 1 + > fs/xfs/libxfs/xfs_dir2_node.c | 1 + > fs/xfs/libxfs/xfs_ialloc.c | 8 +++++-- > fs/xfs/libxfs/xfs_sb.c | 8 +++++++ > fs/xfs/libxfs/xfs_symlink_remote.c | 3 +++ > fs/xfs/xfs_log.c | 1 - > fs/xfs/xfs_log_priv.h | 24 +++++++++++++++++++ > fs/xfs/xfs_log_recover.c | 15 +++++++++++- > fs/xfs/xfs_mount.c | 49 ++++++++++++++++++++++++++++++++++++++ > fs/xfs/xfs_mount.h | 3 +++ > 16 files changed, 135 insertions(+), 8 deletions(-) > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > index ffad7f2..cb26016 100644 > --- a/fs/xfs/libxfs/xfs_alloc.c > +++ b/fs/xfs/libxfs/xfs_alloc.c > @@ -482,6 +482,8 @@ xfs_agfl_verify( > be32_to_cpu(agfl->agfl_bno[i]) >= mp->m_sb.sb_agblocks) > return false; > } > + > + xfs_detect_invalid_lsn(mp, be64_to_cpu(XFS_BUF_TO_AGFL(bp)->agfl_lsn)); > return true; xfs_log_check_lsn(mp, <lsn>); > +/* > + * The LSN is valid so long as it is behind the current LSN. If it isn't, this > + * means that the next log record that includes this metadata could have a > + * smaller LSN. In turn, this means that the modification in the log would not > + * replay. > + */ > +static inline bool > +xlog_valid_lsn( > + struct xlog *log, > + xfs_lsn_t lsn) > +{ > + int cycle = CYCLE_LSN(lsn); > + int block = BLOCK_LSN(lsn); > + bool valid = true; > + > + spin_lock(&log->l_icloglock); > + if ((cycle > log->l_curr_cycle) || > + (cycle == log->l_curr_cycle && block > log->l_curr_block)) > + valid = false; > + spin_unlock(&log->l_icloglock); > + > + return valid; > +} We do not want to take the l_icloglock in metadata IO context. It's a global fs lock, and it's a hot lock, too, so we do not want to be taking this during every metadata IO completion. I think, at minimum, we need to sample the current values and do an unlocked check, then if the result is suspect we need to do an accurate locked check. This means that we will almost never take the l_icloglock here. e.g: { int cycle = ACCESS_ONCE(log->l_curr_cycle); int block = ACCESS_ONCE(log->l_curr_block); if (CYCLE_LSN(lsn) > cycle || (CYCLE_LSN(lsn) == cycle && BLOCK_LSN(lsn) > block)) { /* suspect, take icloglock to check again */ bool valid = true; .... return valid; } return true; } > --- a/fs/xfs/xfs_mount.c > +++ b/fs/xfs/xfs_mount.c > @@ -41,6 +41,7 @@ > #include "xfs_trace.h" > #include "xfs_icache.h" > #include "xfs_sysfs.h" > +#include "xfs_log_priv.h" > > > static DEFINE_MUTEX(xfs_uuid_table_mutex); > @@ -1301,3 +1302,51 @@ xfs_dev_is_read_only( > } > return 0; > } > + > +/* > + * Verify that an LSN stamped into a piece of metadata is valid. This is > + * intended for use in read verifiers on v5 superblocks. > + */ > +void > +xfs_detect_invalid_lsn( > + struct xfs_mount *mp, > + xfs_lsn_t lsn) > +{ This shoul dbe called xfs_log_check_lsn() and be located in xfs_log.c. > + struct xlog *log = mp->m_log; > + int cycle = CYCLE_LSN(lsn); > + int block = BLOCK_LSN(lsn); > + const char *fmt; > + > + /* > + * 'norecovery' mode skips mount-time log processing and unconditionally > + * resets the LSN. > + */ > + if (mp->m_flags & XFS_MOUNT_NORECOVERY) > + return; > + > + if (xlog_valid_lsn(mp->m_log, lsn)) > + return; > + > + /* > + * Warn at least once for each filesystem susceptible to this problem > + * and once per day thereafter. > + * > + * XXX: Specify the minimum xfs_repair version required to resolve? > + * Older versions will silently reintroduce the problem. > + * > + * We should eventually convert this condition into a hard error (via > + * verifier failure). Defer that behavior for a few release cycles or so > + * until the userspace fixes have had the opportunity to propogate > + * throughout the various distributions. > + */ > + fmt = "WARNING: Metadata has LSN (%d:%d) ahead of current LSN (%d:%d). " > +"This may result in filesystem failure. Please unmount and run xfs_repair to resolve."; > + if (!mp->m_warned_bad_lsn) { > + mp->m_warned_bad_lsn = true; > + xfs_warn(mp, fmt, cycle, block, log->l_curr_cycle, > + log->l_curr_block); > + } else { > + xfs_warn_daily(mp, fmt, cycle, block, log->l_curr_cycle, > + log->l_curr_block); > + } That's really ugly. Rate limited prints should always occur at least once for each ratelimit key, and now i've seen this I can say the first patch needs rework. that is, each xfs_warn_daily() caller needs it's own ratelimit key, rather than sharing the global ratelimit key that all other XFS messages share. e.g.: xfs_warn_daily(mp, mp->m_log->l_lsn_rlstate, "Corruption warning: Metadata has LSN (%d:%d) ahead of current LSN (%d:%d). " "Please unmount and run xfs_repair (>= v4.3) to resolve.". cycle, block, log->l_curr_cycle, log->l_curr_block); But really, thinking on this further (the "corruption warning" bit) I'm starting to think we should just shut the filesystem down when we hit this and force the user to fix it immediately. if we don't, then who knows whether we are making things worse or not... > --- a/fs/xfs/xfs_mount.h > +++ b/fs/xfs/xfs_mount.h > @@ -127,6 +127,7 @@ typedef struct xfs_mount { > int64_t m_low_space[XFS_LOWSP_MAX]; > /* low free space thresholds */ > struct xfs_kobj m_kobj; > + bool m_warned_bad_lsn;/* bad metadata lsn detected */ That's where I'd put the ratelimit key (or in the struct xlog). Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC v2 2/2] xfs: validate metadata LSNs against log on v5 superblocks 2015-09-22 8:07 ` Dave Chinner @ 2015-09-22 14:21 ` Brian Foster 0 siblings, 0 replies; 5+ messages in thread From: Brian Foster @ 2015-09-22 14:21 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On Tue, Sep 22, 2015 at 06:07:25PM +1000, Dave Chinner wrote: > On Fri, Sep 11, 2015 at 02:53:55PM -0400, Brian Foster wrote: > > Since the onset of v5 superblocks, the LSN of the last modification has > > been included in a variety of on-disk data structures. This LSN is used > > to provide log recovery ordering guarantees (e.g., to ensure an older > > log recovery item is not replayed over a newer target data structure). > > > > While this works correctly from the point a filesystem is formatted and > > mounted, userspace tools have some problematic behaviors that defeat > > this mechanism. For example, xfs_repair historically zeroes out the log > > unconditionally (regardless of whether corruption is detected). If this > > occurs, the LSN of the filesystem is reset and the log is now in a > > problematic state with respect to on-disk metadata structures that might > > have a larger LSN. Until either the log catches up to the highest > > previously used metadata LSN or each affected data structure is modified > > and written out without incident (which resets the metadata LSN), log > > recovery is susceptible to filesystem corruption. > > > > This problem is ultimately addressed and repaired in the associated > > userspace tools. The kernel is still responsible to detect the problem > > and notify the user that something is wrong. Check the superblock LSN at > > mount time and notify the user and fail the mount if it is invalid. > > From that point on, simply warn the user if an invalid metadata LSN is > > detected from the various metadata I/O verifiers. A new xfs_mount flag > > is added to guarantee that a warning fires at least once for each > > affected filesystem mount. From that point forward, further instances of > > the problem are rate limited to a once per 24 hour period. > > > > Signed-off-by: Brian Foster <bfoster@redhat.com> > > --- > > fs/xfs/libxfs/xfs_alloc.c | 9 +++++-- > > fs/xfs/libxfs/xfs_attr_leaf.c | 2 ++ > > fs/xfs/libxfs/xfs_btree.c | 14 +++++++++-- > > fs/xfs/libxfs/xfs_da_btree.c | 3 +++ > > fs/xfs/libxfs/xfs_dir2_block.c | 1 + > > fs/xfs/libxfs/xfs_dir2_data.c | 1 + > > fs/xfs/libxfs/xfs_dir2_leaf.c | 1 + > > fs/xfs/libxfs/xfs_dir2_node.c | 1 + > > fs/xfs/libxfs/xfs_ialloc.c | 8 +++++-- > > fs/xfs/libxfs/xfs_sb.c | 8 +++++++ > > fs/xfs/libxfs/xfs_symlink_remote.c | 3 +++ > > fs/xfs/xfs_log.c | 1 - > > fs/xfs/xfs_log_priv.h | 24 +++++++++++++++++++ > > fs/xfs/xfs_log_recover.c | 15 +++++++++++- > > fs/xfs/xfs_mount.c | 49 ++++++++++++++++++++++++++++++++++++++ > > fs/xfs/xfs_mount.h | 3 +++ > > 16 files changed, 135 insertions(+), 8 deletions(-) > > > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > > index ffad7f2..cb26016 100644 > > --- a/fs/xfs/libxfs/xfs_alloc.c > > +++ b/fs/xfs/libxfs/xfs_alloc.c > > @@ -482,6 +482,8 @@ xfs_agfl_verify( > > be32_to_cpu(agfl->agfl_bno[i]) >= mp->m_sb.sb_agblocks) > > return false; > > } > > + > > + xfs_detect_invalid_lsn(mp, be64_to_cpu(XFS_BUF_TO_AGFL(bp)->agfl_lsn)); > > return true; > > xfs_log_check_lsn(mp, <lsn>); > Ok. > > +/* > > + * The LSN is valid so long as it is behind the current LSN. If it isn't, this > > + * means that the next log record that includes this metadata could have a > > + * smaller LSN. In turn, this means that the modification in the log would not > > + * replay. > > + */ > > +static inline bool > > +xlog_valid_lsn( > > + struct xlog *log, > > + xfs_lsn_t lsn) > > +{ > > + int cycle = CYCLE_LSN(lsn); > > + int block = BLOCK_LSN(lsn); > > + bool valid = true; > > + > > + spin_lock(&log->l_icloglock); > > + if ((cycle > log->l_curr_cycle) || > > + (cycle == log->l_curr_cycle && block > log->l_curr_block)) > > + valid = false; > > + spin_unlock(&log->l_icloglock); > > + > > + return valid; > > +} > > We do not want to take the l_icloglock in metadata IO > context. It's a global fs lock, and it's a hot lock, too, so we > do not want to be taking this during every metadata IO completion. > > I think, at minimum, we need to sample the current values and do an > unlocked check, then if the result is suspect we need to do an > accurate locked check. This means that we will almost never take the > l_icloglock here. e.g: > > { > int cycle = ACCESS_ONCE(log->l_curr_cycle); > int block = ACCESS_ONCE(log->l_curr_block); > > if (CYCLE_LSN(lsn) > cycle || > (CYCLE_LSN(lsn) == cycle && BLOCK_LSN(lsn) > block)) { > /* suspect, take icloglock to check again */ > bool valid = true; > > .... > return valid; > } > return true; > } > Good point. In this case, I may just lift the lock out of the helper and allow this to be the responsibility of the caller, but I'll see how it looks. > > > --- a/fs/xfs/xfs_mount.c > > +++ b/fs/xfs/xfs_mount.c > > @@ -41,6 +41,7 @@ > > #include "xfs_trace.h" > > #include "xfs_icache.h" > > #include "xfs_sysfs.h" > > +#include "xfs_log_priv.h" > > > > > > static DEFINE_MUTEX(xfs_uuid_table_mutex); > > @@ -1301,3 +1302,51 @@ xfs_dev_is_read_only( > > } > > return 0; > > } > > + > > +/* > > + * Verify that an LSN stamped into a piece of metadata is valid. This is > > + * intended for use in read verifiers on v5 superblocks. > > + */ > > +void > > +xfs_detect_invalid_lsn( > > + struct xfs_mount *mp, > > + xfs_lsn_t lsn) > > +{ > > This shoul dbe called xfs_log_check_lsn() and be located in > xfs_log.c. > Ok. > > + struct xlog *log = mp->m_log; > > + int cycle = CYCLE_LSN(lsn); > > + int block = BLOCK_LSN(lsn); > > + const char *fmt; > > + > > + /* > > + * 'norecovery' mode skips mount-time log processing and unconditionally > > + * resets the LSN. > > + */ > > + if (mp->m_flags & XFS_MOUNT_NORECOVERY) > > + return; > > + > > + if (xlog_valid_lsn(mp->m_log, lsn)) > > + return; > > + > > + /* > > + * Warn at least once for each filesystem susceptible to this problem > > + * and once per day thereafter. > > + * > > + * XXX: Specify the minimum xfs_repair version required to resolve? > > + * Older versions will silently reintroduce the problem. > > + * > > + * We should eventually convert this condition into a hard error (via > > + * verifier failure). Defer that behavior for a few release cycles or so > > + * until the userspace fixes have had the opportunity to propogate > > + * throughout the various distributions. > > + */ > > + fmt = "WARNING: Metadata has LSN (%d:%d) ahead of current LSN (%d:%d). " > > +"This may result in filesystem failure. Please unmount and run xfs_repair to resolve."; > > + if (!mp->m_warned_bad_lsn) { > > + mp->m_warned_bad_lsn = true; > > + xfs_warn(mp, fmt, cycle, block, log->l_curr_cycle, > > + log->l_curr_block); > > + } else { > > + xfs_warn_daily(mp, fmt, cycle, block, log->l_curr_cycle, > > + log->l_curr_block); > > + } > > That's really ugly. Rate limited prints should always occur at least > once for each ratelimit key, and now i've seen this I can say the > first patch needs rework. > This is definitely more ugly than the shutdown approach. As mentioned, I've been waffling back and forth on how to handle this condition. I had already sent the shutdown version, so I figured I would put together what I thought would be necessary to handle this with a warning for comparison. > that is, each xfs_warn_daily() caller needs it's own ratelimit key, > rather than sharing the global ratelimit key that all other XFS > messages share. > What do you mean by a key? What is the purpose? On further thought... I think I see what you mean. Each mount should report in the ratelimited timeframe independently. Or in other words, the rate limit of one mount should not affect message delivery on another. > e.g.: > xfs_warn_daily(mp, mp->m_log->l_lsn_rlstate, > "Corruption warning: Metadata has LSN (%d:%d) ahead of current LSN (%d:%d). " > "Please unmount and run xfs_repair (>= v4.3) to resolve.". > cycle, block, log->l_curr_cycle, log->l_curr_block); > > But really, thinking on this further (the "corruption warning" bit) > I'm starting to think we should just shut the filesystem down when > we hit this and force the user to fix it immediately. if we don't, > then who knows whether we are making things worse or not... > I was originally thinking the shutdown might not be ideal because we induce the very situation we're warning about (forcing a log recovery). Since this verifies on metadata reads, however, we should in theory avoid the problem before it could ever be introduced. My follow on question to that was whether a shutdown is appropriate for something that is somewhat minor and self-corrective..? Given the ugliness of the multiple warnings, new mount flag, additional complication of the ratelimit mechanism, etc., I might just go back to the shutdown and call it a day. ;) I think the subtle point raised above about calling this a corruption is a sufficient argument. While this might be minor and self-correcting, it is still technically a corruption of sorts and should be identified and repaired appropriately. Thanks for the feedback. Brian > > --- a/fs/xfs/xfs_mount.h > > +++ b/fs/xfs/xfs_mount.h > > @@ -127,6 +127,7 @@ typedef struct xfs_mount { > > int64_t m_low_space[XFS_LOWSP_MAX]; > > /* low free space thresholds */ > > struct xfs_kobj m_kobj; > > + bool m_warned_bad_lsn;/* bad metadata lsn detected */ > > That's where I'd put the ratelimit key (or in the struct xlog). > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-09-22 14:21 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-09-11 18:53 [RFC v2 0/2] xfs: detect and warn about invalid metadata lsns Brian Foster 2015-09-11 18:53 ` [RFC v2 1/2] xfs: create a daily warning mechanism Brian Foster 2015-09-11 18:53 ` [RFC v2 2/2] xfs: validate metadata LSNs against log on v5 superblocks Brian Foster 2015-09-22 8:07 ` Dave Chinner 2015-09-22 14:21 ` Brian Foster
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox