* [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount
@ 2021-04-20 11:08 Gao Xiang
2021-04-20 11:08 ` [PATCH v2 2/2] xfs: turn on lazysbcount unconditionally Gao Xiang
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Gao Xiang @ 2021-04-20 11:08 UTC (permalink / raw)
To: linux-xfs
Cc: Darrick J. Wong, Dave Chinner, Gao Xiang, Zorro Lang,
Carlos Maiolino
There are many paths which could trigger xfs_log_sb(), e.g.
xfs_bmap_add_attrfork()
-> xfs_log_sb()
, which overrides on-disk fdblocks by in-core per-CPU fdblocks.
However, for !lazysbcount cases, on-disk fdblocks is actually updated
by xfs_trans_apply_sb_deltas(), and generally it isn't equal to
in-core per-CPU fdblocks due to xfs_reserve_blocks() or whatever,
see the comment in xfs_unmountfs().
It could be observed by the following steps reported by Zorro:
1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev
2. mount $dev $mnt
3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load)
4. umount $mnt
5. xfs_repair -n $dev
yet due to commit f46e5a174655 ("xfs: fold sbcount quiesce logging
into log covering"), xfs_sync_sb() will also be triggered if log
covering is needed and !lazysbcount when xfs_unmountfs(), so hard
to reproduce on kernel 5.12+ for clean unmount.
on-disk sb_icount and sb_ifree are also updated in
xfs_trans_apply_sb_deltas() for !lazysbcount cases, however, which
are always equal to per-CPU counters, so only fdblocks matters.
After this patch, I've seen no strange so far on older kernels
for the testcase above without lazysbcount.
Reported-by: Zorro Lang <zlang@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
---
changes since v1:
- update commit message.
fs/xfs/libxfs/xfs_sb.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 60e6d255e5e2..423dada3f64c 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -928,7 +928,13 @@ xfs_log_sb(
mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount);
mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree);
- mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
+ if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) {
+ struct xfs_dsb *dsb = bp->b_addr;
+
+ mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks);
+ } else {
+ mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks);
+ }
xfs_sb_to_disk(bp->b_addr, &mp->m_sb);
xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF);
--
2.27.0
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH v2 2/2] xfs: turn on lazysbcount unconditionally 2021-04-20 11:08 [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount Gao Xiang @ 2021-04-20 11:08 ` Gao Xiang 2021-04-20 16:22 ` Darrick J. Wong 2021-04-20 17:42 ` [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount Darrick J. Wong 2021-04-20 21:25 ` Dave Chinner 2 siblings, 1 reply; 17+ messages in thread From: Gao Xiang @ 2021-04-20 11:08 UTC (permalink / raw) To: linux-xfs; +Cc: Darrick J. Wong, Dave Chinner, Gao Xiang As Dave mentioned [1], "/me is now wondering why we even bother with !lazy-count anymore. We've updated the agr btree block accounting unconditionally since lazy-count was added, and scrub will always report a mismatch in counts if they exist regardless of lazy-count. So why don't we just start ignoring the on-disk value and always use lazy-count based updates? " Therefore, turn on lazy sb counters if it's still disabled at the mount time, or at remount_rw if fs was mounted as read-only. xfs_initialize_perag_data() is reused here since no need to scan agf/agi once more again. After this patch, we could get rid of this whole set of subtle conditional behaviours in the codebase. [1] https://lore.kernel.org/r/20210417223201.GU63242@dread.disaster.area Signed-off-by: Gao Xiang <hsiangkao@redhat.com> --- Enabling lazysbcount is only addressed in this patch, I'll send out a seperated following patch to cleanup all unused conditions later. Also tr_sb is reused here since only agf is modified for each ag, and before lazysbcount sb feature is enabled (m_update_sb = true), agf_btreeblks field shouldn't matter for such AGs. fs/xfs/libxfs/xfs_format.h | 6 +++ fs/xfs/libxfs/xfs_sb.c | 93 +++++++++++++++++++++++++++++++++++--- fs/xfs/xfs_mount.c | 2 +- fs/xfs/xfs_super.c | 5 ++ 4 files changed, 98 insertions(+), 8 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 76e2461b9e66..9081d7876d66 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -385,6 +385,12 @@ static inline bool xfs_sb_version_haslazysbcount(struct xfs_sb *sbp) (sbp->sb_features2 & XFS_SB_VERSION2_LAZYSBCOUNTBIT)); } +static inline void xfs_sb_version_addlazysbcount(struct xfs_sb *sbp) +{ + sbp->sb_versionnum |= XFS_SB_VERSION_MOREBITSBIT; + sbp->sb_features2 |= XFS_SB_VERSION2_LAZYSBCOUNTBIT; +} + static inline bool xfs_sb_version_hasattr2(struct xfs_sb *sbp) { return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) || diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index 423dada3f64c..6353e0d4cab1 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -18,6 +18,7 @@ #include "xfs_trace.h" #include "xfs_trans.h" #include "xfs_buf_item.h" +#include "xfs_btree.h" #include "xfs_bmap_btree.h" #include "xfs_alloc_btree.h" #include "xfs_log.h" @@ -841,6 +842,55 @@ xfs_sb_mount_common( mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp); } +static int +xfs_fixup_agf_btreeblks( + struct xfs_mount *mp, + struct xfs_trans *tp, + struct xfs_buf *agfbp, + xfs_agnumber_t agno) +{ + struct xfs_btree_cur *cur; + struct xfs_perag *pag = agfbp->b_pag; + struct xfs_agf *agf = agfbp->b_addr; + xfs_agblock_t btreeblks, blocks; + int error; + + cur = xfs_allocbt_init_cursor(mp, tp, agfbp, agno, XFS_BTNUM_BNO); + error = xfs_btree_count_blocks(cur, &blocks); + if (error) + goto err; + xfs_btree_del_cursor(cur, error); + btreeblks = blocks - 1; + + cur = xfs_allocbt_init_cursor(mp, tp, agfbp, agno, XFS_BTNUM_CNT); + error = xfs_btree_count_blocks(cur, &blocks); + if (error) + goto err; + xfs_btree_del_cursor(cur, error); + btreeblks += blocks - 1; + + /* + * although rmapbt doesn't exist in v4 fses, but it'd be better + * to turn it as a generic helper. + */ + if (xfs_sb_version_hasrmapbt(&mp->m_sb)) { + cur = xfs_rmapbt_init_cursor(mp, tp, agfbp, agno); + error = xfs_btree_count_blocks(cur, &blocks); + if (error) + goto err; + xfs_btree_del_cursor(cur, error); + btreeblks += blocks - 1; + } + + agf->agf_btreeblks = cpu_to_be32(btreeblks); + pag->pagf_btreeblks = btreeblks; + xfs_alloc_log_agf(tp, agfbp, XFS_AGF_BTREEBLKS); + return 0; +err: + xfs_btree_del_cursor(cur, error); + return error; +} + /* * xfs_initialize_perag_data * @@ -864,27 +914,51 @@ xfs_initialize_perag_data( uint64_t btree = 0; uint64_t fdblocks; int error = 0; + bool conv = !(mp->m_flags & XFS_MOUNT_RDONLY) && + !xfs_sb_version_haslazysbcount(sbp); + + if (conv) + xfs_warn(mp, "enabling lazy-counters..."); for (index = 0; index < agcount; index++) { + struct xfs_trans *tp = NULL; + struct xfs_buf *agfbp; + + if (conv) { + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_sb, + 0, 0, 0, &tp); + if (error) + return error; + } + /* - * read the agf, then the agi. This gets us + * read the agi, then the agf. This gets us * all the information we need and populates the * per-ag structures for us. */ - error = xfs_alloc_pagf_init(mp, NULL, index, 0); - if (error) + error = xfs_ialloc_pagi_init(mp, tp, index); + if (error) { +err_out: + if (tp) + xfs_trans_cancel(tp); return error; + } - error = xfs_ialloc_pagi_init(mp, NULL, index); + error = xfs_alloc_read_agf(mp, tp, index, 0, &agfbp); if (error) - return error; - pag = xfs_perag_get(mp, index); + goto err_out; + pag = agfbp->b_pag; ifree += pag->pagi_freecount; ialloc += pag->pagi_count; bfree += pag->pagf_freeblks; bfreelst += pag->pagf_flcount; + if (tp) { + error = xfs_fixup_agf_btreeblks(mp, tp, agfbp, index); + xfs_trans_commit(tp); + } else { + xfs_buf_relse(agfbp); + } btree += pag->pagf_btreeblks; - xfs_perag_put(pag); } fdblocks = bfree + bfreelst + btree; @@ -900,6 +974,11 @@ xfs_initialize_perag_data( goto out; } + if (conv) { + xfs_sb_version_addlazysbcount(sbp); + mp->m_update_sb = true; + xfs_warn(mp, "lazy-counters has been enabled."); + } /* Overwrite incore superblock counters with just-read data */ spin_lock(&mp->m_sb_lock); sbp->sb_ifree = ifree; diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index cb1e2c4702c3..b3b13acd45d6 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -626,7 +626,7 @@ xfs_check_summary_counts( * superblock to be correct and we don't need to do anything here. * Otherwise, recalculate the summary counters. */ - if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) || + if ((xfs_sb_version_haslazysbcount(&mp->m_sb) && XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) && !xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS)) return 0; diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index a2dab05332ac..16197a890c15 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1678,6 +1678,11 @@ xfs_remount_rw( } mp->m_flags &= ~XFS_MOUNT_RDONLY; + if (!xfs_sb_version_haslazysbcount(sbp)) { + error = xfs_initialize_perag_data(mp, sbp->sb_agcount); + if (error) + return error; + } /* * If this is the first remount to writeable state we might have some -- 2.27.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] xfs: turn on lazysbcount unconditionally 2021-04-20 11:08 ` [PATCH v2 2/2] xfs: turn on lazysbcount unconditionally Gao Xiang @ 2021-04-20 16:22 ` Darrick J. Wong 2021-04-20 20:00 ` Gao Xiang 0 siblings, 1 reply; 17+ messages in thread From: Darrick J. Wong @ 2021-04-20 16:22 UTC (permalink / raw) To: Gao Xiang; +Cc: linux-xfs, Dave Chinner On Tue, Apr 20, 2021 at 07:08:55PM +0800, Gao Xiang wrote: > As Dave mentioned [1], "/me is now wondering why we even bother > with !lazy-count anymore. > > We've updated the agr btree block accounting unconditionally since > lazy-count was added, and scrub will always report a mismatch in > counts if they exist regardless of lazy-count. So why don't we just > start ignoring the on-disk value and always use lazy-count based > updates? " > > Therefore, turn on lazy sb counters if it's still disabled at the > mount time, or at remount_rw if fs was mounted as read-only. > xfs_initialize_perag_data() is reused here since no need to scan > agf/agi once more again. > > After this patch, we could get rid of this whole set of subtle > conditional behaviours in the codebase. > > [1] https://lore.kernel.org/r/20210417223201.GU63242@dread.disaster.area > Signed-off-by: Gao Xiang <hsiangkao@redhat.com> > --- > Enabling lazysbcount is only addressed in this patch, I'll send > out a seperated following patch to cleanup all unused conditions > later. > > Also tr_sb is reused here since only agf is modified for each ag, > and before lazysbcount sb feature is enabled (m_update_sb = true), > agf_btreeblks field shouldn't matter for such AGs. > > fs/xfs/libxfs/xfs_format.h | 6 +++ > fs/xfs/libxfs/xfs_sb.c | 93 +++++++++++++++++++++++++++++++++++--- > fs/xfs/xfs_mount.c | 2 +- > fs/xfs/xfs_super.c | 5 ++ > 4 files changed, 98 insertions(+), 8 deletions(-) > > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h > index 76e2461b9e66..9081d7876d66 100644 > --- a/fs/xfs/libxfs/xfs_format.h > +++ b/fs/xfs/libxfs/xfs_format.h > @@ -385,6 +385,12 @@ static inline bool xfs_sb_version_haslazysbcount(struct xfs_sb *sbp) > (sbp->sb_features2 & XFS_SB_VERSION2_LAZYSBCOUNTBIT)); > } > > +static inline void xfs_sb_version_addlazysbcount(struct xfs_sb *sbp) > +{ > + sbp->sb_versionnum |= XFS_SB_VERSION_MOREBITSBIT; > + sbp->sb_features2 |= XFS_SB_VERSION2_LAZYSBCOUNTBIT; > +} > + > static inline bool xfs_sb_version_hasattr2(struct xfs_sb *sbp) > { > return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) || > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c > index 423dada3f64c..6353e0d4cab1 100644 > --- a/fs/xfs/libxfs/xfs_sb.c > +++ b/fs/xfs/libxfs/xfs_sb.c > @@ -18,6 +18,7 @@ > #include "xfs_trace.h" > #include "xfs_trans.h" > #include "xfs_buf_item.h" > +#include "xfs_btree.h" > #include "xfs_bmap_btree.h" > #include "xfs_alloc_btree.h" > #include "xfs_log.h" > @@ -841,6 +842,55 @@ xfs_sb_mount_common( > mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp); > } > > +static int > +xfs_fixup_agf_btreeblks( > + struct xfs_mount *mp, > + struct xfs_trans *tp, > + struct xfs_buf *agfbp, > + xfs_agnumber_t agno) > +{ > + struct xfs_btree_cur *cur; > + struct xfs_perag *pag = agfbp->b_pag; > + struct xfs_agf *agf = agfbp->b_addr; > + xfs_agblock_t btreeblks, blocks; > + int error; > + > + cur = xfs_allocbt_init_cursor(mp, tp, agfbp, agno, XFS_BTNUM_BNO); > + error = xfs_btree_count_blocks(cur, &blocks); > + if (error) > + goto err; > + xfs_btree_del_cursor(cur, error); > + btreeblks = blocks - 1; > + > + cur = xfs_allocbt_init_cursor(mp, tp, agfbp, agno, XFS_BTNUM_CNT); > + error = xfs_btree_count_blocks(cur, &blocks); > + if (error) > + goto err; > + xfs_btree_del_cursor(cur, error); > + btreeblks += blocks - 1; > + > + /* > + * although rmapbt doesn't exist in v4 fses, but it'd be better > + * to turn it as a generic helper. > + */ > + if (xfs_sb_version_hasrmapbt(&mp->m_sb)) { > + cur = xfs_rmapbt_init_cursor(mp, tp, agfbp, agno); > + error = xfs_btree_count_blocks(cur, &blocks); > + if (error) > + goto err; > + xfs_btree_del_cursor(cur, error); > + btreeblks += blocks - 1; > + } > + > + agf->agf_btreeblks = cpu_to_be32(btreeblks); > + pag->pagf_btreeblks = btreeblks; > + xfs_alloc_log_agf(tp, agfbp, XFS_AGF_BTREEBLKS); > + return 0; > +err: > + xfs_btree_del_cursor(cur, error); > + return error; > +} > + > /* > * xfs_initialize_perag_data > * > @@ -864,27 +914,51 @@ xfs_initialize_perag_data( > uint64_t btree = 0; > uint64_t fdblocks; > int error = 0; > + bool conv = !(mp->m_flags & XFS_MOUNT_RDONLY) && > + !xfs_sb_version_haslazysbcount(sbp); > + > + if (conv) > + xfs_warn(mp, "enabling lazy-counters..."); > > for (index = 0; index < agcount; index++) { > + struct xfs_trans *tp = NULL; > + struct xfs_buf *agfbp; > + > + if (conv) { > + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_sb, > + 0, 0, 0, &tp); > + if (error) > + return error; > + } > + > /* > - * read the agf, then the agi. This gets us > + * read the agi, then the agf. This gets us > * all the information we need and populates the > * per-ag structures for us. > */ > - error = xfs_alloc_pagf_init(mp, NULL, index, 0); > - if (error) > + error = xfs_ialloc_pagi_init(mp, tp, index); > + if (error) { > +err_out: > + if (tp) > + xfs_trans_cancel(tp); > return error; > + } > > - error = xfs_ialloc_pagi_init(mp, NULL, index); > + error = xfs_alloc_read_agf(mp, tp, index, 0, &agfbp); > if (error) > - return error; > - pag = xfs_perag_get(mp, index); > + goto err_out; > + pag = agfbp->b_pag; > ifree += pag->pagi_freecount; > ialloc += pag->pagi_count; > bfree += pag->pagf_freeblks; > bfreelst += pag->pagf_flcount; > + if (tp) { > + error = xfs_fixup_agf_btreeblks(mp, tp, agfbp, index); Lazysbcount upgrades should be done from a separate function, not mixed in with perag initialization. Also, why is it necessary to walk all the space btrees to set agf_btreeblks? > + xfs_trans_commit(tp); > + } else { > + xfs_buf_relse(agfbp); > + } > btree += pag->pagf_btreeblks; > - xfs_perag_put(pag); > } > fdblocks = bfree + bfreelst + btree; > > @@ -900,6 +974,11 @@ xfs_initialize_perag_data( > goto out; > } > > + if (conv) { > + xfs_sb_version_addlazysbcount(sbp); > + mp->m_update_sb = true; > + xfs_warn(mp, "lazy-counters has been enabled."); But we don't log the sb update? As far as the feature upgrade goes, is it necessary to bwrite the primary super to disk (and then log the change)[1] to prevent a truly ancient kernel that doesn't support lazysbcount from trying to recover the log and ending up with an unsupported feature set? [1] https://lore.kernel.org/linux-xfs/161723934343.3149451.16679733325094950568.stgit@magnolia/ > + } > /* Overwrite incore superblock counters with just-read data */ > spin_lock(&mp->m_sb_lock); > sbp->sb_ifree = ifree; > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c > index cb1e2c4702c3..b3b13acd45d6 100644 > --- a/fs/xfs/xfs_mount.c > +++ b/fs/xfs/xfs_mount.c > @@ -626,7 +626,7 @@ xfs_check_summary_counts( > * superblock to be correct and we don't need to do anything here. > * Otherwise, recalculate the summary counters. > */ > - if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) || > + if ((xfs_sb_version_haslazysbcount(&mp->m_sb) && Not clear why the logic here inverts? --D > XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) && > !xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS)) > return 0; > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > index a2dab05332ac..16197a890c15 100644 > --- a/fs/xfs/xfs_super.c > +++ b/fs/xfs/xfs_super.c > @@ -1678,6 +1678,11 @@ xfs_remount_rw( > } > > mp->m_flags &= ~XFS_MOUNT_RDONLY; > + if (!xfs_sb_version_haslazysbcount(sbp)) { > + error = xfs_initialize_perag_data(mp, sbp->sb_agcount); > + if (error) > + return error; > + } > > /* > * If this is the first remount to writeable state we might have some > -- > 2.27.0 > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] xfs: turn on lazysbcount unconditionally 2021-04-20 16:22 ` Darrick J. Wong @ 2021-04-20 20:00 ` Gao Xiang 2021-04-22 0:01 ` Darrick J. Wong 0 siblings, 1 reply; 17+ messages in thread From: Gao Xiang @ 2021-04-20 20:00 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs, Dave Chinner Hi Darrick, On Tue, Apr 20, 2021 at 09:22:50AM -0700, Darrick J. Wong wrote: > On Tue, Apr 20, 2021 at 07:08:55PM +0800, Gao Xiang wrote: > > As Dave mentioned [1], "/me is now wondering why we even bother > > with !lazy-count anymore. > > > > We've updated the agr btree block accounting unconditionally since > > lazy-count was added, and scrub will always report a mismatch in > > counts if they exist regardless of lazy-count. So why don't we just > > start ignoring the on-disk value and always use lazy-count based > > updates? " > > > > Therefore, turn on lazy sb counters if it's still disabled at the > > mount time, or at remount_rw if fs was mounted as read-only. > > xfs_initialize_perag_data() is reused here since no need to scan > > agf/agi once more again. > > > > After this patch, we could get rid of this whole set of subtle > > conditional behaviours in the codebase. > > > > [1] https://lore.kernel.org/r/20210417223201.GU63242@dread.disaster.area > > Signed-off-by: Gao Xiang <hsiangkao@redhat.com> > > --- > > Enabling lazysbcount is only addressed in this patch, I'll send > > out a seperated following patch to cleanup all unused conditions > > later. > > > > Also tr_sb is reused here since only agf is modified for each ag, > > and before lazysbcount sb feature is enabled (m_update_sb = true), > > agf_btreeblks field shouldn't matter for such AGs. > > > > fs/xfs/libxfs/xfs_format.h | 6 +++ > > fs/xfs/libxfs/xfs_sb.c | 93 +++++++++++++++++++++++++++++++++++--- > > fs/xfs/xfs_mount.c | 2 +- > > fs/xfs/xfs_super.c | 5 ++ > > 4 files changed, 98 insertions(+), 8 deletions(-) > > > > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h > > index 76e2461b9e66..9081d7876d66 100644 > > --- a/fs/xfs/libxfs/xfs_format.h > > +++ b/fs/xfs/libxfs/xfs_format.h > > @@ -385,6 +385,12 @@ static inline bool xfs_sb_version_haslazysbcount(struct xfs_sb *sbp) > > (sbp->sb_features2 & XFS_SB_VERSION2_LAZYSBCOUNTBIT)); > > } > > > > +static inline void xfs_sb_version_addlazysbcount(struct xfs_sb *sbp) > > +{ > > + sbp->sb_versionnum |= XFS_SB_VERSION_MOREBITSBIT; > > + sbp->sb_features2 |= XFS_SB_VERSION2_LAZYSBCOUNTBIT; > > +} > > + > > static inline bool xfs_sb_version_hasattr2(struct xfs_sb *sbp) > > { > > return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) || > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c > > index 423dada3f64c..6353e0d4cab1 100644 > > --- a/fs/xfs/libxfs/xfs_sb.c > > +++ b/fs/xfs/libxfs/xfs_sb.c > > @@ -18,6 +18,7 @@ > > #include "xfs_trace.h" > > #include "xfs_trans.h" > > #include "xfs_buf_item.h" > > +#include "xfs_btree.h" > > #include "xfs_bmap_btree.h" > > #include "xfs_alloc_btree.h" > > #include "xfs_log.h" > > @@ -841,6 +842,55 @@ xfs_sb_mount_common( > > mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp); > > } > > > > +static int > > +xfs_fixup_agf_btreeblks( > > + struct xfs_mount *mp, > > + struct xfs_trans *tp, > > + struct xfs_buf *agfbp, > > + xfs_agnumber_t agno) > > +{ > > + struct xfs_btree_cur *cur; > > + struct xfs_perag *pag = agfbp->b_pag; > > + struct xfs_agf *agf = agfbp->b_addr; > > + xfs_agblock_t btreeblks, blocks; > > + int error; > > + > > + cur = xfs_allocbt_init_cursor(mp, tp, agfbp, agno, XFS_BTNUM_BNO); > > + error = xfs_btree_count_blocks(cur, &blocks); > > + if (error) > > + goto err; > > + xfs_btree_del_cursor(cur, error); > > + btreeblks = blocks - 1; > > + > > + cur = xfs_allocbt_init_cursor(mp, tp, agfbp, agno, XFS_BTNUM_CNT); > > + error = xfs_btree_count_blocks(cur, &blocks); > > + if (error) > > + goto err; > > + xfs_btree_del_cursor(cur, error); > > + btreeblks += blocks - 1; > > + > > + /* > > + * although rmapbt doesn't exist in v4 fses, but it'd be better > > + * to turn it as a generic helper. > > + */ > > + if (xfs_sb_version_hasrmapbt(&mp->m_sb)) { > > + cur = xfs_rmapbt_init_cursor(mp, tp, agfbp, agno); > > + error = xfs_btree_count_blocks(cur, &blocks); > > + if (error) > > + goto err; > > + xfs_btree_del_cursor(cur, error); > > + btreeblks += blocks - 1; > > + } > > + > > + agf->agf_btreeblks = cpu_to_be32(btreeblks); > > + pag->pagf_btreeblks = btreeblks; > > + xfs_alloc_log_agf(tp, agfbp, XFS_AGF_BTREEBLKS); > > + return 0; > > +err: > > + xfs_btree_del_cursor(cur, error); > > + return error; > > +} > > + > > /* > > * xfs_initialize_perag_data > > * > > @@ -864,27 +914,51 @@ xfs_initialize_perag_data( > > uint64_t btree = 0; > > uint64_t fdblocks; > > int error = 0; > > + bool conv = !(mp->m_flags & XFS_MOUNT_RDONLY) && > > + !xfs_sb_version_haslazysbcount(sbp); > > + > > + if (conv) > > + xfs_warn(mp, "enabling lazy-counters..."); > > > > for (index = 0; index < agcount; index++) { > > + struct xfs_trans *tp = NULL; > > + struct xfs_buf *agfbp; > > + > > + if (conv) { > > + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_sb, > > + 0, 0, 0, &tp); > > + if (error) > > + return error; > > + } > > + > > /* > > - * read the agf, then the agi. This gets us > > + * read the agi, then the agf. This gets us > > * all the information we need and populates the > > * per-ag structures for us. > > */ > > - error = xfs_alloc_pagf_init(mp, NULL, index, 0); > > - if (error) > > + error = xfs_ialloc_pagi_init(mp, tp, index); > > + if (error) { > > +err_out: > > + if (tp) > > + xfs_trans_cancel(tp); > > return error; > > + } > > > > - error = xfs_ialloc_pagi_init(mp, NULL, index); > > + error = xfs_alloc_read_agf(mp, tp, index, 0, &agfbp); > > if (error) > > - return error; > > - pag = xfs_perag_get(mp, index); > > + goto err_out; > > + pag = agfbp->b_pag; > > ifree += pag->pagi_freecount; > > ialloc += pag->pagi_count; > > bfree += pag->pagf_freeblks; > > bfreelst += pag->pagf_flcount; > > + if (tp) { > > + error = xfs_fixup_agf_btreeblks(mp, tp, agfbp, index); > > Lazysbcount upgrades should be done from a separate function, not mixed > in with perag initialization. I've seen some previous discussion about multiple AG total scan time cost. Yeah, if another extra scan really accepts here, I could update instead. > Also, why is it necessary to walk all the space btrees to set agf_btreeblks? If my understanding is correct, I think because without lazysbcount, even pagf_btreeblks is updated unconditionally now, but that counter is unreliable for quite ancient kernels which don't have such update logic. Kindly correct me if I'm wrong here. > > > + xfs_trans_commit(tp); > > + } else { > > + xfs_buf_relse(agfbp); > > + } > > btree += pag->pagf_btreeblks; > > - xfs_perag_put(pag); > > } > > fdblocks = bfree + bfreelst + btree; > > > > @@ -900,6 +974,11 @@ xfs_initialize_perag_data( > > goto out; > > } > > > > + if (conv) { > > + xfs_sb_version_addlazysbcount(sbp); > > + mp->m_update_sb = true; > > + xfs_warn(mp, "lazy-counters has been enabled."); > > But we don't log the sb update? > > As far as the feature upgrade goes, is it necessary to bwrite the > primary super to disk (and then log the change)[1] to prevent a truly > ancient kernel that doesn't support lazysbcount from trying to recover > the log and ending up with an unsupported feature set? Not quite sure if it does harm to ancient kernels with such unsupported feature. may I ask for more details? :) but yeah, if any issues here, I should follow 1) bwrite sb block first; 2) log sb > > [1] https://lore.kernel.org/linux-xfs/161723934343.3149451.16679733325094950568.stgit@magnolia/ > > > + } > > /* Overwrite incore superblock counters with just-read data */ > > spin_lock(&mp->m_sb_lock); > > sbp->sb_ifree = ifree; > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c > > index cb1e2c4702c3..b3b13acd45d6 100644 > > --- a/fs/xfs/xfs_mount.c > > +++ b/fs/xfs/xfs_mount.c > > @@ -626,7 +626,7 @@ xfs_check_summary_counts( > > * superblock to be correct and we don't need to do anything here. > > * Otherwise, recalculate the summary counters. > > */ > > - if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) || > > + if ((xfs_sb_version_haslazysbcount(&mp->m_sb) && > > Not clear why the logic here inverts? .. thus xfs_initialize_perag_data() below can be called then. Thanks, Gao Xiang > > --D > > > XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) && > > !xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS)) > > return 0; > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > > index a2dab05332ac..16197a890c15 100644 > > --- a/fs/xfs/xfs_super.c > > +++ b/fs/xfs/xfs_super.c > > @@ -1678,6 +1678,11 @@ xfs_remount_rw( > > } > > > > mp->m_flags &= ~XFS_MOUNT_RDONLY; > > + if (!xfs_sb_version_haslazysbcount(sbp)) { > > + error = xfs_initialize_perag_data(mp, sbp->sb_agcount); > > + if (error) > > + return error; > > + } > > > > /* > > * If this is the first remount to writeable state we might have some > > -- > > 2.27.0 > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] xfs: turn on lazysbcount unconditionally 2021-04-20 20:00 ` Gao Xiang @ 2021-04-22 0:01 ` Darrick J. Wong 2021-04-22 1:51 ` Gao Xiang 0 siblings, 1 reply; 17+ messages in thread From: Darrick J. Wong @ 2021-04-22 0:01 UTC (permalink / raw) To: Gao Xiang; +Cc: linux-xfs, Dave Chinner On Wed, Apr 21, 2021 at 04:00:29AM +0800, Gao Xiang wrote: > Hi Darrick, > > On Tue, Apr 20, 2021 at 09:22:50AM -0700, Darrick J. Wong wrote: > > On Tue, Apr 20, 2021 at 07:08:55PM +0800, Gao Xiang wrote: > > > As Dave mentioned [1], "/me is now wondering why we even bother > > > with !lazy-count anymore. > > > > > > We've updated the agr btree block accounting unconditionally since > > > lazy-count was added, and scrub will always report a mismatch in > > > counts if they exist regardless of lazy-count. So why don't we just > > > start ignoring the on-disk value and always use lazy-count based > > > updates? " > > > > > > Therefore, turn on lazy sb counters if it's still disabled at the > > > mount time, or at remount_rw if fs was mounted as read-only. > > > xfs_initialize_perag_data() is reused here since no need to scan > > > agf/agi once more again. > > > > > > After this patch, we could get rid of this whole set of subtle > > > conditional behaviours in the codebase. > > > > > > [1] https://lore.kernel.org/r/20210417223201.GU63242@dread.disaster.area > > > Signed-off-by: Gao Xiang <hsiangkao@redhat.com> > > > --- > > > Enabling lazysbcount is only addressed in this patch, I'll send > > > out a seperated following patch to cleanup all unused conditions > > > later. > > > > > > Also tr_sb is reused here since only agf is modified for each ag, > > > and before lazysbcount sb feature is enabled (m_update_sb = true), > > > agf_btreeblks field shouldn't matter for such AGs. > > > > > > fs/xfs/libxfs/xfs_format.h | 6 +++ > > > fs/xfs/libxfs/xfs_sb.c | 93 +++++++++++++++++++++++++++++++++++--- > > > fs/xfs/xfs_mount.c | 2 +- > > > fs/xfs/xfs_super.c | 5 ++ > > > 4 files changed, 98 insertions(+), 8 deletions(-) > > > > > > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h > > > index 76e2461b9e66..9081d7876d66 100644 > > > --- a/fs/xfs/libxfs/xfs_format.h > > > +++ b/fs/xfs/libxfs/xfs_format.h > > > @@ -385,6 +385,12 @@ static inline bool xfs_sb_version_haslazysbcount(struct xfs_sb *sbp) > > > (sbp->sb_features2 & XFS_SB_VERSION2_LAZYSBCOUNTBIT)); > > > } > > > > > > +static inline void xfs_sb_version_addlazysbcount(struct xfs_sb *sbp) > > > +{ > > > + sbp->sb_versionnum |= XFS_SB_VERSION_MOREBITSBIT; > > > + sbp->sb_features2 |= XFS_SB_VERSION2_LAZYSBCOUNTBIT; > > > +} > > > + > > > static inline bool xfs_sb_version_hasattr2(struct xfs_sb *sbp) > > > { > > > return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) || > > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c > > > index 423dada3f64c..6353e0d4cab1 100644 > > > --- a/fs/xfs/libxfs/xfs_sb.c > > > +++ b/fs/xfs/libxfs/xfs_sb.c > > > @@ -18,6 +18,7 @@ > > > #include "xfs_trace.h" > > > #include "xfs_trans.h" > > > #include "xfs_buf_item.h" > > > +#include "xfs_btree.h" > > > #include "xfs_bmap_btree.h" > > > #include "xfs_alloc_btree.h" > > > #include "xfs_log.h" > > > @@ -841,6 +842,55 @@ xfs_sb_mount_common( > > > mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp); > > > } > > > > > > +static int > > > +xfs_fixup_agf_btreeblks( > > > + struct xfs_mount *mp, > > > + struct xfs_trans *tp, > > > + struct xfs_buf *agfbp, > > > + xfs_agnumber_t agno) > > > +{ > > > + struct xfs_btree_cur *cur; > > > + struct xfs_perag *pag = agfbp->b_pag; > > > + struct xfs_agf *agf = agfbp->b_addr; > > > + xfs_agblock_t btreeblks, blocks; > > > + int error; > > > + > > > + cur = xfs_allocbt_init_cursor(mp, tp, agfbp, agno, XFS_BTNUM_BNO); > > > + error = xfs_btree_count_blocks(cur, &blocks); > > > + if (error) > > > + goto err; > > > + xfs_btree_del_cursor(cur, error); > > > + btreeblks = blocks - 1; > > > + > > > + cur = xfs_allocbt_init_cursor(mp, tp, agfbp, agno, XFS_BTNUM_CNT); > > > + error = xfs_btree_count_blocks(cur, &blocks); > > > + if (error) > > > + goto err; > > > + xfs_btree_del_cursor(cur, error); > > > + btreeblks += blocks - 1; > > > + > > > + /* > > > + * although rmapbt doesn't exist in v4 fses, but it'd be better > > > + * to turn it as a generic helper. > > > + */ > > > + if (xfs_sb_version_hasrmapbt(&mp->m_sb)) { > > > + cur = xfs_rmapbt_init_cursor(mp, tp, agfbp, agno); > > > + error = xfs_btree_count_blocks(cur, &blocks); > > > + if (error) > > > + goto err; > > > + xfs_btree_del_cursor(cur, error); > > > + btreeblks += blocks - 1; > > > + } > > > + > > > + agf->agf_btreeblks = cpu_to_be32(btreeblks); > > > + pag->pagf_btreeblks = btreeblks; > > > + xfs_alloc_log_agf(tp, agfbp, XFS_AGF_BTREEBLKS); > > > + return 0; > > > +err: > > > + xfs_btree_del_cursor(cur, error); > > > + return error; > > > +} > > > + > > > /* > > > * xfs_initialize_perag_data > > > * > > > @@ -864,27 +914,51 @@ xfs_initialize_perag_data( > > > uint64_t btree = 0; > > > uint64_t fdblocks; > > > int error = 0; > > > + bool conv = !(mp->m_flags & XFS_MOUNT_RDONLY) && > > > + !xfs_sb_version_haslazysbcount(sbp); > > > + > > > + if (conv) > > > + xfs_warn(mp, "enabling lazy-counters..."); > > > > > > for (index = 0; index < agcount; index++) { > > > + struct xfs_trans *tp = NULL; > > > + struct xfs_buf *agfbp; > > > + > > > + if (conv) { > > > + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_sb, > > > + 0, 0, 0, &tp); > > > + if (error) > > > + return error; > > > + } > > > + > > > /* > > > - * read the agf, then the agi. This gets us > > > + * read the agi, then the agf. This gets us > > > * all the information we need and populates the > > > * per-ag structures for us. > > > */ > > > - error = xfs_alloc_pagf_init(mp, NULL, index, 0); > > > - if (error) > > > + error = xfs_ialloc_pagi_init(mp, tp, index); > > > + if (error) { > > > +err_out: > > > + if (tp) > > > + xfs_trans_cancel(tp); > > > return error; > > > + } > > > > > > - error = xfs_ialloc_pagi_init(mp, NULL, index); > > > + error = xfs_alloc_read_agf(mp, tp, index, 0, &agfbp); > > > if (error) > > > - return error; > > > - pag = xfs_perag_get(mp, index); > > > + goto err_out; > > > + pag = agfbp->b_pag; > > > ifree += pag->pagi_freecount; > > > ialloc += pag->pagi_count; > > > bfree += pag->pagf_freeblks; > > > bfreelst += pag->pagf_flcount; > > > + if (tp) { > > > + error = xfs_fixup_agf_btreeblks(mp, tp, agfbp, index); > > > > Lazysbcount upgrades should be done from a separate function, not mixed > > in with perag initialization. > > I've seen some previous discussion about multiple AG total scan time cost. > Yeah, if another extra scan really accepts here, I could update instead. > > > Also, why is it necessary to walk all the space btrees to set agf_btreeblks? > > If my understanding is correct, I think because without lazysbcount, > even pagf_btreeblks is updated unconditionally now, but that counter > is unreliable for quite ancient kernels which don't have such update > logic. > > Kindly correct me if I'm wrong here. Ah, you're right. The agf_btreeblks field in the AGF only exists if lazysbcount is enabled, which means that adding the feature means that we have to scan every AG to compute the correct value. Still, we only need to do this walk once per filesystem, so I'd prefer not to clutter up the xfs_initialize_perag_data code for the sake of a onetime upgrade for a deprecated ondisk format. In my mind it's a feature to be able to do: #if IS_ENABLED(CONFIG_XFS_V4) int xfs_fs_set_lazycount(...) { /* walk AGs, fix AGF... */ /* lock super */ /* set lazysbcount */ /* bwrite super */ /* log super changes */ /* commit the whole mess */ } #else # define xfs_fs_set_lazycount(..) (-ENOSYS) #endif Because then we know that this is all XFSv4 code and can easily make it go away. The other question I have is: Do we /really/ want to QA and support this in the kernel? Considering that we already have xfs_admin -c1? > > > > > + xfs_trans_commit(tp); > > > + } else { > > > + xfs_buf_relse(agfbp); > > > + } > > > btree += pag->pagf_btreeblks; > > > - xfs_perag_put(pag); > > > } > > > fdblocks = bfree + bfreelst + btree; > > > > > > @@ -900,6 +974,11 @@ xfs_initialize_perag_data( > > > goto out; > > > } > > > > > > + if (conv) { > > > + xfs_sb_version_addlazysbcount(sbp); > > > + mp->m_update_sb = true; > > > + xfs_warn(mp, "lazy-counters has been enabled."); > > > > But we don't log the sb update? > > > > As far as the feature upgrade goes, is it necessary to bwrite the > > primary super to disk (and then log the change)[1] to prevent a truly > > ancient kernel that doesn't support lazysbcount from trying to recover > > the log and ending up with an unsupported feature set? > > Not quite sure if it does harm to ancient kernels with such > unsupported feature. may I ask for more details? :) 1. Walk AG to update btreeblks. 2. Commit feature flag update in superblock. 3. Log flushes to disk before the superblock update gets written to sector 0. <crash> 4. Boot ancient kernel that doesn't understand lazysbcount (from USB recovery stick). 5. Mount begins, because sector 0 doesn't have the lazysbcount bit set. 6. Log recovery replays the primary super update over sector 0, and the new contents of sector 0 say lazysbcount is enabled. 7. Superblock now says it has lazysbcount, what does the kernel do? > > but yeah, if any issues here, I should follow > 1) bwrite sb block first; > 2) log sb > > > > > [1] https://lore.kernel.org/linux-xfs/161723934343.3149451.16679733325094950568.stgit@magnolia/ > > > > > + } > > > /* Overwrite incore superblock counters with just-read data */ > > > spin_lock(&mp->m_sb_lock); > > > sbp->sb_ifree = ifree; > > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c > > > index cb1e2c4702c3..b3b13acd45d6 100644 > > > --- a/fs/xfs/xfs_mount.c > > > +++ b/fs/xfs/xfs_mount.c > > > @@ -626,7 +626,7 @@ xfs_check_summary_counts( > > > * superblock to be correct and we don't need to do anything here. > > > * Otherwise, recalculate the summary counters. > > > */ > > > - if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) || > > > + if ((xfs_sb_version_haslazysbcount(&mp->m_sb) && > > > > Not clear why the logic here inverts? > > .. thus xfs_initialize_perag_data() below can be called then. That seems like all the more reason to make it a separate function, TBH. --D > > Thanks, > Gao Xiang > > > > > --D > > > > > XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) && > > > !xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS)) > > > return 0; > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > > > index a2dab05332ac..16197a890c15 100644 > > > --- a/fs/xfs/xfs_super.c > > > +++ b/fs/xfs/xfs_super.c > > > @@ -1678,6 +1678,11 @@ xfs_remount_rw( > > > } > > > > > > mp->m_flags &= ~XFS_MOUNT_RDONLY; > > > + if (!xfs_sb_version_haslazysbcount(sbp)) { > > > + error = xfs_initialize_perag_data(mp, sbp->sb_agcount); > > > + if (error) > > > + return error; > > > + } > > > > > > /* > > > * If this is the first remount to writeable state we might have some > > > -- > > > 2.27.0 > > > > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] xfs: turn on lazysbcount unconditionally 2021-04-22 0:01 ` Darrick J. Wong @ 2021-04-22 1:51 ` Gao Xiang 2021-04-22 5:11 ` Zorro Lang 0 siblings, 1 reply; 17+ messages in thread From: Gao Xiang @ 2021-04-22 1:51 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs, Dave Chinner, Zorro Lang On Wed, Apr 21, 2021 at 05:01:40PM -0700, Darrick J. Wong wrote: > On Wed, Apr 21, 2021 at 04:00:29AM +0800, Gao Xiang wrote: ... > > > > /* > > > > * xfs_initialize_perag_data > > > > * > > > > @@ -864,27 +914,51 @@ xfs_initialize_perag_data( > > > > uint64_t btree = 0; > > > > uint64_t fdblocks; > > > > int error = 0; > > > > + bool conv = !(mp->m_flags & XFS_MOUNT_RDONLY) && > > > > + !xfs_sb_version_haslazysbcount(sbp); > > > > + > > > > + if (conv) > > > > + xfs_warn(mp, "enabling lazy-counters..."); > > > > > > > > for (index = 0; index < agcount; index++) { > > > > + struct xfs_trans *tp = NULL; > > > > + struct xfs_buf *agfbp; > > > > + > > > > + if (conv) { > > > > + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_sb, > > > > + 0, 0, 0, &tp); > > > > + if (error) > > > > + return error; > > > > + } > > > > + > > > > /* > > > > - * read the agf, then the agi. This gets us > > > > + * read the agi, then the agf. This gets us > > > > * all the information we need and populates the > > > > * per-ag structures for us. > > > > */ > > > > - error = xfs_alloc_pagf_init(mp, NULL, index, 0); > > > > - if (error) > > > > + error = xfs_ialloc_pagi_init(mp, tp, index); > > > > + if (error) { > > > > +err_out: > > > > + if (tp) > > > > + xfs_trans_cancel(tp); > > > > return error; > > > > + } > > > > > > > > - error = xfs_ialloc_pagi_init(mp, NULL, index); > > > > + error = xfs_alloc_read_agf(mp, tp, index, 0, &agfbp); > > > > if (error) > > > > - return error; > > > > - pag = xfs_perag_get(mp, index); > > > > + goto err_out; > > > > + pag = agfbp->b_pag; > > > > ifree += pag->pagi_freecount; > > > > ialloc += pag->pagi_count; > > > > bfree += pag->pagf_freeblks; > > > > bfreelst += pag->pagf_flcount; > > > > + if (tp) { > > > > + error = xfs_fixup_agf_btreeblks(mp, tp, agfbp, index); > > > > > > Lazysbcount upgrades should be done from a separate function, not mixed > > > in with perag initialization. > > > > I've seen some previous discussion about multiple AG total scan time cost. > > Yeah, if another extra scan really accepts here, I could update instead. > > > > > Also, why is it necessary to walk all the space btrees to set agf_btreeblks? > > > > If my understanding is correct, I think because without lazysbcount, > > even pagf_btreeblks is updated unconditionally now, but that counter > > is unreliable for quite ancient kernels which don't have such update > > logic. > > > > Kindly correct me if I'm wrong here. > > Ah, you're right. The agf_btreeblks field in the AGF only exists if > lazysbcount is enabled, which means that adding the feature means that > we have to scan every AG to compute the correct value. > > Still, we only need to do this walk once per filesystem, so I'd prefer > not to clutter up the xfs_initialize_perag_data code for the sake of a > onetime upgrade for a deprecated ondisk format. > > In my mind it's a feature to be able to do: > > #if IS_ENABLED(CONFIG_XFS_V4) > int > xfs_fs_set_lazycount(...) > { > /* walk AGs, fix AGF... */ > /* lock super */ > /* set lazysbcount */ > /* bwrite super */ > /* log super changes */ > /* commit the whole mess */ > } > #else > # define xfs_fs_set_lazycount(..) (-ENOSYS) > #endif > > Because then we know that this is all XFSv4 code and can easily make it > go away. Yeah, sounds better. I could refine this in the next version. :) > > The other question I have is: Do we /really/ want to QA and support > this in the kernel? Considering that we already have xfs_admin -c1? I think we might ask Zorro for this whole thing, since no end users actually report this. :) (Cc Zorro here.) Although the reality is we still support !lazysbcount fses even it isn't looked after at all. My another suggestion completely forbids !lazysbcount from mounting in months (or right now.) Just warn users to use xfs_admin -c1 to convert this. And after months, warn users to convert this and also forbid it from mounting. > > > > > > > > + xfs_trans_commit(tp); > > > > + } else { > > > > + xfs_buf_relse(agfbp); > > > > + } > > > > btree += pag->pagf_btreeblks; > > > > - xfs_perag_put(pag); > > > > } > > > > fdblocks = bfree + bfreelst + btree; > > > > > > > > @@ -900,6 +974,11 @@ xfs_initialize_perag_data( > > > > goto out; > > > > } > > > > > > > > + if (conv) { > > > > + xfs_sb_version_addlazysbcount(sbp); > > > > + mp->m_update_sb = true; > > > > + xfs_warn(mp, "lazy-counters has been enabled."); > > > > > > But we don't log the sb update? > > > > > > As far as the feature upgrade goes, is it necessary to bwrite the > > > primary super to disk (and then log the change)[1] to prevent a truly > > > ancient kernel that doesn't support lazysbcount from trying to recover > > > the log and ending up with an unsupported feature set? > > > > Not quite sure if it does harm to ancient kernels with such > > unsupported feature. may I ask for more details? :) > > 1. Walk AG to update btreeblks. > 2. Commit feature flag update in superblock. > 3. Log flushes to disk before the superblock update gets written to > sector 0. > <crash> > 4. Boot ancient kernel that doesn't understand lazysbcount > (from USB recovery stick). > 5. Mount begins, because sector 0 doesn't have the lazysbcount bit set. > 6. Log recovery replays the primary super update over sector 0, and the > new contents of sector 0 say lazysbcount is enabled. > 7. Superblock now says it has lazysbcount, what does the kernel do? Yeah, but I'm not sure if it has some bad effect if ancient kernels do it in this way. I mean (I think) it's somewhat different from log_incompat thing. If ancient kernels just replay the log, and then sb read verified and refuse to proceed (but fs is not corrupted...) I think that would be fine? I'm not sure about the whole thing on ancient kernels. So very curious about this. I will look into the whole thing as well. > > > > > but yeah, if any issues here, I should follow > > 1) bwrite sb block first; > > 2) log sb > > > > > > > > [1] https://lore.kernel.org/linux-xfs/161723934343.3149451.16679733325094950568.stgit@magnolia/ > > > > > > > + } > > > > /* Overwrite incore superblock counters with just-read data */ > > > > spin_lock(&mp->m_sb_lock); > > > > sbp->sb_ifree = ifree; > > > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c > > > > index cb1e2c4702c3..b3b13acd45d6 100644 > > > > --- a/fs/xfs/xfs_mount.c > > > > +++ b/fs/xfs/xfs_mount.c > > > > @@ -626,7 +626,7 @@ xfs_check_summary_counts( > > > > * superblock to be correct and we don't need to do anything here. > > > > * Otherwise, recalculate the summary counters. > > > > */ > > > > - if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) || > > > > + if ((xfs_sb_version_haslazysbcount(&mp->m_sb) && > > > > > > Not clear why the logic here inverts? > > > > .. thus xfs_initialize_perag_data() below can be called then. > > That seems like all the more reason to make it a separate function, TBH. Yeah, will refine this later. Thanks, Gao Xiang > > --D > > > > > Thanks, > > Gao Xiang > > > > > > > > --D > > > > > > > XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) && > > > > !xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS)) > > > > return 0; > > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > > > > index a2dab05332ac..16197a890c15 100644 > > > > --- a/fs/xfs/xfs_super.c > > > > +++ b/fs/xfs/xfs_super.c > > > > @@ -1678,6 +1678,11 @@ xfs_remount_rw( > > > > } > > > > > > > > mp->m_flags &= ~XFS_MOUNT_RDONLY; > > > > + if (!xfs_sb_version_haslazysbcount(sbp)) { > > > > + error = xfs_initialize_perag_data(mp, sbp->sb_agcount); > > > > + if (error) > > > > + return error; > > > > + } > > > > > > > > /* > > > > * If this is the first remount to writeable state we might have some > > > > -- > > > > 2.27.0 > > > > > > > > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/2] xfs: turn on lazysbcount unconditionally 2021-04-22 1:51 ` Gao Xiang @ 2021-04-22 5:11 ` Zorro Lang 0 siblings, 0 replies; 17+ messages in thread From: Zorro Lang @ 2021-04-22 5:11 UTC (permalink / raw) To: Gao Xiang; +Cc: Darrick J. Wong, linux-xfs, Dave Chinner On Thu, Apr 22, 2021 at 09:51:26AM +0800, Gao Xiang wrote: > On Wed, Apr 21, 2021 at 05:01:40PM -0700, Darrick J. Wong wrote: > > On Wed, Apr 21, 2021 at 04:00:29AM +0800, Gao Xiang wrote: > > ... > > > > > > /* > > > > > * xfs_initialize_perag_data > > > > > * > > > > > @@ -864,27 +914,51 @@ xfs_initialize_perag_data( > > > > > uint64_t btree = 0; > > > > > uint64_t fdblocks; > > > > > int error = 0; > > > > > + bool conv = !(mp->m_flags & XFS_MOUNT_RDONLY) && > > > > > + !xfs_sb_version_haslazysbcount(sbp); > > > > > + > > > > > + if (conv) > > > > > + xfs_warn(mp, "enabling lazy-counters..."); > > > > > > > > > > for (index = 0; index < agcount; index++) { > > > > > + struct xfs_trans *tp = NULL; > > > > > + struct xfs_buf *agfbp; > > > > > + > > > > > + if (conv) { > > > > > + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_sb, > > > > > + 0, 0, 0, &tp); > > > > > + if (error) > > > > > + return error; > > > > > + } > > > > > + > > > > > /* > > > > > - * read the agf, then the agi. This gets us > > > > > + * read the agi, then the agf. This gets us > > > > > * all the information we need and populates the > > > > > * per-ag structures for us. > > > > > */ > > > > > - error = xfs_alloc_pagf_init(mp, NULL, index, 0); > > > > > - if (error) > > > > > + error = xfs_ialloc_pagi_init(mp, tp, index); > > > > > + if (error) { > > > > > +err_out: > > > > > + if (tp) > > > > > + xfs_trans_cancel(tp); > > > > > return error; > > > > > + } > > > > > > > > > > - error = xfs_ialloc_pagi_init(mp, NULL, index); > > > > > + error = xfs_alloc_read_agf(mp, tp, index, 0, &agfbp); > > > > > if (error) > > > > > - return error; > > > > > - pag = xfs_perag_get(mp, index); > > > > > + goto err_out; > > > > > + pag = agfbp->b_pag; > > > > > ifree += pag->pagi_freecount; > > > > > ialloc += pag->pagi_count; > > > > > bfree += pag->pagf_freeblks; > > > > > bfreelst += pag->pagf_flcount; > > > > > + if (tp) { > > > > > + error = xfs_fixup_agf_btreeblks(mp, tp, agfbp, index); > > > > > > > > Lazysbcount upgrades should be done from a separate function, not mixed > > > > in with perag initialization. > > > > > > I've seen some previous discussion about multiple AG total scan time cost. > > > Yeah, if another extra scan really accepts here, I could update instead. > > > > > > > Also, why is it necessary to walk all the space btrees to set agf_btreeblks? > > > > > > If my understanding is correct, I think because without lazysbcount, > > > even pagf_btreeblks is updated unconditionally now, but that counter > > > is unreliable for quite ancient kernels which don't have such update > > > logic. > > > > > > Kindly correct me if I'm wrong here. > > > > Ah, you're right. The agf_btreeblks field in the AGF only exists if > > lazysbcount is enabled, which means that adding the feature means that > > we have to scan every AG to compute the correct value. > > > > Still, we only need to do this walk once per filesystem, so I'd prefer > > not to clutter up the xfs_initialize_perag_data code for the sake of a > > onetime upgrade for a deprecated ondisk format. > > > > In my mind it's a feature to be able to do: > > > > #if IS_ENABLED(CONFIG_XFS_V4) > > int > > xfs_fs_set_lazycount(...) > > { > > /* walk AGs, fix AGF... */ > > /* lock super */ > > /* set lazysbcount */ > > /* bwrite super */ > > /* log super changes */ > > /* commit the whole mess */ > > } > > #else > > # define xfs_fs_set_lazycount(..) (-ENOSYS) > > #endif > > > > Because then we know that this is all XFSv4 code and can easily make it > > go away. > > Yeah, sounds better. I could refine this in the next version. :) > > > > > The other question I have is: Do we /really/ want to QA and support > > this in the kernel? Considering that we already have xfs_admin -c1? > > I think we might ask Zorro for this whole thing, since no end users > actually report this. :) (Cc Zorro here.) Although the reality is > we still support !lazysbcount fses even it isn't looked after at all. There's not complaint from cumtomers yet. Due to lazy-count only can be disabled on V4 xfs, and not disabled by default. So nearly no one use it now (Except I'm the one who's picky:) I never thougth it brings in so much argument. I think if the logic is clear, and easy to fix, better to fix it simply, then remove the whole related code later when it's deprecated. If there's risk, we can keep it unitl it's deprecated, especially there's not more complaints on it. Thanks, Zorro > > My another suggestion completely forbids !lazysbcount from mounting > in months (or right now.) > > Just warn users to use xfs_admin -c1 to convert this. And after months, > warn users to convert this and also forbid it from mounting. > > > > > > > > > > > > + xfs_trans_commit(tp); > > > > > + } else { > > > > > + xfs_buf_relse(agfbp); > > > > > + } > > > > > btree += pag->pagf_btreeblks; > > > > > - xfs_perag_put(pag); > > > > > } > > > > > fdblocks = bfree + bfreelst + btree; > > > > > > > > > > @@ -900,6 +974,11 @@ xfs_initialize_perag_data( > > > > > goto out; > > > > > } > > > > > > > > > > + if (conv) { > > > > > + xfs_sb_version_addlazysbcount(sbp); > > > > > + mp->m_update_sb = true; > > > > > + xfs_warn(mp, "lazy-counters has been enabled."); > > > > > > > > But we don't log the sb update? > > > > > > > > As far as the feature upgrade goes, is it necessary to bwrite the > > > > primary super to disk (and then log the change)[1] to prevent a truly > > > > ancient kernel that doesn't support lazysbcount from trying to recover > > > > the log and ending up with an unsupported feature set? > > > > > > Not quite sure if it does harm to ancient kernels with such > > > unsupported feature. may I ask for more details? :) > > > > 1. Walk AG to update btreeblks. > > 2. Commit feature flag update in superblock. > > 3. Log flushes to disk before the superblock update gets written to > > sector 0. > > <crash> > > 4. Boot ancient kernel that doesn't understand lazysbcount > > (from USB recovery stick). > > 5. Mount begins, because sector 0 doesn't have the lazysbcount bit set. > > 6. Log recovery replays the primary super update over sector 0, and the > > new contents of sector 0 say lazysbcount is enabled. > > 7. Superblock now says it has lazysbcount, what does the kernel do? > > Yeah, but I'm not sure if it has some bad effect if ancient kernels > do it in this way. I mean (I think) it's somewhat different from > log_incompat thing. > > If ancient kernels just replay the log, and then sb read verified > and refuse to proceed (but fs is not corrupted...) I think that > would be fine? > > I'm not sure about the whole thing on ancient kernels. So very > curious about this. I will look into the whole thing as well. > > > > > > > > > but yeah, if any issues here, I should follow > > > 1) bwrite sb block first; > > > 2) log sb > > > > > > > > > > > [1] https://lore.kernel.org/linux-xfs/161723934343.3149451.16679733325094950568.stgit@magnolia/ > > > > > > > > > + } > > > > > /* Overwrite incore superblock counters with just-read data */ > > > > > spin_lock(&mp->m_sb_lock); > > > > > sbp->sb_ifree = ifree; > > > > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c > > > > > index cb1e2c4702c3..b3b13acd45d6 100644 > > > > > --- a/fs/xfs/xfs_mount.c > > > > > +++ b/fs/xfs/xfs_mount.c > > > > > @@ -626,7 +626,7 @@ xfs_check_summary_counts( > > > > > * superblock to be correct and we don't need to do anything here. > > > > > * Otherwise, recalculate the summary counters. > > > > > */ > > > > > - if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) || > > > > > + if ((xfs_sb_version_haslazysbcount(&mp->m_sb) && > > > > > > > > Not clear why the logic here inverts? > > > > > > .. thus xfs_initialize_perag_data() below can be called then. > > > > That seems like all the more reason to make it a separate function, TBH. > > Yeah, will refine this later. > > Thanks, > Gao Xiang > > > > > --D > > > > > > > > Thanks, > > > Gao Xiang > > > > > > > > > > > --D > > > > > > > > > XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) && > > > > > !xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS)) > > > > > return 0; > > > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > > > > > index a2dab05332ac..16197a890c15 100644 > > > > > --- a/fs/xfs/xfs_super.c > > > > > +++ b/fs/xfs/xfs_super.c > > > > > @@ -1678,6 +1678,11 @@ xfs_remount_rw( > > > > > } > > > > > > > > > > mp->m_flags &= ~XFS_MOUNT_RDONLY; > > > > > + if (!xfs_sb_version_haslazysbcount(sbp)) { > > > > > + error = xfs_initialize_perag_data(mp, sbp->sb_agcount); > > > > > + if (error) > > > > > + return error; > > > > > + } > > > > > > > > > > /* > > > > > * If this is the first remount to writeable state we might have some > > > > > -- > > > > > 2.27.0 > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount 2021-04-20 11:08 [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount Gao Xiang 2021-04-20 11:08 ` [PATCH v2 2/2] xfs: turn on lazysbcount unconditionally Gao Xiang @ 2021-04-20 17:42 ` Darrick J. Wong 2021-04-20 21:25 ` Dave Chinner 2 siblings, 0 replies; 17+ messages in thread From: Darrick J. Wong @ 2021-04-20 17:42 UTC (permalink / raw) To: Gao Xiang; +Cc: linux-xfs, Dave Chinner, Zorro Lang, Carlos Maiolino On Tue, Apr 20, 2021 at 07:08:54PM +0800, Gao Xiang wrote: > There are many paths which could trigger xfs_log_sb(), e.g. > xfs_bmap_add_attrfork() > -> xfs_log_sb() > , which overrides on-disk fdblocks by in-core per-CPU fdblocks. > > However, for !lazysbcount cases, on-disk fdblocks is actually updated > by xfs_trans_apply_sb_deltas(), and generally it isn't equal to > in-core per-CPU fdblocks due to xfs_reserve_blocks() or whatever, > see the comment in xfs_unmountfs(). > > It could be observed by the following steps reported by Zorro: > > 1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev > 2. mount $dev $mnt > 3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load) > 4. umount $mnt > 5. xfs_repair -n $dev > > yet due to commit f46e5a174655 ("xfs: fold sbcount quiesce logging > into log covering"), xfs_sync_sb() will also be triggered if log > covering is needed and !lazysbcount when xfs_unmountfs(), so hard > to reproduce on kernel 5.12+ for clean unmount. > > on-disk sb_icount and sb_ifree are also updated in > xfs_trans_apply_sb_deltas() for !lazysbcount cases, however, which > are always equal to per-CPU counters, so only fdblocks matters. > > After this patch, I've seen no strange so far on older kernels > for the testcase above without lazysbcount. > > Reported-by: Zorro Lang <zlang@redhat.com> > Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> > Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Still seems a bit roundabout to me, but fmeh, this is all deprecated anyways, so: Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > --- > changes since v1: > - update commit message. > > fs/xfs/libxfs/xfs_sb.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c > index 60e6d255e5e2..423dada3f64c 100644 > --- a/fs/xfs/libxfs/xfs_sb.c > +++ b/fs/xfs/libxfs/xfs_sb.c > @@ -928,7 +928,13 @@ xfs_log_sb( > > mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount); > mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree); > - mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks); > + if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) { > + struct xfs_dsb *dsb = bp->b_addr; > + > + mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks); > + } else { > + mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks); > + } > > xfs_sb_to_disk(bp->b_addr, &mp->m_sb); > xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF); > -- > 2.27.0 > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount 2021-04-20 11:08 [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount Gao Xiang 2021-04-20 11:08 ` [PATCH v2 2/2] xfs: turn on lazysbcount unconditionally Gao Xiang 2021-04-20 17:42 ` [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount Darrick J. Wong @ 2021-04-20 21:25 ` Dave Chinner 2021-04-20 21:54 ` Gao Xiang 2 siblings, 1 reply; 17+ messages in thread From: Dave Chinner @ 2021-04-20 21:25 UTC (permalink / raw) To: Gao Xiang; +Cc: linux-xfs, Darrick J. Wong, Zorro Lang, Carlos Maiolino On Tue, Apr 20, 2021 at 07:08:54PM +0800, Gao Xiang wrote: > There are many paths which could trigger xfs_log_sb(), e.g. > xfs_bmap_add_attrfork() > -> xfs_log_sb() > , which overrides on-disk fdblocks by in-core per-CPU fdblocks. > > However, for !lazysbcount cases, on-disk fdblocks is actually updated > by xfs_trans_apply_sb_deltas(), and generally it isn't equal to > in-core per-CPU fdblocks due to xfs_reserve_blocks() or whatever, > see the comment in xfs_unmountfs(). > > It could be observed by the following steps reported by Zorro: > > 1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev > 2. mount $dev $mnt > 3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load) > 4. umount $mnt > 5. xfs_repair -n $dev > > yet due to commit f46e5a174655 ("xfs: fold sbcount quiesce logging > into log covering"), xfs_sync_sb() will also be triggered if log > covering is needed and !lazysbcount when xfs_unmountfs(), so hard > to reproduce on kernel 5.12+ for clean unmount. > > on-disk sb_icount and sb_ifree are also updated in > xfs_trans_apply_sb_deltas() for !lazysbcount cases, however, which > are always equal to per-CPU counters, so only fdblocks matters. > > After this patch, I've seen no strange so far on older kernels > for the testcase above without lazysbcount. > > Reported-by: Zorro Lang <zlang@redhat.com> > Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> > Signed-off-by: Gao Xiang <hsiangkao@redhat.com> > --- > changes since v1: > - update commit message. > > fs/xfs/libxfs/xfs_sb.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c > index 60e6d255e5e2..423dada3f64c 100644 > --- a/fs/xfs/libxfs/xfs_sb.c > +++ b/fs/xfs/libxfs/xfs_sb.c > @@ -928,7 +928,13 @@ xfs_log_sb( > > mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount); > mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree); > - mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks); > + if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) { > + struct xfs_dsb *dsb = bp->b_addr; > + > + mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks); > + } else { > + mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks); > + } THis really needs a comment explaining why this is done this way. It's not obvious from reading the code why we pull the the fdblock count off disk and then, in xfs_sb_to_disk(), we write it straight back to disk. It's also not clear to me that summing the inode counters is correct in the case of the !lazysbcount for the similar reasons - the percpu counter is not guaranteed to be absolutely accurate here, yet the values in the disk buffer are. Perhaps we should be updating the m_sb values in xfs_trans_apply_sb_deltas() for the !lazycount case, and only summing them here for the lazycount case... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount 2021-04-20 21:25 ` Dave Chinner @ 2021-04-20 21:54 ` Gao Xiang 2021-04-21 1:45 ` Dave Chinner 0 siblings, 1 reply; 17+ messages in thread From: Gao Xiang @ 2021-04-20 21:54 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, Darrick J. Wong, Zorro Lang, Carlos Maiolino Hi Dave, On Wed, Apr 21, 2021 at 07:25:06AM +1000, Dave Chinner wrote: > On Tue, Apr 20, 2021 at 07:08:54PM +0800, Gao Xiang wrote: > > There are many paths which could trigger xfs_log_sb(), e.g. > > xfs_bmap_add_attrfork() > > -> xfs_log_sb() > > , which overrides on-disk fdblocks by in-core per-CPU fdblocks. > > > > However, for !lazysbcount cases, on-disk fdblocks is actually updated > > by xfs_trans_apply_sb_deltas(), and generally it isn't equal to > > in-core per-CPU fdblocks due to xfs_reserve_blocks() or whatever, > > see the comment in xfs_unmountfs(). > > > > It could be observed by the following steps reported by Zorro: > > > > 1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev > > 2. mount $dev $mnt > > 3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load) > > 4. umount $mnt > > 5. xfs_repair -n $dev > > > > yet due to commit f46e5a174655 ("xfs: fold sbcount quiesce logging > > into log covering"), xfs_sync_sb() will also be triggered if log > > covering is needed and !lazysbcount when xfs_unmountfs(), so hard > > to reproduce on kernel 5.12+ for clean unmount. > > > > on-disk sb_icount and sb_ifree are also updated in > > xfs_trans_apply_sb_deltas() for !lazysbcount cases, however, which > > are always equal to per-CPU counters, so only fdblocks matters. > > > > After this patch, I've seen no strange so far on older kernels > > for the testcase above without lazysbcount. > > > > Reported-by: Zorro Lang <zlang@redhat.com> > > Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> > > Signed-off-by: Gao Xiang <hsiangkao@redhat.com> > > --- > > changes since v1: > > - update commit message. > > > > fs/xfs/libxfs/xfs_sb.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c > > index 60e6d255e5e2..423dada3f64c 100644 > > --- a/fs/xfs/libxfs/xfs_sb.c > > +++ b/fs/xfs/libxfs/xfs_sb.c > > @@ -928,7 +928,13 @@ xfs_log_sb( > > > > mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount); > > mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree); > > - mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks); > > + if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) { > > + struct xfs_dsb *dsb = bp->b_addr; > > + > > + mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks); > > + } else { > > + mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks); > > + } > > THis really needs a comment explaining why this is done this way. > It's not obvious from reading the code why we pull the the fdblock > count off disk and then, in xfs_sb_to_disk(), we write it straight > back to disk. > > It's also not clear to me that summing the inode counters is correct > in the case of the !lazysbcount for the similar reasons - the percpu > counter is not guaranteed to be absolutely accurate here, yet the > values in the disk buffer are. Perhaps we should be updating the > m_sb values in xfs_trans_apply_sb_deltas() for the !lazycount case, > and only summing them here for the lazycount case... But if updating m_sb values in xfs_trans_apply_sb_deltas(), we should also update on-disk sb counters in xfs_trans_apply_sb_deltas() and log sb for !lazysbcount (since for such cases, sb counter update should be considered immediately.) That will indeed cause more modification, I'm not quite sure if it's quite ok honestly. But if you assume that's more clear, I could submit an alternative instead later. Thanks, Gao Xiang > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount 2021-04-20 21:54 ` Gao Xiang @ 2021-04-21 1:45 ` Dave Chinner 2021-04-21 3:01 ` Gao Xiang 0 siblings, 1 reply; 17+ messages in thread From: Dave Chinner @ 2021-04-21 1:45 UTC (permalink / raw) To: Gao Xiang; +Cc: linux-xfs, Darrick J. Wong, Zorro Lang, Carlos Maiolino On Wed, Apr 21, 2021 at 05:54:43AM +0800, Gao Xiang wrote: > Hi Dave, > > On Wed, Apr 21, 2021 at 07:25:06AM +1000, Dave Chinner wrote: > > On Tue, Apr 20, 2021 at 07:08:54PM +0800, Gao Xiang wrote: > > > There are many paths which could trigger xfs_log_sb(), e.g. > > > xfs_bmap_add_attrfork() > > > -> xfs_log_sb() > > > , which overrides on-disk fdblocks by in-core per-CPU fdblocks. > > > > > > However, for !lazysbcount cases, on-disk fdblocks is actually updated > > > by xfs_trans_apply_sb_deltas(), and generally it isn't equal to > > > in-core per-CPU fdblocks due to xfs_reserve_blocks() or whatever, > > > see the comment in xfs_unmountfs(). > > > > > > It could be observed by the following steps reported by Zorro: > > > > > > 1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev > > > 2. mount $dev $mnt > > > 3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load) > > > 4. umount $mnt > > > 5. xfs_repair -n $dev > > > > > > yet due to commit f46e5a174655 ("xfs: fold sbcount quiesce logging > > > into log covering"), xfs_sync_sb() will also be triggered if log > > > covering is needed and !lazysbcount when xfs_unmountfs(), so hard > > > to reproduce on kernel 5.12+ for clean unmount. > > > > > > on-disk sb_icount and sb_ifree are also updated in > > > xfs_trans_apply_sb_deltas() for !lazysbcount cases, however, which > > > are always equal to per-CPU counters, so only fdblocks matters. > > > > > > After this patch, I've seen no strange so far on older kernels > > > for the testcase above without lazysbcount. > > > > > > Reported-by: Zorro Lang <zlang@redhat.com> > > > Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> > > > Signed-off-by: Gao Xiang <hsiangkao@redhat.com> > > > --- > > > changes since v1: > > > - update commit message. > > > > > > fs/xfs/libxfs/xfs_sb.c | 8 +++++++- > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c > > > index 60e6d255e5e2..423dada3f64c 100644 > > > --- a/fs/xfs/libxfs/xfs_sb.c > > > +++ b/fs/xfs/libxfs/xfs_sb.c > > > @@ -928,7 +928,13 @@ xfs_log_sb( > > > > > > mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount); > > > mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree); > > > - mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks); > > > + if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) { > > > + struct xfs_dsb *dsb = bp->b_addr; > > > + > > > + mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks); > > > + } else { > > > + mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks); > > > + } > > > > THis really needs a comment explaining why this is done this way. > > It's not obvious from reading the code why we pull the the fdblock > > count off disk and then, in xfs_sb_to_disk(), we write it straight > > back to disk. > > > > It's also not clear to me that summing the inode counters is correct > > in the case of the !lazysbcount for the similar reasons - the percpu > > counter is not guaranteed to be absolutely accurate here, yet the > > values in the disk buffer are. Perhaps we should be updating the > > m_sb values in xfs_trans_apply_sb_deltas() for the !lazycount case, > > and only summing them here for the lazycount case... > > But if updating m_sb values in xfs_trans_apply_sb_deltas(), we > should also update on-disk sb counters in xfs_trans_apply_sb_deltas() > and log sb for !lazysbcount (since for such cases, sb counter update > should be considered immediately.) I don't follow - xfs_trans_apply_sb_deltas() already logs the changes to the superblock made in the transaction. xfs_trans_unreserve_and_mod_sb() does the in-memory counter updates after xfs_trans_apply_sb_deltas() applies them to the on-disk superblock in the buffer and logs them. But nowhere on a !lazysbcount setup are mp->m_sb.sb_fdcount/ifree/ icount values being updated, and hence they are not valid at any time except for during log quiesce where all the in memory reservations have been removed and the per-cpu counters are synced to mp->m_sb. I'm suggesting that xfs_trans_unreserve_and_mod_sb() also updates the mp->m_sb.sb_fdcount/ifree/icount values for !lazysbcount, as we currently do not do this and this will keep them uptodate for any caller of xfs_sb_to_disk(). i.e. we have three choices: 1. avoid writing the counters in xfs_sb_to_disk() for !lazycount. 2. read them from the buffer before writing them back to the buffer. 3. keep them up to date correctly via xfs_trans_unreserve_and_mod_sb. #1 is bad because there are cases where we want to write the counters even for !lazysbcount filesystems (e.g. mkfs, repair, etc). #2 is essentially a hack around the fact that mp->m_sb is not kept up to date in the in-memory superblock for !lazysbcount filesystems. #3 keeps the in-memory superblock up to date for !lazysbcount case so they are coherent with the on-disk values and hence we only need to update the in-memory superblock counts for lazysbcount filesystems before calling xfs_sb_to_disk(). #3 is my preferred solution. > That will indeed cause more modification, I'm not quite sure if it's > quite ok honestly. But if you assume that's more clear, I could submit > an alternative instead later. I think the version you posted doesn't fix the entire problem. It merely slaps a band-aid over the symptom that is being seen, and doesn't address all the non-coherent data that can be written to the superblock here. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount 2021-04-21 1:45 ` Dave Chinner @ 2021-04-21 3:01 ` Gao Xiang 2021-04-22 1:44 ` Dave Chinner 0 siblings, 1 reply; 17+ messages in thread From: Gao Xiang @ 2021-04-21 3:01 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, Darrick J. Wong, Zorro Lang, Carlos Maiolino On Wed, Apr 21, 2021 at 11:45:26AM +1000, Dave Chinner wrote: > On Wed, Apr 21, 2021 at 05:54:43AM +0800, Gao Xiang wrote: > > Hi Dave, > > > > On Wed, Apr 21, 2021 at 07:25:06AM +1000, Dave Chinner wrote: > > > On Tue, Apr 20, 2021 at 07:08:54PM +0800, Gao Xiang wrote: > > > > There are many paths which could trigger xfs_log_sb(), e.g. > > > > xfs_bmap_add_attrfork() > > > > -> xfs_log_sb() > > > > , which overrides on-disk fdblocks by in-core per-CPU fdblocks. > > > > > > > > However, for !lazysbcount cases, on-disk fdblocks is actually updated > > > > by xfs_trans_apply_sb_deltas(), and generally it isn't equal to > > > > in-core per-CPU fdblocks due to xfs_reserve_blocks() or whatever, > > > > see the comment in xfs_unmountfs(). > > > > > > > > It could be observed by the following steps reported by Zorro: > > > > > > > > 1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev > > > > 2. mount $dev $mnt > > > > 3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load) > > > > 4. umount $mnt > > > > 5. xfs_repair -n $dev > > > > > > > > yet due to commit f46e5a174655 ("xfs: fold sbcount quiesce logging > > > > into log covering"), xfs_sync_sb() will also be triggered if log > > > > covering is needed and !lazysbcount when xfs_unmountfs(), so hard > > > > to reproduce on kernel 5.12+ for clean unmount. > > > > > > > > on-disk sb_icount and sb_ifree are also updated in > > > > xfs_trans_apply_sb_deltas() for !lazysbcount cases, however, which > > > > are always equal to per-CPU counters, so only fdblocks matters. > > > > > > > > After this patch, I've seen no strange so far on older kernels > > > > for the testcase above without lazysbcount. > > > > > > > > Reported-by: Zorro Lang <zlang@redhat.com> > > > > Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> > > > > Signed-off-by: Gao Xiang <hsiangkao@redhat.com> > > > > --- > > > > changes since v1: > > > > - update commit message. > > > > > > > > fs/xfs/libxfs/xfs_sb.c | 8 +++++++- > > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c > > > > index 60e6d255e5e2..423dada3f64c 100644 > > > > --- a/fs/xfs/libxfs/xfs_sb.c > > > > +++ b/fs/xfs/libxfs/xfs_sb.c > > > > @@ -928,7 +928,13 @@ xfs_log_sb( > > > > > > > > mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount); > > > > mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree); > > > > - mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks); > > > > + if (!xfs_sb_version_haslazysbcount(&mp->m_sb)) { > > > > + struct xfs_dsb *dsb = bp->b_addr; > > > > + > > > > + mp->m_sb.sb_fdblocks = be64_to_cpu(dsb->sb_fdblocks); > > > > + } else { > > > > + mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks); > > > > + } > > > > > > THis really needs a comment explaining why this is done this way. > > > It's not obvious from reading the code why we pull the the fdblock > > > count off disk and then, in xfs_sb_to_disk(), we write it straight > > > back to disk. > > > > > > It's also not clear to me that summing the inode counters is correct > > > in the case of the !lazysbcount for the similar reasons - the percpu > > > counter is not guaranteed to be absolutely accurate here, yet the > > > values in the disk buffer are. Perhaps we should be updating the > > > m_sb values in xfs_trans_apply_sb_deltas() for the !lazycount case, > > > and only summing them here for the lazycount case... > > > > But if updating m_sb values in xfs_trans_apply_sb_deltas(), we > > should also update on-disk sb counters in xfs_trans_apply_sb_deltas() > > and log sb for !lazysbcount (since for such cases, sb counter update > > should be considered immediately.) > > I don't follow - xfs_trans_apply_sb_deltas() already logs the > changes to the superblock made in the transaction. > > xfs_trans_unreserve_and_mod_sb() does the in-memory counter updates > after xfs_trans_apply_sb_deltas() applies them to the on-disk > superblock in the buffer and logs them. > > But nowhere on a !lazysbcount setup are mp->m_sb.sb_fdcount/ifree/ > icount values being updated, and hence they are not valid at any > time except for during log quiesce where all the in memory > reservations have been removed and the per-cpu counters are synced > to mp->m_sb. > > I'm suggesting that xfs_trans_unreserve_and_mod_sb() also updates > the mp->m_sb.sb_fdcount/ifree/icount values for !lazysbcount, as we > currently do not do this and this will keep them uptodate for any > caller of xfs_sb_to_disk(). > > i.e. we have three choices: > > 1. avoid writing the counters in xfs_sb_to_disk() for !lazycount. > 2. read them from the buffer before writing them back to the buffer. > 3. keep them up to date correctly via xfs_trans_unreserve_and_mod_sb. > > #1 is bad because there are cases where we want to write the > counters even for !lazysbcount filesystems (e.g. mkfs, repair, etc). > > #2 is essentially a hack around the fact that mp->m_sb is not kept > up to date in the in-memory superblock for !lazysbcount filesystems. > > #3 keeps the in-memory superblock up to date for !lazysbcount case > so they are coherent with the on-disk values and hence we only need > to update the in-memory superblock counts for lazysbcount > filesystems before calling xfs_sb_to_disk(). > > #3 is my preferred solution. > > > That will indeed cause more modification, I'm not quite sure if it's > > quite ok honestly. But if you assume that's more clear, I could submit > > an alternative instead later. > > I think the version you posted doesn't fix the entire problem. It > merely slaps a band-aid over the symptom that is being seen, and > doesn't address all the non-coherent data that can be written to the > superblock here. As I explained on IRC as well, I think for !lazysbcount cases, fdblocks, icount and ifree are protected by sb buffer lock. and the only users of these three are: 1) xfs_trans_apply_sb_deltas() 2) xfs_log_sb() So I've seen no need to update sb_icount, sb_ifree in that way (I mean my v2, although I agree it's a bit hacky.) only sb_fdblocks matters. But the reason why this patch exist is only to backport to old stable kernels, since after [PATCH v2 2/2], we can get rid of all of !lazysbcount cases upstream. But if we'd like to do more e.g. by taking m_sb_lock, I've seen the xfs codebase quite varies these years. and I modified some version like http://paste.debian.net/1194481/ and I'm not sure if we be64_add_cpu() on-disk sb counters in xfs_trans_apply_sb_deltas() and then mp->m_sb.sb_fdblocks += tp->t_fdblocks_delta again in xfs_trans_unreserve_and_mod_sb() seems much better. Thanks, Gao Xiang > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount 2021-04-21 3:01 ` Gao Xiang @ 2021-04-22 1:44 ` Dave Chinner 2021-04-22 2:06 ` Gao Xiang 0 siblings, 1 reply; 17+ messages in thread From: Dave Chinner @ 2021-04-22 1:44 UTC (permalink / raw) To: Gao Xiang; +Cc: linux-xfs, Darrick J. Wong, Zorro Lang, Carlos Maiolino On Wed, Apr 21, 2021 at 11:01:29AM +0800, Gao Xiang wrote: > On Wed, Apr 21, 2021 at 11:45:26AM +1000, Dave Chinner wrote: > > On Wed, Apr 21, 2021 at 05:54:43AM +0800, Gao Xiang wrote: > > #1 is bad because there are cases where we want to write the > > counters even for !lazysbcount filesystems (e.g. mkfs, repair, etc). > > > > #2 is essentially a hack around the fact that mp->m_sb is not kept > > up to date in the in-memory superblock for !lazysbcount filesystems. > > > > #3 keeps the in-memory superblock up to date for !lazysbcount case > > so they are coherent with the on-disk values and hence we only need > > to update the in-memory superblock counts for lazysbcount > > filesystems before calling xfs_sb_to_disk(). > > > > #3 is my preferred solution. > > > > > That will indeed cause more modification, I'm not quite sure if it's > > > quite ok honestly. But if you assume that's more clear, I could submit > > > an alternative instead later. > > > > I think the version you posted doesn't fix the entire problem. It > > merely slaps a band-aid over the symptom that is being seen, and > > doesn't address all the non-coherent data that can be written to the > > superblock here. > > As I explained on IRC as well, I think for !lazysbcount cases, fdblocks, > icount and ifree are protected by sb buffer lock. and the only users of > these three are: > 1) xfs_trans_apply_sb_deltas() > 2) xfs_log_sb() That's just a happy accident and not intentional in any way. Just fixing the case that occurs while holding the sb buffer lock doesn't actually fix the underlying problem, it just uses this as a bandaid. > > So I've seen no need to update sb_icount, sb_ifree in that way (I mean > my v2, although I agree it's a bit hacky.) only sb_fdblocks matters. > > But the reason why this patch exist is only to backport to old stable > kernels, since after [PATCH v2 2/2], we can get rid of all of > !lazysbcount cases upstream. > > But if we'd like to do more e.g. by taking m_sb_lock, I've seen the > xfs codebase quite varies these years. and I modified some version > like http://paste.debian.net/1194481/ I said on IRC that this is what xfs_trans_unreserve_and_mod_sb() is for. For !lazysbcount filesystems the transaction will be marked dirty (i.e XFS_TRANS_SB_DIRTY is set) and so we'll always run the slow path that takes the m_sb_lock and updates mp->m_sb. It's faster for me to explain this by patch than any other way. See below. Cheers, Dave. -- Dave Chinner david@fromorbit.com xfs: update superblock counters correctly for !lazysbcount From: Dave Chinner <dchinner@redhat.com> Keep the mount superblock counters up to date for !lazysbcount filesystems so that when we log the superblock they do not need updating in any way because they are already correct. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/libxfs/xfs_sb.c | 16 +++++++++++++--- fs/xfs/xfs_trans.c | 3 +++ 2 files changed, 16 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index 9630f9e2f540..7d4c238540d4 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -794,9 +794,19 @@ xfs_log_sb( struct xfs_mount *mp = tp->t_mountp; struct xfs_buf *bp = xfs_trans_getsb(tp); - mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount); - mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree); - mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks); + /* + * Lazy sb counters don't update the in-core superblock so do that now. + * If this is at unmount, the counters will be exactly correct, but at + * any other time they will only be ballpark correct because of + * reservations that have been taken out percpu counters. If we have an + * unclean shutdown, this will be corrected by log recovery rebuilding + * the counters from the AGF block counts. + */ + if (xfs_sb_version_haslazysbcount(&mp->m_sb)) { + mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount); + mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree); + mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks); + } xfs_sb_to_disk(bp->b_addr, &mp->m_sb); xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF); diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index bcc978011869..438e41931b55 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -629,6 +629,9 @@ xfs_trans_unreserve_and_mod_sb( /* apply remaining deltas */ spin_lock(&mp->m_sb_lock); + mp->m_sb.sb_fdblocks += blkdelta; + mp->m_sb.sb_icount += idelta; + mp->m_sb.sb_ifree += ifreedelta; mp->m_sb.sb_frextents += rtxdelta; mp->m_sb.sb_dblocks += tp->t_dblocks_delta; mp->m_sb.sb_agcount += tp->t_agcount_delta; ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount 2021-04-22 1:44 ` Dave Chinner @ 2021-04-22 2:06 ` Gao Xiang 2021-04-22 3:01 ` Dave Chinner 0 siblings, 1 reply; 17+ messages in thread From: Gao Xiang @ 2021-04-22 2:06 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, Darrick J. Wong, Zorro Lang, Carlos Maiolino Hi Dave, On Thu, Apr 22, 2021 at 11:44:46AM +1000, Dave Chinner wrote: > On Wed, Apr 21, 2021 at 11:01:29AM +0800, Gao Xiang wrote: > > On Wed, Apr 21, 2021 at 11:45:26AM +1000, Dave Chinner wrote: > > > On Wed, Apr 21, 2021 at 05:54:43AM +0800, Gao Xiang wrote: > > > #1 is bad because there are cases where we want to write the > > > counters even for !lazysbcount filesystems (e.g. mkfs, repair, etc). > > > > > > #2 is essentially a hack around the fact that mp->m_sb is not kept > > > up to date in the in-memory superblock for !lazysbcount filesystems. > > > > > > #3 keeps the in-memory superblock up to date for !lazysbcount case > > > so they are coherent with the on-disk values and hence we only need > > > to update the in-memory superblock counts for lazysbcount > > > filesystems before calling xfs_sb_to_disk(). > > > > > > #3 is my preferred solution. > > > > > > > That will indeed cause more modification, I'm not quite sure if it's > > > > quite ok honestly. But if you assume that's more clear, I could submit > > > > an alternative instead later. > > > > > > I think the version you posted doesn't fix the entire problem. It > > > merely slaps a band-aid over the symptom that is being seen, and > > > doesn't address all the non-coherent data that can be written to the > > > superblock here. > > > > As I explained on IRC as well, I think for !lazysbcount cases, fdblocks, > > icount and ifree are protected by sb buffer lock. and the only users of > > these three are: > > 1) xfs_trans_apply_sb_deltas() > > 2) xfs_log_sb() > > That's just a happy accident and not intentional in any way. Just > fixing the case that occurs while holding the sb buffer lock doesn't > actually fix the underlying problem, it just uses this as a bandaid. I think for !lazysbcases, sb buffer lock is only a reliable lock that can be relied on for serialzing (since we need to make sure each sb write matches the corresponding fdblocks, ifree, icount. So sb buffer needs be locked every time. So so need to recalc on dirty log.) > > > > > So I've seen no need to update sb_icount, sb_ifree in that way (I mean > > my v2, although I agree it's a bit hacky.) only sb_fdblocks matters. > > > > But the reason why this patch exist is only to backport to old stable > > kernels, since after [PATCH v2 2/2], we can get rid of all of > > !lazysbcount cases upstream. > > > > But if we'd like to do more e.g. by taking m_sb_lock, I've seen the > > xfs codebase quite varies these years. and I modified some version > > like http://paste.debian.net/1194481/ > > I said on IRC that this is what xfs_trans_unreserve_and_mod_sb() is > for. For !lazysbcount filesystems the transaction will be marked > dirty (i.e XFS_TRANS_SB_DIRTY is set) and so we'll always run the > slow path that takes the m_sb_lock and updates mp->m_sb. > > It's faster for me to explain this by patch than any other way. See > below. I know what you mean, but there exists 3 things: 1) we be64_add_cpu() on-disk fdblocks, ifree, icount at xfs_trans_apply_sb_deltas(), and then do the same bahavior in xfs_trans_unreserve_and_mod_sb() for in-memory counters again. that is (somewhat) fragile. 2) m_sb_lock behaves no effect at this. This lock between xfs_log_sb() and xfs_trans_unreserve_and_mod_sb() is still sb buffer lock for !lazysbcount cases. 3) in-memory sb counters are serialized by some spinlock now, so I'm not sure sb per-CPU counters behave for lazysbcount cases, are these used for better performance? > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > > xfs: update superblock counters correctly for !lazysbcount > > From: Dave Chinner <dchinner@redhat.com> > > Keep the mount superblock counters up to date for !lazysbcount > filesystems so that when we log the superblock they do not need > updating in any way because they are already correct. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/libxfs/xfs_sb.c | 16 +++++++++++++--- > fs/xfs/xfs_trans.c | 3 +++ > 2 files changed, 16 insertions(+), 3 deletions(-) > > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c > index 9630f9e2f540..7d4c238540d4 100644 > --- a/fs/xfs/libxfs/xfs_sb.c > +++ b/fs/xfs/libxfs/xfs_sb.c > @@ -794,9 +794,19 @@ xfs_log_sb( > struct xfs_mount *mp = tp->t_mountp; > struct xfs_buf *bp = xfs_trans_getsb(tp); > > - mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount); > - mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree); > - mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks); > + /* > + * Lazy sb counters don't update the in-core superblock so do that now. > + * If this is at unmount, the counters will be exactly correct, but at > + * any other time they will only be ballpark correct because of > + * reservations that have been taken out percpu counters. If we have an > + * unclean shutdown, this will be corrected by log recovery rebuilding > + * the counters from the AGF block counts. > + */ > + if (xfs_sb_version_haslazysbcount(&mp->m_sb)) { > + mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount); > + mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree); > + mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks); > + } > > xfs_sb_to_disk(bp->b_addr, &mp->m_sb); > xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF); > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c > index bcc978011869..438e41931b55 100644 > --- a/fs/xfs/xfs_trans.c > +++ b/fs/xfs/xfs_trans.c > @@ -629,6 +629,9 @@ xfs_trans_unreserve_and_mod_sb( > > /* apply remaining deltas */ > spin_lock(&mp->m_sb_lock); > + mp->m_sb.sb_fdblocks += blkdelta; not sure that is quite equal to blkdelta, since (I think) we might need to apply t_res_fdblocks_delta for !lazysbcount cases but not lazysbcount cases, but I'm not quite sure, just saw the comment above xfs_trans_unreserve_and_mod_sb() and the implementation of xfs_trans_apply_sb_deltas(). Thanks, Gao Xiang > + mp->m_sb.sb_icount += idelta; > + mp->m_sb.sb_ifree += ifreedelta; > mp->m_sb.sb_frextents += rtxdelta; > mp->m_sb.sb_dblocks += tp->t_dblocks_delta; > mp->m_sb.sb_agcount += tp->t_agcount_delta; > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount 2021-04-22 2:06 ` Gao Xiang @ 2021-04-22 3:01 ` Dave Chinner 2021-04-22 3:12 ` Gao Xiang 0 siblings, 1 reply; 17+ messages in thread From: Dave Chinner @ 2021-04-22 3:01 UTC (permalink / raw) To: Gao Xiang; +Cc: linux-xfs, Darrick J. Wong, Zorro Lang, Carlos Maiolino On Thu, Apr 22, 2021 at 10:06:13AM +0800, Gao Xiang wrote: > Hi Dave, > > On Thu, Apr 22, 2021 at 11:44:46AM +1000, Dave Chinner wrote: > > On Wed, Apr 21, 2021 at 11:01:29AM +0800, Gao Xiang wrote: > > > On Wed, Apr 21, 2021 at 11:45:26AM +1000, Dave Chinner wrote: > > > > On Wed, Apr 21, 2021 at 05:54:43AM +0800, Gao Xiang wrote: > > > > #1 is bad because there are cases where we want to write the > > > > counters even for !lazysbcount filesystems (e.g. mkfs, repair, etc). > > > > > > > > #2 is essentially a hack around the fact that mp->m_sb is not kept > > > > up to date in the in-memory superblock for !lazysbcount filesystems. > > > > > > > > #3 keeps the in-memory superblock up to date for !lazysbcount case > > > > so they are coherent with the on-disk values and hence we only need > > > > to update the in-memory superblock counts for lazysbcount > > > > filesystems before calling xfs_sb_to_disk(). > > > > > > > > #3 is my preferred solution. > > > > > > > > > That will indeed cause more modification, I'm not quite sure if it's > > > > > quite ok honestly. But if you assume that's more clear, I could submit > > > > > an alternative instead later. > > > > > > > > I think the version you posted doesn't fix the entire problem. It > > > > merely slaps a band-aid over the symptom that is being seen, and > > > > doesn't address all the non-coherent data that can be written to the > > > > superblock here. > > > > > > As I explained on IRC as well, I think for !lazysbcount cases, fdblocks, > > > icount and ifree are protected by sb buffer lock. and the only users of > > > these three are: > > > 1) xfs_trans_apply_sb_deltas() > > > 2) xfs_log_sb() > > > > That's just a happy accident and not intentional in any way. Just > > fixing the case that occurs while holding the sb buffer lock doesn't > > actually fix the underlying problem, it just uses this as a bandaid. > > I think for !lazysbcases, sb buffer lock is only a reliable lock that > can be relied on for serialzing (since we need to make sure each sb > write matches the corresponding fdblocks, ifree, icount. So sb buffer > needs be locked every time. So so need to recalc on dirty log.) > > > > > > > > So I've seen no need to update sb_icount, sb_ifree in that way (I mean > > > my v2, although I agree it's a bit hacky.) only sb_fdblocks matters. > > > > > > But the reason why this patch exist is only to backport to old stable > > > kernels, since after [PATCH v2 2/2], we can get rid of all of > > > !lazysbcount cases upstream. > > > > > > But if we'd like to do more e.g. by taking m_sb_lock, I've seen the > > > xfs codebase quite varies these years. and I modified some version > > > like http://paste.debian.net/1194481/ > > > > I said on IRC that this is what xfs_trans_unreserve_and_mod_sb() is > > for. For !lazysbcount filesystems the transaction will be marked > > dirty (i.e XFS_TRANS_SB_DIRTY is set) and so we'll always run the > > slow path that takes the m_sb_lock and updates mp->m_sb. > > > > It's faster for me to explain this by patch than any other way. See > > below. > > I know what you mean, but there exists 3 things: > 1) we be64_add_cpu() on-disk fdblocks, ifree, icount at > xfs_trans_apply_sb_deltas(), and then do the same bahavior in > xfs_trans_unreserve_and_mod_sb() for in-memory counters again. > that is (somewhat) fragile. That's exactly how the superblock updates have been done since the mid 1990s. It's the way it was intended to work: - xfs_trans_apply_sb_deltas() applies the changes to the on disk superblock - xfs_trans_unreserve_and_mod_sb() applies the changes to the in-memory superblock. All my patch does is follow the long established separation of update responsibilities. It is actually returning the code to the behaviour we had before lazy superblock counters were introduced. > 2) m_sb_lock behaves no effect at this. This lock between > xfs_log_sb() and xfs_trans_unreserve_and_mod_sb() is still > sb buffer lock for !lazysbcount cases. The m_sb_lock doesn't need to have any effect on this. It's to prevent concurrent updates of the in-core superblock, not to prevent access to the superblock buffer. i.e. the superblock buffer lock protects against concurrent updates of the superblock buffer, and hence while progating and logging changes to the superblock buffer we have to have the superblock buffer locked. > 3) in-memory sb counters are serialized by some spinlock now, No, they are not. Lazysbcount does not set XFS_TRANS_SB_DIRTY for pure ifree/icount/fdblock updates, so it never runs the code I modified in xfs_trans_unreserve_and_mod_sb() unless some other part of the superblock is changed. For !lazysbcount, we always run this path because XFS_TRANS_SB_DIRTY is always set. > so I'm not sure sb per-CPU counters behave for lazysbcount > cases, are these used for better performance? It does not change behaviour of anything at all, execpt the counter values for !lazysbcount filesystems are now always kept correctly up to date. > > xfs_sb_to_disk(bp->b_addr, &mp->m_sb); > > xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF); > > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c > > index bcc978011869..438e41931b55 100644 > > --- a/fs/xfs/xfs_trans.c > > +++ b/fs/xfs/xfs_trans.c > > @@ -629,6 +629,9 @@ xfs_trans_unreserve_and_mod_sb( > > > > /* apply remaining deltas */ > > spin_lock(&mp->m_sb_lock); > > + mp->m_sb.sb_fdblocks += blkdelta; > > not sure that is quite equal to blkdelta, since (I think) we might need > to apply t_res_fdblocks_delta for !lazysbcount cases but not lazysbcount > cases, but I'm not quite sure, just saw the comment above > xfs_trans_unreserve_and_mod_sb() and the implementation of > xfs_trans_apply_sb_deltas(). Yes, I forgot about the special delayed allocation space accounting. We'll have to add that, too, so it becomes: + mp->m_sb.sb_fdblocks += blkdelta + tp->t_res_fdblocks_delta; + mp->m_sb.sb_icount += idelta; + mp->m_sb.sb_ifree += ifreedelta; But this doesn't change the structure of the patch in any way. CHeers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount 2021-04-22 3:01 ` Dave Chinner @ 2021-04-22 3:12 ` Gao Xiang 2021-04-22 15:58 ` Darrick J. Wong 0 siblings, 1 reply; 17+ messages in thread From: Gao Xiang @ 2021-04-22 3:12 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, Darrick J. Wong, Zorro Lang, Carlos Maiolino On Thu, Apr 22, 2021 at 01:01:02PM +1000, Dave Chinner wrote: > On Thu, Apr 22, 2021 at 10:06:13AM +0800, Gao Xiang wrote: > > Hi Dave, > > > > On Thu, Apr 22, 2021 at 11:44:46AM +1000, Dave Chinner wrote: > > > On Wed, Apr 21, 2021 at 11:01:29AM +0800, Gao Xiang wrote: > > > > On Wed, Apr 21, 2021 at 11:45:26AM +1000, Dave Chinner wrote: > > > > > On Wed, Apr 21, 2021 at 05:54:43AM +0800, Gao Xiang wrote: > > > > > #1 is bad because there are cases where we want to write the > > > > > counters even for !lazysbcount filesystems (e.g. mkfs, repair, etc). > > > > > > > > > > #2 is essentially a hack around the fact that mp->m_sb is not kept > > > > > up to date in the in-memory superblock for !lazysbcount filesystems. > > > > > > > > > > #3 keeps the in-memory superblock up to date for !lazysbcount case > > > > > so they are coherent with the on-disk values and hence we only need > > > > > to update the in-memory superblock counts for lazysbcount > > > > > filesystems before calling xfs_sb_to_disk(). > > > > > > > > > > #3 is my preferred solution. > > > > > > > > > > > That will indeed cause more modification, I'm not quite sure if it's > > > > > > quite ok honestly. But if you assume that's more clear, I could submit > > > > > > an alternative instead later. > > > > > > > > > > I think the version you posted doesn't fix the entire problem. It > > > > > merely slaps a band-aid over the symptom that is being seen, and > > > > > doesn't address all the non-coherent data that can be written to the > > > > > superblock here. > > > > > > > > As I explained on IRC as well, I think for !lazysbcount cases, fdblocks, > > > > icount and ifree are protected by sb buffer lock. and the only users of > > > > these three are: > > > > 1) xfs_trans_apply_sb_deltas() > > > > 2) xfs_log_sb() > > > > > > That's just a happy accident and not intentional in any way. Just > > > fixing the case that occurs while holding the sb buffer lock doesn't > > > actually fix the underlying problem, it just uses this as a bandaid. > > > > I think for !lazysbcases, sb buffer lock is only a reliable lock that > > can be relied on for serialzing (since we need to make sure each sb > > write matches the corresponding fdblocks, ifree, icount. So sb buffer > > needs be locked every time. So so need to recalc on dirty log.) > > > > > > > > > > > So I've seen no need to update sb_icount, sb_ifree in that way (I mean > > > > my v2, although I agree it's a bit hacky.) only sb_fdblocks matters. > > > > > > > > But the reason why this patch exist is only to backport to old stable > > > > kernels, since after [PATCH v2 2/2], we can get rid of all of > > > > !lazysbcount cases upstream. > > > > > > > > But if we'd like to do more e.g. by taking m_sb_lock, I've seen the > > > > xfs codebase quite varies these years. and I modified some version > > > > like http://paste.debian.net/1194481/ > > > > > > I said on IRC that this is what xfs_trans_unreserve_and_mod_sb() is > > > for. For !lazysbcount filesystems the transaction will be marked > > > dirty (i.e XFS_TRANS_SB_DIRTY is set) and so we'll always run the > > > slow path that takes the m_sb_lock and updates mp->m_sb. > > > > > > It's faster for me to explain this by patch than any other way. See > > > below. > > > > I know what you mean, but there exists 3 things: > > 1) we be64_add_cpu() on-disk fdblocks, ifree, icount at > > xfs_trans_apply_sb_deltas(), and then do the same bahavior in > > xfs_trans_unreserve_and_mod_sb() for in-memory counters again. > > that is (somewhat) fragile. > > That's exactly how the superblock updates have been done since the > mid 1990s. It's the way it was intended to work: > > - xfs_trans_apply_sb_deltas() applies the changes to the on > disk superblock > - xfs_trans_unreserve_and_mod_sb() applies the changes to the > in-memory superblock. > > All my patch does is follow the long established separation of > update responsibilities. It is actually returning the code to the > behaviour we had before lazy superblock counters were introduced. > > > 2) m_sb_lock behaves no effect at this. This lock between > > xfs_log_sb() and xfs_trans_unreserve_and_mod_sb() is still > > sb buffer lock for !lazysbcount cases. > > The m_sb_lock doesn't need to have any effect on this. It's to > prevent concurrent updates of the in-core superblock, not to prevent > access to the superblock buffer. > > i.e. the superblock buffer lock protects against concurrent updates > of the superblock buffer, and hence while progating and logging > changes to the superblock buffer we have to have the superblock > buffer locked. > > > 3) in-memory sb counters are serialized by some spinlock now, > > No, they are not. Lazysbcount does not set XFS_TRANS_SB_DIRTY > for pure ifree/icount/fdblock updates, so it never runs the code > I modified in xfs_trans_unreserve_and_mod_sb() unless some other > part of the superblock is changed. > > For !lazysbcount, we always run this path because XFS_TRANS_SB_DIRTY > is always set. > > > so I'm not sure sb per-CPU counters behave for lazysbcount > > cases, are these used for better performance? > > It does not change behaviour of anything at all, execpt the counter > values for !lazysbcount filesystems are now always kept correctly up > to date. > > > > xfs_sb_to_disk(bp->b_addr, &mp->m_sb); > > > xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF); > > > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c > > > index bcc978011869..438e41931b55 100644 > > > --- a/fs/xfs/xfs_trans.c > > > +++ b/fs/xfs/xfs_trans.c > > > @@ -629,6 +629,9 @@ xfs_trans_unreserve_and_mod_sb( > > > > > > /* apply remaining deltas */ > > > spin_lock(&mp->m_sb_lock); > > > + mp->m_sb.sb_fdblocks += blkdelta; > > > > not sure that is quite equal to blkdelta, since (I think) we might need > > to apply t_res_fdblocks_delta for !lazysbcount cases but not lazysbcount > > cases, but I'm not quite sure, just saw the comment above > > xfs_trans_unreserve_and_mod_sb() and the implementation of > > xfs_trans_apply_sb_deltas(). > > Yes, I forgot about the special delayed allocation space accounting. > We'll have to add that, too, so it becomes: > > + mp->m_sb.sb_fdblocks += blkdelta + tp->t_res_fdblocks_delta; > + mp->m_sb.sb_icount += idelta; > + mp->m_sb.sb_ifree += ifreedelta; > > But this doesn't change the structure of the patch in any way. Anyway, I think this'd be absolutely fine to fix this issue as well, so: Reviewed-by: Gao Xiang <hsiangkao@redhat.com> (Although I still insist on my v2 [just my own thought] since in-memory sb counters are totally unused/reserved compared with on-disk sb counters for sb_fdblocks and per-CPU sb counters for sb_ifree / sb_icount for the whole !lazysbcount cases, maybe adding some comments is better. But I'm also fine if the patch goes like this ;) ) Thanks, Gao Xiang > > CHeers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount 2021-04-22 3:12 ` Gao Xiang @ 2021-04-22 15:58 ` Darrick J. Wong 0 siblings, 0 replies; 17+ messages in thread From: Darrick J. Wong @ 2021-04-22 15:58 UTC (permalink / raw) To: Gao Xiang; +Cc: Dave Chinner, linux-xfs, Zorro Lang, Carlos Maiolino On Thu, Apr 22, 2021 at 11:12:15AM +0800, Gao Xiang wrote: > On Thu, Apr 22, 2021 at 01:01:02PM +1000, Dave Chinner wrote: > > On Thu, Apr 22, 2021 at 10:06:13AM +0800, Gao Xiang wrote: > > > Hi Dave, > > > > > > On Thu, Apr 22, 2021 at 11:44:46AM +1000, Dave Chinner wrote: > > > > On Wed, Apr 21, 2021 at 11:01:29AM +0800, Gao Xiang wrote: > > > > > On Wed, Apr 21, 2021 at 11:45:26AM +1000, Dave Chinner wrote: > > > > > > On Wed, Apr 21, 2021 at 05:54:43AM +0800, Gao Xiang wrote: > > > > > > #1 is bad because there are cases where we want to write the > > > > > > counters even for !lazysbcount filesystems (e.g. mkfs, repair, etc). > > > > > > > > > > > > #2 is essentially a hack around the fact that mp->m_sb is not kept > > > > > > up to date in the in-memory superblock for !lazysbcount filesystems. > > > > > > > > > > > > #3 keeps the in-memory superblock up to date for !lazysbcount case > > > > > > so they are coherent with the on-disk values and hence we only need > > > > > > to update the in-memory superblock counts for lazysbcount > > > > > > filesystems before calling xfs_sb_to_disk(). > > > > > > > > > > > > #3 is my preferred solution. > > > > > > > > > > > > > That will indeed cause more modification, I'm not quite sure if it's > > > > > > > quite ok honestly. But if you assume that's more clear, I could submit > > > > > > > an alternative instead later. > > > > > > > > > > > > I think the version you posted doesn't fix the entire problem. It > > > > > > merely slaps a band-aid over the symptom that is being seen, and > > > > > > doesn't address all the non-coherent data that can be written to the > > > > > > superblock here. > > > > > > > > > > As I explained on IRC as well, I think for !lazysbcount cases, fdblocks, > > > > > icount and ifree are protected by sb buffer lock. and the only users of > > > > > these three are: > > > > > 1) xfs_trans_apply_sb_deltas() > > > > > 2) xfs_log_sb() > > > > > > > > That's just a happy accident and not intentional in any way. Just > > > > fixing the case that occurs while holding the sb buffer lock doesn't > > > > actually fix the underlying problem, it just uses this as a bandaid. > > > > > > I think for !lazysbcases, sb buffer lock is only a reliable lock that > > > can be relied on for serialzing (since we need to make sure each sb > > > write matches the corresponding fdblocks, ifree, icount. So sb buffer > > > needs be locked every time. So so need to recalc on dirty log.) > > > > > > > > > > > > > > So I've seen no need to update sb_icount, sb_ifree in that way (I mean > > > > > my v2, although I agree it's a bit hacky.) only sb_fdblocks matters. > > > > > > > > > > But the reason why this patch exist is only to backport to old stable > > > > > kernels, since after [PATCH v2 2/2], we can get rid of all of > > > > > !lazysbcount cases upstream. > > > > > > > > > > But if we'd like to do more e.g. by taking m_sb_lock, I've seen the > > > > > xfs codebase quite varies these years. and I modified some version > > > > > like http://paste.debian.net/1194481/ > > > > > > > > I said on IRC that this is what xfs_trans_unreserve_and_mod_sb() is > > > > for. For !lazysbcount filesystems the transaction will be marked > > > > dirty (i.e XFS_TRANS_SB_DIRTY is set) and so we'll always run the > > > > slow path that takes the m_sb_lock and updates mp->m_sb. > > > > > > > > It's faster for me to explain this by patch than any other way. See > > > > below. > > > > > > I know what you mean, but there exists 3 things: > > > 1) we be64_add_cpu() on-disk fdblocks, ifree, icount at > > > xfs_trans_apply_sb_deltas(), and then do the same bahavior in > > > xfs_trans_unreserve_and_mod_sb() for in-memory counters again. > > > that is (somewhat) fragile. > > > > That's exactly how the superblock updates have been done since the > > mid 1990s. It's the way it was intended to work: > > > > - xfs_trans_apply_sb_deltas() applies the changes to the on > > disk superblock > > - xfs_trans_unreserve_and_mod_sb() applies the changes to the > > in-memory superblock. > > > > All my patch does is follow the long established separation of > > update responsibilities. It is actually returning the code to the > > behaviour we had before lazy superblock counters were introduced. > > > > > 2) m_sb_lock behaves no effect at this. This lock between > > > xfs_log_sb() and xfs_trans_unreserve_and_mod_sb() is still > > > sb buffer lock for !lazysbcount cases. > > > > The m_sb_lock doesn't need to have any effect on this. It's to > > prevent concurrent updates of the in-core superblock, not to prevent > > access to the superblock buffer. > > > > i.e. the superblock buffer lock protects against concurrent updates > > of the superblock buffer, and hence while progating and logging > > changes to the superblock buffer we have to have the superblock > > buffer locked. > > > > > 3) in-memory sb counters are serialized by some spinlock now, > > > > No, they are not. Lazysbcount does not set XFS_TRANS_SB_DIRTY > > for pure ifree/icount/fdblock updates, so it never runs the code > > I modified in xfs_trans_unreserve_and_mod_sb() unless some other > > part of the superblock is changed. > > > > For !lazysbcount, we always run this path because XFS_TRANS_SB_DIRTY > > is always set. > > > > > so I'm not sure sb per-CPU counters behave for lazysbcount > > > cases, are these used for better performance? > > > > It does not change behaviour of anything at all, execpt the counter > > values for !lazysbcount filesystems are now always kept correctly up > > to date. > > > > > > xfs_sb_to_disk(bp->b_addr, &mp->m_sb); > > > > xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF); > > > > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c > > > > index bcc978011869..438e41931b55 100644 > > > > --- a/fs/xfs/xfs_trans.c > > > > +++ b/fs/xfs/xfs_trans.c > > > > @@ -629,6 +629,9 @@ xfs_trans_unreserve_and_mod_sb( > > > > > > > > /* apply remaining deltas */ > > > > spin_lock(&mp->m_sb_lock); > > > > + mp->m_sb.sb_fdblocks += blkdelta; > > > > > > not sure that is quite equal to blkdelta, since (I think) we might need > > > to apply t_res_fdblocks_delta for !lazysbcount cases but not lazysbcount > > > cases, but I'm not quite sure, just saw the comment above > > > xfs_trans_unreserve_and_mod_sb() and the implementation of > > > xfs_trans_apply_sb_deltas(). > > > > Yes, I forgot about the special delayed allocation space accounting. > > We'll have to add that, too, so it becomes: > > > > + mp->m_sb.sb_fdblocks += blkdelta + tp->t_res_fdblocks_delta; > > + mp->m_sb.sb_icount += idelta; > > + mp->m_sb.sb_ifree += ifreedelta; > > > > But this doesn't change the structure of the patch in any way. > > Anyway, I think this'd be absolutely fine to fix this issue as well, > so: > > Reviewed-by: Gao Xiang <hsiangkao@redhat.com> > > (Although I still insist on my v2 [just my own thought] since in-memory > sb counters are totally unused/reserved compared with on-disk sb counters > for sb_fdblocks and per-CPU sb counters for sb_ifree / sb_icount for > the whole !lazysbcount cases, maybe adding some comments is better. > But I'm also fine if the patch goes like this ;) ) Does this patch (+ other fixes) fix the problem? If Zorro says it's ok, please send this as a formal patch submission so it isn't buried in a thread. --D > Thanks, > Gao Xiang > > > > > CHeers, > > > > Dave. > > -- > > Dave Chinner > > david@fromorbit.com > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2021-04-22 15:58 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-04-20 11:08 [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount Gao Xiang 2021-04-20 11:08 ` [PATCH v2 2/2] xfs: turn on lazysbcount unconditionally Gao Xiang 2021-04-20 16:22 ` Darrick J. Wong 2021-04-20 20:00 ` Gao Xiang 2021-04-22 0:01 ` Darrick J. Wong 2021-04-22 1:51 ` Gao Xiang 2021-04-22 5:11 ` Zorro Lang 2021-04-20 17:42 ` [PATCH v2 1/2] xfs: don't use in-core per-cpu fdblocks for !lazysbcount Darrick J. Wong 2021-04-20 21:25 ` Dave Chinner 2021-04-20 21:54 ` Gao Xiang 2021-04-21 1:45 ` Dave Chinner 2021-04-21 3:01 ` Gao Xiang 2021-04-22 1:44 ` Dave Chinner 2021-04-22 2:06 ` Gao Xiang 2021-04-22 3:01 ` Dave Chinner 2021-04-22 3:12 ` Gao Xiang 2021-04-22 15:58 ` Darrick J. Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox