All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Wengang Wang <wen.gang.wang@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>
Subject: Re: [PATCH] xfs: make sure sb_fdblocks is non-negative
Date: Mon, 3 Jun 2024 11:43:27 -0700	[thread overview]
Message-ID: <20240603184327.GC52987@frogsfrogsfrogs> (raw)
In-Reply-To: <39E20DD5-EDB2-4239-B6EE-237B228845F5@oracle.com>

On Mon, Jun 03, 2024 at 05:23:40PM +0000, Wengang Wang wrote:
> 
> 
> > On Jun 2, 2024, at 4:37 PM, Dave Chinner <david@fromorbit.com> wrote:
> > 
> > On Fri, May 31, 2024 at 03:44:56PM +0000, Wengang Wang wrote:
> >> Hi Dave,
> >> 
> >> Do you have further comments and/or suggestions? Or give a RB pls :D
> > 
> > Sorry, LSFMM intervened and I didn't notice your comment until now.
> > 
> No worries!
> 
> >>> On May 13, 2024, at 10:06 AM, Wengang Wang <wen.gang.wang@oracle.com> wrote:
> >>> 
> >>> Things is that we have a metadump, looking at the fdblocks from super block 0, it is good.
> >>> 
> >>> $ xfs_db -c "sb 0" -c "p" cust.img |egrep "dblocks|ifree|icount"
> >>> dblocks = 26214400
> >>> icount = 512
> >>> ifree = 337
> >>> fdblocks = 25997100
> >>> 
> >>> And when looking at the log, we have the following:
> >>> 
> >>> $ egrep -a "fdblocks|icount|ifree" cust.log |tail
> >>> sb_fdblocks 37
> >>> sb_icount 1056
> >>> sb_ifree 87
> >>> sb_fdblocks 37
> >>> sb_icount 1056
> >>> sb_ifree 87
> >>> sb_fdblocks 37
> >>> sb_icount 1056
> >>> sb_ifree 87
> >>> sb_fdblocks 18446744073709551604
> >>> 
> >>> # cust.log is output of my script which tries to parse the log buffer.
> >>> 
> >>> 18446744073709551604ULL == 0xfffffffffffffff4 or -12LL 
> >>> 
> >>> With upstream kernel (6.7.0-rc3), when I tried to mount (log recover) the metadump,
> >>> I got the following in dmesg:
> >>> 
> >>> [   52.927796] XFS (loop0): SB summary counter sanity check failed
> >>> [   52.928889] XFS (loop0): Metadata corruption detected at xfs_sb_write_verify+0x60/0x110 [xfs], xfs_sb block 0x0
> >>> [   52.930890] XFS (loop0): Unmount and run xfs_repair
> >>> [   52.931797] XFS (loop0): First 128 bytes of corrupted metadata buffer:
> >>> [   52.932954] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 01 90 00 00  XFSB............
> >>> [   52.934333] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >>> [   52.935733] 00000020: c9 c1 ed ae 84 ed 46 b9 a1 f0 09 57 4a a9 98 42  ......F....WJ..B
> >>> [   52.937120] 00000030: 00 00 00 00 01 00 00 06 00 00 00 00 00 00 00 80  ................
> >>> [   52.938515] 00000040: 00 00 00 00 00 00 00 81 00 00 00 00 00 00 00 82  ................
> >>> [   52.939919] 00000050: 00 00 00 01 00 64 00 00 00 00 00 04 00 00 00 00  .....d..........
> >>> [   52.941293] 00000060: 00 00 64 00 b4 a5 02 00 02 00 00 08 00 00 00 00  ..d.............
> >>> [   52.942661] 00000070: 00 00 00 00 00 00 00 00 0c 09 09 03 17 00 00 19  ................
> >>> [   52.944046] XFS (loop0): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply+0x38b/0x3a0 [xfs] (fs/xfs/xfs_buf.c:1559).  Shutting down filesystem.
> >>> [   52.946710] XFS (loop0): Please unmount the filesystem and rectify the problem(s)
> >>> [   52.948099] XFS (loop0): log mount/recovery failed: error -117
> >>> [   52.949810] XFS (loop0): log mount failed
> > 
> > And that's what should be in the commit message, as it explains
> > exactly how the problem occurred, the symptom that was seen, and
> > why the change is necessary. It also means that anyone else who sees
> > a similar problem and is grepping the commit history will see this
> > and recognise it, thereby knowing that this is the fix they need...
> > 
> 
> OK, got it.
> 
> >>> Looking at corresponding code:
> >>> 231 xfs_validate_sb_write(
> >>> 232         struct xfs_mount        *mp,
> >>> 233         struct xfs_buf          *bp,
> >>> 234         struct xfs_sb           *sbp)
> >>> 235 {
> >>> 236         /*
> >>> 237          * Carry out additional sb summary counter sanity checks when we write
> >>> 238          * the superblock.  We skip this in the read validator because there
> >>> 239          * could be newer superblocks in the log and if the values are garbage
> >>> 240          * even after replay we'll recalculate them at the end of log mount.
> >>> 241          *
> >>> 242          * mkfs has traditionally written zeroed counters to inprogress and
> >>> 243          * secondary superblocks, so allow this usage to continue because
> >>> 244          * we never read counters from such superblocks.
> >>> 245          */
> >>> 246         if (xfs_buf_daddr(bp) == XFS_SB_DADDR && !sbp->sb_inprogress &&
> >>> 247             (sbp->sb_fdblocks > sbp->sb_dblocks ||
> >>> 248              !xfs_verify_icount(mp, sbp->sb_icount) ||
> >>> 249              sbp->sb_ifree > sbp->sb_icount)) {
> >>> 250                 xfs_warn(mp, "SB summary counter sanity check failed");
> >>> 251                 return -EFSCORRUPTED;
> >>> 252         }
> >>> 
> >>> From dmesg and code, we know the check failure was due to bad sb_ifree vs sb_icount or bad sb_fdblocks vs sb_dblocks.
> >>> 
> >>> Looking at the super block dump and log dump,
> >>> We know ifree and icount are good, what’s bad is sb_fdblocks. And that sb_fdblocks is from log.
> >>> # I verified that sb_fdblocks is 0xfffffffffffffff4 with a UEK debug kernel (though not 6.7.0-rc3)
> >>> 
> >>> So the sb_fdblocks is updated from log to incore at xfs_log_sb() -> xfs_validate_sb_write() path though
> >>> Should be may re-calculated from AGs.
> >>> 
> >>> The fix aims to make xfs_validate_sb_write() happy.
> > 
> > What about the sb_icount and sb_ifree counters? They are also percpu
> > counters, and they can return transient negative numbers, too,
> > right? If they end up in the log, the same as this transient
> > negative sb_fdblocks count, won't that also cause exactly the same
> > issue?
> > 
> 
> Yes, sb_icount and sb_ifree are also percpu counters. They have been addressed by
> commit 59f6ab40fd8735c9a1a15401610a31cc06a0bbd6, right?

icount and ifree don't go through xfs_mod_freecounter, which means that
they never do the "subtract and see if we went negative" dance that
fdblocks/frextents does.  percpu_counter_sum_positive isn't necessary.

--D

> > i.e. if we need to fix the sb_fdblocks sum to always be positive,
> > then we need to do the same thing with the other lazy superblock
> > per-cpu counters so they don't trip the over the same transient
> > underflow issue...
> > 
> 
> Agreed. While, I think we don’t have further percpu counters problems after this patch.  
> 
> Will send a new patch with line breakness.
> 
> Thanks,
> Wengang
> 
> > -Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com
> 

  reply	other threads:[~2024-06-03 18:43 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-11  0:34 [PATCH] xfs: make sure sb_fdblocks is non-negative Wengang Wang
2024-05-11  1:17 ` Dave Chinner
2024-05-13 17:06   ` Wengang Wang
2024-05-31 15:44     ` Wengang Wang
2024-06-02 23:37       ` Dave Chinner
2024-06-03 17:23         ` Wengang Wang
2024-06-03 18:43           ` Darrick J. Wong [this message]
2024-06-04  4:06             ` Dave Chinner
2024-05-31 15:52 ` Darrick J. Wong
2024-05-31 18:03   ` Wengang Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240603184327.GC52987@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=wen.gang.wang@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.