From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/3] xfs: allow sparse inode records at the end of runt AGs
Date: Sun, 27 Oct 2024 08:47:34 +1100 [thread overview]
Message-ID: <Zx1jduWy+v0VLjWB@dread.disaster.area> (raw)
In-Reply-To: <20241025221919.GP2386201@frogsfrogsfrogs>
On Fri, Oct 25, 2024 at 03:19:19PM -0700, Darrick J. Wong wrote:
> On Fri, Oct 25, 2024 at 05:43:41PM +1100, Dave Chinner wrote:
> > On Thu, Oct 24, 2024 at 10:00:38AM -0700, Darrick J. Wong wrote:
> > > On Thu, Oct 24, 2024 at 01:51:04PM +1100, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > >
> > > > Due to the failure to correctly limit sparse inode chunk allocation
> > > > in runt AGs, we now have many production filesystems with sparse
> > > > inode chunks allocated across the end of the runt AG. xfs_repair
> > > > or a growfs is needed to fix this situation, neither of which are
> > > > particularly appealing.
> > > >
> > > > The on disk layout from the metadump shows AG 12 as a runt that is
> > > > 1031 blocks in length and the last inode chunk allocated on disk at
> > > > agino 8192.
> > >
> > > Does this problem also happen on non-runt AGs?
> >
> > No. The highest agbno an inode chunk can be allocated at in a full
> > size AG is aligned by rounding down from sb_agblocks. Hence
> > sb_agblocks can be unaligned and nothing will go wrong. The problem
> > is purely that the runt AG being shorter than sb_agblocks and so
> > this highest agbno allocation guard is set beyond the end of the
> > AG...
>
> Ah, right, and we don't want sparse inode chunks to cross EOAG because
> then you'd have a chunk whose clusters would cross into the next AG, at
> least in the linear LBA space. That's why (for sparse inode fses) it
> makes sense that we want to round last_agino down by the chunk for
> non-last AGs, and round it down by only the cluster for the last AG.
>
> Waitaminute, what if the last AG is less than a chunk but more than a
> cluster's worth of blocks short of sb_agblocks? Or what if sb_agblocks
> doesn't align with a chunk boundary? I think the new code:
>
> if (xfs_has_sparseinodes(mp) && agno == mp->m_sb.sb_agcount - 1)
> end_align = mp->m_sb.sb_spino_align;
> else
> end_align = M_IGEO(mp)->cluster_align;
> bno = round_down(eoag, end_align);
> *last = XFS_AGB_TO_AGINO(mp, bno) - 1;
>
> will allow a sparse chunk that (erroneously) crosses sb_agblocks, right?
> Let's say sb_spino_align == 4, sb_inoalignmt == 8, sb_agcount == 2,
> sb_agblocks == 100,007, and sb_dblocks == 200,014.
>
> For AG 0, eoag is 100007, end_align == cluster_align == 8, so bno is
> rounded down to 100000. *last is thus set to the inode at the end of
> block 99999.
>
> For AG 1, eoag is also 100007, but now end_align == 4. bno is rounded
> down to 100,004. *last is set to the inode at the end of block 100003,
> not 99999.
>
> But now let's say we growfs another 100007 blocks onto the filesystem.
> Now we have 3x AGs, each with 100007 blocks. But now *last for AG 1
> becomes 99999 even though we might've allocated an inode in block
> 100000 before the growfs. That will cause a corruption error too,
> right?
Yes, I overlooked that case. Good catch.
> IOWs, don't we want something more like this?
>
> /*
> * The preferred inode cluster allocation size cannot ever cross
> * sb_agblocks. cluster_align is one of the following:
> *
> * - For sparse inodes, this is an inode chunk.
> * - For aligned non-sparse inodes, this is an inode cluster.
> */
> bno = round_down(sb_agblocks, cluster_align);
> if (xfs_has_sparseinodes(mp) &&
> agno == mp->m_sb.sb_agcount - 1) {
> /*
> * For a filesystem with sparse inodes, an inode chunk
> * still cannot cross sb_agblocks, but it can cross eoag
> * if eoag < agblocks. Inode clusters cannot cross eoag.
> */
> last_clus_bno = round_down(eoag, sb_spino_align);
> bno = min(bno, last_clus_bno);
> }
> *last = XFS_AGB_TO_AGINO(mp, bno) - 1;
Yes, something like that is needed.
> > > If the only free space
> > > that could be turned into a sparse cluster is unaligned space at the
> > > end of AG 0, would you still get the same corruption error?
> >
> > It will only happen if AG 0 is a runt AG, and then the same error
> > would occur. We don't currently allow single AG filesystems, nor
> > when they are set up do we create them as a runt - the are always
> > full size. So current single AG filesystems made by mkfs won't have
> > this problem.
>
> Hmm, do you have a quick means to simulate this last-AG unaligned
> icluster situation?
No, I haven't been able to reproduce it on demand - nothing I've
tried has specifically landed a sparse inode cluster in exactly the
right position to trigger this. I typically get ENOSPC when I think
it should trigger and it's not immediately obvious what I'm missing
in way of pre-conditions to trigger it. I've been able to test the
fixes on a metadump that has the sparse chunk already on disk
(which came from one of the production systems hitting this).
-Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2024-10-26 21:47 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-24 2:51 [PATCH 0/3] xfs: sparse inodes overlap end of filesystem Dave Chinner
2024-10-24 2:51 ` [PATCH 1/3] xfs: fix sparse inode limits on runt AG Dave Chinner
2024-10-24 2:51 ` [PATCH 2/3] xfs: allow sparse inode records at the end of runt AGs Dave Chinner
2024-10-24 17:00 ` Darrick J. Wong
2024-10-25 6:43 ` Dave Chinner
2024-10-25 22:19 ` Darrick J. Wong
2024-10-26 21:47 ` Dave Chinner [this message]
2024-10-24 2:51 ` [PATCH 3/3] xfs: sb_spino_align is not verified Dave Chinner
2024-10-24 16:55 ` Darrick J. Wong
2024-10-25 6:33 ` Dave Chinner
2024-12-07 0:25 ` Luis Chamberlain
2024-12-07 0:32 ` Darrick J. Wong
2024-12-07 0:36 ` Luis Chamberlain
2024-12-07 11:34 ` Carlos Maiolino
2024-10-24 13:20 ` [PATCH 0/3] xfs: sparse inodes overlap end of filesystem Brian Foster
2024-10-25 0:48 ` Dave Chinner
2024-10-25 19:33 ` Brian Foster
2024-10-24 16:38 ` Darrick J. Wong
2024-10-25 6:29 ` Dave Chinner
2024-10-29 16:14 ` Eric Sandeen
2024-10-31 11:44 ` Carlos Maiolino
2024-10-31 20:45 ` Eric Sandeen
2024-10-31 22:13 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zx1jduWy+v0VLjWB@dread.disaster.area \
--to=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox