public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/3] xfs: allow sparse inode records at the end of runt AGs
Date: Sun, 27 Oct 2024 08:47:34 +1100	[thread overview]
Message-ID: <Zx1jduWy+v0VLjWB@dread.disaster.area> (raw)
In-Reply-To: <20241025221919.GP2386201@frogsfrogsfrogs>

On Fri, Oct 25, 2024 at 03:19:19PM -0700, Darrick J. Wong wrote:
> On Fri, Oct 25, 2024 at 05:43:41PM +1100, Dave Chinner wrote:
> > On Thu, Oct 24, 2024 at 10:00:38AM -0700, Darrick J. Wong wrote:
> > > On Thu, Oct 24, 2024 at 01:51:04PM +1100, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > > 
> > > > Due to the failure to correctly limit sparse inode chunk allocation
> > > > in runt AGs, we now have many production filesystems with sparse
> > > > inode chunks allocated across the end of the runt AG. xfs_repair
> > > > or a growfs is needed to fix this situation, neither of which are
> > > > particularly appealing.
> > > > 
> > > > The on disk layout from the metadump shows AG 12 as a runt that is
> > > > 1031 blocks in length and the last inode chunk allocated on disk at
> > > > agino 8192.
> > > 
> > > Does this problem also happen on non-runt AGs?
> > 
> > No. The highest agbno an inode chunk can be allocated at in a full
> > size AG is aligned by rounding down from sb_agblocks.  Hence
> > sb_agblocks can be unaligned and nothing will go wrong. The problem
> > is purely that the runt AG being shorter than sb_agblocks and so
> > this highest agbno allocation guard is set beyond the end of the
> > AG...
> 
> Ah, right, and we don't want sparse inode chunks to cross EOAG because
> then you'd have a chunk whose clusters would cross into the next AG, at
> least in the linear LBA space.  That's why (for sparse inode fses) it
> makes sense that we want to round last_agino down by the chunk for
> non-last AGs, and round it down by only the cluster for the last AG.
> 
> Waitaminute, what if the last AG is less than a chunk but more than a
> cluster's worth of blocks short of sb_agblocks?  Or what if sb_agblocks
> doesn't align with a chunk boundary?  I think the new code:
> 
> 	if (xfs_has_sparseinodes(mp) && agno == mp->m_sb.sb_agcount - 1)
> 		end_align = mp->m_sb.sb_spino_align;
> 	else
> 		end_align = M_IGEO(mp)->cluster_align;
> 	bno = round_down(eoag, end_align);
> 	*last = XFS_AGB_TO_AGINO(mp, bno) - 1;
> 
> will allow a sparse chunk that (erroneously) crosses sb_agblocks, right?
> Let's say sb_spino_align == 4, sb_inoalignmt == 8, sb_agcount == 2,
> sb_agblocks == 100,007, and sb_dblocks == 200,014.
> 
> For AG 0, eoag is 100007, end_align == cluster_align == 8, so bno is
> rounded down to 100000.  *last is thus set to the inode at the end of
> block 99999.
> 
> For AG 1, eoag is also 100007, but now end_align == 4.  bno is rounded
> down to 100,004.  *last is set to the inode at the end of block 100003,
> not 99999.
> 
> But now let's say we growfs another 100007 blocks onto the filesystem.
> Now we have 3x AGs, each with 100007 blocks.  But now *last for AG 1
> becomes 99999 even though we might've allocated an inode in block
> 100000 before the growfs.  That will cause a corruption error too,
> right?

Yes, I overlooked that case. Good catch.

> IOWs, don't we want something more like this?
> 
> 	/*
> 	 * The preferred inode cluster allocation size cannot ever cross
> 	 * sb_agblocks.  cluster_align is one of the following:
> 	 *
> 	 * - For sparse inodes, this is an inode chunk.
> 	 * - For aligned non-sparse inodes, this is an inode cluster.
> 	 */
> 	bno = round_down(sb_agblocks, cluster_align);
> 	if (xfs_has_sparseinodes(mp) &&
> 	    agno == mp->m_sb.sb_agcount - 1) {
> 		/*
> 		 * For a filesystem with sparse inodes, an inode chunk
> 		 * still cannot cross sb_agblocks, but it can cross eoag
> 		 * if eoag < agblocks.  Inode clusters cannot cross eoag.
> 		 */
> 		last_clus_bno = round_down(eoag, sb_spino_align);
> 		bno = min(bno, last_clus_bno);
> 	}
> 	*last = XFS_AGB_TO_AGINO(mp, bno) - 1;

Yes, something like that is needed.

> > > If the only free space
> > > that could be turned into a sparse cluster is unaligned space at the
> > > end of AG 0, would you still get the same corruption error?
> > 
> > It will only happen if AG 0 is a runt AG, and then the same error
> > would occur. We don't currently allow single AG filesystems, nor
> > when they are set up  do we create them as a runt - the are always
> > full size. So current single AG filesystems made by mkfs won't have
> > this problem.
> 
> Hmm, do you have a quick means to simulate this last-AG unaligned
> icluster situation?

No, I haven't been able to reproduce it on demand - nothing I've
tried has specifically landed a sparse inode cluster in exactly the
right position to trigger this. I typically get ENOSPC when I think
it should trigger and it's not immediately obvious what I'm missing
in way of pre-conditions to trigger it. I've been able to test the
fixes on a metadump that has the sparse chunk already on disk
(which came from one of the production systems hitting this).

-Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2024-10-26 21:47 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-24  2:51 [PATCH 0/3] xfs: sparse inodes overlap end of filesystem Dave Chinner
2024-10-24  2:51 ` [PATCH 1/3] xfs: fix sparse inode limits on runt AG Dave Chinner
2024-10-24  2:51 ` [PATCH 2/3] xfs: allow sparse inode records at the end of runt AGs Dave Chinner
2024-10-24 17:00   ` Darrick J. Wong
2024-10-25  6:43     ` Dave Chinner
2024-10-25 22:19       ` Darrick J. Wong
2024-10-26 21:47         ` Dave Chinner [this message]
2024-10-24  2:51 ` [PATCH 3/3] xfs: sb_spino_align is not verified Dave Chinner
2024-10-24 16:55   ` Darrick J. Wong
2024-10-25  6:33     ` Dave Chinner
2024-12-07  0:25   ` Luis Chamberlain
2024-12-07  0:32     ` Darrick J. Wong
2024-12-07  0:36       ` Luis Chamberlain
2024-12-07 11:34         ` Carlos Maiolino
2024-10-24 13:20 ` [PATCH 0/3] xfs: sparse inodes overlap end of filesystem Brian Foster
2024-10-25  0:48   ` Dave Chinner
2024-10-25 19:33     ` Brian Foster
2024-10-24 16:38 ` Darrick J. Wong
2024-10-25  6:29   ` Dave Chinner
2024-10-29 16:14 ` Eric Sandeen
2024-10-31 11:44   ` Carlos Maiolino
2024-10-31 20:45     ` Eric Sandeen
2024-10-31 22:13     ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zx1jduWy+v0VLjWB@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox