Re: [PATCH] [RFC] xfs: filesystem expansion design documentation

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Eric Sandeen <sandeen@sandeen.net>
Cc: "Darrick J. Wong" <djwong@kernel.org>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH] [RFC] xfs: filesystem expansion design documentation
Date: Thu, 25 Jul 2024 11:42:41 +1000	[thread overview]
Message-ID: <ZqGtkbc+QpE5YyyR@dread.disaster.area> (raw)
In-Reply-To: <5ccb5a51-75c0-41eb-92ff-a2ce5674ac0f@sandeen.net>

On Wed, Jul 24, 2024 at 05:38:46PM -0500, Eric Sandeen wrote:
> On 7/24/24 5:12 PM, Darrick J. Wong wrote:
> > On Wed, Jul 24, 2024 at 05:06:58PM -0500, Eric Sandeen wrote:
> >> On 7/21/24 6:01 PM, Dave Chinner wrote:
> >>> +The solution should already be obvious: we can exploit the sparseness of FSBNO
> >>> +addressing to allow AGs to grow to 1TB (maximum size) simply by configuring
> >>> +sb_agblklog appropriately at mkfs.xfs time. Hence if we have 16MB AGs (minimum
> >>> +size) and sb_agblklog = 30 (1TB max AG size), we can expand the AG size up
> >>> +to their maximum size before we start appending new AGs.
> >>
> >> there's a check in xfs_validate_sb_common() that tests whether sg_agblklog is
> >> really the next power of two from sb_agblklog:
> >>
> >> sbp->sb_agblklog != xfs_highbit32(sbp->sb_agblocks - 1) + 1
> >>
> >> so I think the proposed idea would require a feature flag, right?
> >>
> >> That might make it a little trickier as a drop-in replacement for cloud
> >> providers because these new expandable filesystem images would only work on
> >> kernels that understand the (trivial) new feature, unless I'm missing
> >> something.

Yes, I think that's the case.

I don't see this as a showstopper - golden images that VMs get
cloned from tend to be specific to the distro release they contain,
so as long as both the orchestration nodes and the distro kernel
within the image support the feature bit it will just work, yes?

This essentially makes it an image build time decision to support
the expand feature.  i.e. if the fs in the image has the feature bit
set, deployment can use expand, otherwise skip it and go straight to
grow.

> > agblklog and agblocks would both be set correctly if you did mkfs.xfs -d
> > agsize=1T on a small image; afaict it's only mkfs that cares that
> > dblocks >= agblocks.

I don't think we can change sb_agblocks like this 

Fundamentally, sb_agblocks is the physical length of AGs, not the
theoretical maximum size. Setting it to the maximum possible size
fails in all sorts of ways for multiple AG filesystems
(__xfs_ag_block_count() is the simplest example to point out), and
even on single AG filesystems it will be problematic.

sb->sb_agblklog is not a physical size - it is decoupled from the
physical size of the AG so it can be used to calculate the address
space locations of inodes and FSBNOs. Hence we can change
sb_agblklog without changing sb_agblocks and other physical lengths
because it doesn't change anything to do with the physical layout of
the filesystem.

> Yes, in a single AG image like you've suggested.

I'm not so sure about that.

We're not so lucky with code that validate against the AG header
lengths or just use sb_agblocks blindly.

e.g. for single AG fs's to work, they need to use
__xfs_ag_block_count() everywhere to get the size of the AG and have
it return sb_dblocks instead.  We have code that assumes that
sb_agblocks is the highest AGBNO possible in an AG and doesn't take
into account that the size of the runt AG (e.g.
xfs_alloc_ag_max_usable() is one of these).

I've always understood there to be an implicit requirement for
sb_agblocks and agf->agf_length/agi->agi_length to be the same and
define the actual maximum valid block number in the AG.

I don't think we can change that, and I don't think we should try to
change this even for single AG filesystems. We don't even support
single AG filesystems. And I don't think we should try to change
such a fundamental physical layout definition for what would be a
filesystem format that only physically exists temporarily in a cloud
deployment context.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2024-07-25  1:42 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-21 23:01 [PATCH] [RFC] xfs: filesystem expansion design documentation Dave Chinner
2024-07-23 23:58 ` Darrick J. Wong
2024-07-24  0:46   ` Dave Chinner
2024-07-24 15:41     ` Eric Sandeen
2024-07-24 15:44       ` Eric Sandeen
2024-07-24 17:23         ` Darrick J. Wong
2024-07-24 17:33           ` Eric Sandeen
2024-07-24 21:08     ` Darrick J. Wong
2024-07-24 22:50       ` Eric Sandeen
2024-07-24 23:48         ` Darrick J. Wong
2024-07-25  0:41       ` Dave Chinner
2024-08-02 12:09         ` Brian Foster
2024-08-06  2:01           ` Dave Chinner
2024-08-09 13:31             ` Brian Foster
2024-07-24 22:06 ` Eric Sandeen
2024-07-24 22:12   ` Darrick J. Wong
2024-07-24 22:38     ` Eric Sandeen
2024-07-25  1:42       ` Dave Chinner [this message]
2024-07-29  1:45       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZqGtkbc+QpE5YyyR@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox