public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Eric Sandeen <sandeen@sandeen.net>
Cc: "Darrick J. Wong" <djwong@kernel.org>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH] [RFC] xfs: filesystem expansion design documentation
Date: Mon, 29 Jul 2024 11:45:40 +1000	[thread overview]
Message-ID: <Zqb0RABbeVvs1nKX@dread.disaster.area> (raw)
In-Reply-To: <5ccb5a51-75c0-41eb-92ff-a2ce5674ac0f@sandeen.net>

On Wed, Jul 24, 2024 at 05:38:46PM -0500, Eric Sandeen wrote:
> On 7/24/24 5:12 PM, Darrick J. Wong wrote:
> > On Wed, Jul 24, 2024 at 05:06:58PM -0500, Eric Sandeen wrote:
> >> On 7/21/24 6:01 PM, Dave Chinner wrote:
> >>> +The solution should already be obvious: we can exploit the sparseness of FSBNO
> >>> +addressing to allow AGs to grow to 1TB (maximum size) simply by configuring
> >>> +sb_agblklog appropriately at mkfs.xfs time. Hence if we have 16MB AGs (minimum
> >>> +size) and sb_agblklog = 30 (1TB max AG size), we can expand the AG size up
> >>> +to their maximum size before we start appending new AGs.
> >>
> >> there's a check in xfs_validate_sb_common() that tests whether sg_agblklog is
> >> really the next power of two from sb_agblklog:
> >>
> >> sbp->sb_agblklog != xfs_highbit32(sbp->sb_agblocks - 1) + 1
> >>
> >> so I think the proposed idea would require a feature flag, right?
> >>
> >> That might make it a little trickier as a drop-in replacement for cloud
> >> providers because these new expandable filesystem images would only work on
> >> kernels that understand the (trivial) new feature, unless I'm missing
> >> something.
> > 
> > agblklog and agblocks would both be set correctly if you did mkfs.xfs -d
> > agsize=1T on a small image; afaict it's only mkfs that cares that
> > dblocks >= agblocks.
> 
> Yes, in a single AG image like you've suggested.
> 
> In Dave's proposal, with multiple AGs, I think it would need to be handled.

So in looking into this in more detail, there is one thing I've
overlooked that will absolutely require a change of on-disk format:
we encode the physical location (daddr) of some types of metadata
into the metadata.

This was added so that we could detect misdirected reads or writes;
the metadata knows it's physical location and so checks that it
matches the physical location the buffer was read from or is being
written to.

This includes the btree block headers, symlink blocks and
directory/attr blocks, and I'm pretty sure that the locations
recorded by the rmapbt's are physical locations as well.

To allow expansion to physically relocate metadata, I think we're
going to have to change all of these to record fsbno rather than
daddr so that we can change the physical location of the metadata
without needing to change all the metadata location references. We
calculate daddr from FSBNO or AGBNO everywhere, so I think all the
info we need to do this already available in the code.

Daddrs and fsbnos are also the same size on disk (__be64) so there's
no change to metadata layouts or anything like that. It's just a
change of what that those fields store from physical to internal
sparse addressing.

Hence I don't think this is a showstopper issue - it will just
require verifiers and metadata header initialisation to use the
correct encoding at runtime based on the superblock feature bit.
It's not really that complex, just a bit more work that I originally
thought it would be.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

      parent reply	other threads:[~2024-07-29  1:45 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-21 23:01 [PATCH] [RFC] xfs: filesystem expansion design documentation Dave Chinner
2024-07-23 23:58 ` Darrick J. Wong
2024-07-24  0:46   ` Dave Chinner
2024-07-24 15:41     ` Eric Sandeen
2024-07-24 15:44       ` Eric Sandeen
2024-07-24 17:23         ` Darrick J. Wong
2024-07-24 17:33           ` Eric Sandeen
2024-07-24 21:08     ` Darrick J. Wong
2024-07-24 22:50       ` Eric Sandeen
2024-07-24 23:48         ` Darrick J. Wong
2024-07-25  0:41       ` Dave Chinner
2024-08-02 12:09         ` Brian Foster
2024-08-06  2:01           ` Dave Chinner
2024-08-09 13:31             ` Brian Foster
2024-07-24 22:06 ` Eric Sandeen
2024-07-24 22:12   ` Darrick J. Wong
2024-07-24 22:38     ` Eric Sandeen
2024-07-25  1:42       ` Dave Chinner
2024-07-29  1:45       ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zqb0RABbeVvs1nKX@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox