Re: [PATCH 0/7] xfs: refactor and tablise growfs

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 0/7] xfs: refactor and tablise growfs
Date: Wed, 7 Feb 2018 18:10:45 +1100	[thread overview]
Message-ID: <20180207071045.5nlw7b7fozldohgs@destitution> (raw)
In-Reply-To: <20180206234407.GN4849@magnolia>

On Tue, Feb 06, 2018 at 03:44:07PM -0800, Darrick J. Wong wrote:
> On Thu, Feb 01, 2018 at 05:41:55PM +1100, Dave Chinner wrote:
> > Hi folks,
> > 
> > This is a series I posted months ago with the first thinspace
> > filesystem support. There was no comments on any of these patches
> > because all the heat and light got focussed on the growfs API.
> > I'm posting this separately to avoid that problem again....
> 
> ...just in time to collide head-on with the online repair series that I
> am planning to push out for review for 4.17. :)

Yuck. :(

> > Anyway, the core of this change is to make the growfs code much
> > simpler to extend. Most of the code that does structure
> > initialisation is cookie-cutter code and it's whacked into one great
> > big function. This patch set splits it up into separate functions
> > and uses common helper functions where possible. The different
> > structures and their initialisation definitions are now held in a
> > table, so when we add new stuctures or modify existing structures
> > it's a simple and isolate change.
> > 
> > The reworked initialisation code is suitable for moving to libxfs
> > and converting mkfs.xfs to use it for the initial formatting of
> > the filesystem. This will take more work to acheive, so this
> > patch set stops short of moving the code to libxfs.
> 
> Or maybe I'll just pull the patches into my dev tree and move all the
> code to libxfs /now/ since I don't see much difference between growing
> extra limbs and regrowing new body parts.  The "repair" code I've
> written so far chooses to rebuild the entire data structure from other
> parts rather than trying to save an existing structure:
> 
> 1. Lock the AG{I,F,FL} header/inode/whatever we're repairing.
> 2. Gather all the data that would have been in that data structure.
> 3. Make a list of all the blocks with the relevant rmap owner.
> 4. Make a list of all the blocks with the relevant rmap owner that are
>    owned by any other structure.  For example, if we're rebuilding the
>    inobt then we make a list of all the OWN_INOBT blocks, and then we
>    iterate the finobt to make a list of finobt blocks.
> 5. Allocate a new block for the root, if necessary.
> 6. Initialize the data structure.
> 7. Import all the data gathered in step 2.
> 8. Subtract the list made in step 4 from the list made in step 3.  These
>    are all the blocks that were owned by the structure we just rebuilt,
>    so free them.
> 9. Commit transaction, release locks, we're done.
> 
> (Steps 7-8 involve rolling transactions.)
> 
> I think growfs'ing a new AG is basically steps 1, 5, 6, 9, with the only
> twist being that growfs uses a delwri list instead of joining things to
> a transaction. 

Actually, it's a lot more different than just that. It's not
transactional, and we can't use cached buffers (and hence
transactions) because we're writing beyond the current valid end
of the filesystem. i.e. we're initialising all the structures before
we modify the superblock to indicate they are valid. That's also why
all the dirty buffers are gathered on a delwri list rather than in a
transaction context and we don't have to worry about locking...

> For this to work there needs to be separate functions to
> initialize a block and to deal with writing the xfs_buf to disk; I think
> I see this happening in the patchset, but tbh I suck at reading diff. :)

Yeah, there are separate functions to initialise the different
structures, but common code controls the delwri list addition and
writeback.

> > There are also mods to the secondary superblock update algorithm
> > which make it more resilient in the face of writeback failures. We
> > use a two pass update now - the main growfs loop now initialised
> > secondary superblocks with sb_inprogess = 1 to indicate it is not
> > in a valid state before we make any modifications, then after teh
> > transactional grow we do a second pass to set sb_inprogess = 0 and
> > mark them valid.
> > 
> > This means that if we fail to write any secondary superblock, repair
> > is not going to get confused by partial grow state. If we crash
> > during the initial write, nothing has changed in the primary
> > superblock. If we crash after the primary sb grow, then we'll know
> > exactly what secondary superblocks did not get updated because
> > they'll be the ones with sb_inprogress = 1 in them. Hence the
> > recovery process becomes much easier as the parts of the fs that
> > need updating are obvious....
> 
> I assume there will eventually be some kind of code to detect
> sb_inprogress==1 and fix it as soon as we try to write an AG?

Well, sort of. We never actually read secondary superblocks, except
in repair and now scrub. But reapir needs to know if secondary
superblocks are stale or not yet valid - right now it's an interesting
situation to be in if grow succeeds and then fails to write all the
secondary superblocks because we can't tell the difference between
per-grow and post-grow sbs. At least with sb_inprogress set in these
secondary superblocks we'll be able to tell how far the update got
before failing, and hence that makes it easier to recover to
consistent state.

Ultimately, I want to make the secondary superblock update loop a
set of ordered buffers associated with the primary superblock change
transaction. That way we can make log recovery correctly recover
from crashes during grow operations (i.e. it can re-run the
secondary sb updates based on the sb_inprogress flags...).

Or alternatively, turn grow into a set of defered ops with intents
so we can do the superblock size change first and always be able to
complete the secondary superblock updates in a transactional
manner...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

     prev parent reply	other threads:[~2018-02-07  7:09 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-01  6:41 [PATCH 0/7] xfs: refactor and tablise growfs Dave Chinner
2018-02-01  6:41 ` [PATCH 1/7] xfs: factor out AG header initialisation from growfs core Dave Chinner
2018-02-08 18:53   ` Brian Foster
2018-02-01  6:41 ` [PATCH 2/7] xfs: convert growfs AG header init to use buffer lists Dave Chinner
2018-02-08 18:53   ` Brian Foster
2018-02-01  6:41 ` [PATCH 3/7] xfs: factor ag btree reoot block initialisation Dave Chinner
2018-02-08 18:54   ` Brian Foster
2018-02-08 20:00     ` Darrick J. Wong
2018-02-09 13:10       ` Brian Foster
2018-02-12  0:45         ` Darrick J. Wong
2018-02-15  5:53           ` Darrick J. Wong
2018-02-01  6:41 ` [PATCH 4/7] xfs: turn ag header initialisation into a table driven operation Dave Chinner
2018-02-09 16:11   ` Brian Foster
2018-02-01  6:42 ` [PATCH 5/7] xfs: make imaxpct changes in growfs separate Dave Chinner
2018-02-09 16:11   ` Brian Foster
2018-02-15 22:10     ` Dave Chinner
2018-02-01  6:42 ` [PATCH 6/7] xfs: separate secondary sb update in growfs Dave Chinner
2018-02-09 16:11   ` Brian Foster
2018-02-15 22:23     ` Dave Chinner
2018-02-16 12:31       ` Brian Foster
2018-02-01  6:42 ` [PATCH 7/7] xfs: rework secondary superblock updates " Dave Chinner
2018-02-09 16:12   ` Brian Foster
2018-02-15 22:31     ` Dave Chinner
2018-02-16 12:56       ` Brian Foster
2018-02-16 16:20         ` Darrick J. Wong
2018-02-19  2:16         ` Dave Chinner
2018-02-19 13:21           ` Brian Foster
2018-02-19 22:14             ` Dave Chinner
2018-02-20 12:44               ` Brian Foster
2018-03-24  0:37                 ` Darrick J. Wong
2018-02-06 23:44 ` [PATCH 0/7] xfs: refactor and tablise growfs Darrick J. Wong
2018-02-07  7:10   ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180207071045.5nlw7b7fozldohgs@destitution \
    --to=david@fromorbit.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox