linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCHSET v26.0 0/9] xfs: fix online repair block reaping
Date: Wed, 9 Aug 2023 16:17:28 -0700	[thread overview]
Message-ID: <20230809231728.GY11352@frogsfrogsfrogs> (raw)
In-Reply-To: <ZNHP4TqsOQPIpiqf@dread.disaster.area>

On Tue, Aug 08, 2023 at 03:17:21PM +1000, Dave Chinner wrote:
> On Mon, Aug 07, 2023 at 05:40:07PM -0700, Darrick J. Wong wrote:
> > On Mon, Aug 07, 2023 at 04:19:11PM +1000, Dave Chinner wrote:
> > > On Thu, Jul 27, 2023 at 03:18:32PM -0700, Darrick J. Wong wrote:
> > > > Hi all,
> > > > 
> > > > These patches fix a few problems that I noticed in the code that deals
> > > > with old btree blocks after a successful repair.
> > > > 
> > > > First, I observed that it is possible for repair to incorrectly
> > > > invalidate and delete old btree blocks if they were crosslinked.  The
> > > > solution here is to consult the reverse mappings for each block in the
> > > > extent -- singly owned blocks are invalidated and freed, whereas for
> > > > crosslinked blocks, we merely drop the incorrect reverse mapping.
> > > > 
> > > > A largeish change in this patchset is moving the reaping code to a
> > > > separate file, because the code are mostly interrelated static
> > > > functions.  For now this also drops the ability to reap file blocks,
> > > > which will return when we add the bmbt repair functions.
> > > > 
> > > > Second, we convert the reap function to use EFIs so that we can commit
> > > > to freeing as many blocks in as few transactions as we dare.  We would
> > > > like to free as many old blocks as we can in the same transaction that
> > > > commits the new structure to the ondisk filesystem to minimize the
> > > > number of blocks that leak if the system crashes before the repair fully
> > > > completes.
> > > > 
> > > > The third change made in this series is to avoid tripping buffer cache
> > > > assertions if we're merely scanning the buffer cache for buffers to
> > > > invalidate, and find a non-stale buffer of the wrong length.  This is
> > > > primarily cosmetic, but makes my life easier.
> > > > 
> > > > The fourth change restructures the reaping code to try to process as many
> > > > blocks in one go as possible, to reduce logging traffic.
> > > > 
> > > > The last change switches the reaping mechanism to use per-AG bitmaps
> > > > defined in a previous patchset.  This should reduce type confusion when
> > > > reading the source code.
> > > > 
> > > > If you're going to start using this mess, you probably ought to just
> > > > pull from my git trees, which are linked below.
> > > > 
> > > > This is an extraordinary way to destroy everything.  Enjoy!
> > > > Comments and questions are, as always, welcome.
> > > 
> > > Overall I don't see any red flags, so from that perspective I think
> > > it's good to merge as is. THe buffer cache interactions are much
> > > neater this time around.
> > > 
> > > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> > 
> > Thanks!
> > 
> > > The main thing I noticed is that the deferred freeing mechanism ifo
> > > rbulk reaping will add up to 128 XEFIs to the transaction. That
> > > could result in a single EFI with up to 128 extents in it, right?
> > 
> > Welllp... the defer ops code only logs up to 16 extents per EFI log item
> > due to my, er, butchering of max_items.  So in the end, we log up to 8x
> > EFI items, each of which has up to 16y EFIs...
> > 
> > > What happens when we try to free that many extents in a single
> > > transaction loop? The extent free processing doesn't have a "have we
> > > run out of transaction reservation" check in it like the refcount
> > > item processing does, so I don't think it can roll to renew the
> > > transaction reservation if it is needed. DO we need to catch this
> > > and renew the reservation by returning -EAGAIN from
> > > xfs_extent_free_finish_item() if there isn't enough of a reservation
> > > remaining to free an extent?
> > 
> > ...and by my estimation, those eight items consume a fraction of the
> > reservation available with tr_itruncate:
> > 
> > 16 x xfs_extent_64_t   = 256 bytes
> > 1 x xfs_efi_log_format = 8 bytes
> >                        = 272 bytes per EFI
> > 
> > 8 x EFI                = 2176 bytes
> 
> I'm not worried by the EFIs themselves when they are created and
> committed, it's the processing of the XEFIs which are all done in a
> single transaction unless a ->finish_item() call returns -EAGAIN.

*OH*.  You're right, we don't really have a guarantee that someone won't
queue 16 extents to an EFI logitem and then ->finish_item will blow out
the reservation...

> i.e. it's the xfs_trans_free_extent() calls that are done one after
> another, and potential log different AG metadata blocks on each
> extent free operation....
> 
> And it's not just runtime we have to worry about - if we crash and
> have to recover on of these EFIs with 16 extents in it, we have the
> problem of processing a 16 extent EFI on a single transaction
> reservation, right?

...so to answer your question, there isn't anything in the
xfs_trans_free_extent codepath that would trigger a transaction roll,
nor is there anything to prevent repair from logging a huge EFI.

I also don't see anything preventing *other* parts of the filesystem
from logging a huge number of deferred frees and having them end up as
one big EFI.  Maybe I should monitor that to see what fstests comes up
with?

The only situation where a lot of extents get queued to a single EFI
logitem (I think) would be the xfs_refcount code, which could end up
freeing a lot of small extents while decrementing one physical extent's
refcount.

> > So far, I haven't seen any overflows with the reaping code -- for the AG
> > btree rebuilders, we end up logging and relogging the same bnobt/cntbt
> > buffers over and over again.  tr_itruncate gives us ~320K per transaction,
> > and I haven't seen any overflows yet.
> 
> I suspect it might be different with aged filesystems where the
> extents being freed could be spread across many, many btree leaf
> nodes...

Hmm.  I already think the refcount overhead calculation thing is sort of
handwavy -- the estimates are (hopefully) deliberately overlarge to
avoid triggering a shutdown.  Last time I checked, there wasn't a good
way to figure out how much of a transaction's reservation has actually
been used, since we don't really know that until log item formatting
time, right?

I wonder if we'd be better off lowering XFS_EFI_MAX_FAST_EXTENTS...?

--D

> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

  reply	other threads:[~2023-08-09 23:17 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-27 22:11 [MEGAPATCHSET v26] xfs: online repair, part of part 1 Darrick J. Wong
2023-07-27 22:18 ` [PATCHSET v26.0 0/9] xfs: fix online repair block reaping Darrick J. Wong
2023-07-27 22:21   ` [PATCH 1/9] xfs: cull repair code that will never get used Darrick J. Wong
2023-07-27 22:21   ` [PATCH 2/9] xfs: move the post-repair block reaping code to a separate file Darrick J. Wong
2023-07-27 22:22   ` [PATCH 3/9] xfs: only invalidate blocks if we're going to free them Darrick J. Wong
2023-07-27 22:22   ` [PATCH 4/9] xfs: only allow reaping of per-AG blocks in xrep_reap_extents Darrick J. Wong
2023-07-27 22:22   ` [PATCH 5/9] xfs: use deferred frees to reap old btree blocks Darrick J. Wong
2023-07-27 22:22   ` [PATCH 6/9] xfs: rearrange xrep_reap_block to make future code flow easier Darrick J. Wong
2023-07-27 22:23   ` [PATCH 7/9] xfs: allow scanning ranges of the buffer cache for live buffers Darrick J. Wong
2023-07-27 22:23   ` [PATCH 8/9] xfs: reap large AG metadata extents when possible Darrick J. Wong
2023-07-27 22:23   ` [PATCH 9/9] xfs: use per-AG bitmaps to reap unused AG metadata blocks during repair Darrick J. Wong
2023-08-07  6:19   ` [PATCHSET v26.0 0/9] xfs: fix online repair block reaping Dave Chinner
2023-08-08  0:40     ` Darrick J. Wong
2023-08-08  5:17       ` Dave Chinner
2023-08-09 23:17         ` Darrick J. Wong [this message]
2023-07-27 22:18 ` [PATCHSET v26.0 0/6] xfs: prepare repair for bulk loading Darrick J. Wong
2023-07-27 22:24   ` [PATCH 1/6] xfs: force all buffers to be written during btree bulk load Darrick J. Wong
2023-07-27 22:24   ` [PATCH 2/6] xfs: implement block reservation accounting for btrees we're staging Darrick J. Wong
2023-08-07  6:58     ` Dave Chinner
2023-08-08  1:08       ` Darrick J. Wong
2023-07-27 22:24   ` [PATCH 3/6] xfs: log EFIs for all btree blocks being used to stage a btree Darrick J. Wong
2023-08-07  8:41     ` Dave Chinner
2023-08-08  0:54       ` Darrick J. Wong
2023-08-08  6:11         ` Dave Chinner
2023-08-09 23:52           ` Darrick J. Wong
2023-08-10 20:36             ` Darrick J. Wong
2023-09-08 23:34       ` Darrick J. Wong
2023-07-27 22:24   ` [PATCH 4/6] xfs: add debug knobs to control btree bulk load slack factors Darrick J. Wong
2023-07-27 22:25   ` [PATCH 5/6] xfs: move btree bulkload record initialization to ->get_record implementations Darrick J. Wong
2023-07-27 22:25   ` [PATCH 6/6] xfs: constrain dirty buffers while formatting a staged btree Darrick J. Wong
2023-07-27 22:19 ` [PATCHSET v26.0 0/7] xfs: stage repair information in pageable memory Darrick J. Wong
2023-07-27 22:25   ` [PATCH 1/7] xfs: create a big array data structure Darrick J. Wong
2023-07-28  3:10     ` Matthew Wilcox
2023-07-28  4:39       ` Darrick J. Wong
2023-07-27 22:25   ` [PATCH 2/7] xfs: enable sorting of xfile-backed arrays Darrick J. Wong
2023-07-27 22:26   ` [PATCH 3/7] xfs: convert xfarray insertion sort to heapsort using scratchpad memory Darrick J. Wong
2023-07-27 22:26   ` [PATCH 4/7] xfs: teach xfile to pass back direct-map pages to caller Darrick J. Wong
2023-07-27 22:26   ` [PATCH 5/7] xfs: speed up xfarray sort by sorting xfile page contents directly Darrick J. Wong
2023-07-27 22:26   ` [PATCH 6/7] xfs: cache pages used for xfarray quicksort convergence Darrick J. Wong
2023-07-27 22:27   ` [PATCH 7/7] xfs: improve xfarray quicksort pivot Darrick J. Wong
2023-07-27 22:19 ` [PATCHSET v26.0 0/2] xfs: add usage counters for scrub Darrick J. Wong
2023-07-27 22:27   ` [PATCH 1/2] xfs: create scaffolding for creating debugfs entries Darrick J. Wong
2023-07-27 22:27   ` [PATCH 2/2] xfs: track usage statistics of online fsck Darrick J. Wong
2023-08-08  7:09   ` [PATCHSET v26.0 0/2] xfs: add usage counters for scrub Dave Chinner
2023-07-27 22:19 ` [PATCHSET v26.0 0/4] xfs: online scrubbing of realtime summary files Darrick J. Wong
2023-07-27 22:27   ` [PATCH 1/4] xfs: get our own reference to inodes that we want to scrub Darrick J. Wong
2023-07-27 22:28   ` [PATCH 2/4] xfs: wrap ilock/iunlock operations on sc->ip Darrick J. Wong
2023-07-27 22:28   ` [PATCH 3/4] xfs: move the realtime summary file scrubber to a separate source file Darrick J. Wong
2023-07-27 22:28   ` [PATCH 4/4] xfs: implement online scrubbing of rtsummary info Darrick J. Wong
2023-07-27 22:19 ` [PATCHSET v26.0 0/2] xfs: miscellaneous repair tweaks Darrick J. Wong
2023-07-27 22:28   ` [PATCH 1/2] xfs: always rescan allegedly healthy per-ag metadata after repair Darrick J. Wong
2023-07-27 22:29   ` [PATCH 2/2] xfs: allow the user to cancel repairs before we start writing Darrick J. Wong
2023-07-27 22:20 ` [PATCHSET v26.0 0/2] xfs: force rebuilding of metadata Darrick J. Wong
2023-07-27 22:29   ` [PATCH 1/2] xfs: don't complain about unfixed metadata when repairs were injected Darrick J. Wong
2023-07-27 22:29   ` [PATCH 2/2] xfs: allow userspace to rebuild metadata structures Darrick J. Wong
2023-07-27 22:20 ` [PATCHSET v26.0 0/2] xfs: fixes to the AGFL repair code Darrick J. Wong
2023-07-27 22:30   ` [PATCH 1/2] xfs: clear pagf_agflreset when repairing the AGFL Darrick J. Wong
2023-07-27 22:30   ` [PATCH 2/2] xfs: fix agf_fllast when repairing an empty AGFL Darrick J. Wong
2023-08-08  7:10     ` Dave Chinner
2023-07-27 22:20 ` [PATCHSET v26.0 0/5] xfs: online repair of AG btrees Darrick J. Wong
2023-07-27 22:30   ` [PATCH 1/5] xfs: repair free space btrees Darrick J. Wong
2023-07-27 22:30   ` [PATCH 2/5] xfs: hide xfs_inode_is_allocated in scrub common code Darrick J. Wong
2023-08-08  7:13     ` Dave Chinner
2023-07-27 22:31   ` [PATCH 3/5] xfs: rewrite xchk_inode_is_allocated to work properly Darrick J. Wong
2023-08-08  7:14     ` Dave Chinner
2023-07-27 22:31   ` [PATCH 4/5] xfs: repair inode btrees Darrick J. Wong
2023-07-27 22:31   ` [PATCH 5/5] xfs: repair refcount btrees Darrick J. Wong
2023-07-27 22:20 ` [PATCHSET v26.0 0/2] xfs: fixes for the block mapping checker Darrick J. Wong
2023-07-27 22:31   ` [PATCH 1/2] xfs: simplify returns in xchk_bmap Darrick J. Wong
2023-07-27 22:32   ` [PATCH 2/2] xfs: don't check reflink iflag state when checking cow fork Darrick J. Wong
2023-08-08  7:16   ` [PATCHSET v26.0 0/2] xfs: fixes for the block mapping checker Dave Chinner
2023-07-27 22:21 ` [PATCHSET v26.0 0/6] xfs: online repair of inodes and forks Darrick J. Wong
2023-07-27 22:32   ` [PATCH 1/6] xfs: disable online repair quota helpers when quota not enabled Darrick J. Wong
2023-07-27 22:32   ` [PATCH 2/6] xfs: try to attach dquots to files before repairing them Darrick J. Wong
2023-07-27 22:32   ` [PATCH 3/6] xfs: repair inode records Darrick J. Wong
2023-08-09  8:42     ` Dave Chinner
2023-08-10  0:43       ` Darrick J. Wong
2023-07-27 22:33   ` [PATCH 4/6] xfs: zap broken inode forks Darrick J. Wong
2023-07-27 22:33   ` [PATCH 5/6] xfs: abort directory parent scrub scans if we encounter a zapped directory Darrick J. Wong
2023-07-27 22:33   ` [PATCH 6/6] xfs: repair obviously broken inode modes Darrick J. Wong
2023-08-09  9:44   ` [PATCHSET v26.0 0/6] xfs: online repair of inodes and forks Dave Chinner
2023-08-10  0:45     ` Darrick J. Wong
2023-07-27 22:21 ` [PATCHSET v26.0 0/5] xfs: online repair of file fork mappings Darrick J. Wong
2023-07-27 22:33   ` [PATCH 1/5] xfs: reintroduce reaping of file metadata blocks to xrep_reap_extents Darrick J. Wong
2023-07-27 22:34   ` [PATCH 2/5] xfs: repair inode fork block mapping data structures Darrick J. Wong
2023-07-27 22:34   ` [PATCH 3/5] xfs: refactor repair forcing tests into a repair.c helper Darrick J. Wong
2023-07-27 22:34   ` [PATCH 4/5] xfs: create a ranged query function for refcount btrees Darrick J. Wong
2023-07-27 22:34   ` [PATCH 5/5] xfs: repair problems in CoW forks Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230809231728.GY11352@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).