From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 4/7] xfs: automatic freeing of freshly allocated unwritten space
Date: Fri, 24 Nov 2023 15:32:10 -0800 [thread overview]
Message-ID: <20231124233210.GI36190@frogsfrogsfrogs> (raw)
In-Reply-To: <20231006051251.GR21298@frogsfrogsfrogs>
On Thu, Oct 05, 2023 at 10:12:51PM -0700, Darrick J. Wong wrote:
> On Thu, Oct 05, 2023 at 02:47:50PM +1100, Dave Chinner wrote:
> > On Tue, Sep 26, 2023 at 04:32:09PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > >
> > > As mentioned in the previous commit, online repair wants to allocate
> > > space to write out a new metadata structure, and it also wants to hedge
> > > against system crashes during repairs by logging (and later cancelling)
> > > EFIs to free the space if we crash before committing the new data
> > > structure.
> > >
> > > Therefore, create a trio of functions to schedule automatic reaping of
> > > freshly allocated unwritten space. xfs_alloc_schedule_autoreap creates
> > > a paused EFI representing the space we just allocated. Once the
> > > allocations are made and the autoreaps scheduled, we can start writing
> > > to disk.
> > >
> > > If the writes succeed, xfs_alloc_cancel_autoreap marks the EFI work
> > > items as stale and unpauses the pending deferred work item. Assuming
> > > that's done in the same transaction that commits the new structure into
> > > the filesystem, we guarantee that either the new object is fully
> > > visible, or that all the space gets reclaimed.
> > >
> > > If the writes succeed but only part of an extent was used, repair must
> > > call the same _cancel_autoreap function to kill the first EFI and then
> > > log a new EFI to free the unused space. The first EFI is already
> > > committed, so it cannot be changed.
> > >
> > > For full extents that aren't used, xfs_alloc_commit_autoreap will
> > > unpause the EFI, which results in the space being freed during the next
> > > _defer_finish cycle.
> > >
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > ---
> > > fs/xfs/libxfs/xfs_alloc.c | 104 +++++++++++++++++++++++++++++++++++++++++++--
> > > fs/xfs/libxfs/xfs_alloc.h | 12 +++++
> > > fs/xfs/xfs_extfree_item.c | 11 +++--
> > > 3 files changed, 120 insertions(+), 7 deletions(-)
> > >
> > >
> > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > > index 295d11a27f632..c1ee1862cc1af 100644
> > > --- a/fs/xfs/libxfs/xfs_alloc.c
> > > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > > @@ -2501,14 +2501,15 @@ xfs_defer_agfl_block(
> > > * Add the extent to the list of extents to be free at transaction end.
> > > * The list is maintained sorted (by block number).
> > > */
> > > -int
> > > -xfs_free_extent_later(
> > > +static int
> > > +__xfs_free_extent_later(
> > > struct xfs_trans *tp,
> > > xfs_fsblock_t bno,
> > > xfs_filblks_t len,
> > > const struct xfs_owner_info *oinfo,
> > > enum xfs_ag_resv_type type,
> > > - bool skip_discard)
> > > + bool skip_discard,
> > > + struct xfs_defer_pending **dfpp)
> >
> > I was happy that __xfs_free_extent_later() went away in the last
> > patch.
> >
> > I am sad that it has immediately come back....
> >
> > Can we please name this after what it does? Say
> > xfs_defer_extent_free()?
>
> Done. Thank you for a better name! :)
>
> > > {
> > > struct xfs_extent_free_item *xefi;
> > > struct xfs_mount *mp = tp->t_mountp;
> > > @@ -2556,10 +2557,105 @@ xfs_free_extent_later(
> > > XFS_FSB_TO_AGBNO(tp->t_mountp, bno), len);
> > >
> > > xfs_extent_free_get_group(mp, xefi);
> > > - xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_FREE, &xefi->xefi_list);
> > > + *dfpp = xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_FREE, &xefi->xefi_list);
> > > return 0;
> > > }
> > >
> > > +int
> > > +xfs_free_extent_later(
> > > + struct xfs_trans *tp,
> > > + xfs_fsblock_t bno,
> > > + xfs_filblks_t len,
> > > + const struct xfs_owner_info *oinfo,
> > > + enum xfs_ag_resv_type type,
> > > + bool skip_discard)
> > > +{
> > > + struct xfs_defer_pending *dontcare = NULL;
> > > +
> > > + return __xfs_free_extent_later(tp, bno, len, oinfo, type, skip_discard,
> > > + &dontcare);
> > > +}
> > > +
> > > +/*
> > > + * Set up automatic freeing of unwritten space in the filesystem.
> > > + *
> > > + * This function attached a paused deferred extent free item to the
> > > + * transaction. Pausing means that the EFI will be logged in the next
> > > + * transaction commit, but the pending EFI will not be finished until the
> > > + * pending item is unpaused.
> > > + *
> > > + * If the system goes down after the EFI has been persisted to the log but
> > > + * before the pending item is unpaused, log recovery will find the EFI, fail to
> > > + * find the EFD, and free the space.
> > > + *
> > > + * If the pending item is unpaused, the next transaction commit will log an EFD
> > > + * without freeing the space.
> > > + *
> > > + * Caller must ensure that the tp, fsbno, len, oinfo, and resv flags of the
> > > + * @args structure are set to the relevant values.
> > > + */
> > > +int
> > > +xfs_alloc_schedule_autoreap(
> > > + const struct xfs_alloc_arg *args,
> > > + bool skip_discard,
> > > + struct xfs_alloc_autoreap *aarp)
> > > +{
> > > + int error;
> > > +
> > > + error = __xfs_free_extent_later(args->tp, args->fsbno, args->len,
> > > + &args->oinfo, args->resv, skip_discard, &aarp->dfp);
> > > + if (error)
> > > + return error;
> > > +
> > > + xfs_defer_item_pause(args->tp, aarp->dfp);
> > > + return 0;
> > > +}
> >
> > Then this becomes much more readable - xfs_defer_free_extent()
> > returns the defer_pending work item for the EFI, which we then
> > immediately pause....
>
> <nod>
>
> > > +
> > > +/*
> > > + * Cancel automatic freeing of unwritten space in the filesystem.
> > > + *
> > > + * Earlier, we created a paused deferred extent free item and attached it to
> > > + * this transaction so that we could automatically roll back a new space
> > > + * allocation if the system went down. Now we want to cancel the paused work
> > > + * item by marking the EFI stale so we don't actually free the space, unpausing
> > > + * the pending item and logging an EFD.
> > > + *
> > > + * The caller generally should have already mapped the space into the ondisk
> > > + * filesystem. If the reserved space was partially used, the caller must call
> > > + * xfs_free_extent_later to create a new EFI to free the unused space.
> > > + */
> > > +void
> > > +xfs_alloc_cancel_autoreap(
> > > + struct xfs_trans *tp,
> > > + struct xfs_alloc_autoreap *aarp)
> > > +{
> > > + struct xfs_defer_pending *dfp = aarp->dfp;
> > > + struct xfs_extent_free_item *xefi;
> > > +
> > > + if (!dfp)
> > > + return;
> > > +
> > > + list_for_each_entry(xefi, &dfp->dfp_work, xefi_list)
> > > + xefi->xefi_flags |= XFS_EFI_STALE;
> > > +
> > > + xfs_defer_item_unpause(tp, dfp);
> > > +}
> >
> > Hmmmm. I see what you are trying to do here, though I'm not sure
> > that "stale" is the right word to describe it. The EFI has been
> > cancelled, so we want and EFD to be logged without freeing the
> > extent.
> >
> > To me, "stale" means "contents longer valid, do not touch" as per
> > XFS_ISTALE, xfs_buf_stale(), etc
> >
> > Whereas we use "cancelled" to indicate that the pending operations
> > on the buffer should not be actioned, such as XFS_BLF_CANCEL buffer
> > items in log recovery to indicate the buffer has been freed and
> > while it is cancelled we should not replay the changes found in the
> > log for that buffer....
> >
> > Hence I think this is better named XFS_EFI_CANCELLED, and it also
> > matches what the calling function is doing - cancelling the autoreap
> > of the extent....
>
> Funny story -- CANCELLED was the original name for this flag. I'll put
> it back.
>
> > > diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
> > > index 9e7b58f3566c0..98c2667d369e8 100644
> > > --- a/fs/xfs/xfs_extfree_item.c
> > > +++ b/fs/xfs/xfs_extfree_item.c
> > > @@ -392,9 +392,14 @@ xfs_trans_free_extent(
> > > trace_xfs_bmap_free_deferred(tp->t_mountp, xefi->xefi_pag->pag_agno, 0,
> > > agbno, xefi->xefi_blockcount);
> > >
> > > - error = __xfs_free_extent(tp, xefi->xefi_pag, agbno,
> > > - xefi->xefi_blockcount, &oinfo, xefi->xefi_agresv,
> > > - xefi->xefi_flags & XFS_EFI_SKIP_DISCARD);
> > > + if (xefi->xefi_flags & XFS_EFI_STALE) {
> > > + error = 0;
> > > + } else {
> > > + error = __xfs_free_extent(tp, xefi->xefi_pag, agbno,
> > > + xefi->xefi_blockcount, &oinfo,
> > > + xefi->xefi_agresv,
> > > + xefi->xefi_flags & XFS_EFI_SKIP_DISCARD);
> > > + }
> >
> > Just init error to 0 when it is declared, then this becomes:
> >
> > if (!(xefi->xefi_flags & XFS_EFI_CANCELLED)) {
> > error = __xfs_free_extent(tp, xefi->xefi_pag, agbno,
> > xefi->xefi_blockcount, &oinfo,
> > xefi->xefi_agresv,
> > xefi->xefi_flags & XFS_EFI_SKIP_DISCARD);
> > }
>
> <nod> Eventually the rt reflink patch will trip over this (since rtdev
> vs. datadev selection is also an xefi_flag) but that'll come later.
...only now it won't, since hch and I have been hacking on getting rt
modernization reviewed; and that involved splitting out the rt paths so
that the the existing functions aren't littered with conditionals.
--D
> > Otherwise the code looks ok.
>
> Yay!
>
> --D
>
> > --
> > Dave Chinner
> > david@fromorbit.com
next prev parent reply other threads:[~2023-11-24 23:32 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-26 23:14 [MEGAPATCHSET v27] xfs: online repair, second part of part 1 Darrick J. Wong
2023-09-26 23:29 ` [PATCHSET v27.0 0/1] xfs: prevent livelocks in xchk_iget Darrick J. Wong
2023-09-26 23:31 ` [PATCH 1/1] xfs: make xchk_iget safer in the presence of corrupt inode btrees Darrick J. Wong
2023-09-28 5:54 ` Dave Chinner
2023-09-28 17:01 ` Darrick J. Wong
2023-09-26 23:29 ` [PATCHSET v27.0 0/7] xfs: reserve disk space for online repairs Darrick J. Wong
2023-09-26 23:31 ` [PATCH 1/7] xfs: don't append work items to logged xfs_defer_pending objects Darrick J. Wong
2023-10-05 2:55 ` Dave Chinner
2023-09-26 23:31 ` [PATCH 2/7] xfs: allow pausing of pending deferred work items Darrick J. Wong
2023-10-05 3:00 ` Dave Chinner
2023-09-26 23:31 ` [PATCH 3/7] xfs: remove __xfs_free_extent_later Darrick J. Wong
2023-10-05 3:30 ` Dave Chinner
2023-09-26 23:32 ` [PATCH 4/7] xfs: automatic freeing of freshly allocated unwritten space Darrick J. Wong
2023-10-05 3:47 ` Dave Chinner
2023-10-06 5:12 ` Darrick J. Wong
2023-11-24 23:32 ` Darrick J. Wong [this message]
2023-10-12 5:05 ` [PATCH v27.1 " Darrick J. Wong
2023-09-26 23:32 ` [PATCH 5/7] xfs: implement block reservation accounting for btrees we're staging Darrick J. Wong
2023-10-05 4:53 ` Dave Chinner
2023-10-06 5:18 ` Darrick J. Wong
2023-09-26 23:32 ` [PATCH 6/7] xfs: log EFIs for all btree blocks being used to stage a btree Darrick J. Wong
2023-10-05 5:12 ` Dave Chinner
2023-09-26 23:32 ` [PATCH 7/7] xfs: force small EFIs for reaping btree extents Darrick J. Wong
2023-10-05 5:13 ` Dave Chinner
2023-10-04 23:32 ` [PATCHSET v27.0 0/7] xfs: reserve disk space for online repairs Darrick J. Wong
2023-09-26 23:29 ` [PATCHSET v27.0 0/4] xfs: prepare repair for bulk loading Darrick J. Wong
2023-09-26 23:33 ` [PATCH 1/4] xfs: force all buffers to be written during btree bulk load Darrick J. Wong
2023-09-26 23:33 ` [PATCH 2/4] xfs: add debug knobs to control btree bulk load slack factors Darrick J. Wong
2023-09-26 23:33 ` [PATCH 3/4] xfs: move btree bulkload record initialization to ->get_record implementations Darrick J. Wong
2023-09-26 23:33 ` [PATCH 4/4] xfs: constrain dirty buffers while formatting a staged btree Darrick J. Wong
2023-09-26 23:29 ` [PATCHSET v27.0 0/4] xfs: online repair of AG btrees Darrick J. Wong
2023-09-26 23:34 ` [PATCH 1/4] xfs: roll the scrub transaction after completing a repair Darrick J. Wong
2023-09-26 23:34 ` [PATCH 2/4] xfs: repair free space btrees Darrick J. Wong
2023-09-26 23:34 ` [PATCH 3/4] xfs: repair inode btrees Darrick J. Wong
2023-09-26 23:35 ` [PATCH 4/4] xfs: repair refcount btrees Darrick J. Wong
2023-09-26 23:30 ` [PATCHSET v27.0 0/7] xfs: online repair of inodes and forks Darrick J. Wong
2023-09-26 23:35 ` [PATCH 1/7] xfs: disable online repair quota helpers when quota not enabled Darrick J. Wong
2023-09-26 23:35 ` [PATCH 2/7] xfs: try to attach dquots to files before repairing them Darrick J. Wong
2023-09-26 23:35 ` [PATCH 3/7] xfs: repair inode records Darrick J. Wong
2023-09-26 23:36 ` [PATCH 4/7] xfs: zap broken inode forks Darrick J. Wong
2023-09-26 23:36 ` [PATCH 5/7] xfs: abort directory parent scrub scans if we encounter a zapped directory Darrick J. Wong
2023-09-26 23:36 ` [PATCH 6/7] xfs: skip the rmapbt search on an empty attr fork unless we know it was zapped Darrick J. Wong
2023-09-26 23:36 ` [PATCH 7/7] xfs: repair obviously broken inode modes Darrick J. Wong
2023-09-26 23:30 ` [PATCHSET v27.0 0/5] xfs: online repair of file fork mappings Darrick J. Wong
2023-09-26 23:37 ` [PATCH 1/5] xfs: reintroduce reaping of file metadata blocks to xrep_reap_extents Darrick J. Wong
2023-09-26 23:37 ` [PATCH 2/5] xfs: repair inode fork block mapping data structures Darrick J. Wong
2023-09-26 23:37 ` [PATCH 3/5] xfs: refactor repair forcing tests into a repair.c helper Darrick J. Wong
2023-09-26 23:37 ` [PATCH 4/5] xfs: create a ranged query function for refcount btrees Darrick J. Wong
2023-09-26 23:38 ` [PATCH 5/5] xfs: repair problems in CoW forks Darrick J. Wong
2023-09-26 23:30 ` [PATCHSET v27.0 0/4] xfs: online repair of rt bitmap file Darrick J. Wong
2023-09-26 23:38 ` [PATCH 1/4] xfs: repair the inode core and forks of a metadata inode Darrick J. Wong
2023-09-26 23:38 ` [PATCH 2/4] xfs: create a new inode fork block unmap helper Darrick J. Wong
2023-09-26 23:38 ` [PATCH 3/4] xfs: always check the rtbitmap and rtsummary files Darrick J. Wong
2023-09-26 23:39 ` [PATCH 4/4] xfs: online repair of realtime bitmaps Darrick J. Wong
2023-09-26 23:30 ` [PATCHSET v27.0 0/5] xfs: online repair of quota and rt metadata files Darrick J. Wong
2023-09-26 23:39 ` [PATCH 1/5] xfs: check the ondisk space mapping behind a dquot Darrick J. Wong
2023-09-26 23:39 ` [PATCH 2/5] xfs: check dquot resource timers Darrick J. Wong
2023-09-26 23:40 ` [PATCH 3/5] xfs: pull xfs_qm_dqiterate back into scrub Darrick J. Wong
2023-09-26 23:40 ` [PATCH 4/5] xfs: improve dquot iteration for scrub Darrick J. Wong
2023-09-26 23:40 ` [PATCH 5/5] xfs: repair quotas Darrick J. Wong
2023-11-24 23:30 ` [MEGAPATCHSET v28] xfs: online repair, second part of part 1 Darrick J. Wong
-- strict thread matches above, loose matches on Subject: below --
2023-11-24 23:45 [PATCHSET v28.0 0/7] xfs: reserve disk space for online repairs Darrick J. Wong
2023-11-24 23:47 ` [PATCH 4/7] xfs: automatic freeing of freshly allocated unwritten space Darrick J. Wong
2023-11-25 5:06 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231124233210.GI36190@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).