From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/3] xfs: EFI recovery needs it's own transaction reservation
Date: Thu, 10 Sep 2020 09:18:10 -0400 [thread overview]
Message-ID: <20200910131810.GA1143857@bfoster> (raw)
In-Reply-To: <20200909214455.GQ12131@dread.disaster.area>
On Thu, Sep 10, 2020 at 07:44:55AM +1000, Dave Chinner wrote:
> On Wed, Sep 09, 2020 at 09:31:11AM -0400, Brian Foster wrote:
> > On Wed, Sep 09, 2020 at 06:19:10PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > >
> > > Recovering an EFI currently uses a itruncate reservation, which is
> > > designed for a rolling transaction that modifies the BMBT and
> > > logs the EFI in one commit, then frees the space and logs the EFD in
> > > the second commit.
> > >
> > > Recovering the EFI only requires the second transaction in this
> > > pair, and hence has a smaller log space requirement than a truncate
> > > operation. Hence when the extent free is being processed at runtime,
> > > the log reservation that is held by the filesystem is only enough to
> > > complete the extent free, not the entire truncate operation.
> > >
> > > Hence if the EFI pins the tail of the log and the log fills up while
> > > the extent is being freed, the amount of reserved free space in the
> > > log is not enough to start another entire truncate operation. Hence
> > > if we crash at this point, log recovery will deadlock with the EFI
> > > pinning the tail of the log and the log not having enough free space
> > > to reserve an itruncate transaction.
> > >
> > > As such, EFI recovery needs it's own log space reservation separate
> > > to the itruncate reservation. We only need what is required free the
> > > extent, and this matches the space we have reserved at runtime for
> > > this operation and hence should prevent the recovery deadlock from
> > > occurring.
> > >
> > > This patch adds the new reservation in a way that minimises the
> > > change such that it should be back-portable to older kernels easily.
> > > Follow up patches will factor and rework the reservations to be more
> > > correct and more tightly defined.
> > >
> > > Note: this would appear to be a generic problem with intent
> > > recovery; we use the entire operation reservation for recovery,
> > > not the reservation that was held at runtime after the intent was
> > > logged. I suspect all intents are going to require their own
> > > reservation as a result.
> > >
> >
> > It might be worth explicitly pointing out that support for EFI/EFD
> > intents goes farther back than the various intents associated with newer
> > features, hence the value of a targeted fix.
>
> Ok.
>
> > > @@ -916,6 +916,16 @@ xfs_trans_resv_calc(
> > > resp->tr_qm_dqalloc.tr_logcount = XFS_WRITE_LOG_COUNT;
> > > resp->tr_qm_dqalloc.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
> > >
> > > + /*
> > > + * Log recovery reservations for intent replay
> > > + *
> > > + * EFI recovery is itruncate minus the initial transaction that logs
> > > + * logs the EFI.
> > > + */
> > > + resp->tr_efi.tr_logres = resp->tr_itruncate.tr_logres;
> > > + resp->tr_efi.tr_logcount = resp->tr_itruncate.tr_logcount - 1;
> >
> > tr_itruncate.tr_logcount looks like it's either 2 or 8 depending on
> > whether reflink is enabled. On one hand this seems conservative enough,
> > but do we know exactly what those extra unit counts are accounted for in
> > the reflink case?
>
> Right, in the reflink case we may have to roll the transaction many more
> times to do the refcount btree and reverse map btree modifications.
> Those are done under separate intents and so we have to roll and
> commit the defered ops more times on a reflink/rmap based
> filesystem. Hence the logcount is higher so that the initial
> reservation can roll more times before regrant during a roll has to
> go and physically reserve more write space in the log to continue
> rolling the transaction.
>
> > It looks like extents are only freed when the last
> > reference is dropped (otherwise we log a refcount intent), which makes
> > me wonder whether we really need 7 log count units if recovery
> > encounters an EFI.
>
> I don't know if the numbers are correct, and it really is out of
> scope for this patch to audit/fix that. I really think we need to
> map this whole thing out in a diagram at this point because I now
> suspect that the allocfree log count calculation is not correct,
> either...
>
I agree up to the point where it relates to this specific EFI recovery
issue. reflink is enabled by default, which means the default EFI
recovery reservation is going to have 7 logcount units. Is that actually
enough of a reduction to prevent this same recovery problem on newer
fs'? I'm wondering if the tr_efi logcount should just be set to 1, for
example, at least for the short term fix.
Brian
> > Also, while looking through the code I noticed that truncate does the
> > following:
> >
> > ...
> > error = xfs_defer_finish(&tp);
> > if (error)
> > goto out;
> >
> > error = xfs_trans_roll_inode(&tp, ip);
> > if (error)
> > goto out;
> > ...
> >
> > ... which looks like it rolls the transaction an extra time per-extent.
> > I don't think that contributes to this problem vs just being a runtime
> > inefficiency, so maybe I'll fling a patch up for that separately.
>
> Yeah, I'm not sure if this is correct/needed or not.
>
> > > * The following transactions are logged in logical format with
> > > * a default log count.
> > > diff --git a/fs/xfs/libxfs/xfs_trans_resv.h b/fs/xfs/libxfs/xfs_trans_resv.h
> > > index 7241ab28cf84..13173b3eaac9 100644
> > > --- a/fs/xfs/libxfs/xfs_trans_resv.h
> > > +++ b/fs/xfs/libxfs/xfs_trans_resv.h
> > > @@ -50,6 +50,8 @@ struct xfs_trans_resv {
> > > struct xfs_trans_res tr_qm_equotaoff;/* end of turn quota off */
> > > struct xfs_trans_res tr_sb; /* modify superblock */
> > > struct xfs_trans_res tr_fsyncts; /* update timestamps on fsync */
> > > + struct xfs_trans_res tr_efi; /* EFI log item recovery */
> > > +
> >
> > Extra whitespace line.
>
> Will fix.
>
> Thanks!
>
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
next prev parent reply other threads:[~2020-09-10 13:21 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-09 8:19 [PATCH 0/3] [RFC] xfs: intent recovery reservation changes Dave Chinner
2020-09-09 8:19 ` [PATCH 1/3] xfs: EFI recovery needs it's own transaction reservation Dave Chinner
2020-09-09 13:31 ` Brian Foster
2020-09-09 21:44 ` Dave Chinner
2020-09-10 13:18 ` Brian Foster [this message]
2020-09-10 21:29 ` Dave Chinner
2020-09-11 12:37 ` Brian Foster
2020-09-09 8:19 ` [PATCH 2/3] xfs: add a free space extent change reservation Dave Chinner
2020-09-09 13:35 ` Brian Foster
2020-09-09 22:51 ` Dave Chinner
2020-09-10 13:19 ` Brian Foster
2020-09-09 8:19 ` [PATCH 3/3] xfs: factor free space tree transaciton reservations Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200910131810.GA1143857@bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox