Re: [PATCH 1/3] xfs: EFI recovery needs it's own transaction reservation

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/3] xfs: EFI recovery needs it's own transaction reservation
Date: Fri, 11 Sep 2020 08:37:42 -0400	[thread overview]
Message-ID: <20200911123742.GB1194939@bfoster> (raw)
In-Reply-To: <20200910212927.GT12131@dread.disaster.area>

On Fri, Sep 11, 2020 at 07:29:27AM +1000, Dave Chinner wrote:
> On Thu, Sep 10, 2020 at 09:18:10AM -0400, Brian Foster wrote:
> > On Thu, Sep 10, 2020 at 07:44:55AM +1000, Dave Chinner wrote:
> > > On Wed, Sep 09, 2020 at 09:31:11AM -0400, Brian Foster wrote:
> > > > It looks like extents are only freed when the last
> > > > reference is dropped (otherwise we log a refcount intent), which makes
> > > > me wonder whether we really need 7 log count units if recovery
> > > > encounters an EFI.
> > > 
> > > I don't know if the numbers are correct, and it really is out of
> > > scope for this patch to audit/fix that. I really think we need to
> > > map this whole thing out in a diagram at this point because I now
> > > suspect that the allocfree log count calculation is not correct,
> > > either...
> > 
> > I agree up to the point where it relates to this specific EFI recovery
> > issue. reflink is enabled by default, which means the default EFI
> > recovery reservation is going to have 7 logcount units. Is that actually
> > enough of a reduction to prevent this same recovery problem on newer
> > fs'? I'm wondering if the tr_efi logcount should just be set to 1, for
> > example, at least for the short term fix.
> 
> I spent yesterday mapping out the whole "free extent" chain to
> understand exactly what was necessary, and in discussing this with
> Darrick on #xfs I came to the conclusion that we cannot -ever- have
> a logcount of more than 1 for an intent recovery of any sort.
> 
> That's because the transaction may have rolled enough times to
> exhaust the initial grant of unit * logcount, so the only
> reservation that the runtime kernel has when it crashs is a single
> transaction unit reservation (which gets re-reserved on every roll).
> 
> Hence recovery cannot assume that intent that was being processed has more
> than a single unit reservation of log space available to be used,
> and hence all intent recovery reservations must start with a log
> count of 1.
> 

Makes sense.

> THere are other restrictions we need to deal with, too. multiple
> intents may pin the tail of the log, so we can't just process a
> single intent chain at a time as that will result in using all the
> log space for a single intent chain and reservation deadlocking on
> one of the intents we haven't yet started.
> 
> Hence the first thing we have to do in recovering intents is take an
> active reservation for -all- intents in the log to match the
> reservation state at runtime. Only then can we guarantee that
> intents that pin the tail of the log will have the reservation space
> needed to be able to unpin the log tail.
> 

Which brings up the ordering question when multiple intents pin the
tail...

> Further, because we now may have consumed all the available log
> reservation space to guarantee forwards progress, the first commit
> of the first intent may block trying to regrant space if another
> intent also pins the tail of the log. This means we cannot replay
> intents in a serial manner as we currently do. Each intent chain and it's
> transaction context needs to run in it's own process context so it
> can block waiting for other intents to make progress, just like
> happens at runtime when the system crashed. IOWs, intent replay
> needs to be done with a task context per intent chain (probably via
> a workqueue).
> 

Indeed.

> AFAICT, this is the only way we can make intent recovery deadlock
> free - we have to recreate the complete log reservation state in
> memory before we start recovery of a single intent, we can only
> assume an intent had a single unit reservation of log space assigned
> to it, and intent chain execution needs to run concurrently so
> commits can block waiting for log space to become available as other
> intents commit and unpin the tail of the log...
> 

Ok, so the primary recovery task would either need to acquire and
distribute the initial reservation or otherwise implement some kind of
execution barrier for the intent recovery wq tasks to ensure that every
currently log resident intent acquires a transaction before any other is
allowed to roll. That doesn't seem _too_ limiting a restriction given
that it technically should allow many intents (that don't result in
intent chains) to execute to completion before waiting on others at all,
though it does sound like a complete rework of the current
post-recovery, single threaded intent recovery approach.

I'm kind of wondering if we had some mechanism to freeze/unfreeze all
transaction rolls, if we could then rework recovery to issue intent
recovery wq tasks during pass2 as log records are written back to the
fs. For example, with transaction rolls frozen, pass2 of recovery
completes a particular log record and on I/O completion of all
associated buffers, kicks off workqueue tasks for each unfinished intent
in that particular record. The workqueue tasks will either complete and
exit or unconditionally block on the next transaction roll (regrant).
The main recovery task carries on and repeats for all subsequent records
in the log, unfreezes regrants and waits for all outstanding
tasks/transactions to (hopefully) complete. Hm?

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>

next prev parent reply	other threads:[~2020-09-11 17:42 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-09  8:19 [PATCH 0/3] [RFC] xfs: intent recovery reservation changes Dave Chinner
2020-09-09  8:19 ` [PATCH 1/3] xfs: EFI recovery needs it's own transaction reservation Dave Chinner
2020-09-09 13:31   ` Brian Foster
2020-09-09 21:44     ` Dave Chinner
2020-09-10 13:18       ` Brian Foster
2020-09-10 21:29         ` Dave Chinner
2020-09-11 12:37           ` Brian Foster [this message]
2020-09-09  8:19 ` [PATCH 2/3] xfs: add a free space extent change reservation Dave Chinner
2020-09-09 13:35   ` Brian Foster
2020-09-09 22:51     ` Dave Chinner
2020-09-10 13:19       ` Brian Foster
2020-09-09  8:19 ` [PATCH 3/3] xfs: factor free space tree transaciton reservations Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200911123742.GB1194939@bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox