Re: [PATCH 2/2] xfs: log recovery stage split EFIs with multiple extents

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Wengang Wang <wen.gang.wang@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>
Subject: Re: [PATCH 2/2] xfs: log recovery stage split EFIs with multiple extents
Date: Fri, 21 Apr 2023 00:32:15 +0000	[thread overview]
Message-ID: <A6789F19-BE44-4545-B26A-7AE96AB5F4CF@oracle.com> (raw)
In-Reply-To: <20230420225418.GY3223426@dread.disaster.area>



> On Apr 20, 2023, at 3:54 PM, Dave Chinner <david@fromorbit.com> wrote:
> 
> On Thu, Apr 20, 2023 at 05:10:42PM +0000, Wengang Wang wrote:
>> 
>> 
>>> On Apr 19, 2023, at 5:30 PM, Dave Chinner <david@fromorbit.com> wrote:
>>> 
>>> On Fri, Apr 14, 2023 at 03:58:36PM -0700, Wengang Wang wrote:
>>>> At log recovery stage, we need to split EFIs with multiple extents. For each
>>>> orginal multiple-extent EFI, split it into new EFIs each including one extent
>>>> from the original EFI. By that we avoid deadlock when allocating blocks for
>>>> AGFL waiting for the held busy extents by current transaction to be flushed.
>>>> 
>>>> For the original EFI, the process is
>>>> 1. Create and log new EFIs each covering one extent from the
>>>>   original EFI.
>>>> 2. Don't free extent with the original EFI.
>>>> 3. Log EFD for the original EFI.
>>>>   Make sure we log the new EFIs and original EFD in this order:
>>>>     new EFI 1
>>>>     new EFI 2
>>>>     ...
>>>>     new EFI N
>>>>     original EFD
>>>> The original extents are freed with the new EFIs.
>>> 
>>> We may not have the log space available during recovery to explode a
>>> single EFI out into many EFIs like this. The EFI only had enough
>>> space reserved for processing a single EFI, and exploding a single
>>> EFI out like this requires an individual log reservation for each
>>> new EFI. Hence this de-multiplexing process risks running out of log
>>> space and deadlocking before we've been able to process anything.
>>> 
>> 
>> Oh, yes, got it.
>> 
>>> Hence the only option we really have here is to replicate how CUIs
>>> are handled.  We must process the first extent with a whole EFD and
>>> a new EFI containing the remaining unprocessed extents as defered
>>> operations.  i.e.
>>> 
>>> 1. free the first extent in the original EFI
>>> 2. log an EFD for the original EFI
>>> 3. Add all the remaining extents in the original EFI to an xefi chain
>>> 4. Call xfs_defer_ops_capture_and_commit() to create a new EFI from
>>>  the xefi chain and commit the current transaction.
>>> 
>>> xfs_defer_ops_capture_and_commit() will then add a work item to the
>>> defered list which will come back to the new EFI and process it
>>> through the normal runtime deferred ops intent processing path.
>>> 
>> 
>> So you meant this?
>> 
>> Orig EFI with extent1 extent2 extent3
>> free first extent1
>> Full EFD to orig EFI
>> transaction roll,
>> xfs_defer_ops_capture_and_commit() to take care of extent2 and extent3
> 
> No. We do not need a transaction roll there if we rebuild a new
> xefi list with the remaining extents from the original efi. At that
> point, we call:
> 
> <create transaction>
> <free first extent>
> <create new xefi list>
> <create and log EFD>
> xfs_defer_ops_capture_and_commit()
>   xfs_defer_ops_capture()
>     xfs_defer_create_intents()
>       for each tp->t_dfops
>         ->create intent
>   xfs_extent_free_create_intent()
>     create new EFI
>     walk each xefi and add it to the new intent
>     <captures remaining defered work>
>   xfs_trans_commit()
> 
> i.e. xfs_defer_ops_capture_and_commit() can builds new EFI for us
> from the xefi list as part of defering the work that remains to be
> done. Once it has done that, it queues the remaining work and
> commits the transaction. Hence all we need to do in recovery of the
> first extent is free it, create the xefi list and log the full EFD.
> xfs_defer_ops_capture_and_commit() does the rest.

Yes, xfs_defer_ops_capture_and_commit will commit the transaction so we
don’t need a roll.

> 
>> If so, I don’t think it’s safe.
>> Consider that case that kernel panic happened after the transaction roll,
>> during next log replay, the original EFI has the matching EFD, so this EFI
>> is ignored, but actually extent2 and extent3 are not freed.
>> 
>> If you didn’t mean above, but instead this:
>> 
>> Orig EFI with extent1 extent2 extent3
>> free first extent1
>> New EFI extent2 extent3
>> Full EFD to orig EFI
>> transaction roll,
>> xfs_defer_ops_capture_and_commit() to take care of extent2 and extent3
>> 
>> The problem will comeback to the log space issue, are we ensured we have
>> the space for the new EFI? 
> 
> Yes, because logging the full EFD cancels the original EFI and so it
> no longer consumes log space. hence we can log a new EFI using the
> space the original EFI consumed.
> 

Make sense.

>>> The first patch changed that path to only create intents with a
>>> single extent, so the continued defer ops would then do the right
>>> thing with that change in place. However, I think that we also need
>>> the runtime code to process a single extent per intent per commit in
>>> the same manner as above. i.e. we process the first extent in the
>>> intent, then relog all the remaining unprocessed extents as a single
>>> new intent.
>>> 
>>> Note that this is similar to how we already relog intents to roll
>>> them forward in the journal. The only difference for single extent
>>> processing is that an intent relog duplicates the entire extent list
>>> in the EFD and the new EFI, whilst what we want is the new EFI to
>>> contain all the extents except the one we just processed...
>>> 
>> 
>> The problem to me is that where we place the new EFI, it can’t be after the EFD.
>> I explained why above.
> 
> Yes it can. The only thing that matters is that the EFD and new EFI
> are committed in the same transaction. Remember: transactions are
> atomic change sets - either all the changes in the transaction are
> replayed on recovery, or none of them are.

Yes, will try this way.

thanks,
wengang

     prev parent reply	other threads:[~2023-04-21  0:32 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-14 22:58 [PATCH 0/2] xfs: one extent per EFI Wengang Wang
2023-04-14 22:58 ` [PATCH 1/2] xfs: IO time " Wengang Wang
2023-04-19 23:55   ` Dave Chinner
2023-04-20 17:31     ` Wengang Wang
2023-04-20 23:22       ` Dave Chinner
2023-04-21  0:24         ` Wengang Wang
2023-04-21  9:34           ` Dave Chinner
2023-04-21 18:23             ` Wengang Wang
2023-04-22  3:22               ` Wengang Wang
2023-04-24 15:53                 ` Wengang Wang
2023-04-24 22:52                   ` Wengang Wang
2023-04-14 22:58 ` [PATCH 2/2] xfs: log recovery stage split EFIs with multiple extents Wengang Wang
2023-04-20  0:30   ` Dave Chinner
2023-04-20 17:10     ` Wengang Wang
2023-04-20 22:54       ` Dave Chinner
2023-04-21  0:32         ` Wengang Wang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=A6789F19-BE44-4545-B26A-7AE96AB5F4CF@oracle.com \
    --to=wen.gang.wang@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox