From: Wengang Wang <wen.gang.wang@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>
Subject: Re: [PATCH 2/2] xfs: log recovery stage split EFIs with multiple extents
Date: Fri, 21 Apr 2023 00:32:15 +0000 [thread overview]
Message-ID: <A6789F19-BE44-4545-B26A-7AE96AB5F4CF@oracle.com> (raw)
In-Reply-To: <20230420225418.GY3223426@dread.disaster.area>
> On Apr 20, 2023, at 3:54 PM, Dave Chinner <david@fromorbit.com> wrote:
>
> On Thu, Apr 20, 2023 at 05:10:42PM +0000, Wengang Wang wrote:
>>
>>
>>> On Apr 19, 2023, at 5:30 PM, Dave Chinner <david@fromorbit.com> wrote:
>>>
>>> On Fri, Apr 14, 2023 at 03:58:36PM -0700, Wengang Wang wrote:
>>>> At log recovery stage, we need to split EFIs with multiple extents. For each
>>>> orginal multiple-extent EFI, split it into new EFIs each including one extent
>>>> from the original EFI. By that we avoid deadlock when allocating blocks for
>>>> AGFL waiting for the held busy extents by current transaction to be flushed.
>>>>
>>>> For the original EFI, the process is
>>>> 1. Create and log new EFIs each covering one extent from the
>>>> original EFI.
>>>> 2. Don't free extent with the original EFI.
>>>> 3. Log EFD for the original EFI.
>>>> Make sure we log the new EFIs and original EFD in this order:
>>>> new EFI 1
>>>> new EFI 2
>>>> ...
>>>> new EFI N
>>>> original EFD
>>>> The original extents are freed with the new EFIs.
>>>
>>> We may not have the log space available during recovery to explode a
>>> single EFI out into many EFIs like this. The EFI only had enough
>>> space reserved for processing a single EFI, and exploding a single
>>> EFI out like this requires an individual log reservation for each
>>> new EFI. Hence this de-multiplexing process risks running out of log
>>> space and deadlocking before we've been able to process anything.
>>>
>>
>> Oh, yes, got it.
>>
>>> Hence the only option we really have here is to replicate how CUIs
>>> are handled. We must process the first extent with a whole EFD and
>>> a new EFI containing the remaining unprocessed extents as defered
>>> operations. i.e.
>>>
>>> 1. free the first extent in the original EFI
>>> 2. log an EFD for the original EFI
>>> 3. Add all the remaining extents in the original EFI to an xefi chain
>>> 4. Call xfs_defer_ops_capture_and_commit() to create a new EFI from
>>> the xefi chain and commit the current transaction.
>>>
>>> xfs_defer_ops_capture_and_commit() will then add a work item to the
>>> defered list which will come back to the new EFI and process it
>>> through the normal runtime deferred ops intent processing path.
>>>
>>
>> So you meant this?
>>
>> Orig EFI with extent1 extent2 extent3
>> free first extent1
>> Full EFD to orig EFI
>> transaction roll,
>> xfs_defer_ops_capture_and_commit() to take care of extent2 and extent3
>
> No. We do not need a transaction roll there if we rebuild a new
> xefi list with the remaining extents from the original efi. At that
> point, we call:
>
> <create transaction>
> <free first extent>
> <create new xefi list>
> <create and log EFD>
> xfs_defer_ops_capture_and_commit()
> xfs_defer_ops_capture()
> xfs_defer_create_intents()
> for each tp->t_dfops
> ->create intent
> xfs_extent_free_create_intent()
> create new EFI
> walk each xefi and add it to the new intent
> <captures remaining defered work>
> xfs_trans_commit()
>
> i.e. xfs_defer_ops_capture_and_commit() can builds new EFI for us
> from the xefi list as part of defering the work that remains to be
> done. Once it has done that, it queues the remaining work and
> commits the transaction. Hence all we need to do in recovery of the
> first extent is free it, create the xefi list and log the full EFD.
> xfs_defer_ops_capture_and_commit() does the rest.
Yes, xfs_defer_ops_capture_and_commit will commit the transaction so we
don’t need a roll.
>
>> If so, I don’t think it’s safe.
>> Consider that case that kernel panic happened after the transaction roll,
>> during next log replay, the original EFI has the matching EFD, so this EFI
>> is ignored, but actually extent2 and extent3 are not freed.
>>
>> If you didn’t mean above, but instead this:
>>
>> Orig EFI with extent1 extent2 extent3
>> free first extent1
>> New EFI extent2 extent3
>> Full EFD to orig EFI
>> transaction roll,
>> xfs_defer_ops_capture_and_commit() to take care of extent2 and extent3
>>
>> The problem will comeback to the log space issue, are we ensured we have
>> the space for the new EFI?
>
> Yes, because logging the full EFD cancels the original EFI and so it
> no longer consumes log space. hence we can log a new EFI using the
> space the original EFI consumed.
>
Make sense.
>>> The first patch changed that path to only create intents with a
>>> single extent, so the continued defer ops would then do the right
>>> thing with that change in place. However, I think that we also need
>>> the runtime code to process a single extent per intent per commit in
>>> the same manner as above. i.e. we process the first extent in the
>>> intent, then relog all the remaining unprocessed extents as a single
>>> new intent.
>>>
>>> Note that this is similar to how we already relog intents to roll
>>> them forward in the journal. The only difference for single extent
>>> processing is that an intent relog duplicates the entire extent list
>>> in the EFD and the new EFI, whilst what we want is the new EFI to
>>> contain all the extents except the one we just processed...
>>>
>>
>> The problem to me is that where we place the new EFI, it can’t be after the EFD.
>> I explained why above.
>
> Yes it can. The only thing that matters is that the EFD and new EFI
> are committed in the same transaction. Remember: transactions are
> atomic change sets - either all the changes in the transaction are
> replayed on recovery, or none of them are.
Yes, will try this way.
thanks,
wengang
prev parent reply other threads:[~2023-04-21 0:32 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-14 22:58 [PATCH 0/2] xfs: one extent per EFI Wengang Wang
2023-04-14 22:58 ` [PATCH 1/2] xfs: IO time " Wengang Wang
2023-04-19 23:55 ` Dave Chinner
2023-04-20 17:31 ` Wengang Wang
2023-04-20 23:22 ` Dave Chinner
2023-04-21 0:24 ` Wengang Wang
2023-04-21 9:34 ` Dave Chinner
2023-04-21 18:23 ` Wengang Wang
2023-04-22 3:22 ` Wengang Wang
2023-04-24 15:53 ` Wengang Wang
2023-04-24 22:52 ` Wengang Wang
2023-04-14 22:58 ` [PATCH 2/2] xfs: log recovery stage split EFIs with multiple extents Wengang Wang
2023-04-20 0:30 ` Dave Chinner
2023-04-20 17:10 ` Wengang Wang
2023-04-20 22:54 ` Dave Chinner
2023-04-21 0:32 ` Wengang Wang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=A6789F19-BE44-4545-B26A-7AE96AB5F4CF@oracle.com \
--to=wen.gang.wang@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox