From: Dave Chinner <david@fromorbit.com>
To: Wengang Wang <wen.gang.wang@oracle.com>
Cc: "linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
Srikanth C S <srikanth.c.s@oracle.com>
Subject: Re: Question: reserve log space at IO time for recover
Date: Wed, 19 Jul 2023 10:11:03 +1000 [thread overview]
Message-ID: <ZLcqF2/7ZBI44C65@dread.disaster.area> (raw)
In-Reply-To: <1DB9F8BB-4A7C-4422-B447-90A08E310E17@oracle.com>
On Tue, Jul 18, 2023 at 10:57:38PM +0000, Wengang Wang wrote:
> Hi,
>
> I have a XFS metadump (was running with 4.14.35 plussing some back ported patches),
> mounting it (log recover) hang at log space reservation. There is 181760 bytes on-disk
> free journal space, while the transaction needs to reserve 360416 bytes to start the recovery.
> Thus the mount hangs for ever.
Most likely something went wrong at runtime on the 4.14.35 kernel
prior to the crash, leaving the on-disk state in an impossible to
recover state. Likely an accounting leak in a transaction
reservation somewhere, likely in passing the space used from the
transaction to the CIL. We've had bugs in this area before, they
eventually manifest in log hangs like this either at runtime or
during recovery...
> That happens with 4.14.35 kernel and also upstream
> kernel (6.4.0).
Upgrading the kernel won't fix recovery - it is likely that the
journal state on disk is invalid and so the mount cannot complete
> The is the related stack dumping (6.4.0 kernel):
>
> [<0>] xlog_grant_head_wait+0xbd/0x200 [xfs]
> [<0>] xlog_grant_head_check+0xd9/0x100 [xfs]
> [<0>] xfs_log_reserve+0xbc/0x1e0 [xfs]
> [<0>] xfs_trans_reserve+0x138/0x170 [xfs]
> [<0>] xfs_trans_alloc+0xe8/0x220 [xfs]
> [<0>] xfs_efi_item_recover+0x110/0x250 [xfs]
> [<0>] xlog_recover_process_intents.isra.28+0xba/0x2d0 [xfs]
> [<0>] xlog_recover_finish+0x33/0x310 [xfs]
> [<0>] xfs_log_mount_finish+0xdb/0x160 [xfs]
> [<0>] xfs_mountfs+0x51c/0x900 [xfs]
> [<0>] xfs_fs_fill_super+0x4b8/0x940 [xfs]
> [<0>] get_tree_bdev+0x193/0x280
> [<0>] vfs_get_tree+0x26/0xd0
> [<0>] path_mount+0x69d/0x9b0
> [<0>] do_mount+0x7d/0xa0
> [<0>] __x64_sys_mount+0xdc/0x100
> [<0>] do_syscall_64+0x3b/0x90
> [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
>
> Thus we can say 4.14.35 kernel didn’t reserve log space at IO time to make log recover
> safe. Upstream kernel doesn’t do that either if I read the source code right (I might be wrong).
Sure they do.
Log space usage is what the grant heads track; transactions are not
allowed to start if there isn't both reserve and write grant head
space available for them, and transaction rolls get held until there
is write grant space available for them (i.e. they can block in
xfs_trans_roll() -> xfs_trans_reserve() waiting for write grant head
space).
There have been bugs in the grant head accounting mechanisms in the
past, there may well still be bugs in it. But it is the grant head
mechanisms that is supposed to guarantee there is always space in
the journal for a transaction to commit, and by extension, ensure
that we always have space in the journal for a transaction to be
fully recovered.
> So shall we reserve proper amount of log space at IO time, call it Unflush-Reserve, to
> ensure log recovery safe? The number of UR is determined by current un flushed log items.
> It gets increased just after transaction is committed and gets decreased when log items are
> flushed. With the UR, we are safe to have enough log space for the transactions used by log
> recovery.
The grant heads already track log space usage and reservations like
this. If you want to learn more about the nitty gritty details, look
at this patch set that is aimed at changing how the grant heads
track the used/reserved log space to improve performance:
https://lore.kernel.org/linux-xfs/20221220232308.3482960-1-david@fromorbit.com/
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2023-07-19 0:11 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-18 22:57 Question: reserve log space at IO time for recover Wengang Wang
2023-07-19 0:11 ` Dave Chinner [this message]
2023-07-19 1:44 ` Darrick J. Wong
2023-07-19 6:25 ` Dave Chinner
2023-07-21 19:36 ` Wengang Wang
2023-07-24 0:57 ` Dave Chinner
2023-07-24 18:03 ` Wengang Wang
2023-07-26 4:08 ` Dave Chinner
2023-07-26 15:23 ` Darrick J. Wong
2023-07-27 1:05 ` Dave Chinner
2023-07-28 17:56 ` Wengang Wang
2023-08-18 3:25 ` Wengang Wang
2023-08-21 22:06 ` Wengang Wang
2023-08-24 5:05 ` Darrick J. Wong
2023-08-24 22:55 ` Wengang Wang
2023-08-24 4:52 ` Darrick J. Wong
2023-08-24 7:28 ` Dave Chinner
2023-08-24 22:01 ` Darrick J. Wong
2023-08-26 3:37 ` Dave Chinner
2023-08-27 16:04 ` Darrick J. Wong
2023-08-24 23:53 ` Wengang Wang
2023-07-19 1:46 ` Wengang Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZLcqF2/7ZBI44C65@dread.disaster.area \
--to=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
--cc=srikanth.c.s@oracle.com \
--cc=wen.gang.wang@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox