From: Long Li <leo.lilong@huawei.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: <chandanbabu@kernel.org>, <linux-xfs@vger.kernel.org>,
<yi.zhang@huawei.com>, <houtao1@huawei.com>,
<yangerkun@huawei.com>
Subject: Re: [PATCH] xfs: ensure submit buffers on LSN boundaries in error handlers
Date: Thu, 4 Jan 2024 21:03:14 +0800 [thread overview]
Message-ID: <20240104130314.GA1815758@ceph-admin> (raw)
In-Reply-To: <20240104020121.GS361584@frogsfrogsfrogs>
On Wed, Jan 03, 2024 at 06:01:21PM -0800, Darrick J. Wong wrote:
> On Thu, Dec 28, 2023 at 08:46:46PM +0800, Long Li wrote:
> > While performing the IO fault injection test, I caught the following data
> > corruption report:
> >
> > XFS (dm-0): Internal error ltbno + ltlen > bno at line 1957 of file fs/xfs/libxfs/xfs_alloc.c. Caller xfs_free_ag_extent+0x79c/0x1130
> > CPU: 3 PID: 33 Comm: kworker/3:0 Not tainted 6.5.0-rc7-next-20230825-00001-g7f8666926889 #214
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
> > Workqueue: xfs-inodegc/dm-0 xfs_inodegc_worker
> > Call Trace:
> > <TASK>
> > dump_stack_lvl+0x50/0x70
> > xfs_corruption_error+0x134/0x150
> > xfs_free_ag_extent+0x7d3/0x1130
> > __xfs_free_extent+0x201/0x3c0
> > xfs_trans_free_extent+0x29b/0xa10
> > xfs_extent_free_finish_item+0x2a/0xb0
> > xfs_defer_finish_noroll+0x8d1/0x1b40
> > xfs_defer_finish+0x21/0x200
> > xfs_itruncate_extents_flags+0x1cb/0x650
> > xfs_free_eofblocks+0x18f/0x250
> > xfs_inactive+0x485/0x570
> > xfs_inodegc_worker+0x207/0x530
> > process_scheduled_works+0x24a/0xe10
> > worker_thread+0x5ac/0xc60
> > kthread+0x2cd/0x3c0
> > ret_from_fork+0x4a/0x80
> > ret_from_fork_asm+0x11/0x20
> > </TASK>
> > XFS (dm-0): Corruption detected. Unmount and run xfs_repair
> >
> > After analyzing the disk image, it was found that the corruption was
> > triggered by the fact that extent was recorded in both the inode and AGF
> > btrees. After a long time of reproduction and analysis, we found that the
> > root cause of the problem was that the AGF btree block was not recovered.
> >
> > Consider the following situation, Transaction A and Transaction B are in
> > the same record, so Transaction A and Transaction B share the same LSN1.
> > If the buf item in Transaction A has been recovered, then the buf item in
> > Transaction B cannot be recovered, because log recovery skips items with a
> > metadata LSN >= the current LSN of the recovery item. If there is still an
> > inode item in transaction B that records the Extent X, the Extent X will
> > be recorded in both the inode and the AGF btree block after transaction B
> > is recovered.
> >
> > |------------Record (LSN1)------------------|---Record (LSN2)---|
> > |----------Trans A------------|-------------Trans B-------------|
> > | Buf Item(Extent X) | Buf Item / Inode item(Extent X) |
> > | Extent X is freed | Extent X is allocated |
> >
> > After commit 12818d24db8a ("xfs: rework log recovery to submit buffers on
> > LSN boundaries") was introduced, we submit buffers on lsn boundaries during
> > log recovery. The above problem can be avoided under normal paths, but it's
> > not guaranteed under abnormal paths. Consider the following process, if an
> > error was encountered after recover buf item in transaction A and before
> > recover buf item in transaction B, buffers that have been added to
> > buffer_list will still be submitted, this violates the submits rule on lsn
> > boundaries. So buf item in Transaction B cannot be recovered on the next
> > mount due to current lsn of transaction equal to metadata lsn on disk.
> >
> > xlog_do_recovery_pass
> > error = xlog_recover_process
> > xlog_recover_process_data
> > ...
> > xlog_recover_buf_commit_pass2
> > xlog_recover_do_reg_buffer //recover buf item in Trans A
> > xfs_buf_delwri_queue(bp, buffer_list)
> > ...
> > ====> Encountered error and returned
> > ...
> > xlog_recover_buf_commit_pass2
> > xlog_recover_do_reg_buffer //recover buf item in Trans B
> > xfs_buf_delwri_queue(bp, buffer_list)
> > if (!list_empty(&buffer_list))
> > xfs_buf_delwri_submit(&buffer_list); //submit regardless of error
> >
> > In order to make sure that submits buffers on lsn boundaries in the
> > abnormal paths, we need to check error status before submit buffers that
> > have been added from the last record processed. If error status exist,
> > buffers in the bufffer_list should be canceled.
>
> What was the error, specifically? I would have though that recovery
> would abort after "Encountered error and returned". Does the recovery
> somehow keep running and then finds the buf item in Trans B?
>
That was not what I meant. I'm just trying to point out that any error
that occurs after recovering buf item in Transaction A and before
recovering buf item in Transaction B can trigger the problem. It
doesn't matter what the error is, for example the buf read error
that occurred during this period.
> Or is the problem here that after the error, xfs submits the delwri
> buffers? And then the user tried to recover a second time, only this
> time the recovery attempt reads Trans B, but then doesn't actually write
> anything because the ondisk buffer now has the same LSN as Trans B?
>
Yes, that's what I want to said. I think I should change the description
of the process that triggered the issue in commit message to avoid
misunderstandings.
Thanks
Long Li
> <confused>
>
> --D
>
> > Canceling the buffers in the buffer_list directly isn't correct, unlike
> > any other place where write list was canceled, these buffers has been
> > initialized by xfs_buf_item_init() during recovery and held by buf
> > item, buf items will not be released in xfs_buf_delwri_cancel(). If
> > these buffers are submitted successfully, buf items assocated with
> > the buffer will be released in io end process. So releasing buf item
> > in write list cacneling process is needed.
> >
> > Fixes: 50d5c8d8e938 ("xfs: check LSN ordering for v5 superblocks during recovery")
> > Signed-off-by: Long Li <leo.lilong@huawei.com>
> > ---
> > fs/xfs/xfs_buf.c | 2 ++
> > fs/xfs/xfs_log_recover.c | 22 +++++++++++++---------
> > 2 files changed, 15 insertions(+), 9 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > index 8e5bd50d29fe..6a1b26aaf97e 100644
> > --- a/fs/xfs/xfs_buf.c
> > +++ b/fs/xfs/xfs_buf.c
> > @@ -2075,6 +2075,8 @@ xfs_buf_delwri_cancel(
> > xfs_buf_lock(bp);
> > bp->b_flags &= ~_XBF_DELWRI_Q;
> > xfs_buf_list_del(bp);
> > + if (bp->b_log_item)
> > + xfs_buf_item_relse(bp);
> > xfs_buf_relse(bp);
> > }
> > }
> > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> > index 1251c81e55f9..2cda6c90890d 100644
> > --- a/fs/xfs/xfs_log_recover.c
> > +++ b/fs/xfs/xfs_log_recover.c
> > @@ -2964,7 +2964,6 @@ xlog_do_recovery_pass(
> > char *offset;
> > char *hbp, *dbp;
> > int error = 0, h_size, h_len;
> > - int error2 = 0;
> > int bblks, split_bblks;
> > int hblks, split_hblks, wrapped_hblks;
> > int i;
> > @@ -3203,16 +3202,21 @@ xlog_do_recovery_pass(
> > bread_err1:
> > kmem_free(hbp);
> >
> > - /*
> > - * Submit buffers that have been added from the last record processed,
> > - * regardless of error status.
> > - */
> > - if (!list_empty(&buffer_list))
> > - error2 = xfs_buf_delwri_submit(&buffer_list);
> > -
> > if (error && first_bad)
> > *first_bad = rhead_blk;
> >
> > + /*
> > + * If there are no error, submit buffers that have been added from the
> > + * last record processed, othrewise cancel the write list, to ensure
> > + * submit buffers on LSN boundaries.
> > + */
> > + if (!list_empty(&buffer_list)) {
> > + if (error)
> > + xfs_buf_delwri_cancel(&buffer_list);
> > + else
> > + error = xfs_buf_delwri_submit(&buffer_list);
> > + }
> > +
> > /*
> > * Transactions are freed at commit time but transactions without commit
> > * records on disk are never committed. Free any that may be left in the
> > @@ -3226,7 +3230,7 @@ xlog_do_recovery_pass(
> > xlog_recover_free_trans(trans);
> > }
> >
> > - return error ? error : error2;
> > + return error;
> > }
> >
> > /*
> > --
> > 2.31.1
> >
> >
>
next prev parent reply other threads:[~2024-01-04 13:00 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-28 12:46 [PATCH] xfs: ensure submit buffers on LSN boundaries in error handlers Long Li
2024-01-04 2:01 ` Darrick J. Wong
2024-01-04 13:03 ` Long Li [this message]
2024-01-07 22:13 ` Dave Chinner
2024-01-08 12:28 ` Long Li
2024-01-08 23:40 ` Dave Chinner
2024-01-09 14:52 ` Brian Foster
2024-01-09 21:43 ` Dave Chinner
2024-01-10 7:03 ` Long Li
2024-01-10 23:08 ` Dave Chinner
2024-01-11 14:54 ` Brian Foster
2024-01-10 8:38 ` Long Li
2024-01-11 0:47 ` Dave Chinner
2024-01-12 12:55 ` Long Li
2024-01-12 18:42 ` Brian Foster
2024-01-15 13:31 ` Long Li
2024-01-15 14:14 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240104130314.GA1815758@ceph-admin \
--to=leo.lilong@huawei.com \
--cc=chandanbabu@kernel.org \
--cc=djwong@kernel.org \
--cc=houtao1@huawei.com \
--cc=linux-xfs@vger.kernel.org \
--cc=yangerkun@huawei.com \
--cc=yi.zhang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox