* [PATCH v2] xfs: fix mount hang during primary superblock recovery failure
@ 2025-01-09 2:13 Long Li
2025-01-09 4:41 ` Darrick J. Wong
2025-01-09 6:10 ` Christoph Hellwig
0 siblings, 2 replies; 5+ messages in thread
From: Long Li @ 2025-01-09 2:13 UTC (permalink / raw)
To: djwong, cem
Cc: linux-xfs, david, yi.zhang, houtao1, leo.lilong, yangerkun,
lonuxli.64
When mounting an image containing a log with sb modifications that require
log replay, the mount process hang all the time and stack as follows:
[root@localhost ~]# cat /proc/557/stack
[<0>] xfs_buftarg_wait+0x31/0x70
[<0>] xfs_buftarg_drain+0x54/0x350
[<0>] xfs_mountfs+0x66e/0xe80
[<0>] xfs_fs_fill_super+0x7f1/0xec0
[<0>] get_tree_bdev_flags+0x186/0x280
[<0>] get_tree_bdev+0x18/0x30
[<0>] xfs_fs_get_tree+0x1d/0x30
[<0>] vfs_get_tree+0x2d/0x110
[<0>] path_mount+0xb59/0xfc0
[<0>] do_mount+0x92/0xc0
[<0>] __x64_sys_mount+0xc2/0x160
[<0>] x64_sys_call+0x2de4/0x45c0
[<0>] do_syscall_64+0xa7/0x240
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
During log recovery, while updating the in-memory superblock from the
primary SB buffer, if an error is encountered, such as superblock
corruption occurs or some other reasons, we will proceed to out_release
and release the xfs_buf. However, this is insufficient because the
xfs_buf's log item has already been initialized and the xfs_buf is held
by the buffer log item as follows, the xfs_buf will not be released,
causing the mount thread to hang.
xlog_recover_do_primary_sb_buffer
xlog_recover_do_reg_buffer
xlog_recover_validate_buf_type
xfs_buf_item_init(bp, mp)
The solution is straightforward, we simply need to allow it to be
handled by the normal buffer write process. The filesystem will be
shutdown before the submission of buffer_list in xlog_do_recovery_pass(),
ensuring the correct release of the xfs_buf as follows:
xlog_do_recovery_pass
error = xlog_recover_process
xlog_recover_process_data
xlog_recover_process_ophdr
xlog_recovery_process_trans
...
xlog_recover_buf_commit_pass2
error = xlog_recover_do_primary_sb_buffer
//Encounter error and return
if (error)
goto out_writebuf
...
out_writebuf:
xfs_buf_delwri_queue(bp, buffer_list) //add bp to list
return error
...
if (!list_empty(&buffer_list))
if (error)
xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); //shutdown first
xfs_buf_delwri_submit(&buffer_list); //write buffer in list
__xfs_buf_submit
if (bp->b_mount->m_log && xlog_is_shutdown(bp->b_mount->m_log))
xfs_buf_ioend_fail(bp) //release bp correctly
Fixes: 6a18765b54e2 ("xfs: update the file system geometry after recoverying superblock buffers")
Signed-off-by: Long Li <leo.lilong@huawei.com>
---
v1->v2: Add code comments and add the fixed stack description to the
commit message.
fs/xfs/xfs_buf_item_recover.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
index 3d0c6402cb36..04122bbdd5f3 100644
--- a/fs/xfs/xfs_buf_item_recover.c
+++ b/fs/xfs/xfs_buf_item_recover.c
@@ -1079,7 +1079,7 @@ xlog_recover_buf_commit_pass2(
error = xlog_recover_do_primary_sb_buffer(mp, item, bp, buf_f,
current_lsn);
if (error)
- goto out_release;
+ goto out_writebuf;
/* Update the rt superblock if we have one. */
if (xfs_has_rtsb(mp) && mp->m_rtsb_bp) {
@@ -1096,6 +1096,15 @@ xlog_recover_buf_commit_pass2(
xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn);
}
+ /*
+ * Buffer held by buf log item during 'normal' buffer recovery must
+ * be committed through buffer I/O submission path to ensure proper
+ * release. When error occurs during do sb buffer recovery, log
+ * shutdown will be done before submitting buffer list so that buffers
+ * can be released correctly through ioend failure path.
+ */
+out_writebuf:
+
/*
* Perform delayed write on the buffer. Asynchronous writes will be
* slower when taking into account all the buffers to be flushed.
--
2.39.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2] xfs: fix mount hang during primary superblock recovery failure
2025-01-09 2:13 [PATCH v2] xfs: fix mount hang during primary superblock recovery failure Long Li
@ 2025-01-09 4:41 ` Darrick J. Wong
2025-01-10 1:40 ` Long Li
2025-01-09 6:10 ` Christoph Hellwig
1 sibling, 1 reply; 5+ messages in thread
From: Darrick J. Wong @ 2025-01-09 4:41 UTC (permalink / raw)
To: Long Li; +Cc: cem, linux-xfs, david, yi.zhang, houtao1, yangerkun, lonuxli.64
On Thu, Jan 09, 2025 at 10:13:20AM +0800, Long Li wrote:
> When mounting an image containing a log with sb modifications that require
> log replay, the mount process hang all the time and stack as follows:
>
> [root@localhost ~]# cat /proc/557/stack
> [<0>] xfs_buftarg_wait+0x31/0x70
> [<0>] xfs_buftarg_drain+0x54/0x350
> [<0>] xfs_mountfs+0x66e/0xe80
> [<0>] xfs_fs_fill_super+0x7f1/0xec0
> [<0>] get_tree_bdev_flags+0x186/0x280
> [<0>] get_tree_bdev+0x18/0x30
> [<0>] xfs_fs_get_tree+0x1d/0x30
> [<0>] vfs_get_tree+0x2d/0x110
> [<0>] path_mount+0xb59/0xfc0
> [<0>] do_mount+0x92/0xc0
> [<0>] __x64_sys_mount+0xc2/0x160
> [<0>] x64_sys_call+0x2de4/0x45c0
> [<0>] do_syscall_64+0xa7/0x240
> [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> During log recovery, while updating the in-memory superblock from the
> primary SB buffer, if an error is encountered, such as superblock
> corruption occurs or some other reasons, we will proceed to out_release
> and release the xfs_buf. However, this is insufficient because the
> xfs_buf's log item has already been initialized and the xfs_buf is held
> by the buffer log item as follows, the xfs_buf will not be released,
> causing the mount thread to hang.
Can you post a regression test for us, pretty please? :)
> xlog_recover_do_primary_sb_buffer
> xlog_recover_do_reg_buffer
> xlog_recover_validate_buf_type
> xfs_buf_item_init(bp, mp)
>
> The solution is straightforward, we simply need to allow it to be
> handled by the normal buffer write process. The filesystem will be
> shutdown before the submission of buffer_list in xlog_do_recovery_pass(),
> ensuring the correct release of the xfs_buf as follows:
>
> xlog_do_recovery_pass
> error = xlog_recover_process
> xlog_recover_process_data
> xlog_recover_process_ophdr
> xlog_recovery_process_trans
> ...
> xlog_recover_buf_commit_pass2
> error = xlog_recover_do_primary_sb_buffer
> //Encounter error and return
> if (error)
> goto out_writebuf
> ...
> out_writebuf:
> xfs_buf_delwri_queue(bp, buffer_list) //add bp to list
> return error
> ...
> if (!list_empty(&buffer_list))
> if (error)
> xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); //shutdown first
> xfs_buf_delwri_submit(&buffer_list); //write buffer in list
> __xfs_buf_submit
> if (bp->b_mount->m_log && xlog_is_shutdown(bp->b_mount->m_log))
> xfs_buf_ioend_fail(bp) //release bp correctly
>
Please add:
Cc: <stable@vger.kernel.org> # v6.12
> Fixes: 6a18765b54e2 ("xfs: update the file system geometry after recoverying superblock buffers")
> Signed-off-by: Long Li <leo.lilong@huawei.com>
> ---
> v1->v2: Add code comments and add the fixed stack description to the
> commit message.
> fs/xfs/xfs_buf_item_recover.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
> index 3d0c6402cb36..04122bbdd5f3 100644
> --- a/fs/xfs/xfs_buf_item_recover.c
> +++ b/fs/xfs/xfs_buf_item_recover.c
> @@ -1079,7 +1079,7 @@ xlog_recover_buf_commit_pass2(
> error = xlog_recover_do_primary_sb_buffer(mp, item, bp, buf_f,
> current_lsn);
> if (error)
> - goto out_release;
> + goto out_writebuf;
>
> /* Update the rt superblock if we have one. */
> if (xfs_has_rtsb(mp) && mp->m_rtsb_bp) {
> @@ -1096,6 +1096,15 @@ xlog_recover_buf_commit_pass2(
> xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn);
> }
>
> + /*
> + * Buffer held by buf log item during 'normal' buffer recovery must
> + * be committed through buffer I/O submission path to ensure proper
> + * release. When error occurs during do sb buffer recovery, log
"...during sb buffer recovery..."
and with those two things amended,
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
--D
> + * shutdown will be done before submitting buffer list so that buffers
> + * can be released correctly through ioend failure path.
> + */
> +out_writebuf:
> +
> /*
> * Perform delayed write on the buffer. Asynchronous writes will be
> * slower when taking into account all the buffers to be flushed.
> --
> 2.39.2
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] xfs: fix mount hang during primary superblock recovery failure
2025-01-09 2:13 [PATCH v2] xfs: fix mount hang during primary superblock recovery failure Long Li
2025-01-09 4:41 ` Darrick J. Wong
@ 2025-01-09 6:10 ` Christoph Hellwig
1 sibling, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2025-01-09 6:10 UTC (permalink / raw)
To: Long Li
Cc: djwong, cem, linux-xfs, david, yi.zhang, houtao1, yangerkun,
lonuxli.64
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] xfs: fix mount hang during primary superblock recovery failure
2025-01-09 4:41 ` Darrick J. Wong
@ 2025-01-10 1:40 ` Long Li
2025-01-10 1:43 ` Long Li
0 siblings, 1 reply; 5+ messages in thread
From: Long Li @ 2025-01-10 1:40 UTC (permalink / raw)
To: Darrick J. Wong
Cc: cem, linux-xfs, david, yi.zhang, houtao1, yangerkun, lonuxli.64
On Wed, Jan 08, 2025 at 08:41:42PM -0800, Darrick J. Wong wrote:
> On Thu, Jan 09, 2025 at 10:13:20AM +0800, Long Li wrote:
> > When mounting an image containing a log with sb modifications that require
> > log replay, the mount process hang all the time and stack as follows:
> >
> > [root@localhost ~]# cat /proc/557/stack
> > [<0>] xfs_buftarg_wait+0x31/0x70
> > [<0>] xfs_buftarg_drain+0x54/0x350
> > [<0>] xfs_mountfs+0x66e/0xe80
> > [<0>] xfs_fs_fill_super+0x7f1/0xec0
> > [<0>] get_tree_bdev_flags+0x186/0x280
> > [<0>] get_tree_bdev+0x18/0x30
> > [<0>] xfs_fs_get_tree+0x1d/0x30
> > [<0>] vfs_get_tree+0x2d/0x110
> > [<0>] path_mount+0xb59/0xfc0
> > [<0>] do_mount+0x92/0xc0
> > [<0>] __x64_sys_mount+0xc2/0x160
> > [<0>] x64_sys_call+0x2de4/0x45c0
> > [<0>] do_syscall_64+0xa7/0x240
> > [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >
> > During log recovery, while updating the in-memory superblock from the
> > primary SB buffer, if an error is encountered, such as superblock
> > corruption occurs or some other reasons, we will proceed to out_release
> > and release the xfs_buf. However, this is insufficient because the
> > xfs_buf's log item has already been initialized and the xfs_buf is held
> > by the buffer log item as follows, the xfs_buf will not be released,
> > causing the mount thread to hang.
>
> Can you post a regression test for us, pretty please? :)
>
I performed regression testing by mounting specific images that can be
obtained through fault injection on kernels without metadir feature support.
I can provide it if anyone needs it.The image is big and inconvenient to send
in the mail. The detailed steps are as follows:
1) Kernel Build
- The latest realtime AG update bug [1] remains unfixed
- Build kernel without CONFIG_XFS_RT
2) Mount XFS Image (superblock needs replay, incompatible with metadir and
no realtime subvolume)
3) Mount Result Verification
- Without the current patch: mount thread hangs indefinitely
- With the current patch: mount thread does not hang, but XFS is shut down
The xfstests already have the fault injection test, and this test requires
mounting specific images on specifically-compiled kernels, making it impractical
to incorporate into xfstests.
> > xlog_recover_do_primary_sb_buffer
> > xlog_recover_do_reg_buffer
> > xlog_recover_validate_buf_type
> > xfs_buf_item_init(bp, mp)
> >
> > The solution is straightforward, we simply need to allow it to be
> > handled by the normal buffer write process. The filesystem will be
> > shutdown before the submission of buffer_list in xlog_do_recovery_pass(),
> > ensuring the correct release of the xfs_buf as follows:
> >
> > xlog_do_recovery_pass
> > error = xlog_recover_process
> > xlog_recover_process_data
> > xlog_recover_process_ophdr
> > xlog_recovery_process_trans
> > ...
> > xlog_recover_buf_commit_pass2
> > error = xlog_recover_do_primary_sb_buffer
> > //Encounter error and return
> > if (error)
> > goto out_writebuf
> > ...
> > out_writebuf:
> > xfs_buf_delwri_queue(bp, buffer_list) //add bp to list
> > return error
> > ...
> > if (!list_empty(&buffer_list))
> > if (error)
> > xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); //shutdown first
> > xfs_buf_delwri_submit(&buffer_list); //write buffer in list
> > __xfs_buf_submit
> > if (bp->b_mount->m_log && xlog_is_shutdown(bp->b_mount->m_log))
> > xfs_buf_ioend_fail(bp) //release bp correctly
> >
>
> Please add:
> Cc: <stable@vger.kernel.org> # v6.12
>
> > Fixes: 6a18765b54e2 ("xfs: update the file system geometry after recoverying superblock buffers")
> > Signed-off-by: Long Li <leo.lilong@huawei.com>
> > ---
> > v1->v2: Add code comments and add the fixed stack description to the
> > commit message.
> > fs/xfs/xfs_buf_item_recover.c | 11 ++++++++++-
> > 1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
> > index 3d0c6402cb36..04122bbdd5f3 100644
> > --- a/fs/xfs/xfs_buf_item_recover.c
> > +++ b/fs/xfs/xfs_buf_item_recover.c
> > @@ -1079,7 +1079,7 @@ xlog_recover_buf_commit_pass2(
> > error = xlog_recover_do_primary_sb_buffer(mp, item, bp, buf_f,
> > current_lsn);
> > if (error)
> > - goto out_release;
> > + goto out_writebuf;
> >
> > /* Update the rt superblock if we have one. */
> > if (xfs_has_rtsb(mp) && mp->m_rtsb_bp) {
> > @@ -1096,6 +1096,15 @@ xlog_recover_buf_commit_pass2(
> > xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn);
> > }
> >
> > + /*
> > + * Buffer held by buf log item during 'normal' buffer recovery must
> > + * be committed through buffer I/O submission path to ensure proper
> > + * release. When error occurs during do sb buffer recovery, log
>
> "...during sb buffer recovery..."
>
> and with those two things amended,
> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
>
> --D
Ok, thanks for your review.
Long Li.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] xfs: fix mount hang during primary superblock recovery failure
2025-01-10 1:40 ` Long Li
@ 2025-01-10 1:43 ` Long Li
0 siblings, 0 replies; 5+ messages in thread
From: Long Li @ 2025-01-10 1:43 UTC (permalink / raw)
To: Darrick J. Wong
Cc: cem, linux-xfs, david, yi.zhang, houtao1, yangerkun, lonuxli.64
On Fri, Jan 10, 2025 at 09:40:16AM +0800, Long Li wrote:
> On Wed, Jan 08, 2025 at 08:41:42PM -0800, Darrick J. Wong wrote:
> > On Thu, Jan 09, 2025 at 10:13:20AM +0800, Long Li wrote:
> > > When mounting an image containing a log with sb modifications that require
> > > log replay, the mount process hang all the time and stack as follows:
> > >
> > > [root@localhost ~]# cat /proc/557/stack
> > > [<0>] xfs_buftarg_wait+0x31/0x70
> > > [<0>] xfs_buftarg_drain+0x54/0x350
> > > [<0>] xfs_mountfs+0x66e/0xe80
> > > [<0>] xfs_fs_fill_super+0x7f1/0xec0
> > > [<0>] get_tree_bdev_flags+0x186/0x280
> > > [<0>] get_tree_bdev+0x18/0x30
> > > [<0>] xfs_fs_get_tree+0x1d/0x30
> > > [<0>] vfs_get_tree+0x2d/0x110
> > > [<0>] path_mount+0xb59/0xfc0
> > > [<0>] do_mount+0x92/0xc0
> > > [<0>] __x64_sys_mount+0xc2/0x160
> > > [<0>] x64_sys_call+0x2de4/0x45c0
> > > [<0>] do_syscall_64+0xa7/0x240
> > > [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > >
> > > During log recovery, while updating the in-memory superblock from the
> > > primary SB buffer, if an error is encountered, such as superblock
> > > corruption occurs or some other reasons, we will proceed to out_release
> > > and release the xfs_buf. However, this is insufficient because the
> > > xfs_buf's log item has already been initialized and the xfs_buf is held
> > > by the buffer log item as follows, the xfs_buf will not be released,
> > > causing the mount thread to hang.
> >
> > Can you post a regression test for us, pretty please? :)
> >
>
> I performed regression testing by mounting specific images that can be
> obtained through fault injection on kernels without metadir feature support.
> I can provide it if anyone needs it.The image is big and inconvenient to send
> in the mail. The detailed steps are as follows:
>
> 1) Kernel Build
> - The latest realtime AG update bug [1] remains unfixed
> - Build kernel without CONFIG_XFS_RT
>
> 2) Mount XFS Image (superblock needs replay, incompatible with metadir and
> no realtime subvolume)
>
> 3) Mount Result Verification
> - Without the current patch: mount thread hangs indefinitely
> - With the current patch: mount thread does not hang, but XFS is shut down
>
> The xfstests already have the fault injection test, and this test requires
> mounting specific images on specifically-compiled kernels, making it impractical
> to incorporate into xfstests.
>
Sorry, forgot to add the fixed patch link:
[1] https://patchwork.kernel.org/project/xfs/patch/20241217042737.1755365-1-hch@lst.de/
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-01-10 1:47 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-09 2:13 [PATCH v2] xfs: fix mount hang during primary superblock recovery failure Long Li
2025-01-09 4:41 ` Darrick J. Wong
2025-01-10 1:40 ` Long Li
2025-01-10 1:43 ` Long Li
2025-01-09 6:10 ` Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.