All of lore.kernel.org
 help / color / mirror / Atom feed
From: Long Li <leo.lilong@huawei.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: <cem@kernel.org>, <linux-xfs@vger.kernel.org>,
	<david@fromorbit.com>, <yi.zhang@huawei.com>,
	<houtao1@huawei.com>, <yangerkun@huawei.com>,
	<lonuxli.64@gmail.com>
Subject: Re: [PATCH v2] xfs: fix mount hang during primary superblock recovery failure
Date: Fri, 10 Jan 2025 09:40:15 +0800	[thread overview]
Message-ID: <Z4B6f9DdpQX0IIbj@localhost.localdomain> (raw)
In-Reply-To: <20250109044142.GM1306365@frogsfrogsfrogs>

On Wed, Jan 08, 2025 at 08:41:42PM -0800, Darrick J. Wong wrote:
> On Thu, Jan 09, 2025 at 10:13:20AM +0800, Long Li wrote:
> > When mounting an image containing a log with sb modifications that require
> > log replay, the mount process hang all the time and stack as follows:
> > 
> >   [root@localhost ~]# cat /proc/557/stack
> >   [<0>] xfs_buftarg_wait+0x31/0x70
> >   [<0>] xfs_buftarg_drain+0x54/0x350
> >   [<0>] xfs_mountfs+0x66e/0xe80
> >   [<0>] xfs_fs_fill_super+0x7f1/0xec0
> >   [<0>] get_tree_bdev_flags+0x186/0x280
> >   [<0>] get_tree_bdev+0x18/0x30
> >   [<0>] xfs_fs_get_tree+0x1d/0x30
> >   [<0>] vfs_get_tree+0x2d/0x110
> >   [<0>] path_mount+0xb59/0xfc0
> >   [<0>] do_mount+0x92/0xc0
> >   [<0>] __x64_sys_mount+0xc2/0x160
> >   [<0>] x64_sys_call+0x2de4/0x45c0
> >   [<0>] do_syscall_64+0xa7/0x240
> >   [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > 
> > During log recovery, while updating the in-memory superblock from the
> > primary SB buffer, if an error is encountered, such as superblock
> > corruption occurs or some other reasons, we will proceed to out_release
> > and release the xfs_buf. However, this is insufficient because the
> > xfs_buf's log item has already been initialized and the xfs_buf is held
> > by the buffer log item as follows, the xfs_buf will not be released,
> > causing the mount thread to hang.
> 
> Can you post a regression test for us, pretty please? :)
> 

I performed regression testing by mounting specific images that can be
obtained through fault injection on kernels without metadir feature support.
I can provide it if anyone needs it.The image is big and inconvenient to send
in the mail. The detailed steps are as follows:

1) Kernel Build
  - The latest realtime AG update bug [1] remains unfixed
  - Build kernel without CONFIG_XFS_RT
  
2) Mount XFS Image (superblock needs replay, incompatible with metadir and
   no realtime subvolume)

3) Mount Result Verification
  - Without the current patch: mount thread hangs indefinitely
  - With the current patch: mount thread does not hang, but XFS is shut down

The xfstests already have the fault injection test, and this test requires
mounting specific images on specifically-compiled kernels, making it impractical
to incorporate into xfstests.

> >   xlog_recover_do_primary_sb_buffer
> >     xlog_recover_do_reg_buffer
> >       xlog_recover_validate_buf_type
> >         xfs_buf_item_init(bp, mp)
> > 
> > The solution is straightforward, we simply need to allow it to be
> > handled by the normal buffer write process. The filesystem will be
> > shutdown before the submission of buffer_list in xlog_do_recovery_pass(),
> > ensuring the correct release of the xfs_buf as follows:
> > 
> >   xlog_do_recovery_pass
> >     error = xlog_recover_process
> >       xlog_recover_process_data
> >         xlog_recover_process_ophdr
> >           xlog_recovery_process_trans
> >             ...
> >               xlog_recover_buf_commit_pass2
> >                 error = xlog_recover_do_primary_sb_buffer
> >                   //Encounter error and return
> >                 if (error)
> >                   goto out_writebuf
> >                 ...
> >               out_writebuf:
> >                 xfs_buf_delwri_queue(bp, buffer_list) //add bp to list
> >                 return  error
> >             ...
> >     if (!list_empty(&buffer_list))
> >       if (error)
> >         xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); //shutdown first
> >       xfs_buf_delwri_submit(&buffer_list); //write buffer in list
> >         __xfs_buf_submit
> >           if (bp->b_mount->m_log && xlog_is_shutdown(bp->b_mount->m_log))
> >             xfs_buf_ioend_fail(bp)  //release bp correctly
> > 
> 
> Please add:
> Cc: <stable@vger.kernel.org> # v6.12
> 
> > Fixes: 6a18765b54e2 ("xfs: update the file system geometry after recoverying superblock buffers")
> > Signed-off-by: Long Li <leo.lilong@huawei.com>
> > ---
> > v1->v2: Add code comments and add the fixed stack description to the 
> >         commit message.
> >  fs/xfs/xfs_buf_item_recover.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
> > index 3d0c6402cb36..04122bbdd5f3 100644
> > --- a/fs/xfs/xfs_buf_item_recover.c
> > +++ b/fs/xfs/xfs_buf_item_recover.c
> > @@ -1079,7 +1079,7 @@ xlog_recover_buf_commit_pass2(
> >  		error = xlog_recover_do_primary_sb_buffer(mp, item, bp, buf_f,
> >  				current_lsn);
> >  		if (error)
> > -			goto out_release;
> > +			goto out_writebuf;
> >  
> >  		/* Update the rt superblock if we have one. */
> >  		if (xfs_has_rtsb(mp) && mp->m_rtsb_bp) {
> > @@ -1096,6 +1096,15 @@ xlog_recover_buf_commit_pass2(
> >  		xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn);
> >  	}
> >  
> > +	/*
> > +	 * Buffer held by buf log item during 'normal' buffer recovery must
> > +	 * be committed through buffer I/O submission path to ensure proper
> > +	 * release. When error occurs during do sb buffer recovery, log
> 
> "...during sb buffer recovery..."
> 
> and with those two things amended,
> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
> 
> --D

Ok, thanks for your review.

Long Li.

  reply	other threads:[~2025-01-10  1:44 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-09  2:13 [PATCH v2] xfs: fix mount hang during primary superblock recovery failure Long Li
2025-01-09  4:41 ` Darrick J. Wong
2025-01-10  1:40   ` Long Li [this message]
2025-01-10  1:43     ` Long Li
2025-01-09  6:10 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z4B6f9DdpQX0IIbj@localhost.localdomain \
    --to=leo.lilong@huawei.com \
    --cc=cem@kernel.org \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=houtao1@huawei.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=lonuxli.64@gmail.com \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.