From: Long Li <leo.lilong@huawei.com>
To: Dave Chinner <david@fromorbit.com>,
"Darrick J. Wong" <djwong@kernel.org>
Cc: <linux-xfs@vger.kernel.org>, <houtao1@huawei.com>,
<yi.zhang@huawei.com>, <guoxuenan@huawei.com>
Subject: Re: [PATCH v2] xfs: fix hung when transaction commit fail in xfs_inactive_ifree
Date: Thu, 9 Mar 2023 22:10:30 +0800 [thread overview]
Message-ID: <20230309141030.GA2546427@ceph-admin> (raw)
In-Reply-To: <20230301050135.GG360264@dread.disaster.area>
On Wed, Mar 01, 2023 at 04:01:35PM +1100, Dave Chinner wrote:
> On Tue, Feb 28, 2023 at 05:05:26PM -0800, Darrick J. Wong wrote:
> > On Mon, Feb 27, 2023 at 02:29:52PM +0800, Long Li wrote:
> > > After running unplug disk test and unmount filesystem, the umount thread
> > > hung all the time.
> > >
> > > crash> dmesg
> > > sd 0:0:0:0: rejecting I/O to offline device
> > > XFS (sda): log I/O error -5
> > > XFS (sda): Corruption of in-memory data (0x8) detected at xfs_defer_finish_noroll+0x12e0/0x1cf0
> > > (fs/xfs/libxfs/xfs_defer.c:504). Shutting down filesystem.
> > > XFS (sda): Please unmount the filesystem and rectify the problem(s)
> > > XFS (sda): xfs_inactive_ifree: xfs_trans_commit returned error -5
> > > XFS (sda): Unmounting Filesystem
> > >
> > > crash> bt 3368
> > > PID: 3368 TASK: ffff88801bcd8040 CPU: 3 COMMAND: "umount"
> > > #0 [ffffc900086a7ae0] __schedule at ffffffff83d3fd25
> > > #1 [ffffc900086a7be8] schedule at ffffffff83d414dd
> > > #2 [ffffc900086a7c10] xfs_ail_push_all_sync at ffffffff8256db24
> > > #3 [ffffc900086a7d18] xfs_unmount_flush_inodes at ffffffff824ee7e2
> > > #4 [ffffc900086a7d28] xfs_unmountfs at ffffffff824f2eff
> > > #5 [ffffc900086a7da8] xfs_fs_put_super at ffffffff82503e69
> > > #6 [ffffc900086a7de8] generic_shutdown_super at ffffffff81aeb8cd
> > > #7 [ffffc900086a7e10] kill_block_super at ffffffff81aefcfa
> > > #8 [ffffc900086a7e30] deactivate_locked_super at ffffffff81aeb2da
> > > #9 [ffffc900086a7e48] deactivate_super at ffffffff81aeb639
> > > #10 [ffffc900086a7e68] cleanup_mnt at ffffffff81b6ddd5
> > > #11 [ffffc900086a7ea0] __cleanup_mnt at ffffffff81b6dfdf
> > > #12 [ffffc900086a7eb0] task_work_run at ffffffff8126e5cf
> > > #13 [ffffc900086a7ef8] exit_to_user_mode_prepare at ffffffff813fa136
> > > #14 [ffffc900086a7f28] syscall_exit_to_user_mode at ffffffff83d25dbb
> > > #15 [ffffc900086a7f40] do_syscall_64 at ffffffff83d1f8d9
> > > #16 [ffffc900086a7f50] entry_SYSCALL_64_after_hwframe at ffffffff83e00085
> > >
> > > When we free a cluster buffer from xfs_ifree_cluster, all the inodes in
> > > cache are marked XFS_ISTALE. On journal commit dirty stale inodes as are
> > > handled by both buffer and inode log items, inodes marked as XFS_ISTALE
> > > in AIL will be removed from the AIL because the buffer log item will clean
> > > it. If the transaction commit fails in the xfs_inactive_ifree(), inodes
> > > marked as XFS_ISTALE will be left in AIL due to buf log item is not
> > > committed,
> >
> > Ah. So the inode log items *are* in the AIL, but the buffer log item
> > for the inode cluster buffer is /not/ in the AIL?
>
> Which is the rare case, and I think can only happen if an unlinked
> file is held open until the unlinked list mods that last logged the
> buffer have been written to disk. We can keep modifying the inode
> and having it logged while the buffer is clean and has no active log
> item...
>
Yes, buffer log item may be in the AIL, but it could be delete from AIL by
xfs_buf_item_release when transaction commit fail.
>
> > Is it possible for neither inode nor cluster buffer are in the AIL?
> > I think the answer is no because freeing the inode will put it in the
> > AIL?
>
> I think the answer is yes, because after an unlink the buffer log
> item should be in the AIL at the same LSN as the inode log item when
> the unlink transaction updated both of them. Pushing a dirty inode
> flush all the dirty inodes and so both the inode and buffer items
> get cleaned and removed from the AIL in the same IO completion.
>
> Hence if the unlinked inode has been held open long enough for
> metadata writeback to complete, close() can trigger inactivation on
> both a clean inode cluster buffer and clean inode log item. i.e.
> neither are in the AIL at the time the inactivation and inode chunk
> freeing starts, and the commit has to insert both.
>
>
> > > @@ -679,6 +681,19 @@ xfs_buf_item_release(
> > > (ordered && dirty && !xfs_buf_item_dirty_format(bip)));
> > > ASSERT(!stale || (bip->__bli_format.blf_flags & XFS_BLF_CANCEL));
> > >
> > > + /*
> > > + * If it is an inode buffer and item marked as stale, abort flushing
> > > + * inodes associated with the buf, prevent inode item left in AIL.
> > > + */
> > > + if (aborted && (bip->bli_flags & XFS_BLI_STALE_INODE)) {
> > > + list_for_each_entry_safe(lp, n, &bp->b_li_list, li_bio_list) {
> > > + iip = (struct xfs_inode_log_item *)lp;
> >
> > Use container_of(), not raw casting.
> >
> > > +
> > > + if (xfs_iflags_test(iip->ili_inode, XFS_ISTALE))
> > > + xfs_iflush_abort(iip->ili_inode);
> > > + }
> > > + }
> > > +
>
> This is closer to the sort of fix that is needed, but this should
> not be done until the last reference to the buf log item goes away.
> i.e. in xfs_buf_item_put().
>
Agree with you, Take a look at the other code, aborting stale inodes are
after last reference to the buf log item.
> But then I look at the same conditions in xfs_buf_item_unpin(),
> which is the normal path that runs this stale inode cleanup, it does
> this for stale buffers if it drops the last reference to the buf log
> item.
>
> /*
> * If we get called here because of an IO error, we may or may
> * not have the item on the AIL. xfs_trans_ail_delete() will
> * take care of that situation. xfs_trans_ail_delete() drops
> * the AIL lock.
> */
> if (bip->bli_flags & XFS_BLI_STALE_INODE) {
> xfs_buf_item_done(bp);
> xfs_buf_inode_iodone(bp);
> ASSERT(list_empty(&bp->b_li_list));
> } else {
> xfs_trans_ail_delete(lip, SHUTDOWN_LOG_IO_ERROR);
> xfs_buf_item_relse(bp);
> ASSERT(bp->b_log_item == NULL);
> }
> xfs_buf_relse(bp);
>
> And if the buffer is not stale, then it runs it through
> xfs_buf_ioend_fail() to actually mark the attached log items as
> failed.
>
> So it seems to me that the cleanup needed here is more complex than
> unconditionally aborting stale inodes, but I haven't had a chance to
> think it through fully yet. This is one of the more complex corners
> of the buffer/inode item life cycles, and it's been a source of
> shutdown issues for a long time....
>
Sorry I replied so late, I tried to think a little clearer. The normal path
that runs stale inode cleanup in xfs_buf_item_unpin() just release buf log
item and aborting stale inodes, three will be no non-stale inode in the buf
b_li_list list because of buf lock, so I think it is enougth aborting stale
inodes. There may be something I haven't thought through, thanks for pointing
it out.
As you said, it is a complex corners of the buffer/inode item life cycles.
The problem [1] about inode log item UAF that I tried to solve previously,
it can't be solved simply due to inode log item life cycles, and I don't
have a new solution yet.
[1] https://patchwork.kernel.org/project/xfs/patch/20230211022941.GA1515023@ceph-admin/
Thanks
Long Li
prev parent reply other threads:[~2023-03-09 13:47 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-27 6:29 [PATCH v2] xfs: fix hung when transaction commit fail in xfs_inactive_ifree Long Li
2023-03-01 1:05 ` Darrick J. Wong
2023-03-01 5:01 ` Dave Chinner
2023-03-09 14:10 ` Long Li [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230309141030.GA2546427@ceph-admin \
--to=leo.lilong@huawei.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=guoxuenan@huawei.com \
--cc=houtao1@huawei.com \
--cc=linux-xfs@vger.kernel.org \
--cc=yi.zhang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox