From: Jan Kara <jack@suse.cz>
To: Christian Brauner <brauner@kernel.org>
Cc: Ye Bin <yebin@huaweicloud.com>,
viro@zeniv.linux.org.uk, jack@suse.cz,
linux-fsdevel@vger.kernel.org, axboe@kernel.dk,
linux-block@vger.kernel.org, agruenba@redhat.com,
gfs2@lists.linux.dev, amir73il@gmail.com, mic@digikod.net,
gnoack@google.com, paul@paul-moore.com, jmorris@namei.org,
serge@hallyn.com, linux-security-module@vger.kernel.org,
yebin10@huawei.com, zhangxiaoxu5@huawei.com,
Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH 10/11] fs: fix hungtask due to repeated traversal of inodes list
Date: Wed, 4 Dec 2024 15:16:20 +0100 [thread overview]
Message-ID: <20241204141620.vgklclfh5guezcvb@quack3> (raw)
In-Reply-To: <20241204-worden-tontechnik-3ce77e9f3bad@brauner>
On Wed 04-12-24 12:17:49, Christian Brauner wrote:
> On Mon, Nov 18, 2024 at 07:45:07PM +0800, Ye Bin wrote:
> > From: Ye Bin <yebin10@huawei.com>
> >
> > There's a issue when remove scsi disk, the invalidate_inodes() function
> > cannot exit for a long time, then trigger hungtask:
> > INFO: task kworker/56:0:1391396 blocked for more than 122 seconds.
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Workqueue: events_freezable virtscsi_handle_event [virtio_scsi]
> > Call Trace:
> > __schedule+0x33c/0x7f0
> > schedule+0x46/0xb0
> > schedule_preempt_disabled+0xa/0x10
> > __mutex_lock.constprop.0+0x22b/0x490
> > mutex_lock+0x52/0x70
> > scsi_scan_target+0x6d/0xf0
> > virtscsi_handle_event+0x152/0x1a0 [virtio_scsi]
> > process_one_work+0x1b2/0x350
> > worker_thread+0x49/0x310
> > kthread+0xfb/0x140
> > ret_from_fork+0x1f/0x30
> >
> > PID: 540499 TASK: ffff9b15e504c080 CPU: 44 COMMAND: "kworker/44:0"
> > Call trace:
> > invalidate_inodes at ffffffff8f3b4784
> > __invalidate_device at ffffffff8f3dfea3
> > invalidate_partition at ffffffff8f526b49
> > del_gendisk at ffffffff8f5280fb
> > sd_remove at ffffffffc0186455 [sd_mod]
> > __device_release_driver at ffffffff8f738ab2
> > device_release_driver at ffffffff8f738bc4
> > bus_remove_device at ffffffff8f737f66
> > device_del at ffffffff8f73341b
> > __scsi_remove_device at ffffffff8f780340
> > scsi_remove_device at ffffffff8f7803a2
> > virtscsi_handle_event at ffffffffc017204f [virtio_scsi]
> > process_one_work at ffffffff8f1041f2
> > worker_thread at ffffffff8f104789
> > kthread at ffffffff8f109abb
> > ret_from_fork at ffffffff8f001d6f
> >
> > As commit 04646aebd30b ("fs: avoid softlockups in s_inodes iterators")
> > introduces the retry logic. In the problem environment, the 'i_count'
> > of millions of files is not zero. As a result, the time slice for each
> > traversal to the matching inode process is almost used up, and then the
> > traversal is started from scratch. The worst-case scenario is that only
> > one inode can be processed after each wakeup. Because this process holds
> > a lock, other processes will be stuck for a long time, causing a series
> > of problems.
> > To solve the problem of repeated traversal from the beginning, each time
> > the CPU needs to be freed, a cursor is inserted into the linked list, and
> > the traversal continues from the cursor next time.
> >
> > Fixes: 04646aebd30b ("fs: avoid softlockups in s_inodes iterators")
> > Signed-off-by: Ye Bin <yebin10@huawei.com>
> > ---
> > fs/inode.c | 22 ++++++++++++++++++++--
> > 1 file changed, 20 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/inode.c b/fs/inode.c
> > index dc966990bda6..b78895af8779 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -857,11 +857,16 @@ static void dispose_list(struct list_head *head)
> > void evict_inodes(struct super_block *sb)
> > {
> > struct inode *inode, *next;
> > + struct inode cursor;
>
> It seems pretty adventurous to me to just add in a random inode whose
> only fiels that is initialized is i_state. That would need a proper
> analysis and argument that this is safe to do and won't cause trouble
> for any filesystem.
>
> Jan, do you have thoughts on this?
Yeah, I think in the current state where there are several instances of
hand-crafted inode iteration code it is somewhat fragile to use the cursor
approach. I was staying silent because I was hoping Dave Chinner's patches
to clean up inode iteration get to a more ready state. Then either we have
well consolidated inode iteration code so additions like this can be easily
verified for correctness or we could even get as far as removing
sb->s_inodes list altogether as Dave outlined [1] which would nicely deal
with the issue solved here as well. Sadly that patch series seems to have
lost traction. Hopefully we can either revive it or at least scavenge the
nice preparatory cleanups...
Honza
[1] https://lore.kernel.org/all/ZwRvshM65rxXTwxd@dread.disaster.area
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2024-12-04 14:16 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
2024-11-18 11:44 ` [PATCH 01/11] fs: introduce I_CURSOR flag for inode Ye Bin
2024-11-18 11:44 ` [PATCH 02/11] block: use sb_for_each_inodes API Ye Bin
2024-11-18 11:45 ` [PATCH 03/11] fs: " Ye Bin
2024-11-18 11:45 ` [PATCH 04/11] gfs2: " Ye Bin
2024-11-18 11:45 ` [PATCH 05/11] fs: use sb_for_each_inodes_safe API Ye Bin
2024-11-18 11:45 ` [PATCH 06/11] fsnotify: use sb_for_each_inodes API Ye Bin
2024-11-18 11:45 ` [PATCH 07/11] quota: " Ye Bin
2024-11-18 11:45 ` [PATCH 08/11] fs/super.c: " Ye Bin
2024-11-18 11:45 ` [PATCH 09/11] landlock: " Ye Bin
2024-11-18 11:45 ` [PATCH 10/11] fs: fix hungtask due to repeated traversal of inodes list Ye Bin
2024-12-04 11:17 ` Christian Brauner
2024-12-04 14:16 ` Jan Kara [this message]
2024-11-18 11:45 ` [PATCH 11/11] fs: fix potential soft lockup when 'sb->s_inodes' has large number of inodes Ye Bin
2024-12-04 7:04 ` [PATCH 00/11] fix hungtask due to repeated traversal of inodes list yebin (H)
2024-12-04 11:04 ` Mateusz Guzik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241204141620.vgklclfh5guezcvb@quack3 \
--to=jack@suse.cz \
--cc=agruenba@redhat.com \
--cc=amir73il@gmail.com \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=david@fromorbit.com \
--cc=gfs2@lists.linux.dev \
--cc=gnoack@google.com \
--cc=jmorris@namei.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=mic@digikod.net \
--cc=paul@paul-moore.com \
--cc=serge@hallyn.com \
--cc=viro@zeniv.linux.org.uk \
--cc=yebin10@huawei.com \
--cc=yebin@huaweicloud.com \
--cc=zhangxiaoxu5@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox