* [PATCH 01/11] fs: introduce I_CURSOR flag for inode
2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
@ 2024-11-18 11:44 ` Ye Bin
2024-11-18 11:44 ` [PATCH 02/11] block: use sb_for_each_inodes API Ye Bin
` (11 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: Ye Bin @ 2024-11-18 11:44 UTC (permalink / raw)
To: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
linux-security-module
Cc: yebin10, zhangxiaoxu5
From: Ye Bin <yebin10@huawei.com>
This patch introduce I_CURSOR flag for inode and introduce
sb_for_each_inodes_safe/sb_for_each_inodes/sb_for_each_inodes_continue_safe
API for foreach super_block->s_inodes.
Signed-off-by: Ye Bin <yebin10@huawei.com>
---
include/linux/fs.h | 45 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0a152c31d1bf..cf2734e0b2cd 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2473,6 +2473,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
#define I_DONTCACHE (1 << 15)
#define I_SYNC_QUEUED (1 << 16)
#define I_PINNING_NETFS_WB (1 << 17)
+#define I_CURSOR (1 << 18)
#define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
#define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
@@ -3809,4 +3810,48 @@ static inline bool vfs_empty_path(int dfd, const char __user *path)
int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter);
+static inline bool inode_is_cursor(struct inode *inode)
+{
+ return inode->i_state & I_CURSOR;
+}
+
+static inline struct inode *next_sb_inode(struct inode *pos,
+ struct list_head *head)
+{
+ struct inode *inode;
+ struct list_head *start;
+
+ if (!pos)
+ return NULL;
+
+ start = &pos->i_sb_list;
+
+ list_for_each_continue(start, head) {
+ inode = list_entry(start, typeof(*inode), i_sb_list);
+ if (likely(!inode_is_cursor(inode)))
+ return inode;
+ }
+
+ return NULL;
+}
+
+#define sb_for_each_inodes_safe(pos, n, head) \
+ for (pos = list_entry(head, typeof(*pos), i_sb_list), \
+ pos = next_sb_inode(pos, head), \
+ n = next_sb_inode(pos, head); \
+ pos != NULL; \
+ pos = n, n = next_sb_inode(n, head))
+
+#define sb_for_each_inodes(pos, head) \
+ for (pos = list_entry(head, typeof(*pos), i_sb_list), \
+ pos = next_sb_inode(pos, head); \
+ pos != NULL; \
+ pos = next_sb_inode(pos, head))
+
+#define sb_for_each_inodes_continue_safe(pos, n, head) \
+ for (pos = next_sb_inode(pos, head), \
+ n = next_sb_inode(pos, head); \
+ pos != NULL; \
+ pos = n, n = next_sb_inode(n, head))
+
#endif /* _LINUX_FS_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH 02/11] block: use sb_for_each_inodes API
2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
2024-11-18 11:44 ` [PATCH 01/11] fs: introduce I_CURSOR flag for inode Ye Bin
@ 2024-11-18 11:44 ` Ye Bin
2024-11-18 11:45 ` [PATCH 03/11] fs: " Ye Bin
` (10 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: Ye Bin @ 2024-11-18 11:44 UTC (permalink / raw)
To: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
linux-security-module
Cc: yebin10, zhangxiaoxu5
From: Ye Bin <yebin10@huawei.com>
Use sb_for_each_inodes API foreach super_block->s_inodes.
Signed-off-by: Ye Bin <yebin10@huawei.com>
---
block/bdev.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/block/bdev.c b/block/bdev.c
index 738e3c8457e7..b29e2c5c7c6e 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -478,7 +478,7 @@ long nr_blockdev_pages(void)
long ret = 0;
spin_lock(&blockdev_superblock->s_inode_list_lock);
- list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list)
+ sb_for_each_inodes(inode, &blockdev_superblock->s_inodes)
ret += inode->i_mapping->nrpages;
spin_unlock(&blockdev_superblock->s_inode_list_lock);
@@ -1219,7 +1219,7 @@ void sync_bdevs(bool wait)
struct inode *inode, *old_inode = NULL;
spin_lock(&blockdev_superblock->s_inode_list_lock);
- list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list) {
+ sb_for_each_inodes(inode, &blockdev_superblock->s_inodes) {
struct address_space *mapping = inode->i_mapping;
struct block_device *bdev;
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH 03/11] fs: use sb_for_each_inodes API
2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
2024-11-18 11:44 ` [PATCH 01/11] fs: introduce I_CURSOR flag for inode Ye Bin
2024-11-18 11:44 ` [PATCH 02/11] block: use sb_for_each_inodes API Ye Bin
@ 2024-11-18 11:45 ` Ye Bin
2024-11-18 11:45 ` [PATCH 04/11] gfs2: " Ye Bin
` (9 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: Ye Bin @ 2024-11-18 11:45 UTC (permalink / raw)
To: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
linux-security-module
Cc: yebin10, zhangxiaoxu5
From: Ye Bin <yebin10@huawei.com>
Use sb_for_each_inodes API foreach super_block->s_inodes.
Signed-off-by: Ye Bin <yebin10@huawei.com>
---
fs/drop_caches.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index d45ef541d848..43a31fa2b7c2 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -21,7 +21,7 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused)
struct inode *inode, *toput_inode = NULL;
spin_lock(&sb->s_inode_list_lock);
- list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+ sb_for_each_inodes(inode, &sb->s_inodes) {
spin_lock(&inode->i_lock);
/*
* We must skip inodes in unusual state. We may also skip
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH 04/11] gfs2: use sb_for_each_inodes API
2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
` (2 preceding siblings ...)
2024-11-18 11:45 ` [PATCH 03/11] fs: " Ye Bin
@ 2024-11-18 11:45 ` Ye Bin
2024-11-18 11:45 ` [PATCH 05/11] fs: use sb_for_each_inodes_safe API Ye Bin
` (8 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: Ye Bin @ 2024-11-18 11:45 UTC (permalink / raw)
To: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
linux-security-module
Cc: yebin10, zhangxiaoxu5
From: Ye Bin <yebin10@huawei.com>
Use sb_for_each_inodes API foreach super_block->s_inodes.
Signed-off-by: Ye Bin <yebin10@huawei.com>
---
fs/gfs2/ops_fstype.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index e83d293c3614..e9dd94f01b90 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -1742,7 +1742,7 @@ static void gfs2_evict_inodes(struct super_block *sb)
set_bit(SDF_EVICTING, &sdp->sd_flags);
spin_lock(&sb->s_inode_list_lock);
- list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+ sb_for_each_inodes(inode, &sb->s_inodes) {
spin_lock(&inode->i_lock);
if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) &&
!need_resched()) {
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH 05/11] fs: use sb_for_each_inodes_safe API
2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
` (3 preceding siblings ...)
2024-11-18 11:45 ` [PATCH 04/11] gfs2: " Ye Bin
@ 2024-11-18 11:45 ` Ye Bin
2024-11-18 11:45 ` [PATCH 06/11] fsnotify: use sb_for_each_inodes API Ye Bin
` (7 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: Ye Bin @ 2024-11-18 11:45 UTC (permalink / raw)
To: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
linux-security-module
Cc: yebin10, zhangxiaoxu5
From: Ye Bin <yebin10@huawei.com>
Use sb_for_each_inodes_safe API foreach super_block->s_inodes.
Signed-off-by: Ye Bin <yebin10@huawei.com>
---
fs/inode.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index e5a60084a7a9..dc966990bda6 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -861,7 +861,7 @@ void evict_inodes(struct super_block *sb)
again:
spin_lock(&sb->s_inode_list_lock);
- list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
+ sb_for_each_inodes_safe(inode, next, &sb->s_inodes) {
if (atomic_read(&inode->i_count))
continue;
@@ -911,7 +911,7 @@ void invalidate_inodes(struct super_block *sb)
again:
spin_lock(&sb->s_inode_list_lock);
- list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
+ sb_for_each_inodes_safe(inode, next, &sb->s_inodes) {
spin_lock(&inode->i_lock);
if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
spin_unlock(&inode->i_lock);
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH 06/11] fsnotify: use sb_for_each_inodes API
2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
` (4 preceding siblings ...)
2024-11-18 11:45 ` [PATCH 05/11] fs: use sb_for_each_inodes_safe API Ye Bin
@ 2024-11-18 11:45 ` Ye Bin
2024-11-18 11:45 ` [PATCH 07/11] quota: " Ye Bin
` (6 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: Ye Bin @ 2024-11-18 11:45 UTC (permalink / raw)
To: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
linux-security-module
Cc: yebin10, zhangxiaoxu5
From: Ye Bin <yebin10@huawei.com>
Use sb_for_each_inodes API foreach super_block->s_inodes.
Signed-off-by: Ye Bin <yebin10@huawei.com>
---
fs/notify/fsnotify.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 82ae8254c068..11a4eb61b5b2 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -40,7 +40,7 @@ static void fsnotify_unmount_inodes(struct super_block *sb)
struct inode *inode, *iput_inode = NULL;
spin_lock(&sb->s_inode_list_lock);
- list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+ sb_for_each_inodes(inode, &sb->s_inodes) {
/*
* We cannot __iget() an inode in state I_FREEING,
* I_WILL_FREE, or I_NEW which is fine because by that point
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH 07/11] quota: use sb_for_each_inodes API
2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
` (5 preceding siblings ...)
2024-11-18 11:45 ` [PATCH 06/11] fsnotify: use sb_for_each_inodes API Ye Bin
@ 2024-11-18 11:45 ` Ye Bin
2024-11-18 11:45 ` [PATCH 08/11] fs/super.c: " Ye Bin
` (5 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: Ye Bin @ 2024-11-18 11:45 UTC (permalink / raw)
To: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
linux-security-module
Cc: yebin10, zhangxiaoxu5
From: Ye Bin <yebin10@huawei.com>
Use sb_for_each_inodes API foreach super_block->s_inodes.
Signed-off-by: Ye Bin <yebin10@huawei.com>
---
fs/quota/dquot.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index b40410cd39af..414f92bb762c 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -1027,7 +1027,7 @@ static int add_dquot_ref(struct super_block *sb, int type)
int err = 0;
spin_lock(&sb->s_inode_list_lock);
- list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+ sb_for_each_inodes(inode, &sb->s_inodes) {
spin_lock(&inode->i_lock);
if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
!atomic_read(&inode->i_writecount) ||
@@ -1083,7 +1083,7 @@ static void remove_dquot_ref(struct super_block *sb, int type)
#endif
spin_lock(&sb->s_inode_list_lock);
- list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+ sb_for_each_inodes(inode, &sb->s_inodes) {
/*
* We have to scan also I_NEW inodes because they can already
* have quota pointer initialized. Luckily, we need to touch
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH 08/11] fs/super.c: use sb_for_each_inodes API
2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
` (6 preceding siblings ...)
2024-11-18 11:45 ` [PATCH 07/11] quota: " Ye Bin
@ 2024-11-18 11:45 ` Ye Bin
2024-11-18 11:45 ` [PATCH 09/11] landlock: " Ye Bin
` (4 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: Ye Bin @ 2024-11-18 11:45 UTC (permalink / raw)
To: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
linux-security-module
Cc: yebin10, zhangxiaoxu5
From: Ye Bin <yebin10@huawei.com>
Use sb_for_each_inodes API foreach super_block->s_inodes.
Signed-off-by: Ye Bin <yebin10@huawei.com>
---
fs/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/super.c b/fs/super.c
index c9c7223bc2a2..71bd0b41468e 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -658,7 +658,7 @@ void generic_shutdown_super(struct super_block *sb)
struct inode *inode;
spin_lock(&sb->s_inode_list_lock);
- list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+ sb_for_each_inodes(inode, &sb->s_inodes) {
inode->i_op = VFS_PTR_POISON;
inode->i_sb = VFS_PTR_POISON;
inode->i_mapping = VFS_PTR_POISON;
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH 09/11] landlock: use sb_for_each_inodes API
2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
` (7 preceding siblings ...)
2024-11-18 11:45 ` [PATCH 08/11] fs/super.c: " Ye Bin
@ 2024-11-18 11:45 ` Ye Bin
2024-11-18 11:45 ` [PATCH 10/11] fs: fix hungtask due to repeated traversal of inodes list Ye Bin
` (3 subsequent siblings)
12 siblings, 0 replies; 16+ messages in thread
From: Ye Bin @ 2024-11-18 11:45 UTC (permalink / raw)
To: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
linux-security-module
Cc: yebin10, zhangxiaoxu5
From: Ye Bin <yebin10@huawei.com>
Use sb_for_each_inodes API foreach super_block->s_inodes.
Signed-off-by: Ye Bin <yebin10@huawei.com>
---
security/landlock/fs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index dd9a7297ea4e..ccb7951f1404 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -1217,7 +1217,7 @@ static void hook_sb_delete(struct super_block *const sb)
return;
spin_lock(&sb->s_inode_list_lock);
- list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+ sb_for_each_inodes(inode, &sb->s_inodes) {
struct landlock_object *object;
/* Only handles referenced inodes. */
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* [PATCH 10/11] fs: fix hungtask due to repeated traversal of inodes list
2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
` (8 preceding siblings ...)
2024-11-18 11:45 ` [PATCH 09/11] landlock: " Ye Bin
@ 2024-11-18 11:45 ` Ye Bin
2024-12-04 11:17 ` Christian Brauner
2024-11-18 11:45 ` [PATCH 11/11] fs: fix potential soft lockup when 'sb->s_inodes' has large number of inodes Ye Bin
` (2 subsequent siblings)
12 siblings, 1 reply; 16+ messages in thread
From: Ye Bin @ 2024-11-18 11:45 UTC (permalink / raw)
To: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
linux-security-module
Cc: yebin10, zhangxiaoxu5
From: Ye Bin <yebin10@huawei.com>
There's a issue when remove scsi disk, the invalidate_inodes() function
cannot exit for a long time, then trigger hungtask:
INFO: task kworker/56:0:1391396 blocked for more than 122 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Workqueue: events_freezable virtscsi_handle_event [virtio_scsi]
Call Trace:
__schedule+0x33c/0x7f0
schedule+0x46/0xb0
schedule_preempt_disabled+0xa/0x10
__mutex_lock.constprop.0+0x22b/0x490
mutex_lock+0x52/0x70
scsi_scan_target+0x6d/0xf0
virtscsi_handle_event+0x152/0x1a0 [virtio_scsi]
process_one_work+0x1b2/0x350
worker_thread+0x49/0x310
kthread+0xfb/0x140
ret_from_fork+0x1f/0x30
PID: 540499 TASK: ffff9b15e504c080 CPU: 44 COMMAND: "kworker/44:0"
Call trace:
invalidate_inodes at ffffffff8f3b4784
__invalidate_device at ffffffff8f3dfea3
invalidate_partition at ffffffff8f526b49
del_gendisk at ffffffff8f5280fb
sd_remove at ffffffffc0186455 [sd_mod]
__device_release_driver at ffffffff8f738ab2
device_release_driver at ffffffff8f738bc4
bus_remove_device at ffffffff8f737f66
device_del at ffffffff8f73341b
__scsi_remove_device at ffffffff8f780340
scsi_remove_device at ffffffff8f7803a2
virtscsi_handle_event at ffffffffc017204f [virtio_scsi]
process_one_work at ffffffff8f1041f2
worker_thread at ffffffff8f104789
kthread at ffffffff8f109abb
ret_from_fork at ffffffff8f001d6f
As commit 04646aebd30b ("fs: avoid softlockups in s_inodes iterators")
introduces the retry logic. In the problem environment, the 'i_count'
of millions of files is not zero. As a result, the time slice for each
traversal to the matching inode process is almost used up, and then the
traversal is started from scratch. The worst-case scenario is that only
one inode can be processed after each wakeup. Because this process holds
a lock, other processes will be stuck for a long time, causing a series
of problems.
To solve the problem of repeated traversal from the beginning, each time
the CPU needs to be freed, a cursor is inserted into the linked list, and
the traversal continues from the cursor next time.
Fixes: 04646aebd30b ("fs: avoid softlockups in s_inodes iterators")
Signed-off-by: Ye Bin <yebin10@huawei.com>
---
fs/inode.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index dc966990bda6..b78895af8779 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -857,11 +857,16 @@ static void dispose_list(struct list_head *head)
void evict_inodes(struct super_block *sb)
{
struct inode *inode, *next;
+ struct inode cursor;
LIST_HEAD(dispose);
+ cursor.i_state = I_CURSOR;
+ INIT_LIST_HEAD(&cursor.i_sb_list);
+ inode = list_entry(&sb->s_inodes, typeof(*inode), i_sb_list);
+
again:
spin_lock(&sb->s_inode_list_lock);
- sb_for_each_inodes_safe(inode, next, &sb->s_inodes) {
+ sb_for_each_inodes_continue_safe(inode, next, &sb->s_inodes) {
if (atomic_read(&inode->i_count))
continue;
@@ -886,12 +891,16 @@ void evict_inodes(struct super_block *sb)
* bit so we don't livelock.
*/
if (need_resched()) {
+ list_del(&cursor.i_sb_list);
+ list_add(&cursor.i_sb_list, &inode->i_sb_list);
+ inode = &cursor;
spin_unlock(&sb->s_inode_list_lock);
cond_resched();
dispose_list(&dispose);
goto again;
}
}
+ list_del(&cursor.i_sb_list);
spin_unlock(&sb->s_inode_list_lock);
dispose_list(&dispose);
@@ -907,11 +916,16 @@ EXPORT_SYMBOL_GPL(evict_inodes);
void invalidate_inodes(struct super_block *sb)
{
struct inode *inode, *next;
+ struct inode cursor;
LIST_HEAD(dispose);
+ cursor.i_state = I_CURSOR;
+ INIT_LIST_HEAD(&cursor.i_sb_list);
+ inode = list_entry(&sb->s_inodes, typeof(*inode), i_sb_list);
+
again:
spin_lock(&sb->s_inode_list_lock);
- sb_for_each_inodes_safe(inode, next, &sb->s_inodes) {
+ sb_for_each_inodes_continue_safe(inode, next, &sb->s_inodes) {
spin_lock(&inode->i_lock);
if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
spin_unlock(&inode->i_lock);
@@ -927,12 +941,16 @@ void invalidate_inodes(struct super_block *sb)
spin_unlock(&inode->i_lock);
list_add(&inode->i_lru, &dispose);
if (need_resched()) {
+ list_del(&cursor.i_sb_list);
+ list_add(&cursor.i_sb_list, &inode->i_sb_list);
+ inode = &cursor;
spin_unlock(&sb->s_inode_list_lock);
cond_resched();
dispose_list(&dispose);
goto again;
}
}
+ list_del(&cursor.i_sb_list);
spin_unlock(&sb->s_inode_list_lock);
dispose_list(&dispose);
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH 10/11] fs: fix hungtask due to repeated traversal of inodes list
2024-11-18 11:45 ` [PATCH 10/11] fs: fix hungtask due to repeated traversal of inodes list Ye Bin
@ 2024-12-04 11:17 ` Christian Brauner
2024-12-04 14:16 ` Jan Kara
0 siblings, 1 reply; 16+ messages in thread
From: Christian Brauner @ 2024-12-04 11:17 UTC (permalink / raw)
To: Ye Bin
Cc: viro, jack, linux-fsdevel, axboe, linux-block, agruenba, gfs2,
amir73il, mic, gnoack, paul, jmorris, serge,
linux-security-module, yebin10, zhangxiaoxu5
On Mon, Nov 18, 2024 at 07:45:07PM +0800, Ye Bin wrote:
> From: Ye Bin <yebin10@huawei.com>
>
> There's a issue when remove scsi disk, the invalidate_inodes() function
> cannot exit for a long time, then trigger hungtask:
> INFO: task kworker/56:0:1391396 blocked for more than 122 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Workqueue: events_freezable virtscsi_handle_event [virtio_scsi]
> Call Trace:
> __schedule+0x33c/0x7f0
> schedule+0x46/0xb0
> schedule_preempt_disabled+0xa/0x10
> __mutex_lock.constprop.0+0x22b/0x490
> mutex_lock+0x52/0x70
> scsi_scan_target+0x6d/0xf0
> virtscsi_handle_event+0x152/0x1a0 [virtio_scsi]
> process_one_work+0x1b2/0x350
> worker_thread+0x49/0x310
> kthread+0xfb/0x140
> ret_from_fork+0x1f/0x30
>
> PID: 540499 TASK: ffff9b15e504c080 CPU: 44 COMMAND: "kworker/44:0"
> Call trace:
> invalidate_inodes at ffffffff8f3b4784
> __invalidate_device at ffffffff8f3dfea3
> invalidate_partition at ffffffff8f526b49
> del_gendisk at ffffffff8f5280fb
> sd_remove at ffffffffc0186455 [sd_mod]
> __device_release_driver at ffffffff8f738ab2
> device_release_driver at ffffffff8f738bc4
> bus_remove_device at ffffffff8f737f66
> device_del at ffffffff8f73341b
> __scsi_remove_device at ffffffff8f780340
> scsi_remove_device at ffffffff8f7803a2
> virtscsi_handle_event at ffffffffc017204f [virtio_scsi]
> process_one_work at ffffffff8f1041f2
> worker_thread at ffffffff8f104789
> kthread at ffffffff8f109abb
> ret_from_fork at ffffffff8f001d6f
>
> As commit 04646aebd30b ("fs: avoid softlockups in s_inodes iterators")
> introduces the retry logic. In the problem environment, the 'i_count'
> of millions of files is not zero. As a result, the time slice for each
> traversal to the matching inode process is almost used up, and then the
> traversal is started from scratch. The worst-case scenario is that only
> one inode can be processed after each wakeup. Because this process holds
> a lock, other processes will be stuck for a long time, causing a series
> of problems.
> To solve the problem of repeated traversal from the beginning, each time
> the CPU needs to be freed, a cursor is inserted into the linked list, and
> the traversal continues from the cursor next time.
>
> Fixes: 04646aebd30b ("fs: avoid softlockups in s_inodes iterators")
> Signed-off-by: Ye Bin <yebin10@huawei.com>
> ---
> fs/inode.c | 22 ++++++++++++++++++++--
> 1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/fs/inode.c b/fs/inode.c
> index dc966990bda6..b78895af8779 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -857,11 +857,16 @@ static void dispose_list(struct list_head *head)
> void evict_inodes(struct super_block *sb)
> {
> struct inode *inode, *next;
> + struct inode cursor;
It seems pretty adventurous to me to just add in a random inode whose
only fiels that is initialized is i_state. That would need a proper
analysis and argument that this is safe to do and won't cause trouble
for any filesystem.
Jan, do you have thoughts on this?
> LIST_HEAD(dispose);
>
> + cursor.i_state = I_CURSOR;
> + INIT_LIST_HEAD(&cursor.i_sb_list);
> + inode = list_entry(&sb->s_inodes, typeof(*inode), i_sb_list);
> +
> again:
> spin_lock(&sb->s_inode_list_lock);
> - sb_for_each_inodes_safe(inode, next, &sb->s_inodes) {
> + sb_for_each_inodes_continue_safe(inode, next, &sb->s_inodes) {
> if (atomic_read(&inode->i_count))
> continue;
>
> @@ -886,12 +891,16 @@ void evict_inodes(struct super_block *sb)
> * bit so we don't livelock.
> */
> if (need_resched()) {
> + list_del(&cursor.i_sb_list);
> + list_add(&cursor.i_sb_list, &inode->i_sb_list);
> + inode = &cursor;
> spin_unlock(&sb->s_inode_list_lock);
> cond_resched();
> dispose_list(&dispose);
> goto again;
> }
> }
> + list_del(&cursor.i_sb_list);
> spin_unlock(&sb->s_inode_list_lock);
>
> dispose_list(&dispose);
> @@ -907,11 +916,16 @@ EXPORT_SYMBOL_GPL(evict_inodes);
> void invalidate_inodes(struct super_block *sb)
> {
> struct inode *inode, *next;
> + struct inode cursor;
> LIST_HEAD(dispose);
>
> + cursor.i_state = I_CURSOR;
> + INIT_LIST_HEAD(&cursor.i_sb_list);
> + inode = list_entry(&sb->s_inodes, typeof(*inode), i_sb_list);
> +
> again:
> spin_lock(&sb->s_inode_list_lock);
> - sb_for_each_inodes_safe(inode, next, &sb->s_inodes) {
> + sb_for_each_inodes_continue_safe(inode, next, &sb->s_inodes) {
> spin_lock(&inode->i_lock);
> if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
> spin_unlock(&inode->i_lock);
> @@ -927,12 +941,16 @@ void invalidate_inodes(struct super_block *sb)
> spin_unlock(&inode->i_lock);
> list_add(&inode->i_lru, &dispose);
> if (need_resched()) {
> + list_del(&cursor.i_sb_list);
> + list_add(&cursor.i_sb_list, &inode->i_sb_list);
> + inode = &cursor;
> spin_unlock(&sb->s_inode_list_lock);
> cond_resched();
> dispose_list(&dispose);
> goto again;
> }
> }
> + list_del(&cursor.i_sb_list);
> spin_unlock(&sb->s_inode_list_lock);
>
> dispose_list(&dispose);
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 10/11] fs: fix hungtask due to repeated traversal of inodes list
2024-12-04 11:17 ` Christian Brauner
@ 2024-12-04 14:16 ` Jan Kara
0 siblings, 0 replies; 16+ messages in thread
From: Jan Kara @ 2024-12-04 14:16 UTC (permalink / raw)
To: Christian Brauner
Cc: Ye Bin, viro, jack, linux-fsdevel, axboe, linux-block, agruenba,
gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
linux-security-module, yebin10, zhangxiaoxu5, Dave Chinner
On Wed 04-12-24 12:17:49, Christian Brauner wrote:
> On Mon, Nov 18, 2024 at 07:45:07PM +0800, Ye Bin wrote:
> > From: Ye Bin <yebin10@huawei.com>
> >
> > There's a issue when remove scsi disk, the invalidate_inodes() function
> > cannot exit for a long time, then trigger hungtask:
> > INFO: task kworker/56:0:1391396 blocked for more than 122 seconds.
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Workqueue: events_freezable virtscsi_handle_event [virtio_scsi]
> > Call Trace:
> > __schedule+0x33c/0x7f0
> > schedule+0x46/0xb0
> > schedule_preempt_disabled+0xa/0x10
> > __mutex_lock.constprop.0+0x22b/0x490
> > mutex_lock+0x52/0x70
> > scsi_scan_target+0x6d/0xf0
> > virtscsi_handle_event+0x152/0x1a0 [virtio_scsi]
> > process_one_work+0x1b2/0x350
> > worker_thread+0x49/0x310
> > kthread+0xfb/0x140
> > ret_from_fork+0x1f/0x30
> >
> > PID: 540499 TASK: ffff9b15e504c080 CPU: 44 COMMAND: "kworker/44:0"
> > Call trace:
> > invalidate_inodes at ffffffff8f3b4784
> > __invalidate_device at ffffffff8f3dfea3
> > invalidate_partition at ffffffff8f526b49
> > del_gendisk at ffffffff8f5280fb
> > sd_remove at ffffffffc0186455 [sd_mod]
> > __device_release_driver at ffffffff8f738ab2
> > device_release_driver at ffffffff8f738bc4
> > bus_remove_device at ffffffff8f737f66
> > device_del at ffffffff8f73341b
> > __scsi_remove_device at ffffffff8f780340
> > scsi_remove_device at ffffffff8f7803a2
> > virtscsi_handle_event at ffffffffc017204f [virtio_scsi]
> > process_one_work at ffffffff8f1041f2
> > worker_thread at ffffffff8f104789
> > kthread at ffffffff8f109abb
> > ret_from_fork at ffffffff8f001d6f
> >
> > As commit 04646aebd30b ("fs: avoid softlockups in s_inodes iterators")
> > introduces the retry logic. In the problem environment, the 'i_count'
> > of millions of files is not zero. As a result, the time slice for each
> > traversal to the matching inode process is almost used up, and then the
> > traversal is started from scratch. The worst-case scenario is that only
> > one inode can be processed after each wakeup. Because this process holds
> > a lock, other processes will be stuck for a long time, causing a series
> > of problems.
> > To solve the problem of repeated traversal from the beginning, each time
> > the CPU needs to be freed, a cursor is inserted into the linked list, and
> > the traversal continues from the cursor next time.
> >
> > Fixes: 04646aebd30b ("fs: avoid softlockups in s_inodes iterators")
> > Signed-off-by: Ye Bin <yebin10@huawei.com>
> > ---
> > fs/inode.c | 22 ++++++++++++++++++++--
> > 1 file changed, 20 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/inode.c b/fs/inode.c
> > index dc966990bda6..b78895af8779 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -857,11 +857,16 @@ static void dispose_list(struct list_head *head)
> > void evict_inodes(struct super_block *sb)
> > {
> > struct inode *inode, *next;
> > + struct inode cursor;
>
> It seems pretty adventurous to me to just add in a random inode whose
> only fiels that is initialized is i_state. That would need a proper
> analysis and argument that this is safe to do and won't cause trouble
> for any filesystem.
>
> Jan, do you have thoughts on this?
Yeah, I think in the current state where there are several instances of
hand-crafted inode iteration code it is somewhat fragile to use the cursor
approach. I was staying silent because I was hoping Dave Chinner's patches
to clean up inode iteration get to a more ready state. Then either we have
well consolidated inode iteration code so additions like this can be easily
verified for correctness or we could even get as far as removing
sb->s_inodes list altogether as Dave outlined [1] which would nicely deal
with the issue solved here as well. Sadly that patch series seems to have
lost traction. Hopefully we can either revive it or at least scavenge the
nice preparatory cleanups...
Honza
[1] https://lore.kernel.org/all/ZwRvshM65rxXTwxd@dread.disaster.area
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 11/11] fs: fix potential soft lockup when 'sb->s_inodes' has large number of inodes
2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
` (9 preceding siblings ...)
2024-11-18 11:45 ` [PATCH 10/11] fs: fix hungtask due to repeated traversal of inodes list Ye Bin
@ 2024-11-18 11:45 ` Ye Bin
2024-12-04 7:04 ` [PATCH 00/11] fix hungtask due to repeated traversal of inodes list yebin (H)
2024-12-04 11:04 ` Mateusz Guzik
12 siblings, 0 replies; 16+ messages in thread
From: Ye Bin @ 2024-11-18 11:45 UTC (permalink / raw)
To: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
linux-security-module
Cc: yebin10, zhangxiaoxu5
From: Ye Bin <yebin10@huawei.com>
If the memory is sufficient, a large number of inodes that do not
meet the conditions may exist in the 'sb->s_inodes' list when
evict_inodes()/invalidate_inodes() traverse the 'sb->s_inodes' list.
Then it maybe trigger soft lockup.
To solve potential soft lockup, move need_resched() check from tail
to head when traverse the 'sb->s_inodes' list.
Signed-off-by: Ye Bin <yebin10@huawei.com>
---
fs/inode.c | 49 +++++++++++++++++++++++++------------------------
1 file changed, 25 insertions(+), 24 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index b78895af8779..e865fc1f5a95 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -867,6 +867,21 @@ void evict_inodes(struct super_block *sb)
again:
spin_lock(&sb->s_inode_list_lock);
sb_for_each_inodes_continue_safe(inode, next, &sb->s_inodes) {
+ /*
+ * We can have a ton of inodes to evict at unmount time given
+ * enough memory, check to see if we need to go to sleep for a
+ * bit so we don't livelock.
+ */
+ if (need_resched()) {
+ list_del(&cursor.i_sb_list);
+ list_add_tail(&cursor.i_sb_list, &inode->i_sb_list);
+ inode = &cursor;
+ spin_unlock(&sb->s_inode_list_lock);
+ cond_resched();
+ dispose_list(&dispose);
+ goto again;
+ }
+
if (atomic_read(&inode->i_count))
continue;
@@ -884,21 +899,6 @@ void evict_inodes(struct super_block *sb)
inode_lru_list_del(inode);
spin_unlock(&inode->i_lock);
list_add(&inode->i_lru, &dispose);
-
- /*
- * We can have a ton of inodes to evict at unmount time given
- * enough memory, check to see if we need to go to sleep for a
- * bit so we don't livelock.
- */
- if (need_resched()) {
- list_del(&cursor.i_sb_list);
- list_add(&cursor.i_sb_list, &inode->i_sb_list);
- inode = &cursor;
- spin_unlock(&sb->s_inode_list_lock);
- cond_resched();
- dispose_list(&dispose);
- goto again;
- }
}
list_del(&cursor.i_sb_list);
spin_unlock(&sb->s_inode_list_lock);
@@ -926,6 +926,16 @@ void invalidate_inodes(struct super_block *sb)
again:
spin_lock(&sb->s_inode_list_lock);
sb_for_each_inodes_continue_safe(inode, next, &sb->s_inodes) {
+ if (need_resched()) {
+ list_del(&cursor.i_sb_list);
+ list_add_tail(&cursor.i_sb_list, &inode->i_sb_list);
+ inode = &cursor;
+ spin_unlock(&sb->s_inode_list_lock);
+ cond_resched();
+ dispose_list(&dispose);
+ goto again;
+ }
+
spin_lock(&inode->i_lock);
if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
spin_unlock(&inode->i_lock);
@@ -940,15 +950,6 @@ void invalidate_inodes(struct super_block *sb)
inode_lru_list_del(inode);
spin_unlock(&inode->i_lock);
list_add(&inode->i_lru, &dispose);
- if (need_resched()) {
- list_del(&cursor.i_sb_list);
- list_add(&cursor.i_sb_list, &inode->i_sb_list);
- inode = &cursor;
- spin_unlock(&sb->s_inode_list_lock);
- cond_resched();
- dispose_list(&dispose);
- goto again;
- }
}
list_del(&cursor.i_sb_list);
spin_unlock(&sb->s_inode_list_lock);
--
2.34.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [PATCH 00/11] fix hungtask due to repeated traversal of inodes list
2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
` (10 preceding siblings ...)
2024-11-18 11:45 ` [PATCH 11/11] fs: fix potential soft lockup when 'sb->s_inodes' has large number of inodes Ye Bin
@ 2024-12-04 7:04 ` yebin (H)
2024-12-04 11:04 ` Mateusz Guzik
12 siblings, 0 replies; 16+ messages in thread
From: yebin (H) @ 2024-12-04 7:04 UTC (permalink / raw)
To: Ye Bin, viro, brauner, jack, linux-fsdevel, axboe, linux-block,
agruenba, gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
linux-security-module
Cc: zhangxiaoxu5
Friendly ping...
On 2024/11/18 19:44, Ye Bin wrote:
> From: Ye Bin <yebin10@huawei.com>
>
> As commit 04646aebd30b ("fs: avoid softlockups in s_inodes iterators")
> introduces the retry logic. In the problem environment, the 'i_count'
> of millions of files is not zero. As a result, the time slice for each
> traversal to the matching inode process is almost used up, and then the
> traversal is started from scratch. The worst-case scenario is that only
> one inode can be processed after each wakeup. Because this process holds
> a lock, other processes will be stuck for a long time, causing a series
> of problems.
> To solve the problem of repeated traversal from the beginning, each time
> the CPU needs to be freed, a cursor is inserted into the linked list, and
> the traversal continues from the cursor next time.
>
> Ye Bin (11):
> fs: introduce I_CURSOR flag for inode
> block: use sb_for_each_inodes API
> fs: use sb_for_each_inodes API
> gfs2: use sb_for_each_inodes API
> fs: use sb_for_each_inodes_safe API
> fsnotify: use sb_for_each_inodes API
> quota: use sb_for_each_inodes API
> fs/super.c: use sb_for_each_inodes API
> landlock: use sb_for_each_inodes API
> fs: fix hungtask due to repeated traversal of inodes list
> fs: fix potential soft lockup when 'sb->s_inodes' has large number of
> inodes
>
> block/bdev.c | 4 +--
> fs/drop_caches.c | 2 +-
> fs/gfs2/ops_fstype.c | 2 +-
> fs/inode.c | 59 ++++++++++++++++++++++++++++--------------
> fs/notify/fsnotify.c | 2 +-
> fs/quota/dquot.c | 4 +--
> fs/super.c | 2 +-
> include/linux/fs.h | 45 ++++++++++++++++++++++++++++++++
> security/landlock/fs.c | 2 +-
> 9 files changed, 93 insertions(+), 29 deletions(-)
>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH 00/11] fix hungtask due to repeated traversal of inodes list
2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
` (11 preceding siblings ...)
2024-12-04 7:04 ` [PATCH 00/11] fix hungtask due to repeated traversal of inodes list yebin (H)
@ 2024-12-04 11:04 ` Mateusz Guzik
12 siblings, 0 replies; 16+ messages in thread
From: Mateusz Guzik @ 2024-12-04 11:04 UTC (permalink / raw)
To: Ye Bin
Cc: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
linux-security-module, yebin10, zhangxiaoxu5
On Mon, Nov 18, 2024 at 07:44:57PM +0800, Ye Bin wrote:
> From: Ye Bin <yebin10@huawei.com>
>
> As commit 04646aebd30b ("fs: avoid softlockups in s_inodes iterators")
> introduces the retry logic. In the problem environment, the 'i_count'
> of millions of files is not zero. As a result, the time slice for each
> traversal to the matching inode process is almost used up, and then the
> traversal is started from scratch. The worst-case scenario is that only
> one inode can be processed after each wakeup. Because this process holds
> a lock, other processes will be stuck for a long time, causing a series
> of problems.
> To solve the problem of repeated traversal from the beginning, each time
> the CPU needs to be freed, a cursor is inserted into the linked list, and
> the traversal continues from the cursor next time.
>
I'm surprised this was not sorted out eons ago.
My non-maintainer $0,03 is as follows:
Insertion of a marker is indeed the idiomatic way to fix this, but I
disagree with the specific implementation.
All cond_resched() handling should be moved into the magic iteration
machinery, instead of random consumers open-coding it.
^ permalink raw reply [flat|nested] 16+ messages in thread