[PATCH 00/11] fix hungtask due to repeated traversal of inodes list

linux-security-module.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/11] fix hungtask due to repeated traversal of inodes list
@ 2024-11-18 11:44 Ye Bin
  2024-11-18 11:44 ` [PATCH 01/11] fs: introduce I_CURSOR flag for inode Ye Bin
                   ` (12 more replies)
  0 siblings, 13 replies; 16+ messages in thread
From: Ye Bin @ 2024-11-18 11:44 UTC (permalink / raw)
  To: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
	gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
	linux-security-module
  Cc: yebin10, zhangxiaoxu5

From: Ye Bin <yebin10@huawei.com>

As commit 04646aebd30b ("fs: avoid softlockups in s_inodes iterators")
introduces the retry logic. In the problem environment, the 'i_count'
of millions of files is not zero. As a result, the time slice for each
traversal to the matching inode process is almost used up, and then the
traversal is started from scratch. The worst-case scenario is that only
one inode can be processed after each wakeup. Because this process holds
a lock, other processes will be stuck for a long time, causing a series
of problems.
To solve the problem of repeated traversal from the beginning, each time
the CPU needs to be freed, a cursor is inserted into the linked list, and
the traversal continues from the cursor next time.

Ye Bin (11):
  fs: introduce I_CURSOR flag for inode
  block: use sb_for_each_inodes API
  fs: use sb_for_each_inodes API
  gfs2: use sb_for_each_inodes API
  fs: use sb_for_each_inodes_safe API
  fsnotify: use sb_for_each_inodes API
  quota: use sb_for_each_inodes API
  fs/super.c: use sb_for_each_inodes API
  landlock: use sb_for_each_inodes API
  fs: fix hungtask due to repeated traversal of inodes list
  fs: fix potential soft lockup when 'sb->s_inodes' has large number of
    inodes

 block/bdev.c           |  4 +--
 fs/drop_caches.c       |  2 +-
 fs/gfs2/ops_fstype.c   |  2 +-
 fs/inode.c             | 59 ++++++++++++++++++++++++++++--------------
 fs/notify/fsnotify.c   |  2 +-
 fs/quota/dquot.c       |  4 +--
 fs/super.c             |  2 +-
 include/linux/fs.h     | 45 ++++++++++++++++++++++++++++++++
 security/landlock/fs.c |  2 +-
 9 files changed, 93 insertions(+), 29 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 01/11] fs: introduce I_CURSOR flag for inode
  2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
@ 2024-11-18 11:44 ` Ye Bin
  2024-11-18 11:44 ` [PATCH 02/11] block: use sb_for_each_inodes API Ye Bin
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Ye Bin @ 2024-11-18 11:44 UTC (permalink / raw)
  To: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
	gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
	linux-security-module
  Cc: yebin10, zhangxiaoxu5

From: Ye Bin <yebin10@huawei.com>

This patch introduce I_CURSOR flag for inode and introduce
sb_for_each_inodes_safe/sb_for_each_inodes/sb_for_each_inodes_continue_safe
API for foreach super_block->s_inodes.

Signed-off-by: Ye Bin <yebin10@huawei.com>
---
 include/linux/fs.h | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0a152c31d1bf..cf2734e0b2cd 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2473,6 +2473,7 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
 #define I_DONTCACHE		(1 << 15)
 #define I_SYNC_QUEUED		(1 << 16)
 #define I_PINNING_NETFS_WB	(1 << 17)
+#define I_CURSOR		(1 << 18)
 
 #define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC)
 #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
@@ -3809,4 +3810,48 @@ static inline bool vfs_empty_path(int dfd, const char __user *path)
 
 int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter);
 
+static inline bool inode_is_cursor(struct inode *inode)
+{
+	return inode->i_state & I_CURSOR;
+}
+
+static inline struct inode *next_sb_inode(struct inode *pos,
+					  struct list_head *head)
+{
+	struct inode *inode;
+	struct list_head *start;
+
+	if (!pos)
+		return NULL;
+
+	start = &pos->i_sb_list;
+
+	list_for_each_continue(start, head) {
+		inode = list_entry(start, typeof(*inode), i_sb_list);
+		if (likely(!inode_is_cursor(inode)))
+			return inode;
+	}
+
+	return NULL;
+}
+
+#define sb_for_each_inodes_safe(pos, n, head)                 \
+	for (pos = list_entry(head, typeof(*pos), i_sb_list), \
+		pos = next_sb_inode(pos, head),               \
+		n = next_sb_inode(pos, head);                 \
+	     pos != NULL;                                     \
+	     pos = n, n = next_sb_inode(n, head))
+
+#define sb_for_each_inodes(pos, head)                          \
+	for (pos = list_entry(head, typeof(*pos), i_sb_list), \
+		pos = next_sb_inode(pos, head);                \
+	     pos != NULL;                                     \
+	     pos = next_sb_inode(pos, head))
+
+#define sb_for_each_inodes_continue_safe(pos, n, head)        \
+	for (pos = next_sb_inode(pos, head),                  \
+		n = next_sb_inode(pos, head);                 \
+	     pos != NULL;                                     \
+	     pos = n, n = next_sb_inode(n, head))
+
 #endif /* _LINUX_FS_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 02/11] block: use sb_for_each_inodes API
  2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
  2024-11-18 11:44 ` [PATCH 01/11] fs: introduce I_CURSOR flag for inode Ye Bin
@ 2024-11-18 11:44 ` Ye Bin
  2024-11-18 11:45 ` [PATCH 03/11] fs: " Ye Bin
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Ye Bin @ 2024-11-18 11:44 UTC (permalink / raw)
  To: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
	gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
	linux-security-module
  Cc: yebin10, zhangxiaoxu5

From: Ye Bin <yebin10@huawei.com>

Use sb_for_each_inodes API foreach super_block->s_inodes.

Signed-off-by: Ye Bin <yebin10@huawei.com>
---
 block/bdev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 738e3c8457e7..b29e2c5c7c6e 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -478,7 +478,7 @@ long nr_blockdev_pages(void)
 	long ret = 0;
 
 	spin_lock(&blockdev_superblock->s_inode_list_lock);
-	list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list)
+	sb_for_each_inodes(inode, &blockdev_superblock->s_inodes)
 		ret += inode->i_mapping->nrpages;
 	spin_unlock(&blockdev_superblock->s_inode_list_lock);
 
@@ -1219,7 +1219,7 @@ void sync_bdevs(bool wait)
 	struct inode *inode, *old_inode = NULL;
 
 	spin_lock(&blockdev_superblock->s_inode_list_lock);
-	list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list) {
+	sb_for_each_inodes(inode, &blockdev_superblock->s_inodes) {
 		struct address_space *mapping = inode->i_mapping;
 		struct block_device *bdev;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 03/11] fs: use sb_for_each_inodes API
  2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
  2024-11-18 11:44 ` [PATCH 01/11] fs: introduce I_CURSOR flag for inode Ye Bin
  2024-11-18 11:44 ` [PATCH 02/11] block: use sb_for_each_inodes API Ye Bin
@ 2024-11-18 11:45 ` Ye Bin
  2024-11-18 11:45 ` [PATCH 04/11] gfs2: " Ye Bin
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Ye Bin @ 2024-11-18 11:45 UTC (permalink / raw)
  To: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
	gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
	linux-security-module
  Cc: yebin10, zhangxiaoxu5

From: Ye Bin <yebin10@huawei.com>

Use sb_for_each_inodes API foreach super_block->s_inodes.

Signed-off-by: Ye Bin <yebin10@huawei.com>
---
 fs/drop_caches.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index d45ef541d848..43a31fa2b7c2 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -21,7 +21,7 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused)
 	struct inode *inode, *toput_inode = NULL;
 
 	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+	sb_for_each_inodes(inode, &sb->s_inodes) {
 		spin_lock(&inode->i_lock);
 		/*
 		 * We must skip inodes in unusual state. We may also skip
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 04/11] gfs2: use sb_for_each_inodes API
  2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
                   ` (2 preceding siblings ...)
  2024-11-18 11:45 ` [PATCH 03/11] fs: " Ye Bin
@ 2024-11-18 11:45 ` Ye Bin
  2024-11-18 11:45 ` [PATCH 05/11] fs: use sb_for_each_inodes_safe API Ye Bin
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Ye Bin @ 2024-11-18 11:45 UTC (permalink / raw)
  To: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
	gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
	linux-security-module
  Cc: yebin10, zhangxiaoxu5

From: Ye Bin <yebin10@huawei.com>

Use sb_for_each_inodes API foreach super_block->s_inodes.

Signed-off-by: Ye Bin <yebin10@huawei.com>
---
 fs/gfs2/ops_fstype.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index e83d293c3614..e9dd94f01b90 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -1742,7 +1742,7 @@ static void gfs2_evict_inodes(struct super_block *sb)
 	set_bit(SDF_EVICTING, &sdp->sd_flags);
 
 	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+	sb_for_each_inodes(inode, &sb->s_inodes) {
 		spin_lock(&inode->i_lock);
 		if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) &&
 		    !need_resched()) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 05/11] fs: use sb_for_each_inodes_safe API
  2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
                   ` (3 preceding siblings ...)
  2024-11-18 11:45 ` [PATCH 04/11] gfs2: " Ye Bin
@ 2024-11-18 11:45 ` Ye Bin
  2024-11-18 11:45 ` [PATCH 06/11] fsnotify: use sb_for_each_inodes API Ye Bin
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Ye Bin @ 2024-11-18 11:45 UTC (permalink / raw)
  To: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
	gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
	linux-security-module
  Cc: yebin10, zhangxiaoxu5

From: Ye Bin <yebin10@huawei.com>

Use sb_for_each_inodes_safe API foreach super_block->s_inodes.

Signed-off-by: Ye Bin <yebin10@huawei.com>
---
 fs/inode.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index e5a60084a7a9..dc966990bda6 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -861,7 +861,7 @@ void evict_inodes(struct super_block *sb)
 
 again:
 	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
+	sb_for_each_inodes_safe(inode, next, &sb->s_inodes) {
 		if (atomic_read(&inode->i_count))
 			continue;
 
@@ -911,7 +911,7 @@ void invalidate_inodes(struct super_block *sb)
 
 again:
 	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
+	sb_for_each_inodes_safe(inode, next, &sb->s_inodes) {
 		spin_lock(&inode->i_lock);
 		if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
 			spin_unlock(&inode->i_lock);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 06/11] fsnotify: use sb_for_each_inodes API
  2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
                   ` (4 preceding siblings ...)
  2024-11-18 11:45 ` [PATCH 05/11] fs: use sb_for_each_inodes_safe API Ye Bin
@ 2024-11-18 11:45 ` Ye Bin
  2024-11-18 11:45 ` [PATCH 07/11] quota: " Ye Bin
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Ye Bin @ 2024-11-18 11:45 UTC (permalink / raw)
  To: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
	gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
	linux-security-module
  Cc: yebin10, zhangxiaoxu5

From: Ye Bin <yebin10@huawei.com>

Use sb_for_each_inodes API foreach super_block->s_inodes.

Signed-off-by: Ye Bin <yebin10@huawei.com>
---
 fs/notify/fsnotify.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 82ae8254c068..11a4eb61b5b2 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -40,7 +40,7 @@ static void fsnotify_unmount_inodes(struct super_block *sb)
 	struct inode *inode, *iput_inode = NULL;
 
 	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+	sb_for_each_inodes(inode, &sb->s_inodes) {
 		/*
 		 * We cannot __iget() an inode in state I_FREEING,
 		 * I_WILL_FREE, or I_NEW which is fine because by that point
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 07/11] quota: use sb_for_each_inodes API
  2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
                   ` (5 preceding siblings ...)
  2024-11-18 11:45 ` [PATCH 06/11] fsnotify: use sb_for_each_inodes API Ye Bin
@ 2024-11-18 11:45 ` Ye Bin
  2024-11-18 11:45 ` [PATCH 08/11] fs/super.c: " Ye Bin
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Ye Bin @ 2024-11-18 11:45 UTC (permalink / raw)
  To: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
	gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
	linux-security-module
  Cc: yebin10, zhangxiaoxu5

From: Ye Bin <yebin10@huawei.com>

Use sb_for_each_inodes API foreach super_block->s_inodes.

Signed-off-by: Ye Bin <yebin10@huawei.com>
---
 fs/quota/dquot.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index b40410cd39af..414f92bb762c 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -1027,7 +1027,7 @@ static int add_dquot_ref(struct super_block *sb, int type)
 	int err = 0;
 
 	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+	sb_for_each_inodes(inode, &sb->s_inodes) {
 		spin_lock(&inode->i_lock);
 		if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
 		    !atomic_read(&inode->i_writecount) ||
@@ -1083,7 +1083,7 @@ static void remove_dquot_ref(struct super_block *sb, int type)
 #endif
 
 	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+	sb_for_each_inodes(inode, &sb->s_inodes) {
 		/*
 		 *  We have to scan also I_NEW inodes because they can already
 		 *  have quota pointer initialized. Luckily, we need to touch
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 08/11] fs/super.c: use sb_for_each_inodes API
  2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
                   ` (6 preceding siblings ...)
  2024-11-18 11:45 ` [PATCH 07/11] quota: " Ye Bin
@ 2024-11-18 11:45 ` Ye Bin
  2024-11-18 11:45 ` [PATCH 09/11] landlock: " Ye Bin
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Ye Bin @ 2024-11-18 11:45 UTC (permalink / raw)
  To: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
	gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
	linux-security-module
  Cc: yebin10, zhangxiaoxu5

From: Ye Bin <yebin10@huawei.com>

Use sb_for_each_inodes API foreach super_block->s_inodes.

Signed-off-by: Ye Bin <yebin10@huawei.com>
---
 fs/super.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/super.c b/fs/super.c
index c9c7223bc2a2..71bd0b41468e 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -658,7 +658,7 @@ void generic_shutdown_super(struct super_block *sb)
 			struct inode *inode;
 
 			spin_lock(&sb->s_inode_list_lock);
-			list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+			sb_for_each_inodes(inode, &sb->s_inodes) {
 				inode->i_op = VFS_PTR_POISON;
 				inode->i_sb = VFS_PTR_POISON;
 				inode->i_mapping = VFS_PTR_POISON;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 09/11] landlock: use sb_for_each_inodes API
  2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
                   ` (7 preceding siblings ...)
  2024-11-18 11:45 ` [PATCH 08/11] fs/super.c: " Ye Bin
@ 2024-11-18 11:45 ` Ye Bin
  2024-11-18 11:45 ` [PATCH 10/11] fs: fix hungtask due to repeated traversal of inodes list Ye Bin
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 16+ messages in thread
From: Ye Bin @ 2024-11-18 11:45 UTC (permalink / raw)
  To: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
	gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
	linux-security-module
  Cc: yebin10, zhangxiaoxu5

From: Ye Bin <yebin10@huawei.com>

Use sb_for_each_inodes API foreach super_block->s_inodes.

Signed-off-by: Ye Bin <yebin10@huawei.com>
---
 security/landlock/fs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index dd9a7297ea4e..ccb7951f1404 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -1217,7 +1217,7 @@ static void hook_sb_delete(struct super_block *const sb)
 		return;
 
 	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+	sb_for_each_inodes(inode, &sb->s_inodes) {
 		struct landlock_object *object;
 
 		/* Only handles referenced inodes. */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 10/11] fs: fix hungtask due to repeated traversal of inodes list
  2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
                   ` (8 preceding siblings ...)
  2024-11-18 11:45 ` [PATCH 09/11] landlock: " Ye Bin
@ 2024-11-18 11:45 ` Ye Bin
  2024-12-04 11:17   ` Christian Brauner
  2024-11-18 11:45 ` [PATCH 11/11] fs: fix potential soft lockup when 'sb->s_inodes' has large number of inodes Ye Bin
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 16+ messages in thread
From: Ye Bin @ 2024-11-18 11:45 UTC (permalink / raw)
  To: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
	gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
	linux-security-module
  Cc: yebin10, zhangxiaoxu5

From: Ye Bin <yebin10@huawei.com>

There's a issue when remove scsi disk, the invalidate_inodes() function
cannot exit for a long time, then trigger hungtask:
INFO: task kworker/56:0:1391396 blocked for more than 122 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Workqueue: events_freezable virtscsi_handle_event [virtio_scsi]
Call Trace:
 __schedule+0x33c/0x7f0
 schedule+0x46/0xb0
 schedule_preempt_disabled+0xa/0x10
 __mutex_lock.constprop.0+0x22b/0x490
 mutex_lock+0x52/0x70
 scsi_scan_target+0x6d/0xf0
 virtscsi_handle_event+0x152/0x1a0 [virtio_scsi]
 process_one_work+0x1b2/0x350
 worker_thread+0x49/0x310
 kthread+0xfb/0x140
 ret_from_fork+0x1f/0x30

PID: 540499  TASK: ffff9b15e504c080  CPU: 44  COMMAND: "kworker/44:0"
Call trace:
 invalidate_inodes at ffffffff8f3b4784
 __invalidate_device at ffffffff8f3dfea3
 invalidate_partition at ffffffff8f526b49
 del_gendisk at ffffffff8f5280fb
 sd_remove at ffffffffc0186455 [sd_mod]
 __device_release_driver at ffffffff8f738ab2
 device_release_driver at ffffffff8f738bc4
 bus_remove_device at ffffffff8f737f66
 device_del at ffffffff8f73341b
 __scsi_remove_device at ffffffff8f780340
 scsi_remove_device at ffffffff8f7803a2
 virtscsi_handle_event at ffffffffc017204f [virtio_scsi]
 process_one_work at ffffffff8f1041f2
 worker_thread at ffffffff8f104789
 kthread at ffffffff8f109abb
 ret_from_fork at ffffffff8f001d6f

As commit 04646aebd30b ("fs: avoid softlockups in s_inodes iterators")
introduces the retry logic. In the problem environment, the 'i_count'
of millions of files is not zero. As a result, the time slice for each
traversal to the matching inode process is almost used up, and then the
traversal is started from scratch. The worst-case scenario is that only
one inode can be processed after each wakeup. Because this process holds
a lock, other processes will be stuck for a long time, causing a series
of problems.
To solve the problem of repeated traversal from the beginning, each time
the CPU needs to be freed, a cursor is inserted into the linked list, and
the traversal continues from the cursor next time.

Fixes: 04646aebd30b ("fs: avoid softlockups in s_inodes iterators")
Signed-off-by: Ye Bin <yebin10@huawei.com>
---
 fs/inode.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index dc966990bda6..b78895af8779 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -857,11 +857,16 @@ static void dispose_list(struct list_head *head)
 void evict_inodes(struct super_block *sb)
 {
 	struct inode *inode, *next;
+	struct inode cursor;
 	LIST_HEAD(dispose);
 
+	cursor.i_state = I_CURSOR;
+	INIT_LIST_HEAD(&cursor.i_sb_list);
+	inode = list_entry(&sb->s_inodes, typeof(*inode), i_sb_list);
+
 again:
 	spin_lock(&sb->s_inode_list_lock);
-	sb_for_each_inodes_safe(inode, next, &sb->s_inodes) {
+	sb_for_each_inodes_continue_safe(inode, next, &sb->s_inodes) {
 		if (atomic_read(&inode->i_count))
 			continue;
 
@@ -886,12 +891,16 @@ void evict_inodes(struct super_block *sb)
 		 * bit so we don't livelock.
 		 */
 		if (need_resched()) {
+			list_del(&cursor.i_sb_list);
+			list_add(&cursor.i_sb_list, &inode->i_sb_list);
+			inode = &cursor;
 			spin_unlock(&sb->s_inode_list_lock);
 			cond_resched();
 			dispose_list(&dispose);
 			goto again;
 		}
 	}
+	list_del(&cursor.i_sb_list);
 	spin_unlock(&sb->s_inode_list_lock);
 
 	dispose_list(&dispose);
@@ -907,11 +916,16 @@ EXPORT_SYMBOL_GPL(evict_inodes);
 void invalidate_inodes(struct super_block *sb)
 {
 	struct inode *inode, *next;
+	struct inode cursor;
 	LIST_HEAD(dispose);
 
+	cursor.i_state = I_CURSOR;
+	INIT_LIST_HEAD(&cursor.i_sb_list);
+	inode = list_entry(&sb->s_inodes, typeof(*inode), i_sb_list);
+
 again:
 	spin_lock(&sb->s_inode_list_lock);
-	sb_for_each_inodes_safe(inode, next, &sb->s_inodes) {
+	sb_for_each_inodes_continue_safe(inode, next, &sb->s_inodes) {
 		spin_lock(&inode->i_lock);
 		if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
 			spin_unlock(&inode->i_lock);
@@ -927,12 +941,16 @@ void invalidate_inodes(struct super_block *sb)
 		spin_unlock(&inode->i_lock);
 		list_add(&inode->i_lru, &dispose);
 		if (need_resched()) {
+			list_del(&cursor.i_sb_list);
+			list_add(&cursor.i_sb_list, &inode->i_sb_list);
+			inode = &cursor;
 			spin_unlock(&sb->s_inode_list_lock);
 			cond_resched();
 			dispose_list(&dispose);
 			goto again;
 		}
 	}
+	list_del(&cursor.i_sb_list);
 	spin_unlock(&sb->s_inode_list_lock);
 
 	dispose_list(&dispose);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 10/11] fs: fix hungtask due to repeated traversal of inodes list
  2024-11-18 11:45 ` [PATCH 10/11] fs: fix hungtask due to repeated traversal of inodes list Ye Bin
@ 2024-12-04 11:17   ` Christian Brauner
  2024-12-04 14:16     ` Jan Kara
  0 siblings, 1 reply; 16+ messages in thread
From: Christian Brauner @ 2024-12-04 11:17 UTC (permalink / raw)
  To: Ye Bin
  Cc: viro, jack, linux-fsdevel, axboe, linux-block, agruenba, gfs2,
	amir73il, mic, gnoack, paul, jmorris, serge,
	linux-security-module, yebin10, zhangxiaoxu5

On Mon, Nov 18, 2024 at 07:45:07PM +0800, Ye Bin wrote:
> From: Ye Bin <yebin10@huawei.com>
> 
> There's a issue when remove scsi disk, the invalidate_inodes() function
> cannot exit for a long time, then trigger hungtask:
> INFO: task kworker/56:0:1391396 blocked for more than 122 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Workqueue: events_freezable virtscsi_handle_event [virtio_scsi]
> Call Trace:
>  __schedule+0x33c/0x7f0
>  schedule+0x46/0xb0
>  schedule_preempt_disabled+0xa/0x10
>  __mutex_lock.constprop.0+0x22b/0x490
>  mutex_lock+0x52/0x70
>  scsi_scan_target+0x6d/0xf0
>  virtscsi_handle_event+0x152/0x1a0 [virtio_scsi]
>  process_one_work+0x1b2/0x350
>  worker_thread+0x49/0x310
>  kthread+0xfb/0x140
>  ret_from_fork+0x1f/0x30
> 
> PID: 540499  TASK: ffff9b15e504c080  CPU: 44  COMMAND: "kworker/44:0"
> Call trace:
>  invalidate_inodes at ffffffff8f3b4784
>  __invalidate_device at ffffffff8f3dfea3
>  invalidate_partition at ffffffff8f526b49
>  del_gendisk at ffffffff8f5280fb
>  sd_remove at ffffffffc0186455 [sd_mod]
>  __device_release_driver at ffffffff8f738ab2
>  device_release_driver at ffffffff8f738bc4
>  bus_remove_device at ffffffff8f737f66
>  device_del at ffffffff8f73341b
>  __scsi_remove_device at ffffffff8f780340
>  scsi_remove_device at ffffffff8f7803a2
>  virtscsi_handle_event at ffffffffc017204f [virtio_scsi]
>  process_one_work at ffffffff8f1041f2
>  worker_thread at ffffffff8f104789
>  kthread at ffffffff8f109abb
>  ret_from_fork at ffffffff8f001d6f
> 
> As commit 04646aebd30b ("fs: avoid softlockups in s_inodes iterators")
> introduces the retry logic. In the problem environment, the 'i_count'
> of millions of files is not zero. As a result, the time slice for each
> traversal to the matching inode process is almost used up, and then the
> traversal is started from scratch. The worst-case scenario is that only
> one inode can be processed after each wakeup. Because this process holds
> a lock, other processes will be stuck for a long time, causing a series
> of problems.
> To solve the problem of repeated traversal from the beginning, each time
> the CPU needs to be freed, a cursor is inserted into the linked list, and
> the traversal continues from the cursor next time.
> 
> Fixes: 04646aebd30b ("fs: avoid softlockups in s_inodes iterators")
> Signed-off-by: Ye Bin <yebin10@huawei.com>
> ---
>  fs/inode.c | 22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index dc966990bda6..b78895af8779 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -857,11 +857,16 @@ static void dispose_list(struct list_head *head)
>  void evict_inodes(struct super_block *sb)
>  {
>  	struct inode *inode, *next;
> +	struct inode cursor;

It seems pretty adventurous to me to just add in a random inode whose
only fiels that is initialized is i_state. That would need a proper
analysis and argument that this is safe to do and won't cause trouble
for any filesystem.

Jan, do you have thoughts on this?

>  	LIST_HEAD(dispose);
>  
> +	cursor.i_state = I_CURSOR;
> +	INIT_LIST_HEAD(&cursor.i_sb_list);
> +	inode = list_entry(&sb->s_inodes, typeof(*inode), i_sb_list);
> +
>  again:
>  	spin_lock(&sb->s_inode_list_lock);
> -	sb_for_each_inodes_safe(inode, next, &sb->s_inodes) {
> +	sb_for_each_inodes_continue_safe(inode, next, &sb->s_inodes) {
>  		if (atomic_read(&inode->i_count))
>  			continue;
>  
> @@ -886,12 +891,16 @@ void evict_inodes(struct super_block *sb)
>  		 * bit so we don't livelock.
>  		 */
>  		if (need_resched()) {
> +			list_del(&cursor.i_sb_list);
> +			list_add(&cursor.i_sb_list, &inode->i_sb_list);
> +			inode = &cursor;
>  			spin_unlock(&sb->s_inode_list_lock);
>  			cond_resched();
>  			dispose_list(&dispose);
>  			goto again;
>  		}
>  	}
> +	list_del(&cursor.i_sb_list);
>  	spin_unlock(&sb->s_inode_list_lock);
>  
>  	dispose_list(&dispose);
> @@ -907,11 +916,16 @@ EXPORT_SYMBOL_GPL(evict_inodes);
>  void invalidate_inodes(struct super_block *sb)
>  {
>  	struct inode *inode, *next;
> +	struct inode cursor;
>  	LIST_HEAD(dispose);
>  
> +	cursor.i_state = I_CURSOR;
> +	INIT_LIST_HEAD(&cursor.i_sb_list);
> +	inode = list_entry(&sb->s_inodes, typeof(*inode), i_sb_list);
> +
>  again:
>  	spin_lock(&sb->s_inode_list_lock);
> -	sb_for_each_inodes_safe(inode, next, &sb->s_inodes) {
> +	sb_for_each_inodes_continue_safe(inode, next, &sb->s_inodes) {
>  		spin_lock(&inode->i_lock);
>  		if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
>  			spin_unlock(&inode->i_lock);
> @@ -927,12 +941,16 @@ void invalidate_inodes(struct super_block *sb)
>  		spin_unlock(&inode->i_lock);
>  		list_add(&inode->i_lru, &dispose);
>  		if (need_resched()) {
> +			list_del(&cursor.i_sb_list);
> +			list_add(&cursor.i_sb_list, &inode->i_sb_list);
> +			inode = &cursor;
>  			spin_unlock(&sb->s_inode_list_lock);
>  			cond_resched();
>  			dispose_list(&dispose);
>  			goto again;
>  		}
>  	}
> +	list_del(&cursor.i_sb_list);
>  	spin_unlock(&sb->s_inode_list_lock);
>  
>  	dispose_list(&dispose);
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 10/11] fs: fix hungtask due to repeated traversal of inodes list
  2024-12-04 11:17   ` Christian Brauner
@ 2024-12-04 14:16     ` Jan Kara
  0 siblings, 0 replies; 16+ messages in thread
From: Jan Kara @ 2024-12-04 14:16 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Ye Bin, viro, jack, linux-fsdevel, axboe, linux-block, agruenba,
	gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
	linux-security-module, yebin10, zhangxiaoxu5, Dave Chinner

On Wed 04-12-24 12:17:49, Christian Brauner wrote:
> On Mon, Nov 18, 2024 at 07:45:07PM +0800, Ye Bin wrote:
> > From: Ye Bin <yebin10@huawei.com>
> > 
> > There's a issue when remove scsi disk, the invalidate_inodes() function
> > cannot exit for a long time, then trigger hungtask:
> > INFO: task kworker/56:0:1391396 blocked for more than 122 seconds.
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Workqueue: events_freezable virtscsi_handle_event [virtio_scsi]
> > Call Trace:
> >  __schedule+0x33c/0x7f0
> >  schedule+0x46/0xb0
> >  schedule_preempt_disabled+0xa/0x10
> >  __mutex_lock.constprop.0+0x22b/0x490
> >  mutex_lock+0x52/0x70
> >  scsi_scan_target+0x6d/0xf0
> >  virtscsi_handle_event+0x152/0x1a0 [virtio_scsi]
> >  process_one_work+0x1b2/0x350
> >  worker_thread+0x49/0x310
> >  kthread+0xfb/0x140
> >  ret_from_fork+0x1f/0x30
> > 
> > PID: 540499  TASK: ffff9b15e504c080  CPU: 44  COMMAND: "kworker/44:0"
> > Call trace:
> >  invalidate_inodes at ffffffff8f3b4784
> >  __invalidate_device at ffffffff8f3dfea3
> >  invalidate_partition at ffffffff8f526b49
> >  del_gendisk at ffffffff8f5280fb
> >  sd_remove at ffffffffc0186455 [sd_mod]
> >  __device_release_driver at ffffffff8f738ab2
> >  device_release_driver at ffffffff8f738bc4
> >  bus_remove_device at ffffffff8f737f66
> >  device_del at ffffffff8f73341b
> >  __scsi_remove_device at ffffffff8f780340
> >  scsi_remove_device at ffffffff8f7803a2
> >  virtscsi_handle_event at ffffffffc017204f [virtio_scsi]
> >  process_one_work at ffffffff8f1041f2
> >  worker_thread at ffffffff8f104789
> >  kthread at ffffffff8f109abb
> >  ret_from_fork at ffffffff8f001d6f
> > 
> > As commit 04646aebd30b ("fs: avoid softlockups in s_inodes iterators")
> > introduces the retry logic. In the problem environment, the 'i_count'
> > of millions of files is not zero. As a result, the time slice for each
> > traversal to the matching inode process is almost used up, and then the
> > traversal is started from scratch. The worst-case scenario is that only
> > one inode can be processed after each wakeup. Because this process holds
> > a lock, other processes will be stuck for a long time, causing a series
> > of problems.
> > To solve the problem of repeated traversal from the beginning, each time
> > the CPU needs to be freed, a cursor is inserted into the linked list, and
> > the traversal continues from the cursor next time.
> > 
> > Fixes: 04646aebd30b ("fs: avoid softlockups in s_inodes iterators")
> > Signed-off-by: Ye Bin <yebin10@huawei.com>
> > ---
> >  fs/inode.c | 22 ++++++++++++++++++++--
> >  1 file changed, 20 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/inode.c b/fs/inode.c
> > index dc966990bda6..b78895af8779 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -857,11 +857,16 @@ static void dispose_list(struct list_head *head)
> >  void evict_inodes(struct super_block *sb)
> >  {
> >  	struct inode *inode, *next;
> > +	struct inode cursor;
> 
> It seems pretty adventurous to me to just add in a random inode whose
> only fiels that is initialized is i_state. That would need a proper
> analysis and argument that this is safe to do and won't cause trouble
> for any filesystem.
> 
> Jan, do you have thoughts on this?

Yeah, I think in the current state where there are several instances of
hand-crafted inode iteration code it is somewhat fragile to use the cursor
approach. I was staying silent because I was hoping Dave Chinner's patches
to clean up inode iteration get to a more ready state. Then either we have
well consolidated inode iteration code so additions like this can be easily
verified for correctness or we could even get as far as removing
sb->s_inodes list altogether as Dave outlined [1] which would nicely deal
with the issue solved here as well. Sadly that patch series seems to have
lost traction. Hopefully we can either revive it or at least scavenge the
nice preparatory cleanups...

								Honza

[1] https://lore.kernel.org/all/ZwRvshM65rxXTwxd@dread.disaster.area
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 11/11] fs: fix potential soft lockup when 'sb->s_inodes' has large number of inodes
  2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
                   ` (9 preceding siblings ...)
  2024-11-18 11:45 ` [PATCH 10/11] fs: fix hungtask due to repeated traversal of inodes list Ye Bin
@ 2024-11-18 11:45 ` Ye Bin
  2024-12-04  7:04 ` [PATCH 00/11] fix hungtask due to repeated traversal of inodes list yebin (H)
  2024-12-04 11:04 ` Mateusz Guzik
  12 siblings, 0 replies; 16+ messages in thread
From: Ye Bin @ 2024-11-18 11:45 UTC (permalink / raw)
  To: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
	gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
	linux-security-module
  Cc: yebin10, zhangxiaoxu5

From: Ye Bin <yebin10@huawei.com>

If the memory is sufficient, a large number of inodes that do not
meet the conditions may exist in the 'sb->s_inodes' list when
evict_inodes()/invalidate_inodes() traverse the 'sb->s_inodes' list.
Then it maybe trigger soft lockup.
To solve potential soft lockup, move need_resched() check from tail
to head when traverse the 'sb->s_inodes' list.

Signed-off-by: Ye Bin <yebin10@huawei.com>
---
 fs/inode.c | 49 +++++++++++++++++++++++++------------------------
 1 file changed, 25 insertions(+), 24 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index b78895af8779..e865fc1f5a95 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -867,6 +867,21 @@ void evict_inodes(struct super_block *sb)
 again:
 	spin_lock(&sb->s_inode_list_lock);
 	sb_for_each_inodes_continue_safe(inode, next, &sb->s_inodes) {
+		/*
+		 * We can have a ton of inodes to evict at unmount time given
+		 * enough memory, check to see if we need to go to sleep for a
+		 * bit so we don't livelock.
+		 */
+		if (need_resched()) {
+			list_del(&cursor.i_sb_list);
+			list_add_tail(&cursor.i_sb_list, &inode->i_sb_list);
+			inode = &cursor;
+			spin_unlock(&sb->s_inode_list_lock);
+			cond_resched();
+			dispose_list(&dispose);
+			goto again;
+		}
+
 		if (atomic_read(&inode->i_count))
 			continue;
 
@@ -884,21 +899,6 @@ void evict_inodes(struct super_block *sb)
 		inode_lru_list_del(inode);
 		spin_unlock(&inode->i_lock);
 		list_add(&inode->i_lru, &dispose);
-
-		/*
-		 * We can have a ton of inodes to evict at unmount time given
-		 * enough memory, check to see if we need to go to sleep for a
-		 * bit so we don't livelock.
-		 */
-		if (need_resched()) {
-			list_del(&cursor.i_sb_list);
-			list_add(&cursor.i_sb_list, &inode->i_sb_list);
-			inode = &cursor;
-			spin_unlock(&sb->s_inode_list_lock);
-			cond_resched();
-			dispose_list(&dispose);
-			goto again;
-		}
 	}
 	list_del(&cursor.i_sb_list);
 	spin_unlock(&sb->s_inode_list_lock);
@@ -926,6 +926,16 @@ void invalidate_inodes(struct super_block *sb)
 again:
 	spin_lock(&sb->s_inode_list_lock);
 	sb_for_each_inodes_continue_safe(inode, next, &sb->s_inodes) {
+		if (need_resched()) {
+			list_del(&cursor.i_sb_list);
+			list_add_tail(&cursor.i_sb_list, &inode->i_sb_list);
+			inode = &cursor;
+			spin_unlock(&sb->s_inode_list_lock);
+			cond_resched();
+			dispose_list(&dispose);
+			goto again;
+		}
+
 		spin_lock(&inode->i_lock);
 		if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
 			spin_unlock(&inode->i_lock);
@@ -940,15 +950,6 @@ void invalidate_inodes(struct super_block *sb)
 		inode_lru_list_del(inode);
 		spin_unlock(&inode->i_lock);
 		list_add(&inode->i_lru, &dispose);
-		if (need_resched()) {
-			list_del(&cursor.i_sb_list);
-			list_add(&cursor.i_sb_list, &inode->i_sb_list);
-			inode = &cursor;
-			spin_unlock(&sb->s_inode_list_lock);
-			cond_resched();
-			dispose_list(&dispose);
-			goto again;
-		}
 	}
 	list_del(&cursor.i_sb_list);
 	spin_unlock(&sb->s_inode_list_lock);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 00/11] fix hungtask due to repeated traversal of inodes list
  2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
                   ` (10 preceding siblings ...)
  2024-11-18 11:45 ` [PATCH 11/11] fs: fix potential soft lockup when 'sb->s_inodes' has large number of inodes Ye Bin
@ 2024-12-04  7:04 ` yebin (H)
  2024-12-04 11:04 ` Mateusz Guzik
  12 siblings, 0 replies; 16+ messages in thread
From: yebin (H) @ 2024-12-04  7:04 UTC (permalink / raw)
  To: Ye Bin, viro, brauner, jack, linux-fsdevel, axboe, linux-block,
	agruenba, gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
	linux-security-module
  Cc: zhangxiaoxu5

Friendly ping...

On 2024/11/18 19:44, Ye Bin wrote:
> From: Ye Bin <yebin10@huawei.com>
>
> As commit 04646aebd30b ("fs: avoid softlockups in s_inodes iterators")
> introduces the retry logic. In the problem environment, the 'i_count'
> of millions of files is not zero. As a result, the time slice for each
> traversal to the matching inode process is almost used up, and then the
> traversal is started from scratch. The worst-case scenario is that only
> one inode can be processed after each wakeup. Because this process holds
> a lock, other processes will be stuck for a long time, causing a series
> of problems.
> To solve the problem of repeated traversal from the beginning, each time
> the CPU needs to be freed, a cursor is inserted into the linked list, and
> the traversal continues from the cursor next time.
>
> Ye Bin (11):
>    fs: introduce I_CURSOR flag for inode
>    block: use sb_for_each_inodes API
>    fs: use sb_for_each_inodes API
>    gfs2: use sb_for_each_inodes API
>    fs: use sb_for_each_inodes_safe API
>    fsnotify: use sb_for_each_inodes API
>    quota: use sb_for_each_inodes API
>    fs/super.c: use sb_for_each_inodes API
>    landlock: use sb_for_each_inodes API
>    fs: fix hungtask due to repeated traversal of inodes list
>    fs: fix potential soft lockup when 'sb->s_inodes' has large number of
>      inodes
>
>   block/bdev.c           |  4 +--
>   fs/drop_caches.c       |  2 +-
>   fs/gfs2/ops_fstype.c   |  2 +-
>   fs/inode.c             | 59 ++++++++++++++++++++++++++++--------------
>   fs/notify/fsnotify.c   |  2 +-
>   fs/quota/dquot.c       |  4 +--
>   fs/super.c             |  2 +-
>   include/linux/fs.h     | 45 ++++++++++++++++++++++++++++++++
>   security/landlock/fs.c |  2 +-
>   9 files changed, 93 insertions(+), 29 deletions(-)
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 00/11] fix hungtask due to repeated traversal of inodes list
  2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
                   ` (11 preceding siblings ...)
  2024-12-04  7:04 ` [PATCH 00/11] fix hungtask due to repeated traversal of inodes list yebin (H)
@ 2024-12-04 11:04 ` Mateusz Guzik
  12 siblings, 0 replies; 16+ messages in thread
From: Mateusz Guzik @ 2024-12-04 11:04 UTC (permalink / raw)
  To: Ye Bin
  Cc: viro, brauner, jack, linux-fsdevel, axboe, linux-block, agruenba,
	gfs2, amir73il, mic, gnoack, paul, jmorris, serge,
	linux-security-module, yebin10, zhangxiaoxu5

On Mon, Nov 18, 2024 at 07:44:57PM +0800, Ye Bin wrote:
> From: Ye Bin <yebin10@huawei.com>
> 
> As commit 04646aebd30b ("fs: avoid softlockups in s_inodes iterators")
> introduces the retry logic. In the problem environment, the 'i_count'
> of millions of files is not zero. As a result, the time slice for each
> traversal to the matching inode process is almost used up, and then the
> traversal is started from scratch. The worst-case scenario is that only
> one inode can be processed after each wakeup. Because this process holds
> a lock, other processes will be stuck for a long time, causing a series
> of problems.
> To solve the problem of repeated traversal from the beginning, each time
> the CPU needs to be freed, a cursor is inserted into the linked list, and
> the traversal continues from the cursor next time.
> 

I'm surprised this was not sorted out eons ago.

My non-maintainer $0,03 is as follows:

Insertion of a marker is indeed the idiomatic way to fix this, but I
disagree with the specific implementation.

All cond_resched() handling should be moved into the magic iteration
machinery, instead of random consumers open-coding it.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-12-04 14:16 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-18 11:44 [PATCH 00/11] fix hungtask due to repeated traversal of inodes list Ye Bin
2024-11-18 11:44 ` [PATCH 01/11] fs: introduce I_CURSOR flag for inode Ye Bin
2024-11-18 11:44 ` [PATCH 02/11] block: use sb_for_each_inodes API Ye Bin
2024-11-18 11:45 ` [PATCH 03/11] fs: " Ye Bin
2024-11-18 11:45 ` [PATCH 04/11] gfs2: " Ye Bin
2024-11-18 11:45 ` [PATCH 05/11] fs: use sb_for_each_inodes_safe API Ye Bin
2024-11-18 11:45 ` [PATCH 06/11] fsnotify: use sb_for_each_inodes API Ye Bin
2024-11-18 11:45 ` [PATCH 07/11] quota: " Ye Bin
2024-11-18 11:45 ` [PATCH 08/11] fs/super.c: " Ye Bin
2024-11-18 11:45 ` [PATCH 09/11] landlock: " Ye Bin
2024-11-18 11:45 ` [PATCH 10/11] fs: fix hungtask due to repeated traversal of inodes list Ye Bin
2024-12-04 11:17   ` Christian Brauner
2024-12-04 14:16     ` Jan Kara
2024-11-18 11:45 ` [PATCH 11/11] fs: fix potential soft lockup when 'sb->s_inodes' has large number of inodes Ye Bin
2024-12-04  7:04 ` [PATCH 00/11] fix hungtask due to repeated traversal of inodes list yebin (H)
2024-12-04 11:04 ` Mateusz Guzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).