* [PATCH v2 0/4] writeback: Avoid lockups when switching inodes
@ 2025-09-12 10:38 Jan Kara
2025-09-12 10:38 ` [PATCH v2 1/4] writeback: Avoid contention on wb->list_lock " Jan Kara
` (4 more replies)
0 siblings, 5 replies; 12+ messages in thread
From: Jan Kara @ 2025-09-12 10:38 UTC (permalink / raw)
To: Christian Brauner; +Cc: linux-fsdevel, Tejun Heo, Jan Kara
Hello!
This patch series addresses lockups reported by users when systemd unit reading
lots of files from a filesystem mounted with lazytime mount option exits. See
patch 3 for more details about the reproducer.
There are two main issues why switching many inodes between wbs:
1) Multiple workers will be spawned to do the switching but they all contend
on the same wb->list_lock making all the parallelism pointless and just
wasting time.
2) Sorting of wb->b_dirty list by dirtied_time_when is inherently slow.
Patches 1-3 address these problems, patch 4 adds a tracepoint for better
observability of inode writeback switching.
Changes since v1:
* Added Acked-by's from Tejun
* Modified patch 1 to directly insert isw items into a list of switches to
make instead of queueing rcu_work for that. This actually speeded up
large scale switching about 2x.
Honza
Previous versions:
Link: http://lore.kernel.org/r/20250909143734.30801-1-jack@suse.cz # v1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 1/4] writeback: Avoid contention on wb->list_lock when switching inodes
2025-09-12 10:38 [PATCH v2 0/4] writeback: Avoid lockups when switching inodes Jan Kara
@ 2025-09-12 10:38 ` Jan Kara
2025-09-12 16:15 ` Tejun Heo
2025-09-12 16:20 ` Tejun Heo
2025-09-12 10:38 ` [PATCH v2 2/4] writeback: Avoid softlockup when switching many inodes Jan Kara
` (3 subsequent siblings)
4 siblings, 2 replies; 12+ messages in thread
From: Jan Kara @ 2025-09-12 10:38 UTC (permalink / raw)
To: Christian Brauner; +Cc: linux-fsdevel, Tejun Heo, Jan Kara
There can be multiple inode switch works that are trying to switch
inodes to / from the same wb. This can happen in particular if some
cgroup exits which owns many (thousands) inodes and we need to switch
them all. In this case several inode_switch_wbs_work_fn() instances will
be just spinning on the same wb->list_lock while only one of them makes
forward progress. This wastes CPU cycles and quickly leads to softlockup
reports and unusable system.
Instead of running several inode_switch_wbs_work_fn() instances in
parallel switching to the same wb and contending on wb->list_lock, run
just one work item per wb and manage a queue of isw items switching to
this wb.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/fs-writeback.c | 100 ++++++++++++++++++++-----------
include/linux/backing-dev-defs.h | 4 ++
include/linux/writeback.h | 2 +
mm/backing-dev.c | 5 ++
4 files changed, 75 insertions(+), 36 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index a07b8cf73ae2..f2265aa9b4c2 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -368,7 +368,8 @@ static struct bdi_writeback *inode_to_wb_and_lock_list(struct inode *inode)
}
struct inode_switch_wbs_context {
- struct rcu_work work;
+ /* List of queued switching contexts for the wb */
+ struct llist_node list;
/*
* Multiple inodes can be switched at once. The switching procedure
@@ -378,7 +379,6 @@ struct inode_switch_wbs_context {
* array embedded into struct inode_switch_wbs_context. Otherwise
* an inode could be left in a non-consistent state.
*/
- struct bdi_writeback *new_wb;
struct inode *inodes[];
};
@@ -486,13 +486,11 @@ static bool inode_do_switch_wbs(struct inode *inode,
return switched;
}
-static void inode_switch_wbs_work_fn(struct work_struct *work)
+static void process_inode_switch_wbs_work(struct bdi_writeback *new_wb,
+ struct inode_switch_wbs_context *isw)
{
- struct inode_switch_wbs_context *isw =
- container_of(to_rcu_work(work), struct inode_switch_wbs_context, work);
struct backing_dev_info *bdi = inode_to_bdi(isw->inodes[0]);
struct bdi_writeback *old_wb = isw->inodes[0]->i_wb;
- struct bdi_writeback *new_wb = isw->new_wb;
unsigned long nr_switched = 0;
struct inode **inodep;
@@ -543,6 +541,39 @@ static void inode_switch_wbs_work_fn(struct work_struct *work)
atomic_dec(&isw_nr_in_flight);
}
+void inode_switch_wbs_work_fn(struct work_struct *work)
+{
+ struct bdi_writeback *new_wb = container_of(work, struct bdi_writeback,
+ switch_work);
+ struct inode_switch_wbs_context *isw, *next_isw;
+ struct llist_node *list;
+
+ /*
+ * Grab out reference to wb so that it cannot get freed under us
+ * after we process all the isw items.
+ */
+ wb_get(new_wb);
+ while (1) {
+ list = llist_del_all(&new_wb->switch_wbs_ctxs);
+ /* Nothing to do? */
+ if (!list) {
+ wb_put(new_wb);
+ return;
+ }
+ /*
+ * In addition to synchronizing among switchers, I_WB_SWITCH
+ * tells the RCU protected stat update paths to grab the i_page
+ * lock so that stat transfer can synchronize against them.
+ * Let's continue after I_WB_SWITCH is guaranteed to be
+ * visible.
+ */
+ synchronize_rcu();
+
+ llist_for_each_entry_safe(isw, next_isw, list, list)
+ process_inode_switch_wbs_work(new_wb, isw);
+ }
+}
+
static bool inode_prepare_wbs_switch(struct inode *inode,
struct bdi_writeback *new_wb)
{
@@ -572,6 +603,13 @@ static bool inode_prepare_wbs_switch(struct inode *inode,
return true;
}
+static void wb_queue_isw(struct bdi_writeback *wb,
+ struct inode_switch_wbs_context *isw)
+{
+ if (llist_add(&isw->list, &wb->switch_wbs_ctxs))
+ queue_work(isw_wq, &wb->switch_work);
+}
+
/**
* inode_switch_wbs - change the wb association of an inode
* @inode: target inode
@@ -585,6 +623,7 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
struct backing_dev_info *bdi = inode_to_bdi(inode);
struct cgroup_subsys_state *memcg_css;
struct inode_switch_wbs_context *isw;
+ struct bdi_writeback *new_wb = NULL;
/* noop if seems to be already in progress */
if (inode->i_state & I_WB_SWITCH)
@@ -609,40 +648,34 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
if (!memcg_css)
goto out_free;
- isw->new_wb = wb_get_create(bdi, memcg_css, GFP_ATOMIC);
+ new_wb = wb_get_create(bdi, memcg_css, GFP_ATOMIC);
css_put(memcg_css);
- if (!isw->new_wb)
+ if (!new_wb)
goto out_free;
- if (!inode_prepare_wbs_switch(inode, isw->new_wb))
+ if (!inode_prepare_wbs_switch(inode, new_wb))
goto out_free;
isw->inodes[0] = inode;
- /*
- * In addition to synchronizing among switchers, I_WB_SWITCH tells
- * the RCU protected stat update paths to grab the i_page
- * lock so that stat transfer can synchronize against them.
- * Let's continue after I_WB_SWITCH is guaranteed to be visible.
- */
- INIT_RCU_WORK(&isw->work, inode_switch_wbs_work_fn);
- queue_rcu_work(isw_wq, &isw->work);
+ wb_queue_isw(new_wb, isw);
return;
out_free:
atomic_dec(&isw_nr_in_flight);
- if (isw->new_wb)
- wb_put(isw->new_wb);
+ if (new_wb)
+ wb_put(new_wb);
kfree(isw);
}
-static bool isw_prepare_wbs_switch(struct inode_switch_wbs_context *isw,
+static bool isw_prepare_wbs_switch(struct bdi_writeback *new_wb,
+ struct inode_switch_wbs_context *isw,
struct list_head *list, int *nr)
{
struct inode *inode;
list_for_each_entry(inode, list, i_io_list) {
- if (!inode_prepare_wbs_switch(inode, isw->new_wb))
+ if (!inode_prepare_wbs_switch(inode, new_wb))
continue;
isw->inodes[*nr] = inode;
@@ -666,6 +699,7 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb)
{
struct cgroup_subsys_state *memcg_css;
struct inode_switch_wbs_context *isw;
+ struct bdi_writeback *new_wb;
int nr;
bool restart = false;
@@ -678,12 +712,12 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb)
for (memcg_css = wb->memcg_css->parent; memcg_css;
memcg_css = memcg_css->parent) {
- isw->new_wb = wb_get_create(wb->bdi, memcg_css, GFP_KERNEL);
- if (isw->new_wb)
+ new_wb = wb_get_create(wb->bdi, memcg_css, GFP_KERNEL);
+ if (new_wb)
break;
}
- if (unlikely(!isw->new_wb))
- isw->new_wb = &wb->bdi->wb; /* wb_get() is noop for bdi's wb */
+ if (unlikely(!new_wb))
+ new_wb = &wb->bdi->wb; /* wb_get() is noop for bdi's wb */
nr = 0;
spin_lock(&wb->list_lock);
@@ -695,27 +729,21 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb)
* bandwidth restrictions, as writeback of inode metadata is not
* accounted for.
*/
- restart = isw_prepare_wbs_switch(isw, &wb->b_attached, &nr);
+ restart = isw_prepare_wbs_switch(new_wb, isw, &wb->b_attached, &nr);
if (!restart)
- restart = isw_prepare_wbs_switch(isw, &wb->b_dirty_time, &nr);
+ restart = isw_prepare_wbs_switch(new_wb, isw, &wb->b_dirty_time,
+ &nr);
spin_unlock(&wb->list_lock);
/* no attached inodes? bail out */
if (nr == 0) {
atomic_dec(&isw_nr_in_flight);
- wb_put(isw->new_wb);
+ wb_put(new_wb);
kfree(isw);
return restart;
}
- /*
- * In addition to synchronizing among switchers, I_WB_SWITCH tells
- * the RCU protected stat update paths to grab the i_page
- * lock so that stat transfer can synchronize against them.
- * Let's continue after I_WB_SWITCH is guaranteed to be visible.
- */
- INIT_RCU_WORK(&isw->work, inode_switch_wbs_work_fn);
- queue_rcu_work(isw_wq, &isw->work);
+ wb_queue_isw(new_wb, isw);
return restart;
}
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 2ad261082bba..c5c9d89c73ed 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -152,6 +152,10 @@ struct bdi_writeback {
struct list_head blkcg_node; /* anchored at blkcg->cgwb_list */
struct list_head b_attached; /* attached inodes, protected by list_lock */
struct list_head offline_node; /* anchored at offline_cgwbs */
+ struct work_struct switch_work; /* work used to perform inode switching
+ * to this wb */
+ struct llist_head switch_wbs_ctxs; /* queued contexts for
+ * writeback switching */
union {
struct work_struct release_work;
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index a2848d731a46..15a4bc4ab819 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -265,6 +265,8 @@ static inline void wbc_init_bio(struct writeback_control *wbc, struct bio *bio)
bio_associate_blkg_from_css(bio, wbc->wb->blkcg_css);
}
+void inode_switch_wbs_work_fn(struct work_struct *work);
+
#else /* CONFIG_CGROUP_WRITEBACK */
static inline void inode_attach_wb(struct inode *inode, struct folio *folio)
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 783904d8c5ef..0beaca6bacf7 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -633,6 +633,7 @@ static void cgwb_release_workfn(struct work_struct *work)
wb_exit(wb);
bdi_put(bdi);
WARN_ON_ONCE(!list_empty(&wb->b_attached));
+ WARN_ON_ONCE(work_pending(&wb->switch_work));
call_rcu(&wb->rcu, cgwb_free_rcu);
}
@@ -709,6 +710,8 @@ static int cgwb_create(struct backing_dev_info *bdi,
wb->memcg_css = memcg_css;
wb->blkcg_css = blkcg_css;
INIT_LIST_HEAD(&wb->b_attached);
+ INIT_WORK(&wb->switch_work, inode_switch_wbs_work_fn);
+ init_llist_head(&wb->switch_wbs_ctxs);
INIT_WORK(&wb->release_work, cgwb_release_workfn);
set_bit(WB_registered, &wb->state);
bdi_get(bdi);
@@ -839,6 +842,8 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi)
if (!ret) {
bdi->wb.memcg_css = &root_mem_cgroup->css;
bdi->wb.blkcg_css = blkcg_root_css;
+ INIT_WORK(&bdi->wb.switch_work, inode_switch_wbs_work_fn);
+ init_llist_head(&bdi->wb.switch_wbs_ctxs);
}
return ret;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2 2/4] writeback: Avoid softlockup when switching many inodes
2025-09-12 10:38 [PATCH v2 0/4] writeback: Avoid lockups when switching inodes Jan Kara
2025-09-12 10:38 ` [PATCH v2 1/4] writeback: Avoid contention on wb->list_lock " Jan Kara
@ 2025-09-12 10:38 ` Jan Kara
2025-09-12 10:38 ` [PATCH v2 3/4] writeback: Avoid excessively long inode switching times Jan Kara
` (2 subsequent siblings)
4 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2025-09-12 10:38 UTC (permalink / raw)
To: Christian Brauner; +Cc: linux-fsdevel, Tejun Heo, Jan Kara
process_inode_switch_wbs_work() can be switching over 100 inodes to a
different cgroup. Since switching an inode requires counting all dirty &
under-writeback pages in the address space of each inode, this can take
a significant amount of time. Add a possibility to reschedule after
processing each inode to avoid softlockups.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/fs-writeback.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index f2265aa9b4c2..40b42c385b55 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -500,6 +500,7 @@ static void process_inode_switch_wbs_work(struct bdi_writeback *new_wb,
*/
down_read(&bdi->wb_switch_rwsem);
+ inodep = isw->inodes;
/*
* By the time control reaches here, RCU grace period has passed
* since I_WB_SWITCH assertion and all wb stat update transactions
@@ -510,6 +511,7 @@ static void process_inode_switch_wbs_work(struct bdi_writeback *new_wb,
* gives us exclusion against all wb related operations on @inode
* including IO list manipulations and stat updates.
*/
+relock:
if (old_wb < new_wb) {
spin_lock(&old_wb->list_lock);
spin_lock_nested(&new_wb->list_lock, SINGLE_DEPTH_NESTING);
@@ -518,10 +520,17 @@ static void process_inode_switch_wbs_work(struct bdi_writeback *new_wb,
spin_lock_nested(&old_wb->list_lock, SINGLE_DEPTH_NESTING);
}
- for (inodep = isw->inodes; *inodep; inodep++) {
+ while (*inodep) {
WARN_ON_ONCE((*inodep)->i_wb != old_wb);
if (inode_do_switch_wbs(*inodep, old_wb, new_wb))
nr_switched++;
+ inodep++;
+ if (*inodep && need_resched()) {
+ spin_unlock(&new_wb->list_lock);
+ spin_unlock(&old_wb->list_lock);
+ cond_resched();
+ goto relock;
+ }
}
spin_unlock(&new_wb->list_lock);
--
2.51.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2 3/4] writeback: Avoid excessively long inode switching times
2025-09-12 10:38 [PATCH v2 0/4] writeback: Avoid lockups when switching inodes Jan Kara
2025-09-12 10:38 ` [PATCH v2 1/4] writeback: Avoid contention on wb->list_lock " Jan Kara
2025-09-12 10:38 ` [PATCH v2 2/4] writeback: Avoid softlockup when switching many inodes Jan Kara
@ 2025-09-12 10:38 ` Jan Kara
2025-09-12 10:38 ` [PATCH v2 4/4] writeback: Add tracepoint to track pending inode switches Jan Kara
2025-09-15 12:50 ` [PATCH v2 0/4] writeback: Avoid lockups when switching inodes Christian Brauner
4 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2025-09-12 10:38 UTC (permalink / raw)
To: Christian Brauner; +Cc: linux-fsdevel, Tejun Heo, Jan Kara
With lazytime mount option enabled we can be switching many dirty inodes
on cgroup exit to the parent cgroup. The numbers observed in practice
when systemd slice of a large cron job exits can easily reach hundreds
of thousands or millions. The logic in inode_do_switch_wbs() which sorts
the inode into appropriate place in b_dirty list of the target wb
however has linear complexity in the number of dirty inodes thus overall
time complexity of switching all the inodes is quadratic leading to
workers being pegged for hours consuming 100% of the CPU and switching
inodes to the parent wb.
Simple reproducer of the issue:
FILES=10000
# Filesystem mounted with lazytime mount option
MNT=/mnt/
echo "Creating files and switching timestamps"
for (( j = 0; j < 50; j ++ )); do
mkdir $MNT/dir$j
for (( i = 0; i < $FILES; i++ )); do
echo "foo" >$MNT/dir$j/file$i
done
touch -a -t 202501010000 $MNT/dir$j/file*
done
wait
echo "Syncing and flushing"
sync
echo 3 >/proc/sys/vm/drop_caches
echo "Reading all files from a cgroup"
mkdir /sys/fs/cgroup/unified/mycg1 || exit
echo $$ >/sys/fs/cgroup/unified/mycg1/cgroup.procs || exit
for (( j = 0; j < 50; j ++ )); do
cat /mnt/dir$j/file* >/dev/null &
done
wait
echo "Switching wbs"
# Now rmdir the cgroup after the script exits
We need to maintain b_dirty list ordering to keep writeback happy so
instead of sorting inode into appropriate place just append it at the
end of the list and clobber dirtied_time_when. This may result in inode
writeback starting later after cgroup switch however cgroup switches are
rare so it shouldn't matter much. Since the cgroup had write access to
the inode, there are no practical concerns of the possible DoS issues.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/fs-writeback.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 40b42c385b55..22fe313ae0d3 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -445,22 +445,23 @@ static bool inode_do_switch_wbs(struct inode *inode,
* Transfer to @new_wb's IO list if necessary. If the @inode is dirty,
* the specific list @inode was on is ignored and the @inode is put on
* ->b_dirty which is always correct including from ->b_dirty_time.
- * The transfer preserves @inode->dirtied_when ordering. If the @inode
- * was clean, it means it was on the b_attached list, so move it onto
- * the b_attached list of @new_wb.
+ * If the @inode was clean, it means it was on the b_attached list, so
+ * move it onto the b_attached list of @new_wb.
*/
if (!list_empty(&inode->i_io_list)) {
inode->i_wb = new_wb;
if (inode->i_state & I_DIRTY_ALL) {
- struct inode *pos;
-
- list_for_each_entry(pos, &new_wb->b_dirty, i_io_list)
- if (time_after_eq(inode->dirtied_when,
- pos->dirtied_when))
- break;
+ /*
+ * We need to keep b_dirty list sorted by
+ * dirtied_time_when. However properly sorting the
+ * inode in the list gets too expensive when switching
+ * many inodes. So just attach inode at the end of the
+ * dirty list and clobber the dirtied_time_when.
+ */
+ inode->dirtied_time_when = jiffies;
inode_io_list_move_locked(inode, new_wb,
- pos->i_io_list.prev);
+ &new_wb->b_dirty);
} else {
inode_cgwb_move_to_attached(inode, new_wb);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2 4/4] writeback: Add tracepoint to track pending inode switches
2025-09-12 10:38 [PATCH v2 0/4] writeback: Avoid lockups when switching inodes Jan Kara
` (2 preceding siblings ...)
2025-09-12 10:38 ` [PATCH v2 3/4] writeback: Avoid excessively long inode switching times Jan Kara
@ 2025-09-12 10:38 ` Jan Kara
2025-09-15 12:50 ` [PATCH v2 0/4] writeback: Avoid lockups when switching inodes Christian Brauner
4 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2025-09-12 10:38 UTC (permalink / raw)
To: Christian Brauner; +Cc: linux-fsdevel, Tejun Heo, Jan Kara
Add trace_inode_switch_wbs_queue tracepoint to allow insight into how
many inodes are queued to switch their bdi_writeback structure.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/fs-writeback.c | 2 ++
include/trace/events/writeback.h | 29 +++++++++++++++++++++++++++++
2 files changed, 31 insertions(+)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 22fe313ae0d3..fad8ddfa622b 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -668,6 +668,7 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
isw->inodes[0] = inode;
+ trace_inode_switch_wbs_queue(inode->i_wb, new_wb, 1);
wb_queue_isw(new_wb, isw);
return;
@@ -753,6 +754,7 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb)
return restart;
}
+ trace_inode_switch_wbs_queue(wb, new_wb, nr);
wb_queue_isw(new_wb, isw);
return restart;
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 1e23919c0da9..c08aff044e80 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -213,6 +213,35 @@ TRACE_EVENT(inode_foreign_history,
)
);
+TRACE_EVENT(inode_switch_wbs_queue,
+
+ TP_PROTO(struct bdi_writeback *old_wb, struct bdi_writeback *new_wb,
+ unsigned int count),
+
+ TP_ARGS(old_wb, new_wb, count),
+
+ TP_STRUCT__entry(
+ __array(char, name, 32)
+ __field(ino_t, old_cgroup_ino)
+ __field(ino_t, new_cgroup_ino)
+ __field(unsigned int, count)
+ ),
+
+ TP_fast_assign(
+ strscpy_pad(__entry->name, bdi_dev_name(old_wb->bdi), 32);
+ __entry->old_cgroup_ino = __trace_wb_assign_cgroup(old_wb);
+ __entry->new_cgroup_ino = __trace_wb_assign_cgroup(new_wb);
+ __entry->count = count;
+ ),
+
+ TP_printk("bdi %s: old_cgroup_ino=%lu new_cgroup_ino=%lu count=%u",
+ __entry->name,
+ (unsigned long)__entry->old_cgroup_ino,
+ (unsigned long)__entry->new_cgroup_ino,
+ __entry->count
+ )
+);
+
TRACE_EVENT(inode_switch_wbs,
TP_PROTO(struct inode *inode, struct bdi_writeback *old_wb,
--
2.51.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/4] writeback: Avoid contention on wb->list_lock when switching inodes
2025-09-12 10:38 ` [PATCH v2 1/4] writeback: Avoid contention on wb->list_lock " Jan Kara
@ 2025-09-12 16:15 ` Tejun Heo
2025-09-12 16:39 ` Jan Kara
2025-09-12 16:20 ` Tejun Heo
1 sibling, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2025-09-12 16:15 UTC (permalink / raw)
To: Jan Kara; +Cc: Christian Brauner, linux-fsdevel
Hello,
On Fri, Sep 12, 2025 at 12:38:35PM +0200, Jan Kara wrote:
> There can be multiple inode switch works that are trying to switch
> inodes to / from the same wb. This can happen in particular if some
> cgroup exits which owns many (thousands) inodes and we need to switch
> them all. In this case several inode_switch_wbs_work_fn() instances will
> be just spinning on the same wb->list_lock while only one of them makes
> forward progress. This wastes CPU cycles and quickly leads to softlockup
> reports and unusable system.
>
> Instead of running several inode_switch_wbs_work_fn() instances in
> parallel switching to the same wb and contending on wb->list_lock, run
> just one work item per wb and manage a queue of isw items switching to
> this wb.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
Generally looks great to me, but
> +void inode_switch_wbs_work_fn(struct work_struct *work)
> +{
> + struct bdi_writeback *new_wb = container_of(work, struct bdi_writeback,
> + switch_work);
> + struct inode_switch_wbs_context *isw, *next_isw;
> + struct llist_node *list;
> +
> + /*
> + * Grab out reference to wb so that it cannot get freed under us
> + * after we process all the isw items.
> + */
> + wb_get(new_wb);
Shouldn't this ref put at the end of the function?
> + while (1) {
> + list = llist_del_all(&new_wb->switch_wbs_ctxs);
> + /* Nothing to do? */
> + if (!list) {
> + wb_put(new_wb);
> + return;
> + }
> + /*
> + * In addition to synchronizing among switchers, I_WB_SWITCH
> + * tells the RCU protected stat update paths to grab the i_page
> + * lock so that stat transfer can synchronize against them.
> + * Let's continue after I_WB_SWITCH is guaranteed to be
> + * visible.
> + */
> + synchronize_rcu();
> +
> + llist_for_each_entry_safe(isw, next_isw, list, list)
> + process_inode_switch_wbs_work(new_wb, isw);
> + }
> +}
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/4] writeback: Avoid contention on wb->list_lock when switching inodes
2025-09-12 10:38 ` [PATCH v2 1/4] writeback: Avoid contention on wb->list_lock " Jan Kara
2025-09-12 16:15 ` Tejun Heo
@ 2025-09-12 16:20 ` Tejun Heo
1 sibling, 0 replies; 12+ messages in thread
From: Tejun Heo @ 2025-09-12 16:20 UTC (permalink / raw)
To: Jan Kara; +Cc: Christian Brauner, linux-fsdevel
Also, a nit:
> -static void inode_switch_wbs_work_fn(struct work_struct *work)
> +static void process_inode_switch_wbs_work(struct bdi_writeback *new_wb,
> + struct inode_switch_wbs_context *isw)
Maybe just process_inode_switch_wbs()? It's a bit odd to remove "fn" without
the "work" part as those two together was saying it was a function for a
work_struct.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/4] writeback: Avoid contention on wb->list_lock when switching inodes
2025-09-12 16:15 ` Tejun Heo
@ 2025-09-12 16:39 ` Jan Kara
2025-09-12 16:52 ` Tejun Heo
0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2025-09-12 16:39 UTC (permalink / raw)
To: Tejun Heo; +Cc: Jan Kara, Christian Brauner, linux-fsdevel
On Fri 12-09-25 06:15:49, Tejun Heo wrote:
> Hello,
>
> On Fri, Sep 12, 2025 at 12:38:35PM +0200, Jan Kara wrote:
> > There can be multiple inode switch works that are trying to switch
> > inodes to / from the same wb. This can happen in particular if some
> > cgroup exits which owns many (thousands) inodes and we need to switch
> > them all. In this case several inode_switch_wbs_work_fn() instances will
> > be just spinning on the same wb->list_lock while only one of them makes
> > forward progress. This wastes CPU cycles and quickly leads to softlockup
> > reports and unusable system.
> >
> > Instead of running several inode_switch_wbs_work_fn() instances in
> > parallel switching to the same wb and contending on wb->list_lock, run
> > just one work item per wb and manage a queue of isw items switching to
> > this wb.
> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
>
> Generally looks great to me, but
>
> > +void inode_switch_wbs_work_fn(struct work_struct *work)
> > +{
> > + struct bdi_writeback *new_wb = container_of(work, struct bdi_writeback,
> > + switch_work);
> > + struct inode_switch_wbs_context *isw, *next_isw;
> > + struct llist_node *list;
> > +
> > + /*
> > + * Grab out reference to wb so that it cannot get freed under us
> > + * after we process all the isw items.
> > + */
> > + wb_get(new_wb);
>
> Shouldn't this ref put at the end of the function?
It is put:
> > + while (1) {
> > + list = llist_del_all(&new_wb->switch_wbs_ctxs);
> > + /* Nothing to do? */
> > + if (!list) {
> > + wb_put(new_wb);
^^^^ here
There's no other way how to exit the function... But I can put 'break' here
and do wb_put() at the end of the function. That will likely be less
subtle.
Honza
> > + return;
> > + }
> > + /*
> > + * In addition to synchronizing among switchers, I_WB_SWITCH
> > + * tells the RCU protected stat update paths to grab the i_page
> > + * lock so that stat transfer can synchronize against them.
> > + * Let's continue after I_WB_SWITCH is guaranteed to be
> > + * visible.
> > + */
> > + synchronize_rcu();
> > +
> > + llist_for_each_entry_safe(isw, next_isw, list, list)
> > + process_inode_switch_wbs_work(new_wb, isw);
> > + }
> > +}
>
> Thanks.
>
> --
> tejun
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/4] writeback: Avoid contention on wb->list_lock when switching inodes
2025-09-12 16:39 ` Jan Kara
@ 2025-09-12 16:52 ` Tejun Heo
0 siblings, 0 replies; 12+ messages in thread
From: Tejun Heo @ 2025-09-12 16:52 UTC (permalink / raw)
To: Jan Kara; +Cc: Christian Brauner, linux-fsdevel
On Fri, Sep 12, 2025 at 06:39:28PM +0200, Jan Kara wrote:
> > Shouldn't this ref put at the end of the function?
>
> It is put:
>
> > > + while (1) {
> > > + list = llist_del_all(&new_wb->switch_wbs_ctxs);
> > > + /* Nothing to do? */
> > > + if (!list) {
> > > + wb_put(new_wb);
> ^^^^ here
> There's no other way how to exit the function... But I can put 'break' here
> and do wb_put() at the end of the function. That will likely be less
> subtle.
Ah, sorry about missing that. Yeah, maybe better to put it ouside the loop
to make it clearer.
Acked-by: Tejun Heo <tj@kernel.org>
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 0/4] writeback: Avoid lockups when switching inodes
2025-09-12 10:38 [PATCH v2 0/4] writeback: Avoid lockups when switching inodes Jan Kara
` (3 preceding siblings ...)
2025-09-12 10:38 ` [PATCH v2 4/4] writeback: Add tracepoint to track pending inode switches Jan Kara
@ 2025-09-15 12:50 ` Christian Brauner
2025-09-15 15:13 ` Jan Kara
4 siblings, 1 reply; 12+ messages in thread
From: Christian Brauner @ 2025-09-15 12:50 UTC (permalink / raw)
To: Jan Kara; +Cc: Christian Brauner, linux-fsdevel, Tejun Heo
On Fri, 12 Sep 2025 12:38:34 +0200, Jan Kara wrote:
> This patch series addresses lockups reported by users when systemd unit reading
> lots of files from a filesystem mounted with lazytime mount option exits. See
> patch 3 for more details about the reproducer.
>
> There are two main issues why switching many inodes between wbs:
>
> 1) Multiple workers will be spawned to do the switching but they all contend
> on the same wb->list_lock making all the parallelism pointless and just
> wasting time.
>
> [...]
Applied to the vfs-6.18.writeback branch of the vfs/vfs.git tree.
Patches in the vfs-6.18.writeback branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.18.writeback
[1/4] writeback: Avoid contention on wb->list_lock when switching inodes
https://git.kernel.org/vfs/vfs/c/67c312b4e9bf
[2/4] writeback: Avoid softlockup when switching many inodes
https://git.kernel.org/vfs/vfs/c/a29997d9fe7e
[3/4] writeback: Avoid excessively long inode switching times
https://git.kernel.org/vfs/vfs/c/897113876f46
[4/4] writeback: Add tracepoint to track pending inode switches
https://git.kernel.org/vfs/vfs/c/dd5f65bc09d4
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 0/4] writeback: Avoid lockups when switching inodes
2025-09-15 12:50 ` [PATCH v2 0/4] writeback: Avoid lockups when switching inodes Christian Brauner
@ 2025-09-15 15:13 ` Jan Kara
2025-09-19 11:09 ` Christian Brauner
0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2025-09-15 15:13 UTC (permalink / raw)
To: Christian Brauner; +Cc: Jan Kara, linux-fsdevel, Tejun Heo
[-- Attachment #1: Type: text/plain, Size: 1407 bytes --]
On Mon 15-09-25 14:50:46, Christian Brauner wrote:
> On Fri, 12 Sep 2025 12:38:34 +0200, Jan Kara wrote:
> > This patch series addresses lockups reported by users when systemd unit reading
> > lots of files from a filesystem mounted with lazytime mount option exits. See
> > patch 3 for more details about the reproducer.
> >
> > There are two main issues why switching many inodes between wbs:
> >
> > 1) Multiple workers will be spawned to do the switching but they all contend
> > on the same wb->list_lock making all the parallelism pointless and just
> > wasting time.
> >
> > [...]
>
> Applied to the vfs-6.18.writeback branch of the vfs/vfs.git tree.
> Patches in the vfs-6.18.writeback branch should appear in linux-next soon.
>
> Please report any outstanding bugs that were missed during review in a
> new review to the original patch series allowing us to drop it.
>
> It's encouraged to provide Acked-bys and Reviewed-bys even though the
> patch has now been applied. If possible patch trailers will be updated.
>
> Note that commit hashes shown below are subject to change due to rebase,
> trailer updates or similar. If in doubt, please check the listed branch.
Thanks Christian! I'm attaching a new version of the patch 1/4 which
addresses Tejun's minor comments. It would be nice if you can replace it in
your tree. Thanks.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
[-- Attachment #2: 0001-writeback-Avoid-contention-on-wb-list_lock-when-swit.patch --]
[-- Type: text/x-patch, Size: 10024 bytes --]
From c88933809024ebc8ad164aebe02237186614a25f Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Wed, 9 Apr 2025 17:12:59 +0200
Subject: [PATCH] writeback: Avoid contention on wb->list_lock when switching
inodes
There can be multiple inode switch works that are trying to switch
inodes to / from the same wb. This can happen in particular if some
cgroup exits which owns many (thousands) inodes and we need to switch
them all. In this case several inode_switch_wbs_work_fn() instances will
be just spinning on the same wb->list_lock while only one of them makes
forward progress. This wastes CPU cycles and quickly leads to softlockup
reports and unusable system.
Instead of running several inode_switch_wbs_work_fn() instances in
parallel switching to the same wb and contending on wb->list_lock, run
just one work item per wb and manage a queue of isw items switching to
this wb.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/fs-writeback.c | 99 ++++++++++++++++++++------------
include/linux/backing-dev-defs.h | 4 ++
include/linux/writeback.h | 2 +
mm/backing-dev.c | 5 ++
4 files changed, 74 insertions(+), 36 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index a07b8cf73ae2..e87612a40cb0 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -368,7 +368,8 @@ static struct bdi_writeback *inode_to_wb_and_lock_list(struct inode *inode)
}
struct inode_switch_wbs_context {
- struct rcu_work work;
+ /* List of queued switching contexts for the wb */
+ struct llist_node list;
/*
* Multiple inodes can be switched at once. The switching procedure
@@ -378,7 +379,6 @@ struct inode_switch_wbs_context {
* array embedded into struct inode_switch_wbs_context. Otherwise
* an inode could be left in a non-consistent state.
*/
- struct bdi_writeback *new_wb;
struct inode *inodes[];
};
@@ -486,13 +486,11 @@ static bool inode_do_switch_wbs(struct inode *inode,
return switched;
}
-static void inode_switch_wbs_work_fn(struct work_struct *work)
+static void process_inode_switch_wbs(struct bdi_writeback *new_wb,
+ struct inode_switch_wbs_context *isw)
{
- struct inode_switch_wbs_context *isw =
- container_of(to_rcu_work(work), struct inode_switch_wbs_context, work);
struct backing_dev_info *bdi = inode_to_bdi(isw->inodes[0]);
struct bdi_writeback *old_wb = isw->inodes[0]->i_wb;
- struct bdi_writeback *new_wb = isw->new_wb;
unsigned long nr_switched = 0;
struct inode **inodep;
@@ -543,6 +541,38 @@ static void inode_switch_wbs_work_fn(struct work_struct *work)
atomic_dec(&isw_nr_in_flight);
}
+void inode_switch_wbs_work_fn(struct work_struct *work)
+{
+ struct bdi_writeback *new_wb = container_of(work, struct bdi_writeback,
+ switch_work);
+ struct inode_switch_wbs_context *isw, *next_isw;
+ struct llist_node *list;
+
+ /*
+ * Grab out reference to wb so that it cannot get freed under us
+ * after we process all the isw items.
+ */
+ wb_get(new_wb);
+ while (1) {
+ list = llist_del_all(&new_wb->switch_wbs_ctxs);
+ /* Nothing to do? */
+ if (!list)
+ break;
+ /*
+ * In addition to synchronizing among switchers, I_WB_SWITCH
+ * tells the RCU protected stat update paths to grab the i_page
+ * lock so that stat transfer can synchronize against them.
+ * Let's continue after I_WB_SWITCH is guaranteed to be
+ * visible.
+ */
+ synchronize_rcu();
+
+ llist_for_each_entry_safe(isw, next_isw, list, list)
+ process_inode_switch_wbs(new_wb, isw);
+ }
+ wb_put(new_wb);
+}
+
static bool inode_prepare_wbs_switch(struct inode *inode,
struct bdi_writeback *new_wb)
{
@@ -572,6 +602,13 @@ static bool inode_prepare_wbs_switch(struct inode *inode,
return true;
}
+static void wb_queue_isw(struct bdi_writeback *wb,
+ struct inode_switch_wbs_context *isw)
+{
+ if (llist_add(&isw->list, &wb->switch_wbs_ctxs))
+ queue_work(isw_wq, &wb->switch_work);
+}
+
/**
* inode_switch_wbs - change the wb association of an inode
* @inode: target inode
@@ -585,6 +622,7 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
struct backing_dev_info *bdi = inode_to_bdi(inode);
struct cgroup_subsys_state *memcg_css;
struct inode_switch_wbs_context *isw;
+ struct bdi_writeback *new_wb = NULL;
/* noop if seems to be already in progress */
if (inode->i_state & I_WB_SWITCH)
@@ -609,40 +647,34 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
if (!memcg_css)
goto out_free;
- isw->new_wb = wb_get_create(bdi, memcg_css, GFP_ATOMIC);
+ new_wb = wb_get_create(bdi, memcg_css, GFP_ATOMIC);
css_put(memcg_css);
- if (!isw->new_wb)
+ if (!new_wb)
goto out_free;
- if (!inode_prepare_wbs_switch(inode, isw->new_wb))
+ if (!inode_prepare_wbs_switch(inode, new_wb))
goto out_free;
isw->inodes[0] = inode;
- /*
- * In addition to synchronizing among switchers, I_WB_SWITCH tells
- * the RCU protected stat update paths to grab the i_page
- * lock so that stat transfer can synchronize against them.
- * Let's continue after I_WB_SWITCH is guaranteed to be visible.
- */
- INIT_RCU_WORK(&isw->work, inode_switch_wbs_work_fn);
- queue_rcu_work(isw_wq, &isw->work);
+ wb_queue_isw(new_wb, isw);
return;
out_free:
atomic_dec(&isw_nr_in_flight);
- if (isw->new_wb)
- wb_put(isw->new_wb);
+ if (new_wb)
+ wb_put(new_wb);
kfree(isw);
}
-static bool isw_prepare_wbs_switch(struct inode_switch_wbs_context *isw,
+static bool isw_prepare_wbs_switch(struct bdi_writeback *new_wb,
+ struct inode_switch_wbs_context *isw,
struct list_head *list, int *nr)
{
struct inode *inode;
list_for_each_entry(inode, list, i_io_list) {
- if (!inode_prepare_wbs_switch(inode, isw->new_wb))
+ if (!inode_prepare_wbs_switch(inode, new_wb))
continue;
isw->inodes[*nr] = inode;
@@ -666,6 +698,7 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb)
{
struct cgroup_subsys_state *memcg_css;
struct inode_switch_wbs_context *isw;
+ struct bdi_writeback *new_wb;
int nr;
bool restart = false;
@@ -678,12 +711,12 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb)
for (memcg_css = wb->memcg_css->parent; memcg_css;
memcg_css = memcg_css->parent) {
- isw->new_wb = wb_get_create(wb->bdi, memcg_css, GFP_KERNEL);
- if (isw->new_wb)
+ new_wb = wb_get_create(wb->bdi, memcg_css, GFP_KERNEL);
+ if (new_wb)
break;
}
- if (unlikely(!isw->new_wb))
- isw->new_wb = &wb->bdi->wb; /* wb_get() is noop for bdi's wb */
+ if (unlikely(!new_wb))
+ new_wb = &wb->bdi->wb; /* wb_get() is noop for bdi's wb */
nr = 0;
spin_lock(&wb->list_lock);
@@ -695,27 +728,21 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb)
* bandwidth restrictions, as writeback of inode metadata is not
* accounted for.
*/
- restart = isw_prepare_wbs_switch(isw, &wb->b_attached, &nr);
+ restart = isw_prepare_wbs_switch(new_wb, isw, &wb->b_attached, &nr);
if (!restart)
- restart = isw_prepare_wbs_switch(isw, &wb->b_dirty_time, &nr);
+ restart = isw_prepare_wbs_switch(new_wb, isw, &wb->b_dirty_time,
+ &nr);
spin_unlock(&wb->list_lock);
/* no attached inodes? bail out */
if (nr == 0) {
atomic_dec(&isw_nr_in_flight);
- wb_put(isw->new_wb);
+ wb_put(new_wb);
kfree(isw);
return restart;
}
- /*
- * In addition to synchronizing among switchers, I_WB_SWITCH tells
- * the RCU protected stat update paths to grab the i_page
- * lock so that stat transfer can synchronize against them.
- * Let's continue after I_WB_SWITCH is guaranteed to be visible.
- */
- INIT_RCU_WORK(&isw->work, inode_switch_wbs_work_fn);
- queue_rcu_work(isw_wq, &isw->work);
+ wb_queue_isw(new_wb, isw);
return restart;
}
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 2ad261082bba..c5c9d89c73ed 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -152,6 +152,10 @@ struct bdi_writeback {
struct list_head blkcg_node; /* anchored at blkcg->cgwb_list */
struct list_head b_attached; /* attached inodes, protected by list_lock */
struct list_head offline_node; /* anchored at offline_cgwbs */
+ struct work_struct switch_work; /* work used to perform inode switching
+ * to this wb */
+ struct llist_head switch_wbs_ctxs; /* queued contexts for
+ * writeback switching */
union {
struct work_struct release_work;
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index a2848d731a46..15a4bc4ab819 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -265,6 +265,8 @@ static inline void wbc_init_bio(struct writeback_control *wbc, struct bio *bio)
bio_associate_blkg_from_css(bio, wbc->wb->blkcg_css);
}
+void inode_switch_wbs_work_fn(struct work_struct *work);
+
#else /* CONFIG_CGROUP_WRITEBACK */
static inline void inode_attach_wb(struct inode *inode, struct folio *folio)
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 783904d8c5ef..0beaca6bacf7 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -633,6 +633,7 @@ static void cgwb_release_workfn(struct work_struct *work)
wb_exit(wb);
bdi_put(bdi);
WARN_ON_ONCE(!list_empty(&wb->b_attached));
+ WARN_ON_ONCE(work_pending(&wb->switch_work));
call_rcu(&wb->rcu, cgwb_free_rcu);
}
@@ -709,6 +710,8 @@ static int cgwb_create(struct backing_dev_info *bdi,
wb->memcg_css = memcg_css;
wb->blkcg_css = blkcg_css;
INIT_LIST_HEAD(&wb->b_attached);
+ INIT_WORK(&wb->switch_work, inode_switch_wbs_work_fn);
+ init_llist_head(&wb->switch_wbs_ctxs);
INIT_WORK(&wb->release_work, cgwb_release_workfn);
set_bit(WB_registered, &wb->state);
bdi_get(bdi);
@@ -839,6 +842,8 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi)
if (!ret) {
bdi->wb.memcg_css = &root_mem_cgroup->css;
bdi->wb.blkcg_css = blkcg_root_css;
+ INIT_WORK(&bdi->wb.switch_work, inode_switch_wbs_work_fn);
+ init_llist_head(&bdi->wb.switch_wbs_ctxs);
}
return ret;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2 0/4] writeback: Avoid lockups when switching inodes
2025-09-15 15:13 ` Jan Kara
@ 2025-09-19 11:09 ` Christian Brauner
0 siblings, 0 replies; 12+ messages in thread
From: Christian Brauner @ 2025-09-19 11:09 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-fsdevel, Tejun Heo
On Mon, Sep 15, 2025 at 05:13:17PM +0200, Jan Kara wrote:
> On Mon 15-09-25 14:50:46, Christian Brauner wrote:
> > On Fri, 12 Sep 2025 12:38:34 +0200, Jan Kara wrote:
> > > This patch series addresses lockups reported by users when systemd unit reading
> > > lots of files from a filesystem mounted with lazytime mount option exits. See
> > > patch 3 for more details about the reproducer.
> > >
> > > There are two main issues why switching many inodes between wbs:
> > >
> > > 1) Multiple workers will be spawned to do the switching but they all contend
> > > on the same wb->list_lock making all the parallelism pointless and just
> > > wasting time.
> > >
> > > [...]
> >
> > Applied to the vfs-6.18.writeback branch of the vfs/vfs.git tree.
> > Patches in the vfs-6.18.writeback branch should appear in linux-next soon.
> >
> > Please report any outstanding bugs that were missed during review in a
> > new review to the original patch series allowing us to drop it.
> >
> > It's encouraged to provide Acked-bys and Reviewed-bys even though the
> > patch has now been applied. If possible patch trailers will be updated.
> >
> > Note that commit hashes shown below are subject to change due to rebase,
> > trailer updates or similar. If in doubt, please check the listed branch.
>
> Thanks Christian! I'm attaching a new version of the patch 1/4 which
> addresses Tejun's minor comments. It would be nice if you can replace it in
> your tree. Thanks.
Absolutely! No problem.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-09-19 11:09 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-12 10:38 [PATCH v2 0/4] writeback: Avoid lockups when switching inodes Jan Kara
2025-09-12 10:38 ` [PATCH v2 1/4] writeback: Avoid contention on wb->list_lock " Jan Kara
2025-09-12 16:15 ` Tejun Heo
2025-09-12 16:39 ` Jan Kara
2025-09-12 16:52 ` Tejun Heo
2025-09-12 16:20 ` Tejun Heo
2025-09-12 10:38 ` [PATCH v2 2/4] writeback: Avoid softlockup when switching many inodes Jan Kara
2025-09-12 10:38 ` [PATCH v2 3/4] writeback: Avoid excessively long inode switching times Jan Kara
2025-09-12 10:38 ` [PATCH v2 4/4] writeback: Add tracepoint to track pending inode switches Jan Kara
2025-09-15 12:50 ` [PATCH v2 0/4] writeback: Avoid lockups when switching inodes Christian Brauner
2025-09-15 15:13 ` Jan Kara
2025-09-19 11:09 ` Christian Brauner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).