* [RFC 0/1] writeback: add sysfs to config the number of writeback contexts
@ 2025-08-25 12:29 ` wangyufei
2025-08-25 12:29 ` [RFC 1/1] " wangyufei
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: wangyufei @ 2025-08-25 12:29 UTC (permalink / raw)
To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Matthew Wilcox (Oracle),
Stephen Rothwell, wangyufei, open list,
open list:MEMORY MANAGEMENT - MISC, open list:PAGE CACHE
Cc: kundan.kumar, anuj20.g, hch, bernd, djwong, jack, linux-kernel,
linux-mm, linux-fsdevel, opensource.kernel
Hi everyone,
We've been interested in this patch about parallelizing writeback [1]
and have been following its discussion and development. Our testing in
several application scenarios on mobile devices has shown significant
performance improvements.
Currently, we're focusing on how the number of writeback contexts impacts
the performance on different filesystems and storage workloads. We noticed
the previous discussion about making the number of writeback contexts an
opt-in configuration to adapt to different filesystems [2]. Currently, it
can only be set via a sysfs interface at system initialization. We'd like
to discuss the possibility of supporting dynamic runtime configuration of
the number of writeback contexts.
We have developed a mechanism that allows the number of writeback contexts
to be configured at runtime via a sysfs interface. To configure, use:
echo <nr_wb_ctx> > /sys/class/bdi/<dev>/nwritebacks.
Our implementation supports *increasing* the number of writeback contexts.
This is achieved by dynamically allocating new writeback contexts and
replacing the existing bdi->wb_ctx_arr and bdi->nr_wb_ctx. But we have
not yet solved the problem of safely *reducing* the bdi->nr_wb_ctx.
Several challenges remain:
- How should we safely handle ongoing I/Os when contexts are removed?
- What is the correct way to migrate pending writeback tasks and related
resources to other writeback contexts?
- Should this be a per-device or global setting?
We're sharing this early implementation to gather feedback on:
1. Is runtime configurability of writeback contexts a worthwhile goal?
2. How should we handle synchronization and migration when dynamically
changing the bdi->nr_wb_ctx, particularly when removing the active
writeback contexts?
3. Any better tests to validate the stability of this approach?
We look forward to feedback and suggestions for further improvements.
[1] Parallelizing filesystem writeback :
https://lore.kernel.org/linux-fsdevel/20250529111504.89912-1-kundan.kumar@samsung.com/
[2] The discussion on configuration of the number of writeback contexts :
https://lore.kernel.org/linux-fsdevel/20250609040056.GA26101@lst.de/
wangyufei (1):
writeback: add sysfs to config the number of writeback contexts
include/linux/backing-dev.h | 3 ++
mm/backing-dev.c | 59 +++++++++++++++++++++++++++++++++++
mm/page-writeback.c | 61 +++++++++++++++++++++++++++++++++++++
3 files changed, 123 insertions(+)
--
2.39.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC 1/1] writeback: add sysfs to config the number of writeback contexts
2025-08-25 12:29 ` [RFC 0/1] writeback: add sysfs to config the number of writeback contexts wangyufei
@ 2025-08-25 12:29 ` wangyufei
2025-08-25 14:46 ` [RFC 0/1] " David Hildenbrand
2025-08-29 8:59 ` Kundan Kumar
2 siblings, 0 replies; 6+ messages in thread
From: wangyufei @ 2025-08-25 12:29 UTC (permalink / raw)
To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Matthew Wilcox (Oracle),
Stephen Rothwell, wangyufei, open list,
open list:MEMORY MANAGEMENT - MISC, open list:PAGE CACHE
Cc: kundan.kumar, anuj20.g, hch, bernd, djwong, jack, linux-kernel,
linux-mm, linux-fsdevel, opensource.kernel
The number of writeback contexts is set to the number of CPUs by
default. To test the impact of the number of writeback contexts
on the writeback performance of filesystems, we introduce a sysfs
interface 'nwritebacks' for adjusting bdi->wb_ctx_arr in runtime.
However, only increasing the bdi->wb_ctx_arr is supported; support
for reducing it is still under development.
Signed-off-by: wangyufei <wangyufei@vivo.com>
---
include/linux/backing-dev.h | 3 ++
mm/backing-dev.c | 59 +++++++++++++++++++++++++++++++++++
mm/page-writeback.c | 61 +++++++++++++++++++++++++++++++++++++
3 files changed, 123 insertions(+)
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 30a812fbd..c59578c25 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -112,6 +112,7 @@ int bdi_set_max_ratio_no_scale(struct backing_dev_info *bdi, unsigned int max_ra
int bdi_set_min_bytes(struct backing_dev_info *bdi, u64 min_bytes);
int bdi_set_max_bytes(struct backing_dev_info *bdi, u64 max_bytes);
int bdi_set_strict_limit(struct backing_dev_info *bdi, unsigned int strict_limit);
+int bdi_set_nwritebacks(struct backing_dev_info *bdi, int nwritebacks);
/*
* Flags in backing_dev_info::capability
@@ -128,6 +129,8 @@ int bdi_set_strict_limit(struct backing_dev_info *bdi, unsigned int strict_limit
extern struct backing_dev_info noop_backing_dev_info;
int bdi_init(struct backing_dev_info *bdi);
+int bdi_wb_ctx_init(struct backing_dev_info *bdi, struct bdi_writeback_ctx *bdi_wb_ctx);
+void bdi_wb_ctx_exit(struct backing_dev_info *bdi, struct bdi_writeback_ctx *bdi_wb_ctx);
/**
* writeback_in_progress - determine whether there is writeback in progress
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index a5b44dd79..44b24c1e4 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -469,6 +469,34 @@ static ssize_t strict_limit_show(struct device *dev,
}
static DEVICE_ATTR_RW(strict_limit);
+static ssize_t nwritebacks_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct backing_dev_info *bdi = dev_get_drvdata(dev);
+
+ return sysfs_emit(buf, "%d\n", bdi->nr_wb_ctx);
+}
+
+static ssize_t nwritebacks_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t count)
+{
+ struct backing_dev_info *bdi = dev_get_drvdata(dev);
+ int nr;
+ ssize_t ret;
+
+ ret = kstrtoint(buf, 10, &nr);
+ if (ret < 0)
+ return ret;
+
+ ret = bdi_set_nwritebacks(bdi, nr);
+ if (!ret)
+ ret = count;
+
+ return ret;
+}
+static DEVICE_ATTR_RW(nwritebacks);
+
static struct attribute *bdi_dev_attrs[] = {
&dev_attr_read_ahead_kb.attr,
&dev_attr_min_ratio.attr,
@@ -479,6 +507,7 @@ static struct attribute *bdi_dev_attrs[] = {
&dev_attr_max_bytes.attr,
&dev_attr_stable_pages_required.attr,
&dev_attr_strict_limit.attr,
+ &dev_attr_nwritebacks.attr,
NULL,
};
ATTRIBUTE_GROUPS(bdi_dev);
@@ -1004,6 +1033,22 @@ static int __init cgwb_init(void)
}
subsys_initcall(cgwb_init);
+int bdi_wb_ctx_init(struct backing_dev_info *bdi, struct bdi_writeback_ctx *bdi_wb_ctx)
+{
+ int ret;
+
+ INIT_RADIX_TREE(&bdi_wb_ctx->cgwb_tree, GFP_ATOMIC);
+ mutex_init(&bdi->cgwb_release_mutex);
+ init_rwsem(&bdi_wb_ctx->wb_switch_rwsem);
+
+ ret = wb_init(&bdi_wb_ctx->wb, bdi_wb_ctx, bdi, GFP_KERNEL);
+ if (!ret) {
+ bdi_wb_ctx->wb.memcg_css = &root_mem_cgroup->css;
+ bdi_wb_ctx->wb.blkcg_css = blkcg_root_css;
+ }
+ return ret;
+}
+
#else /* CONFIG_CGROUP_WRITEBACK */
static int cgwb_bdi_init(struct backing_dev_info *bdi)
@@ -1292,3 +1337,17 @@ const char *bdi_dev_name(struct backing_dev_info *bdi)
return bdi->dev_name;
}
EXPORT_SYMBOL_GPL(bdi_dev_name);
+
+int bdi_wb_ctx_init(struct backing_dev_info *bdi, struct bdi_writeback_ctx *bdi_wb_ctx)
+{
+ return wb_init(&bdi_wb_ctx->wb, bdi_wb_ctx, bdi, GFP_KERNEL);
+}
+
+void bdi_wb_ctx_exit(struct backing_dev_info *bdi, struct bdi_writeback_ctx *bdi_wb_ctx)
+{
+ wb_shutdown(&bdi_wb_ctx->wb);
+ cgwb_bdi_unregister(bdi, bdi_wb_ctx);
+
+ WARN_ON_ONCE(test_bit(WB_registered, &bdi_wb_ctx->wb.state));
+ wb_exit(&bdi_wb_ctx->wb);
+}
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 6f283a777..87c77004b 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -740,6 +740,59 @@ static int __bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned int max_ra
return ret;
}
+static int __bdi_set_wb_ctx(struct backing_dev_info *bdi, int nwritebacks)
+{
+ struct bdi_writeback_ctx **new_ctx_arr, **old_ctx_arr;
+ int i, ret;
+
+ new_ctx_arr = kcalloc(nwritebacks, sizeof(struct bdi_writeback_ctx *), GFP_KERNEL);
+ if (!new_ctx_arr)
+ return -ENOMEM;
+
+ for (i = 0; i < min(bdi->nr_wb_ctx, nwritebacks); i++)
+ new_ctx_arr[i] = bdi->wb_ctx_arr[i];
+
+ for (i = bdi->nr_wb_ctx; i < nwritebacks; i++) {
+ new_ctx_arr[i] = (struct bdi_writeback_ctx *)
+ kzalloc(sizeof(struct bdi_writeback_ctx), GFP_KERNEL);
+ if (!new_ctx_arr[i]) {
+ pr_err("Failed to allocate %d", i);
+ while (--i >= bdi->nr_wb_ctx)
+ kfree(new_ctx_arr[i]);
+ kfree(new_ctx_arr);
+ return -ENOMEM;
+ }
+ INIT_LIST_HEAD(&new_ctx_arr[i]->wb_list);
+ init_waitqueue_head(&new_ctx_arr[i]->wb_waitq);
+ }
+
+ for (i = bdi->nr_wb_ctx; i < nwritebacks; i++) {
+ ret = bdi_wb_ctx_init(bdi, new_ctx_arr[i]);
+ if (ret) {
+ while (--i >= bdi->nr_wb_ctx) {
+ bdi_wb_ctx_exit(bdi, new_ctx_arr[i]);
+ kfree(new_ctx_arr[i]);
+ }
+ kfree(new_ctx_arr);
+ return ret;
+ }
+ list_add_tail_rcu(&new_ctx_arr[i]->wb.bdi_node, &new_ctx_arr[i]->wb_list);
+ set_bit(WB_registered, &new_ctx_arr[i]->wb.state);
+ }
+
+ // Make sure the initialization is done before assignment
+ smp_wmb();
+
+ old_ctx_arr = bdi->wb_ctx_arr;
+ spin_lock_bh(&bdi_lock);
+ bdi->wb_ctx_arr = new_ctx_arr;
+ bdi->nr_wb_ctx = nwritebacks;
+ spin_unlock_bh(&bdi_lock);
+
+ kfree(old_ctx_arr);
+ return 0;
+}
+
int bdi_set_min_ratio_no_scale(struct backing_dev_info *bdi, unsigned int min_ratio)
{
return __bdi_set_min_ratio(bdi, min_ratio);
@@ -818,6 +871,14 @@ int bdi_set_strict_limit(struct backing_dev_info *bdi, unsigned int strict_limit
return 0;
}
+int bdi_set_nwritebacks(struct backing_dev_info *bdi, int nwritebacks)
+{
+ if (nwritebacks < bdi->nr_wb_ctx)
+ return -EINVAL;
+
+ return __bdi_set_wb_ctx(bdi, nwritebacks);
+}
+
static unsigned long dirty_freerun_ceiling(unsigned long thresh,
unsigned long bg_thresh)
{
--
2.39.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC 0/1] writeback: add sysfs to config the number of writeback contexts
2025-08-25 12:29 ` [RFC 0/1] writeback: add sysfs to config the number of writeback contexts wangyufei
2025-08-25 12:29 ` [RFC 1/1] " wangyufei
@ 2025-08-25 14:46 ` David Hildenbrand
2025-08-25 16:15 ` Matthew Wilcox
2025-08-29 8:59 ` Kundan Kumar
2 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2025-08-25 14:46 UTC (permalink / raw)
To: wangyufei, Andrew Morton, Lorenzo Stoakes, Liam R. Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Matthew Wilcox (Oracle), Stephen Rothwell, open list,
open list:MEMORY MANAGEMENT - MISC, open list:PAGE CACHE
Cc: kundan.kumar, anuj20.g, hch, bernd, djwong, jack,
opensource.kernel
On 25.08.25 14:29, wangyufei wrote:
> Hi everyone,
>
> We've been interested in this patch about parallelizing writeback [1]
> and have been following its discussion and development. Our testing in
> several application scenarios on mobile devices has shown significant
> performance improvements.
>
> Currently, we're focusing on how the number of writeback contexts impacts
> the performance on different filesystems and storage workloads. We noticed
> the previous discussion about making the number of writeback contexts an
> opt-in configuration to adapt to different filesystems [2]. Currently, it
> can only be set via a sysfs interface at system initialization. We'd like
> to discuss the possibility of supporting dynamic runtime configuration of
> the number of writeback contexts.
>
> We have developed a mechanism that allows the number of writeback contexts
> to be configured at runtime via a sysfs interface. To configure, use:
> echo <nr_wb_ctx> > /sys/class/bdi/<dev>/nwritebacks.
What's the target use case for updating it dynamically?
If it's mostly for debugging/testing (find out what works, what
doesn't), it might better go into debugfs or just carried out of tree.
If it's about setting sane default based on specific filesystems, maybe
it could be optimized from within the kernel, without the need to expose
this to an admin?
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC 0/1] writeback: add sysfs to config the number of writeback contexts
2025-08-25 14:46 ` [RFC 0/1] " David Hildenbrand
@ 2025-08-25 16:15 ` Matthew Wilcox
0 siblings, 0 replies; 6+ messages in thread
From: Matthew Wilcox @ 2025-08-25 16:15 UTC (permalink / raw)
To: David Hildenbrand
Cc: wangyufei, Andrew Morton, Lorenzo Stoakes, Liam R. Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Stephen Rothwell, open list, open list:MEMORY MANAGEMENT - MISC,
open list:PAGE CACHE, kundan.kumar, anuj20.g, hch, bernd, djwong,
jack, opensource.kernel
On Mon, Aug 25, 2025 at 04:46:46PM +0200, David Hildenbrand wrote:
> On 25.08.25 14:29, wangyufei wrote:
> > Hi everyone,
> >
> > We've been interested in this patch about parallelizing writeback [1]
> > and have been following its discussion and development. Our testing in
> > several application scenarios on mobile devices has shown significant
> > performance improvements.
> >
> > Currently, we're focusing on how the number of writeback contexts impacts
> > the performance on different filesystems and storage workloads. We noticed
> > the previous discussion about making the number of writeback contexts an
> > opt-in configuration to adapt to different filesystems [2]. Currently, it
> > can only be set via a sysfs interface at system initialization. We'd like
> > to discuss the possibility of supporting dynamic runtime configuration of
> > the number of writeback contexts.
> >
> > We have developed a mechanism that allows the number of writeback contexts
> > to be configured at runtime via a sysfs interface. To configure, use:
> > echo <nr_wb_ctx> > /sys/class/bdi/<dev>/nwritebacks.
>
> What's the target use case for updating it dynamically?
>
> If it's mostly for debugging/testing (find out what works, what doesn't), it
> might better go into debugfs or just carried out of tree.
>
> If it's about setting sane default based on specific filesystems, maybe it
> could be optimized from within the kernel, without the need to expose this
> to an admin?
I was assuming that this patch is for people who are experimenting to
gather data more effectively. I'd NAK it being included, but it's good
to have it out on the list so other people don't have to reinvent it.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC 0/1] writeback: add sysfs to config the number of writeback contexts
2025-08-25 12:29 ` [RFC 0/1] writeback: add sysfs to config the number of writeback contexts wangyufei
2025-08-25 12:29 ` [RFC 1/1] " wangyufei
2025-08-25 14:46 ` [RFC 0/1] " David Hildenbrand
@ 2025-08-29 8:59 ` Kundan Kumar
2025-09-02 11:19 ` wangyufei
2 siblings, 1 reply; 6+ messages in thread
From: Kundan Kumar @ 2025-08-29 8:59 UTC (permalink / raw)
To: wangyufei, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Matthew Wilcox (Oracle),
Stephen Rothwell, open list, open list:MEMORY MANAGEMENT - MISC,
open list:PAGE CACHE
Cc: anuj20.g, hch, bernd, djwong, jack, opensource.kernel
On 8/25/2025 5:59 PM, wangyufei wrote:
> Hi everyone,
>
> We've been interested in this patch about parallelizing writeback [1]
> and have been following its discussion and development. Our testing in
> several application scenarios on mobile devices has shown significant
> performance improvements.
>
Hi,
Thanks for sharing this work.
Could you clarify a few details about your test setup?
- Which filesystem did you run these experiments on?
- What were the specifics of the workload (number of threads, block size,
I/O size)?
- If you are using fio, can you please share the fio command.
- How much RAM was available on the test system?
- Can you share the performance improvement numbers you observed?
That would help in understanding the impact of parallel writeback?
I made similar modifications to dynamically configure the number of
writeback threads in this experimental patch. Refer to patches 14 and 15:
https://lore.kernel.org/all/20250807045706.2848-1-kundan.kumar@samsung.com/
The key difference is that this change also enables a reduction in the
number of writeback threads.
Thanks,
Kundan
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC 0/1] writeback: add sysfs to config the number of writeback contexts
2025-08-29 8:59 ` Kundan Kumar
@ 2025-09-02 11:19 ` wangyufei
0 siblings, 0 replies; 6+ messages in thread
From: wangyufei @ 2025-09-02 11:19 UTC (permalink / raw)
To: Kundan Kumar, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Matthew Wilcox (Oracle),
Stephen Rothwell, open list, open list:MEMORY MANAGEMENT - MISC,
open list:PAGE CACHE
Cc: anuj20.g, hch, bernd, djwong, jack, opensource.kernel
On 8/29/2025 4:59 PM, Kundan Kumar wrote:
> On 8/25/2025 5:59 PM, wangyufei wrote:
>> Hi everyone,
>>
>> We've been interested in this patch about parallelizing writeback [1]
>> and have been following its discussion and development. Our testing in
>> several application scenarios on mobile devices has shown significant
>> performance improvements.
>>
> Hi,
>
> Thanks for sharing this work.
>
> Could you clarify a few details about your test setup?
>
> - Which filesystem did you run these experiments on?
> - What were the specifics of the workload (number of threads, block size,
> I/O size)?
> - If you are using fio, can you please share the fio command.
> - How much RAM was available on the test system?
> - Can you share the performance improvement numbers you observed?
>
> That would help in understanding the impact of parallel writeback?
Hi Kundan,
Most of the time we tested this patch on mobile devices. The test
platform setup is shown as below:
- filesystem:F2FS
- system config:
Number of CPUs = 8
System RAM = 11G
- workload & fio:We used the same fio command as mentioned in your patch
fio command line:
fio --directory=/mnt --name=test --bs=4k --iodepth=1024 --rw=randwrite
--ioengine=io_uring --time_based=1 -runtime=60 --numjobs=8 --size=450M
--direct=0 --eta-interval=1 --eta-newline=1 --group_reporting
- Performance gains:
Base F2FS :973 MiB/s
Parallel Writeback F2FS :1237 MiB/s (+27%)
>
> I made similar modifications to dynamically configure the number of
> writeback threads in this experimental patch. Refer to patches 14 and 15:
> https://lore.kernel.org/all/20250807045706.2848-1-kundan.kumar@samsung.com/
> The key difference is that this change also enables a reduction in the
> number of writeback threads.
Thanks for sharing the patch. I have a few questions:
- The current approach freezes the filesystem and reallocates all
writeback_ctx structures. Could this introduce latency? In some cases, I
think the existing bdi_writeback_ctx structures could be reused instead.
- Are there other use cases for dynamic thread tuning besides
initialization and testing?
- What methods are used to test the stability of this function?
Finally, I would like to ask if there are any problems to be solved or
optimization directions worth discussing for the parallelizing
filesystem writeback?
Thanks,
yufei
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-09-02 11:19 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CGME20250825123009epcas5p1573496e2cad2f58d22036493e5af03be@epcas5p1.samsung.com>
2025-08-25 12:29 ` [RFC 0/1] writeback: add sysfs to config the number of writeback contexts wangyufei
2025-08-25 12:29 ` [RFC 1/1] " wangyufei
2025-08-25 14:46 ` [RFC 0/1] " David Hildenbrand
2025-08-25 16:15 ` Matthew Wilcox
2025-08-29 8:59 ` Kundan Kumar
2025-09-02 11:19 ` wangyufei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).