From: Kemeng Shi <shikemeng@huaweicloud.com>
To: Brian Foster <bfoster@redhat.com>
Cc: akpm@linux-foundation.org, willy@infradead.org, jack@suse.cz,
tj@kernel.org, dsterba@suse.com, mjguzik@gmail.com,
dhowells@redhat.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v2 3/6] writeback: support retrieving per group debug writeback stats of bdi
Date: Wed, 3 Apr 2024 16:49:42 +0800 [thread overview]
Message-ID: <e3816f9c-0f29-a0e4-8ad8-a6acf82a06ad@huaweicloud.com> (raw)
In-Reply-To: <Zga937dR5UgtSVaz@bfoster>
on 3/29/2024 9:10 PM, Brian Foster wrote:
> On Wed, Mar 27, 2024 at 11:57:48PM +0800, Kemeng Shi wrote:
>> Add /sys/kernel/debug/bdi/xxx/wb_stats to show per group writeback stats
>> of bdi.
>>
>
> Hi Kemeng,
Hello Brian,
>
> Just a few random thoughts/comments..
>
>> Following domain hierarchy is tested:
>> global domain (320G)
>> / \
>> cgroup domain1(10G) cgroup domain2(10G)
>> | |
>> bdi wb1 wb2
>>
>> /* per wb writeback info of bdi is collected */
>> cat /sys/kernel/debug/bdi/252:16/wb_stats
>> WbCgIno: 1
>> WbWriteback: 0 kB
>> WbReclaimable: 0 kB
>> WbDirtyThresh: 0 kB
>> WbDirtied: 0 kB
>> WbWritten: 0 kB
>> WbWriteBandwidth: 102400 kBps
>> b_dirty: 0
>> b_io: 0
>> b_more_io: 0
>> b_dirty_time: 0
>> state: 1
>
> Maybe some whitespace or something between entries would improve
> readability?
Sure, I will add a whitespace in next version.
>
>> WbCgIno: 4094
>> WbWriteback: 54432 kB
>> WbReclaimable: 766080 kB
>> WbDirtyThresh: 3094760 kB
>> WbDirtied: 1656480 kB
>> WbWritten: 837088 kB
>> WbWriteBandwidth: 132772 kBps
>> b_dirty: 1
>> b_io: 1
>> b_more_io: 0
>> b_dirty_time: 0
>> state: 7
>> WbCgIno: 4135
>> WbWriteback: 15232 kB
>> WbReclaimable: 786688 kB
>> WbDirtyThresh: 2909984 kB
>> WbDirtied: 1482656 kB
>> WbWritten: 681408 kB
>> WbWriteBandwidth: 124848 kBps
>> b_dirty: 0
>> b_io: 1
>> b_more_io: 0
>> b_dirty_time: 0
>> state: 7
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>> include/linux/writeback.h | 1 +
>> mm/backing-dev.c | 88 +++++++++++++++++++++++++++++++++++++++
>> mm/page-writeback.c | 19 +++++++++
>> 3 files changed, 108 insertions(+)
>>
> ...
>> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
>> index 8daf950e6855..e3953db7d88d 100644
>> --- a/mm/backing-dev.c
>> +++ b/mm/backing-dev.c
>> @@ -103,6 +103,91 @@ static void collect_wb_stats(struct wb_stats *stats,
>> }
>>
>> #ifdef CONFIG_CGROUP_WRITEBACK
> ...
>> +static int cgwb_debug_stats_show(struct seq_file *m, void *v)
>> +{
>> + struct backing_dev_info *bdi;
>> + unsigned long background_thresh;
>> + unsigned long dirty_thresh;
>> + struct bdi_writeback *wb;
>> + struct wb_stats stats;
>> +
>> + rcu_read_lock();
>> + bdi = lookup_bdi(m);
>> + if (!bdi) {
>> + rcu_read_unlock();
>> + return -EEXIST;
>> + }
>> +
>> + global_dirty_limits(&background_thresh, &dirty_thresh);
>> +
>> + list_for_each_entry_rcu(wb, &bdi->wb_list, bdi_node) {
>> + memset(&stats, 0, sizeof(stats));
>> + stats.dirty_thresh = dirty_thresh;
>
> If you did something like the following here, wouldn't that also zero
> the rest of the structure?
>
> struct wb_stats stats = { .dirty_thresh = dirty_thresh };
>
Suer, will do it in next version.
>> + collect_wb_stats(&stats, wb);
>> +
>
> Also, similar question as before on whether you'd want to check
> WB_registered or something here..
Still prefer to keep full debug info and user could filter out on
demand.
>
>> + if (mem_cgroup_wb_domain(wb) == NULL) {
>> + wb_stats_show(m, wb, &stats);
>> + continue;
>> + }
>
> Can you explain what this logic is about? Is the cgwb_calc_thresh()
> thing not needed in this case? A comment might help for those less
> familiar with the implementation details.
If mem_cgroup_wb_domain(wb) is NULL, then it's bdi->wb, otherwise,
it's wb in cgroup. For bdi->wb, there is no need to do wb_tryget
and cgwb_calc_thresh. Will add some comment in next version.
>
> BTW, I'm also wondering if something like the following is correct
> and/or roughly equivalent:
>
> list_for_each_*(wb, ...) {
> struct wb_stats stats = ...;
>
> if (!wb_tryget(wb))
> continue;
>
> collect_wb_stats(&stats, wb);
>
> /*
> * Extra wb_thresh magic. Drop rcu lock because ... . We
> * can do so here because we have a ref.
> */
> if (mem_cgroup_wb_domain(wb)) {
> rcu_read_unlock();
> stats.wb_thresh = min(stats.wb_thresh, cgwb_calc_thresh(wb));
> rcu_read_lock();
> }
>
> wb_stats_show(m, wb, &stats)
> wb_put(wb);
> }
It's correct as wb_tryget to bdi->wb has no harm. I have considered
to do it in this way, I change my mind to do it in new way for
two reason:
1. Put code handling wb in cgroup more tight which could be easier
to maintain.
2. Rmove extra wb_tryget/wb_put for wb in bdi.
Would this make sense to you?
>
>> +
>> + /*
>> + * cgwb_release will destroy wb->memcg_completions which
>> + * will be ued in cgwb_calc_thresh. Use wb_tryget to prevent
>> + * memcg_completions destruction from cgwb_release.
>> + */
>> + if (!wb_tryget(wb))
>> + continue;
>> +
>> + rcu_read_unlock();
>> + /* cgwb_calc_thresh may sleep in cgroup_rstat_flush */
>> + stats.wb_thresh = min(stats.wb_thresh, cgwb_calc_thresh(wb));
>> + wb_stats_show(m, wb, &stats);
>> + rcu_read_lock();
>> + wb_put(wb);
>> + }
>> + rcu_read_unlock();
>> +
>> + return 0;
>> +}
>> +DEFINE_SHOW_ATTRIBUTE(cgwb_debug_stats);
>> +
>> +static void cgwb_debug_register(struct backing_dev_info *bdi)
>> +{
>> + debugfs_create_file("wb_stats", 0444, bdi->debug_dir, bdi,
>> + &cgwb_debug_stats_fops);
>> +}
>> +
>> static void bdi_collect_stats(struct backing_dev_info *bdi,
>> struct wb_stats *stats)
>> {
>> @@ -117,6 +202,8 @@ static void bdi_collect_stats(struct backing_dev_info *bdi,
>> {
>> collect_wb_stats(stats, &bdi->wb);
>> }
>> +
>> +static inline void cgwb_debug_register(struct backing_dev_info *bdi) { }
>
> Could we just create the wb_stats file regardless of whether cgwb is
> enabled? Obviously theres only one wb in the !CGWB case and it's
> somewhat duplicative with the bdi stats file, but that seems harmless if
> the same code can be reused..? Maybe there's also a small argument for
> dropping the state info from the bdi stats file and moving it to
> wb_stats.In backing-dev.c, there are a lot "#ifdef CGWB .. #else .. #endif" to
avoid unneed extra cost when CGWB is not enabled.
I think it's better to avoid extra cost from wb_stats when CGWB is not
enabled. For now, we only save cpu cost to create and destroy wb_stats
and save memory cost to record debugfs file, we could save more in
future when wb_stats records more debug info.
Move state info from bdi stats to wb_stats make senses to me. The only
concern would be compatibility problem. I will add a new patch to this
to make this more noticeable and easier to revert.
Thanks a lot for review!
Kemeng
>
> Brian
>
>> #endif
>>
>> static int bdi_debug_stats_show(struct seq_file *m, void *v)
>> @@ -182,6 +269,7 @@ static void bdi_debug_register(struct backing_dev_info *bdi, const char *name)
>>
>> debugfs_create_file("stats", 0444, bdi->debug_dir, bdi,
>> &bdi_debug_stats_fops);
>> + cgwb_debug_register(bdi);
>> }
>>
>> static void bdi_debug_unregister(struct backing_dev_info *bdi)
>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>> index 0e20467367fe..3724c7525316 100644
>> --- a/mm/page-writeback.c
>> +++ b/mm/page-writeback.c
>> @@ -893,6 +893,25 @@ unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh)
>> return __wb_calc_thresh(&gdtc, thresh);
>> }
>>
>> +unsigned long cgwb_calc_thresh(struct bdi_writeback *wb)
>> +{
>> + struct dirty_throttle_control gdtc = { GDTC_INIT_NO_WB };
>> + struct dirty_throttle_control mdtc = { MDTC_INIT(wb, &gdtc) };
>> + unsigned long filepages, headroom, writeback;
>> +
>> + gdtc.avail = global_dirtyable_memory();
>> + gdtc.dirty = global_node_page_state(NR_FILE_DIRTY) +
>> + global_node_page_state(NR_WRITEBACK);
>> +
>> + mem_cgroup_wb_stats(wb, &filepages, &headroom,
>> + &mdtc.dirty, &writeback);
>> + mdtc.dirty += writeback;
>> + mdtc_calc_avail(&mdtc, filepages, headroom);
>> + domain_dirty_limits(&mdtc);
>> +
>> + return __wb_calc_thresh(&mdtc, mdtc.thresh);
>> +}
>> +
>> /*
>> * setpoint - dirty 3
>> * f(dirty) := 1.0 + (----------------)
>> --
>> 2.30.0
>>
>
>
next prev parent reply other threads:[~2024-04-03 8:49 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-27 15:57 [PATCH v2 0/6] Improve visibility of writeback Kemeng Shi
2024-03-27 15:57 ` [PATCH v2 1/6] writeback: protect race between bdi release and bdi_debug_stats_show Kemeng Shi
2024-03-28 17:53 ` Brian Foster
2024-04-03 2:16 ` Kemeng Shi
2024-03-27 15:57 ` [PATCH v2 2/6] writeback: collect stats of all wb of bdi in bdi_debug_stats_show Kemeng Shi
2024-03-29 13:04 ` Brian Foster
2024-04-03 7:49 ` Kemeng Shi
2024-03-27 15:57 ` [PATCH v2 3/6] writeback: support retrieving per group debug writeback stats of bdi Kemeng Shi
2024-03-29 13:10 ` Brian Foster
2024-04-03 8:49 ` Kemeng Shi [this message]
2024-04-03 15:04 ` Brian Foster
2024-04-04 9:07 ` Jan Kara
2024-04-07 3:13 ` Kemeng Shi
2024-04-07 2:48 ` Kemeng Shi
2024-03-27 15:57 ` [PATCH v2 4/6] writeback: add wb_monitor.py script to monitor writeback info on bdi Kemeng Shi
2024-03-27 15:57 ` [PATCH v2 5/6] writeback: rename nr_reclaimable to nr_dirty in balance_dirty_pages Kemeng Shi
2024-03-27 15:57 ` [PATCH v2 6/6] writeback: define GDTC_INIT_NO_WB to null Kemeng Shi
2024-03-27 17:40 ` [PATCH v2 0/6] Improve visibility of writeback Andrew Morton
2024-03-28 1:59 ` Kemeng Shi
2024-03-28 8:23 ` Kemeng Shi
2024-03-28 19:15 ` Kent Overstreet
2024-03-28 19:23 ` Andrew Morton
2024-03-28 19:36 ` Kent Overstreet
2024-03-28 19:24 ` Kent Overstreet
2024-03-28 19:31 ` Tejun Heo
2024-03-28 19:40 ` Kent Overstreet
2024-03-28 19:46 ` Tejun Heo
2024-03-28 19:55 ` Kent Overstreet
2024-03-28 20:13 ` Tejun Heo
2024-03-28 20:22 ` Kent Overstreet
2024-03-28 20:46 ` Tejun Heo
2024-03-28 20:53 ` Kent Overstreet
2024-04-03 16:27 ` Jan Kara
2024-04-03 18:44 ` Tejun Heo
2024-04-03 19:06 ` Kent Overstreet
2024-04-03 19:21 ` Tejun Heo
2024-04-03 22:24 ` Kent Overstreet
2024-04-03 6:56 ` Kemeng Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e3816f9c-0f29-a0e4-8ad8-a6acf82a06ad@huaweicloud.com \
--to=shikemeng@huaweicloud.com \
--cc=akpm@linux-foundation.org \
--cc=bfoster@redhat.com \
--cc=dhowells@redhat.com \
--cc=dsterba@suse.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mjguzik@gmail.com \
--cc=tj@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).