From: Brian Foster <bfoster@redhat.com>
To: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: akpm@linux-foundation.org, tj@kernel.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
willy@infradead.org, jack@suse.cz, dsterba@suse.com,
mjguzik@gmail.com, dhowells@redhat.com, peterz@infradead.org
Subject: Re: [PATCH 1/6] writeback: collect stats of all wb of bdi in bdi_debug_stats_show
Date: Thu, 21 Mar 2024 08:10:11 -0400 [thread overview]
Message-ID: <Zfwjo_ZQH_LFZ1Rc@bfoster> (raw)
In-Reply-To: <3d08c249-1b12-f82b-2662-a6fa2b888011@huaweicloud.com>
On Thu, Mar 21, 2024 at 11:44:40AM +0800, Kemeng Shi wrote:
>
>
> on 3/20/2024 9:21 PM, Brian Foster wrote:
> > On Wed, Mar 20, 2024 at 07:02:17PM +0800, Kemeng Shi wrote:
> >> /sys/kernel/debug/bdi/xxx/stats is supposed to show writeback information
> >> of whole bdi, but only writeback information of bdi in root cgroup is
> >> collected. So writeback information in non-root cgroup are missing now.
> >> To be more specific, considering following case:
> >>
> >> /* create writeback cgroup */
> >> cd /sys/fs/cgroup
> >> echo "+memory +io" > cgroup.subtree_control
> >> mkdir group1
> >> cd group1
> >> echo $$ > cgroup.procs
> >> /* do writeback in cgroup */
> >> fio -name test -filename=/dev/vdb ...
> >> /* get writeback info of bdi */
> >> cat /sys/kernel/debug/bdi/xxx/stats
> >> The cat result unexpectedly implies that there is no writeback on target
> >> bdi.
> >>
> >> Fix this by collecting stats of all wb in bdi instead of only wb in
> >> root cgroup.
> >>
> >> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> >> ---
> >> mm/backing-dev.c | 93 ++++++++++++++++++++++++++++++++++++------------
> >> 1 file changed, 70 insertions(+), 23 deletions(-)
> >>
> >> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> >> index 5f2be8c8df11..788702b6c5dd 100644
> >> --- a/mm/backing-dev.c
> >> +++ b/mm/backing-dev.c
> > ...
> >> @@ -46,31 +59,65 @@ static void bdi_debug_init(void)
> >> bdi_debug_root = debugfs_create_dir("bdi", NULL);
> >> }
> >>
> > ...
> >> +#ifdef CONFIG_CGROUP_WRITEBACK
> >> +static void bdi_collect_stats(struct backing_dev_info *bdi,
> >> + struct wb_stats *stats)
> >> +{
> >> + struct bdi_writeback *wb;
> >> +
> >> + /* protect wb from release */
> >> + mutex_lock(&bdi->cgwb_release_mutex);
> >> + list_for_each_entry(wb, &bdi->wb_list, bdi_node)
> >> + collect_wb_stats(stats, wb);
> >> + mutex_unlock(&bdi->cgwb_release_mutex);
> >> +}
> >> +#else
> >> +static void bdi_collect_stats(struct backing_dev_info *bdi,
> >> + struct wb_stats *stats)
> >> +{
> >> + collect_wb_stats(stats, &bdi->wb);
> >> +}
> >> +#endif
> >> +
> >
> > I'm not familiar enough with the cgwb code to say for sure (and I'd
> > probably wait for more high level feedback before worrying too much
> > about this), but do we need the ifdef here just to iterate ->wb_list?
> >>From looking at the code, it appears bdi->wb ends up on the list while
> > the bdi is registered for both cases, so that distinction seems
> > unnecessary. WRT to wb release protection, I wonder if this could use a
> Currently, we have ifdef trying to remove unnecessary cost when
> CONFIG_CGROUP_WRITEBACK is not enabled, see defination of cgwb_bdi_register
> and cgwb_remove_from_bdi_list for example. So I try to define bdi_collect_stats
> in similar way.
> > combination of rcu_read_lock()/list_for_each_safe() and wb_tryget() on
> > each wb before collecting its stats..? See how bdi_split_work_to_wbs()
> > works, for example.
> The combination of rcu_read_lock()/list_for_each_safe() and wb_tryget()
> should work fine.
> With ifdef, bdi_collect_stats takes no extra cost when CONFIG_CGROUP_WRITEBACK
> is not enabled and is consistent with existing code style, so I still prefer
> this way. Yes, The extra cost is not a big deal as it only exists in debug mode,
> so it's acceptable to use the suggested combination in next version if you are
> still strongly aganst this.
>
Ok. I also previously missed that there are two implementations of
bdi_split_work_to_wbs() based on CGROUP_WRITEBACK. It seems reasonable
enough to me to follow that precedent for the !CGROUP_WRITEBACK case.
It still seems to make more sense to me to walk the list similar to how
bdi_split_work_to_wbs() does for the CGROUP_WRITEBACK enabled case. Do
you agree?
Brian
> >
> > Also I see a patch conflict/compile error on patch 2 due to
> > __wb_calc_thresh() only taking one parameter in my tree. What's the
> > baseline commit for this series?
> >
> Sorry for missing this, this seris is based on another patchset [1] which is still
> under review.
> Look forward to your reply!
>
> Thansk
> Kemeng
>
> [1] https://lore.kernel.org/lkml/20240123183332.876854-1-shikemeng@huaweicloud.com/T/#mc6455784a63d0f8aa1a2f5aff325abcdf9336b76
>
> > Brian
> >
> >> +static int bdi_debug_stats_show(struct seq_file *m, void *v)
> >> +{
> >> + struct backing_dev_info *bdi = m->private;
> >> + unsigned long background_thresh;
> >> + unsigned long dirty_thresh;
> >> + struct wb_stats stats;
> >> + unsigned long tot_bw;
> >> +
> >> global_dirty_limits(&background_thresh, &dirty_thresh);
> >> - wb_thresh = wb_calc_thresh(wb, dirty_thresh);
> >> +
> >> + memset(&stats, 0, sizeof(stats));
> >> + stats.dirty_thresh = dirty_thresh;
> >> + bdi_collect_stats(bdi, &stats);
> >> +
> >> + tot_bw = atomic_long_read(&bdi->tot_write_bandwidth);
> >>
> >> seq_printf(m,
> >> "BdiWriteback: %10lu kB\n"
> >> @@ -87,18 +134,18 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
> >> "b_dirty_time: %10lu\n"
> >> "bdi_list: %10u\n"
> >> "state: %10lx\n",
> >> - (unsigned long) K(wb_stat(wb, WB_WRITEBACK)),
> >> - (unsigned long) K(wb_stat(wb, WB_RECLAIMABLE)),
> >> - K(wb_thresh),
> >> + K(stats.nr_writeback),
> >> + K(stats.nr_reclaimable),
> >> + K(stats.wb_thresh),
> >> K(dirty_thresh),
> >> K(background_thresh),
> >> - (unsigned long) K(wb_stat(wb, WB_DIRTIED)),
> >> - (unsigned long) K(wb_stat(wb, WB_WRITTEN)),
> >> - (unsigned long) K(wb->write_bandwidth),
> >> - nr_dirty,
> >> - nr_io,
> >> - nr_more_io,
> >> - nr_dirty_time,
> >> + K(stats.nr_dirtied),
> >> + K(stats.nr_written),
> >> + K(tot_bw),
> >> + stats.nr_dirty,
> >> + stats.nr_io,
> >> + stats.nr_more_io,
> >> + stats.nr_dirty_time,
> >> !list_empty(&bdi->bdi_list), bdi->wb.state);
> >>
> >> return 0;
> >> --
> >> 2.30.0
> >>
> >>
> >
> >
>
next prev parent reply other threads:[~2024-03-21 12:08 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-20 11:02 [PATCH 0/6] Improve visibility of writeback Kemeng Shi
2024-03-20 11:02 ` [PATCH 1/6] writeback: collect stats of all wb of bdi in bdi_debug_stats_show Kemeng Shi
2024-03-20 13:21 ` Brian Foster
2024-03-21 3:44 ` Kemeng Shi
2024-03-21 12:10 ` Brian Foster [this message]
2024-03-22 7:32 ` Kemeng Shi
2024-03-21 18:06 ` Jan Kara
2024-03-22 7:51 ` Kemeng Shi
2024-03-22 11:58 ` Brian Foster
2024-03-26 13:16 ` Kemeng Shi
2024-03-20 11:02 ` [PATCH 2/6] writeback: support retrieving per group debug writeback stats of bdi Kemeng Shi
2024-03-20 15:01 ` Tejun Heo
2024-03-21 3:45 ` Kemeng Shi
2024-03-26 12:24 ` Jan Kara
2024-03-26 13:26 ` Kemeng Shi
2024-03-20 11:02 ` [PATCH 3/6] workqueue: remove unnecessary import and function in wq_monitor.py Kemeng Shi
2024-03-20 15:03 ` Tejun Heo
2024-03-21 6:08 ` Kemeng Shi
2024-03-20 11:02 ` [PATCH 4/6] writeback: add wb_monitor.py script to monitor writeback info on bdi Kemeng Shi
2024-03-20 15:12 ` Tejun Heo
2024-03-21 6:22 ` Kemeng Shi
2024-03-20 11:02 ` [PATCH 5/6] writeback: rename nr_reclaimable to nr_dirty in balance_dirty_pages Kemeng Shi
2024-03-26 12:27 ` Jan Kara
2024-03-20 11:02 ` [PATCH 6/6] writeback: remove unneeded GDTC_INIT_NO_WB Kemeng Shi
2024-03-20 15:15 ` Tejun Heo
2024-03-21 7:12 ` Kemeng Shi
2024-03-25 20:26 ` Tejun Heo
2024-03-26 13:17 ` Kemeng Shi
2024-03-27 9:33 ` Jan Kara
2024-03-28 1:49 ` Kemeng Shi
2024-04-02 13:53 ` Jan Kara
2024-04-03 8:50 ` Kemeng Shi
2024-03-26 12:35 ` Jan Kara
2024-03-26 13:30 ` Kemeng Shi
2024-03-20 15:20 ` [PATCH 0/6] Improve visibility of writeback Tejun Heo
2024-03-20 17:22 ` Jan Kara
2024-03-21 8:12 ` Kemeng Shi
2024-03-21 18:07 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zfwjo_ZQH_LFZ1Rc@bfoster \
--to=bfoster@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dhowells@redhat.com \
--cc=dsterba@suse.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mjguzik@gmail.com \
--cc=peterz@infradead.org \
--cc=shikemeng@huaweicloud.com \
--cc=tj@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.