From: Brian Foster <bfoster@redhat.com>
To: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: akpm@linux-foundation.org, willy@infradead.org, jack@suse.cz,
tj@kernel.org, dsterba@suse.com, mjguzik@gmail.com,
dhowells@redhat.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v2 3/6] writeback: support retrieving per group debug writeback stats of bdi
Date: Wed, 3 Apr 2024 11:04:58 -0400 [thread overview]
Message-ID: <Zg1wGvTeQxjqjYUG@bfoster> (raw)
In-Reply-To: <e3816f9c-0f29-a0e4-8ad8-a6acf82a06ad@huaweicloud.com>
On Wed, Apr 03, 2024 at 04:49:42PM +0800, Kemeng Shi wrote:
>
>
> on 3/29/2024 9:10 PM, Brian Foster wrote:
> > On Wed, Mar 27, 2024 at 11:57:48PM +0800, Kemeng Shi wrote:
> >> Add /sys/kernel/debug/bdi/xxx/wb_stats to show per group writeback stats
> >> of bdi.
> >>
> >
> > Hi Kemeng,
> Hello Brian,
> >
> > Just a few random thoughts/comments..
> >
> >> Following domain hierarchy is tested:
> >> global domain (320G)
> >> / \
> >> cgroup domain1(10G) cgroup domain2(10G)
> >> | |
> >> bdi wb1 wb2
> >>
> >> /* per wb writeback info of bdi is collected */
> >> cat /sys/kernel/debug/bdi/252:16/wb_stats
> >> WbCgIno: 1
> >> WbWriteback: 0 kB
> >> WbReclaimable: 0 kB
> >> WbDirtyThresh: 0 kB
> >> WbDirtied: 0 kB
> >> WbWritten: 0 kB
> >> WbWriteBandwidth: 102400 kBps
> >> b_dirty: 0
> >> b_io: 0
> >> b_more_io: 0
> >> b_dirty_time: 0
> >> state: 1
> >
> > Maybe some whitespace or something between entries would improve
> > readability?
> Sure, I will add a whitespace in next version.
> >
> >> WbCgIno: 4094
> >> WbWriteback: 54432 kB
> >> WbReclaimable: 766080 kB
> >> WbDirtyThresh: 3094760 kB
> >> WbDirtied: 1656480 kB
> >> WbWritten: 837088 kB
> >> WbWriteBandwidth: 132772 kBps
> >> b_dirty: 1
> >> b_io: 1
> >> b_more_io: 0
> >> b_dirty_time: 0
> >> state: 7
> >> WbCgIno: 4135
> >> WbWriteback: 15232 kB
> >> WbReclaimable: 786688 kB
> >> WbDirtyThresh: 2909984 kB
> >> WbDirtied: 1482656 kB
> >> WbWritten: 681408 kB
> >> WbWriteBandwidth: 124848 kBps
> >> b_dirty: 0
> >> b_io: 1
> >> b_more_io: 0
> >> b_dirty_time: 0
> >> state: 7
> >>
> >> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> >> ---
> >> include/linux/writeback.h | 1 +
> >> mm/backing-dev.c | 88 +++++++++++++++++++++++++++++++++++++++
> >> mm/page-writeback.c | 19 +++++++++
> >> 3 files changed, 108 insertions(+)
> >>
> > ...
> >> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> >> index 8daf950e6855..e3953db7d88d 100644
> >> --- a/mm/backing-dev.c
> >> +++ b/mm/backing-dev.c
> >> @@ -103,6 +103,91 @@ static void collect_wb_stats(struct wb_stats *stats,
> >> }
> >>
> >> #ifdef CONFIG_CGROUP_WRITEBACK
> > ...
> >> +static int cgwb_debug_stats_show(struct seq_file *m, void *v)
> >> +{
> >> + struct backing_dev_info *bdi;
> >> + unsigned long background_thresh;
> >> + unsigned long dirty_thresh;
> >> + struct bdi_writeback *wb;
> >> + struct wb_stats stats;
> >> +
> >> + rcu_read_lock();
> >> + bdi = lookup_bdi(m);
> >> + if (!bdi) {
> >> + rcu_read_unlock();
> >> + return -EEXIST;
> >> + }
> >> +
> >> + global_dirty_limits(&background_thresh, &dirty_thresh);
> >> +
> >> + list_for_each_entry_rcu(wb, &bdi->wb_list, bdi_node) {
> >> + memset(&stats, 0, sizeof(stats));
> >> + stats.dirty_thresh = dirty_thresh;
> >
> > If you did something like the following here, wouldn't that also zero
> > the rest of the structure?
> >
> > struct wb_stats stats = { .dirty_thresh = dirty_thresh };
> >
> Suer, will do it in next version.
> >> + collect_wb_stats(&stats, wb);
> >> +
> >
> > Also, similar question as before on whether you'd want to check
> > WB_registered or something here..
> Still prefer to keep full debug info and user could filter out on
> demand.
Ok. I was more wondering if that was needed for correctness. If not,
then that seems fair enough to me.
> >
> >> + if (mem_cgroup_wb_domain(wb) == NULL) {
> >> + wb_stats_show(m, wb, &stats);
> >> + continue;
> >> + }
> >
> > Can you explain what this logic is about? Is the cgwb_calc_thresh()
> > thing not needed in this case? A comment might help for those less
> > familiar with the implementation details.
> If mem_cgroup_wb_domain(wb) is NULL, then it's bdi->wb, otherwise,
> it's wb in cgroup. For bdi->wb, there is no need to do wb_tryget
> and cgwb_calc_thresh. Will add some comment in next version.
> >
> > BTW, I'm also wondering if something like the following is correct
> > and/or roughly equivalent:
> >
> > list_for_each_*(wb, ...) {
> > struct wb_stats stats = ...;
> >
> > if (!wb_tryget(wb))
> > continue;
> >
> > collect_wb_stats(&stats, wb);
> >
> > /*
> > * Extra wb_thresh magic. Drop rcu lock because ... . We
> > * can do so here because we have a ref.
> > */
> > if (mem_cgroup_wb_domain(wb)) {
> > rcu_read_unlock();
> > stats.wb_thresh = min(stats.wb_thresh, cgwb_calc_thresh(wb));
> > rcu_read_lock();
> > }
> >
> > wb_stats_show(m, wb, &stats)
> > wb_put(wb);
> > }
> It's correct as wb_tryget to bdi->wb has no harm. I have considered
> to do it in this way, I change my mind to do it in new way for
> two reason:
> 1. Put code handling wb in cgroup more tight which could be easier
> to maintain.
> 2. Rmove extra wb_tryget/wb_put for wb in bdi.
> Would this make sense to you?
Ok, well assuming it is correct the above logic is a bit more simple and
readable to me. I think you'd just need to fill in the comment around
the wb_thresh thing rather than i.e. having to explain we don't need to
ref bdi->wb even though it doesn't seem to matter.
I kind of feel the same on the wb_stats file thing below just because it
seems more consistent and available if wb_stats eventually grows more
wb-specific data.
That said, this is subjective and not hugely important so I don't insist
on either point. Maybe wait a bit and see if Jan or Tejun or somebody
has any thoughts..? If nobody else expresses explicit preference then
I'm good with it either way.
Brian
> >
> >> +
> >> + /*
> >> + * cgwb_release will destroy wb->memcg_completions which
> >> + * will be ued in cgwb_calc_thresh. Use wb_tryget to prevent
> >> + * memcg_completions destruction from cgwb_release.
> >> + */
> >> + if (!wb_tryget(wb))
> >> + continue;
> >> +
> >> + rcu_read_unlock();
> >> + /* cgwb_calc_thresh may sleep in cgroup_rstat_flush */
> >> + stats.wb_thresh = min(stats.wb_thresh, cgwb_calc_thresh(wb));
> >> + wb_stats_show(m, wb, &stats);
> >> + rcu_read_lock();
> >> + wb_put(wb);
> >> + }
> >> + rcu_read_unlock();
> >> +
> >> + return 0;
> >> +}
> >> +DEFINE_SHOW_ATTRIBUTE(cgwb_debug_stats);
> >> +
> >> +static void cgwb_debug_register(struct backing_dev_info *bdi)
> >> +{
> >> + debugfs_create_file("wb_stats", 0444, bdi->debug_dir, bdi,
> >> + &cgwb_debug_stats_fops);
> >> +}
> >> +
> >> static void bdi_collect_stats(struct backing_dev_info *bdi,
> >> struct wb_stats *stats)
> >> {
> >> @@ -117,6 +202,8 @@ static void bdi_collect_stats(struct backing_dev_info *bdi,
> >> {
> >> collect_wb_stats(stats, &bdi->wb);
> >> }
> >> +
> >> +static inline void cgwb_debug_register(struct backing_dev_info *bdi) { }
> >
> > Could we just create the wb_stats file regardless of whether cgwb is
> > enabled? Obviously theres only one wb in the !CGWB case and it's
> > somewhat duplicative with the bdi stats file, but that seems harmless if
> > the same code can be reused..? Maybe there's also a small argument for
> > dropping the state info from the bdi stats file and moving it to
> > wb_stats.In backing-dev.c, there are a lot "#ifdef CGWB .. #else .. #endif" to
> avoid unneed extra cost when CGWB is not enabled.
> I think it's better to avoid extra cost from wb_stats when CGWB is not
> enabled. For now, we only save cpu cost to create and destroy wb_stats
> and save memory cost to record debugfs file, we could save more in
> future when wb_stats records more debug info.
> Move state info from bdi stats to wb_stats make senses to me. The only
> concern would be compatibility problem. I will add a new patch to this
> to make this more noticeable and easier to revert.
> Thanks a lot for review!
>
> Kemeng
> >
> > Brian
> >
> >> #endif
> >>
> >> static int bdi_debug_stats_show(struct seq_file *m, void *v)
> >> @@ -182,6 +269,7 @@ static void bdi_debug_register(struct backing_dev_info *bdi, const char *name)
> >>
> >> debugfs_create_file("stats", 0444, bdi->debug_dir, bdi,
> >> &bdi_debug_stats_fops);
> >> + cgwb_debug_register(bdi);
> >> }
> >>
> >> static void bdi_debug_unregister(struct backing_dev_info *bdi)
> >> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> >> index 0e20467367fe..3724c7525316 100644
> >> --- a/mm/page-writeback.c
> >> +++ b/mm/page-writeback.c
> >> @@ -893,6 +893,25 @@ unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh)
> >> return __wb_calc_thresh(&gdtc, thresh);
> >> }
> >>
> >> +unsigned long cgwb_calc_thresh(struct bdi_writeback *wb)
> >> +{
> >> + struct dirty_throttle_control gdtc = { GDTC_INIT_NO_WB };
> >> + struct dirty_throttle_control mdtc = { MDTC_INIT(wb, &gdtc) };
> >> + unsigned long filepages, headroom, writeback;
> >> +
> >> + gdtc.avail = global_dirtyable_memory();
> >> + gdtc.dirty = global_node_page_state(NR_FILE_DIRTY) +
> >> + global_node_page_state(NR_WRITEBACK);
> >> +
> >> + mem_cgroup_wb_stats(wb, &filepages, &headroom,
> >> + &mdtc.dirty, &writeback);
> >> + mdtc.dirty += writeback;
> >> + mdtc_calc_avail(&mdtc, filepages, headroom);
> >> + domain_dirty_limits(&mdtc);
> >> +
> >> + return __wb_calc_thresh(&mdtc, mdtc.thresh);
> >> +}
> >> +
> >> /*
> >> * setpoint - dirty 3
> >> * f(dirty) := 1.0 + (----------------)
> >> --
> >> 2.30.0
> >>
> >
> >
>
next prev parent reply other threads:[~2024-04-03 15:03 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-27 15:57 [PATCH v2 0/6] Improve visibility of writeback Kemeng Shi
2024-03-27 15:57 ` [PATCH v2 1/6] writeback: protect race between bdi release and bdi_debug_stats_show Kemeng Shi
2024-03-28 17:53 ` Brian Foster
2024-04-03 2:16 ` Kemeng Shi
2024-03-27 15:57 ` [PATCH v2 2/6] writeback: collect stats of all wb of bdi in bdi_debug_stats_show Kemeng Shi
2024-03-29 13:04 ` Brian Foster
2024-04-03 7:49 ` Kemeng Shi
2024-03-27 15:57 ` [PATCH v2 3/6] writeback: support retrieving per group debug writeback stats of bdi Kemeng Shi
2024-03-29 13:10 ` Brian Foster
2024-04-03 8:49 ` Kemeng Shi
2024-04-03 15:04 ` Brian Foster [this message]
2024-04-04 9:07 ` Jan Kara
2024-04-07 3:13 ` Kemeng Shi
2024-04-07 2:48 ` Kemeng Shi
2024-03-27 15:57 ` [PATCH v2 4/6] writeback: add wb_monitor.py script to monitor writeback info on bdi Kemeng Shi
2024-03-27 15:57 ` [PATCH v2 5/6] writeback: rename nr_reclaimable to nr_dirty in balance_dirty_pages Kemeng Shi
2024-03-27 15:57 ` [PATCH v2 6/6] writeback: define GDTC_INIT_NO_WB to null Kemeng Shi
2024-03-27 17:40 ` [PATCH v2 0/6] Improve visibility of writeback Andrew Morton
2024-03-28 1:59 ` Kemeng Shi
2024-03-28 8:23 ` Kemeng Shi
2024-03-28 19:15 ` Kent Overstreet
2024-03-28 19:23 ` Andrew Morton
2024-03-28 19:36 ` Kent Overstreet
2024-03-28 19:24 ` Kent Overstreet
2024-03-28 19:31 ` Tejun Heo
2024-03-28 19:40 ` Kent Overstreet
2024-03-28 19:46 ` Tejun Heo
2024-03-28 19:55 ` Kent Overstreet
2024-03-28 20:13 ` Tejun Heo
2024-03-28 20:22 ` Kent Overstreet
2024-03-28 20:46 ` Tejun Heo
2024-03-28 20:53 ` Kent Overstreet
2024-04-03 16:27 ` Jan Kara
2024-04-03 18:44 ` Tejun Heo
2024-04-03 19:06 ` Kent Overstreet
2024-04-03 19:21 ` Tejun Heo
2024-04-03 22:24 ` Kent Overstreet
2024-04-03 6:56 ` Kemeng Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zg1wGvTeQxjqjYUG@bfoster \
--to=bfoster@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dhowells@redhat.com \
--cc=dsterba@suse.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mjguzik@gmail.com \
--cc=shikemeng@huaweicloud.com \
--cc=tj@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.