From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: JP Kobryn <inwardvessel@gmail.com>
Cc: tj@kernel.org, shakeel.butt@linux.dev, mhocko@kernel.org,
hannes@cmpxchg.org, akpm@linux-foundation.org,
linux-mm@kvack.org, cgroups@vger.kernel.org,
kernel-team@meta.com
Subject: Re: [PATCH 1/4 v2] cgroup: move cgroup_rstat from cgroup to cgroup_subsys_state
Date: Sat, 1 Mar 2025 01:25:03 +0000 [thread overview]
Message-ID: <Z8Jh7-lN_qltU7WD@google.com> (raw)
In-Reply-To: <bd45e4df-266e-4b67-abd5-680808a40d4f@gmail.com>
On Fri, Feb 28, 2025 at 05:06:23PM -0800, JP Kobryn wrote:
[..]
> >
> > > cgroup_idr_replace(&ss->css_idr, NULL, css->id);
> > > if (ss->css_released)
> > [..]
> > > @@ -6188,6 +6186,9 @@ int __init cgroup_init(void)
> > > css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2,
> > > GFP_KERNEL);
> > > BUG_ON(css->id < 0);
> > > +
> > > + if (css->ss && css->ss->css_rstat_flush)
> > > + BUG_ON(cgroup_rstat_init(css));
> >
> > Why do we need this call here? We already call cgroup_rstat_init() in
> > cgroup_init_subsys(). IIUC for subsystems with ss->early_init, we will
> > have already called cgroup_init_subsys() in cgroup_init_early().
> >
> > Did I miss something?
>
> Hmmm it's a good question. cgroup_rstat_init() is deferred in the same
> way that cgroup_idr_alloc() is. So for ss with early_init == true,
> cgroup_rstat_init() is not called during cgroup_early_init().
Oh I didn't realize that the call here is only when early_init == false.
I think we need a comment to clarify that cgroup_idr_alloc() and
cgroup_rstat_init() are not called in cgroup_init_subsys() when
early_init == true, and hence need to be called in cgroup_init().
Or maybe organize the code in a way to make this more obvious (put them
in a helper with a descriptive name? idk).
>
> Is it safe to call alloc_percpu() during early boot? If so, the
> deferral can be removed and cgroup_rstat_init() can be called in one
> place.
I don't think so. cgroup_init_early() is called before
setup_per_cpu_areas().
>
> >
> > > } else {
> > > cgroup_init_subsys(ss, false);
> > > }
> > [..]
> > > @@ -300,27 +306,25 @@ static inline void __cgroup_rstat_unlock(struct cgroup *cgrp, int cpu_in_loop)
> > > }
> > > /* see cgroup_rstat_flush() */
> > > -static void cgroup_rstat_flush_locked(struct cgroup *cgrp)
> > > +static void cgroup_rstat_flush_locked(struct cgroup_subsys_state *css)
> > > __releases(&cgroup_rstat_lock) __acquires(&cgroup_rstat_lock)
> > > {
> > > + struct cgroup *cgrp = css->cgroup;
> > > int cpu;
> > > lockdep_assert_held(&cgroup_rstat_lock);
> > > for_each_possible_cpu(cpu) {
> > > - struct cgroup *pos = cgroup_rstat_updated_list(cgrp, cpu);
> > > + struct cgroup_subsys_state *pos;
> > > + pos = cgroup_rstat_updated_list(css, cpu);
> > > for (; pos; pos = pos->rstat_flush_next) {
> > > - struct cgroup_subsys_state *css;
> > > + if (!pos->ss)
> > > + cgroup_base_stat_flush(pos->cgroup, cpu);
> > > + else
> > > + pos->ss->css_rstat_flush(pos, cpu);
> > > - cgroup_base_stat_flush(pos, cpu);
> > > - bpf_rstat_flush(pos, cgroup_parent(pos), cpu);
> > > -
> > > - rcu_read_lock();
> > > - list_for_each_entry_rcu(css, &pos->rstat_css_list,
> > > - rstat_css_node)
> > > - css->ss->css_rstat_flush(css, cpu);
> > > - rcu_read_unlock();
> > > + bpf_rstat_flush(pos->cgroup, cgroup_parent(pos->cgroup), cpu);
> >
> > We should call bpf_rstat_flush() only if (!pos->ss) as well, right?
> > Otherwise we will call BPF rstat flush whenever any subsystem is
> > flushed.
> >
> > I guess it's because BPF can now pass any subsystem to
> > cgroup_rstat_flush(), and we don't keep track. I think it would be
> > better if we do not allow BPF programs to select a css and always make
> > them flush the self css.
> >
> > We can perhaps introduce a bpf_cgroup_rstat_flush() wrapper that takes
> > in a cgroup and passes cgroup->self internally to cgroup_rstat_flush().
>
> I'm fine with this if others are in agreement. A similar concept was
> done in v1.
Let's wait for Shakeel to chime in here since he suggested removing this
hook, but I am not sure if he intended to actually do it or not. Better
not to waste effort if this will be gone soon anyway.
>
> >
> > But if the plan is to remove the bpf_rstat_flush() call here soon then
> > it's probably not worth the hassle.
> >
> > Shakeel (and others), WDYT?
>
next prev parent reply other threads:[~2025-03-01 1:25 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-27 21:55 [PATCH 0/4 v2] cgroup: separate rstat trees inwardvessel
2025-02-27 21:55 ` [PATCH 1/4 v2] cgroup: move cgroup_rstat from cgroup to cgroup_subsys_state inwardvessel
2025-02-27 22:43 ` Shakeel Butt
2025-02-28 19:04 ` Yosry Ahmed
2025-03-01 1:06 ` JP Kobryn
2025-03-01 1:25 ` Yosry Ahmed [this message]
2025-03-01 1:30 ` JP Kobryn
2025-03-03 18:18 ` Shakeel Butt
2025-03-03 18:21 ` Yosry Ahmed
2025-03-03 15:20 ` Michal Koutný
2025-03-03 19:31 ` JP Kobryn
2025-02-27 21:55 ` [PATCH 2/4 v2] cgroup: rstat lock indirection inwardvessel
2025-03-03 15:21 ` Michal Koutný
2025-02-27 21:55 ` [PATCH 3/4 v2] cgroup: separate rstat locks for subsystems inwardvessel
2025-02-27 22:52 ` Shakeel Butt
2025-02-28 16:07 ` JP Kobryn
2025-02-28 17:37 ` JP Kobryn
2025-02-28 19:20 ` Yosry Ahmed
2025-03-06 21:47 ` JP Kobryn
2025-03-01 23:00 ` kernel test robot
2025-03-03 15:22 ` Michal Koutný
2025-03-03 18:29 ` Yosry Ahmed
2025-03-03 18:40 ` Shakeel Butt
2025-03-03 19:23 ` JP Kobryn
2025-03-03 19:39 ` Shakeel Butt
2025-03-03 19:50 ` Yosry Ahmed
2025-03-03 20:09 ` Shakeel Butt
2025-03-03 18:49 ` Michal Koutný
2025-03-10 17:59 ` JP Kobryn
2025-03-11 13:49 ` Michal Koutný
2025-03-06 21:36 ` JP Kobryn
2025-03-03 23:49 ` kernel test robot
2025-02-27 21:55 ` [PATCH 4/4 v2] cgroup: separate rstat list pointers from base stats inwardvessel
2025-02-27 23:01 ` Shakeel Butt
2025-02-28 20:33 ` Yosry Ahmed
2025-02-28 18:22 ` [PATCH 0/4 v2] cgroup: separate rstat trees Yosry Ahmed
2025-03-03 15:19 ` Michal Koutný
2025-03-06 1:07 ` JP Kobryn
2025-03-11 13:49 ` Michal Koutný
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z8Jh7-lN_qltU7WD@google.com \
--to=yosry.ahmed@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=inwardvessel@gmail.com \
--cc=kernel-team@meta.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.