From: Matt Bobrowski <mattbobrowski@google.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
ohn Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@fomichev.me>,
Jiri Olsa <jolsa@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Chuyi Zhou <zhouchuyi@bytedance.com>, Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH bpf-next 1/2] bpf: add new BPF_CGROUP_ITER_CHILDREN_ONLY control option
Date: Mon, 26 Jan 2026 09:03:49 +0000 [thread overview]
Message-ID: <aXct9XxD43eBZs-H@google.com> (raw)
In-Reply-To: <CAADnVQKHuj1snSOBh+VEJrcW82OW2zNOEd1=Dqe-m+kuOV7MWw@mail.gmail.com>
On Fri, Jan 23, 2026 at 09:17:03AM -0800, Alexei Starovoitov wrote:
> On Fri, Jan 23, 2026 at 3:06 AM Matt Bobrowski <mattbobrowski@google.com> wrote:
> >
> > On Thu, Jan 22, 2026 at 08:26:49PM -0800, Alexei Starovoitov wrote:
> > > On Wed, Jan 21, 2026 at 5:54 AM Matt Bobrowski <mattbobrowski@google.com> wrote:
> > > >
> > > > break;
> > > > + case BPF_CGROUP_ITER_CHILDREN_ONLY:
> > > > + kit->pos = css_next_child(kit->pos, kit->start);
> > > > + break;
> > >
> > > There are no users of css_next_child() outside of cgroup
> > > internals. It's a red flag for me.
> >
> > Hm, I can see the slight hesitation here. However, until somewhat
> > recently, the same argument could have been applied to functions like
> > css_next_descendant_post(), right?
> >
> > css_next_descendant_post() has rather recently started seeing usage in
> > other subsystems (block/, kernel/sched/, kernel/bpf), despite
> > remaining internal-only since its inception over a decade ago. Given
> > that css_next_descendant_post() continues to remain unexported, in all
> > fairness, I don't see css_next_child() being all that different in
> > this context.
> >
> > Would your stance change if Tejun agreed to mark css_next_child() as
> > exportable, or simply agreed to it being used from BPF cgroup
> > iterators?
> >
> > Both css_next_child() and css_for_each_child() have remained virtually
> > unchanged - and therefore inherently stable - since their
> > introduction. I should also mention that functions like
> > css_next_child() and css_next_descendant_post() were originally marked
> > as exported, and only later unmarked. This presumably was done as part
> > of a symbol cleanup, not because the functions were unstable
> > primitives.
> >
> > > I feel there is something wrong with the use case. It should have
> > > been handled by the primitives we have.
> >
> > Right, arguably it could be handled with the current set of
> > primitives, but it isn't nice and certainly not as efficient as I'd
> > like.
>
> Why is it? Just descdant_pre and break out of the loop if not immediate.
If I use a break when I hit a grandchild in pre-order traversal, I
will still terminate the loop early and miss all the subsequent
immediate children?
I do understand that using css_next_descendant_post() (without a
break) could allow me to logically filter for children in the
loop. However, my concern is purely about algorithmic complexity and
scalability.
css_next_descendant_post() performs a full subtree walk
(O(Descendants)). If I have a top-level cgroup with only a few
children but a massive hierarchy underneath (hundreds/thousands of
grandchildren), using the descendant based iterator requires me to
literally visit every single node in that subtree.
In contrast, iterating immediate children is just a linked-list walk
(O(Children)). For deep hierarchies, the performance difference is not
constant - it's orders of magnitude.
For example, if I have 5 children under a given root/parent with each
of those 5 children having 500 nested descedants each:
- A pre-order traversal would lead me to visiting 2505 nodes.
- A linear traversal would lead me to visting 5 nodes.
> The thought process should be to use what's available as much as possible.
Well, I get that and it was my initial idea. But, I quickly realized
that we have a gap in the current iterators capabilities, and this
patch is meant to address this gap.
> Adding UAPI for specific use cases isn't great.
> If it was a new kfunc that's one thing, but here you're touching uapi/bpf.h
> So acceptance bar is much much higher.
OK, but now that Tejun confirmed [0] that he's fine with exposing
css_next_child(), how does this weigh up?
[0] https://lore.kernel.org/bpf/aXPC_KMX_rvO14lR@slm.duckdns.org/
next prev parent reply other threads:[~2026-01-26 9:03 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-21 13:54 [PATCH bpf-next 1/2] bpf: add new BPF_CGROUP_ITER_CHILDREN_ONLY control option Matt Bobrowski
2026-01-21 13:54 ` [PATCH bpf-next 2/2] bpf/selftests: cover " Matt Bobrowski
2026-01-21 19:14 ` [PATCH bpf-next 1/2] bpf: add new " Song Liu
2026-01-22 12:31 ` Matt Bobrowski
2026-01-23 4:26 ` Alexei Starovoitov
2026-01-23 11:06 ` Matt Bobrowski
2026-01-23 17:17 ` Alexei Starovoitov
2026-01-26 9:03 ` Matt Bobrowski [this message]
2026-01-27 2:26 ` Alexei Starovoitov
2026-01-27 8:28 ` Matt Bobrowski
2026-01-23 18:50 ` Tejun Heo
2026-01-26 9:14 ` Matt Bobrowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aXct9XxD43eBZs-H@google.com \
--to=mattbobrowski@google.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=martin.lau@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=sdf@fomichev.me \
--cc=song@kernel.org \
--cc=tj@kernel.org \
--cc=yonghong.song@linux.dev \
--cc=zhouchuyi@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox