From: Matt Bobrowski <mattbobrowski@google.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
ohn Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@fomichev.me>,
Jiri Olsa <jolsa@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Chuyi Zhou <zhouchuyi@bytedance.com>, Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH bpf-next 1/2] bpf: add new BPF_CGROUP_ITER_CHILDREN_ONLY control option
Date: Tue, 27 Jan 2026 08:28:31 +0000 [thread overview]
Message-ID: <aXh3L7BukANG6Dve@google.com> (raw)
In-Reply-To: <CAADnVQJXs1NinaahP-uKOrqKUHEZ4PgwOAAVpWkHDWkapBZ2YA@mail.gmail.com>
On Mon, Jan 26, 2026 at 06:26:55PM -0800, Alexei Starovoitov wrote:
> On Mon, Jan 26, 2026 at 1:03 AM Matt Bobrowski <mattbobrowski@google.com> wrote:
> >
> > On Fri, Jan 23, 2026 at 09:17:03AM -0800, Alexei Starovoitov wrote:
> > > On Fri, Jan 23, 2026 at 3:06 AM Matt Bobrowski <mattbobrowski@google.com> wrote:
> > > >
> > > > On Thu, Jan 22, 2026 at 08:26:49PM -0800, Alexei Starovoitov wrote:
> > > > > On Wed, Jan 21, 2026 at 5:54 AM Matt Bobrowski <mattbobrowski@google.com> wrote:
> > > > > >
> > > > > > break;
> > > > > > + case BPF_CGROUP_ITER_CHILDREN_ONLY:
> > > > > > + kit->pos = css_next_child(kit->pos, kit->start);
> > > > > > + break;
> > > > >
> > > > > There are no users of css_next_child() outside of cgroup
> > > > > internals. It's a red flag for me.
> > > >
> > > > Hm, I can see the slight hesitation here. However, until somewhat
> > > > recently, the same argument could have been applied to functions like
> > > > css_next_descendant_post(), right?
> > > >
> > > > css_next_descendant_post() has rather recently started seeing usage in
> > > > other subsystems (block/, kernel/sched/, kernel/bpf), despite
> > > > remaining internal-only since its inception over a decade ago. Given
> > > > that css_next_descendant_post() continues to remain unexported, in all
> > > > fairness, I don't see css_next_child() being all that different in
> > > > this context.
> > > >
> > > > Would your stance change if Tejun agreed to mark css_next_child() as
> > > > exportable, or simply agreed to it being used from BPF cgroup
> > > > iterators?
> > > >
> > > > Both css_next_child() and css_for_each_child() have remained virtually
> > > > unchanged - and therefore inherently stable - since their
> > > > introduction. I should also mention that functions like
> > > > css_next_child() and css_next_descendant_post() were originally marked
> > > > as exported, and only later unmarked. This presumably was done as part
> > > > of a symbol cleanup, not because the functions were unstable
> > > > primitives.
> > > >
> > > > > I feel there is something wrong with the use case. It should have
> > > > > been handled by the primitives we have.
> > > >
> > > > Right, arguably it could be handled with the current set of
> > > > primitives, but it isn't nice and certainly not as efficient as I'd
> > > > like.
> > >
> > > Why is it? Just descdant_pre and break out of the loop if not immediate.
> >
> > If I use a break when I hit a grandchild in pre-order traversal, I
> > will still terminate the loop early and miss all the subsequent
> > immediate children?
> >
> > I do understand that using css_next_descendant_post() (without a
> > break) could allow me to logically filter for children in the
> > loop. However, my concern is purely about algorithmic complexity and
> > scalability.
> >
> > css_next_descendant_post() performs a full subtree walk
> > (O(Descendants)). If I have a top-level cgroup with only a few
> > children but a massive hierarchy underneath (hundreds/thousands of
> > grandchildren), using the descendant based iterator requires me to
> > literally visit every single node in that subtree.
> >
> > In contrast, iterating immediate children is just a linked-list walk
> > (O(Children)). For deep hierarchies, the performance difference is not
> > constant - it's orders of magnitude.
> >
> > For example, if I have 5 children under a given root/parent with each
> > of those 5 children having 500 nested descedants each:
> > - A pre-order traversal would lead me to visiting 2505 nodes.
> > - A linear traversal would lead me to visting 5 nodes.
> >
> > > The thought process should be to use what's available as much as possible.
> >
> > Well, I get that and it was my initial idea. But, I quickly realized
> > that we have a gap in the current iterators capabilities, and this
> > patch is meant to address this gap.
> >
> > > Adding UAPI for specific use cases isn't great.
> > > If it was a new kfunc that's one thing, but here you're touching uapi/bpf.h
> > > So acceptance bar is much much higher.
> >
> > OK, but now that Tejun confirmed [0] that he's fine with exposing
> > css_next_child(), how does this weigh up?
>
> ok. let's add it.
> Maybe shorten the name to
> BPF_CGROUP_ITER_CHILDREN
> "_ONLY" looks unnecessary.
Sounds good, thank you. Shortening to BPF_CGROUP_ITER_CHILDREN is OK
too. Will send it out as part of v2.
next prev parent reply other threads:[~2026-01-27 8:28 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-21 13:54 [PATCH bpf-next 1/2] bpf: add new BPF_CGROUP_ITER_CHILDREN_ONLY control option Matt Bobrowski
2026-01-21 13:54 ` [PATCH bpf-next 2/2] bpf/selftests: cover " Matt Bobrowski
2026-01-21 19:14 ` [PATCH bpf-next 1/2] bpf: add new " Song Liu
2026-01-22 12:31 ` Matt Bobrowski
2026-01-23 4:26 ` Alexei Starovoitov
2026-01-23 11:06 ` Matt Bobrowski
2026-01-23 17:17 ` Alexei Starovoitov
2026-01-26 9:03 ` Matt Bobrowski
2026-01-27 2:26 ` Alexei Starovoitov
2026-01-27 8:28 ` Matt Bobrowski [this message]
2026-01-23 18:50 ` Tejun Heo
2026-01-26 9:14 ` Matt Bobrowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aXh3L7BukANG6Dve@google.com \
--to=mattbobrowski@google.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=martin.lau@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=sdf@fomichev.me \
--cc=song@kernel.org \
--cc=tj@kernel.org \
--cc=yonghong.song@linux.dev \
--cc=zhouchuyi@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox