Re: [PATCH bpf-next 1/2] bpf: add new BPF_CGROUP_ITER_CHILDREN_ONLY control option

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

From: Matt Bobrowski <mattbobrowski@google.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
	Yonghong Song <yonghong.song@linux.dev>,
	ohn Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@fomichev.me>,
	Jiri Olsa <jolsa@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Chuyi Zhou <zhouchuyi@bytedance.com>, Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH bpf-next 1/2] bpf: add new BPF_CGROUP_ITER_CHILDREN_ONLY control option
Date: Tue, 27 Jan 2026 08:28:31 +0000	[thread overview]
Message-ID: <aXh3L7BukANG6Dve@google.com> (raw)
In-Reply-To: <CAADnVQJXs1NinaahP-uKOrqKUHEZ4PgwOAAVpWkHDWkapBZ2YA@mail.gmail.com>

On Mon, Jan 26, 2026 at 06:26:55PM -0800, Alexei Starovoitov wrote:
> On Mon, Jan 26, 2026 at 1:03 AM Matt Bobrowski <mattbobrowski@google.com> wrote:
> >
> > On Fri, Jan 23, 2026 at 09:17:03AM -0800, Alexei Starovoitov wrote:
> > > On Fri, Jan 23, 2026 at 3:06 AM Matt Bobrowski <mattbobrowski@google.com> wrote:
> > > >
> > > > On Thu, Jan 22, 2026 at 08:26:49PM -0800, Alexei Starovoitov wrote:
> > > > > On Wed, Jan 21, 2026 at 5:54 AM Matt Bobrowski <mattbobrowski@google.com> wrote:
> > > > > >
> > > > > >                 break;
> > > > > > +       case BPF_CGROUP_ITER_CHILDREN_ONLY:
> > > > > > +               kit->pos = css_next_child(kit->pos, kit->start);
> > > > > > +               break;
> > > > >
> > > > > There are no users of css_next_child() outside of cgroup
> > > > > internals. It's a red flag for me.
> > > >
> > > > Hm, I can see the slight hesitation here. However, until somewhat
> > > > recently, the same argument could have been applied to functions like
> > > > css_next_descendant_post(), right?
> > > >
> > > > css_next_descendant_post() has rather recently started seeing usage in
> > > > other subsystems (block/, kernel/sched/, kernel/bpf), despite
> > > > remaining internal-only since its inception over a decade ago. Given
> > > > that css_next_descendant_post() continues to remain unexported, in all
> > > > fairness, I don't see css_next_child() being all that different in
> > > > this context.
> > > >
> > > > Would your stance change if Tejun agreed to mark css_next_child() as
> > > > exportable, or simply agreed to it being used from BPF cgroup
> > > > iterators?
> > > >
> > > > Both css_next_child() and css_for_each_child() have remained virtually
> > > > unchanged - and therefore inherently stable - since their
> > > > introduction. I should also mention that functions like
> > > > css_next_child() and css_next_descendant_post() were originally marked
> > > > as exported, and only later unmarked. This presumably was done as part
> > > > of a symbol cleanup, not because the functions were unstable
> > > > primitives.
> > > >
> > > > > I feel there is something wrong with the use case. It should have
> > > > > been handled by the primitives we have.
> > > >
> > > > Right, arguably it could be handled with the current set of
> > > > primitives, but it isn't nice and certainly not as efficient as I'd
> > > > like.
> > >
> > > Why is it? Just descdant_pre and break out of the loop if not immediate.
> >
> > If I use a break when I hit a grandchild in pre-order traversal, I
> > will still terminate the loop early and miss all the subsequent
> > immediate children?
> >
> > I do understand that using css_next_descendant_post() (without a
> > break) could allow me to logically filter for children in the
> > loop. However, my concern is purely about algorithmic complexity and
> > scalability.
> >
> > css_next_descendant_post() performs a full subtree walk
> > (O(Descendants)). If I have a top-level cgroup with only a few
> > children but a massive hierarchy underneath (hundreds/thousands of
> > grandchildren), using the descendant based iterator requires me to
> > literally visit every single node in that subtree.
> >
> > In contrast, iterating immediate children is just a linked-list walk
> > (O(Children)). For deep hierarchies, the performance difference is not
> > constant - it's orders of magnitude.
> >
> > For example, if I have 5 children under a given root/parent with each
> > of those 5 children having 500 nested descedants each:
> > - A pre-order traversal would lead me to visiting 2505 nodes.
> > - A linear traversal would lead me to visting 5 nodes.
> >
> > > The thought process should be to use what's available as much as possible.
> >
> > Well, I get that and it was my initial idea. But, I quickly realized
> > that we have a gap in the current iterators capabilities, and this
> > patch is meant to address this gap.
> >
> > > Adding UAPI for specific use cases isn't great.
> > > If it was a new kfunc that's one thing, but here you're touching uapi/bpf.h
> > > So acceptance bar is much much higher.
> >
> > OK, but now that Tejun confirmed [0] that he's fine with exposing
> > css_next_child(), how does this weigh up?
> 
> ok. let's add it.
> Maybe shorten the name to
> BPF_CGROUP_ITER_CHILDREN
> "_ONLY" looks unnecessary.

Sounds good, thank you. Shortening to BPF_CGROUP_ITER_CHILDREN is OK
too. Will send it out as part of v2.

next prev parent reply	other threads:[~2026-01-27  8:28 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-21 13:54 [PATCH bpf-next 1/2] bpf: add new BPF_CGROUP_ITER_CHILDREN_ONLY control option Matt Bobrowski
2026-01-21 13:54 ` [PATCH bpf-next 2/2] bpf/selftests: cover " Matt Bobrowski
2026-01-21 19:14 ` [PATCH bpf-next 1/2] bpf: add new " Song Liu
2026-01-22 12:31   ` Matt Bobrowski
2026-01-23  4:26 ` Alexei Starovoitov
2026-01-23 11:06   ` Matt Bobrowski
2026-01-23 17:17     ` Alexei Starovoitov
2026-01-26  9:03       ` Matt Bobrowski
2026-01-27  2:26         ` Alexei Starovoitov
2026-01-27  8:28           ` Matt Bobrowski [this message]
2026-01-23 18:50     ` Tejun Heo
2026-01-26  9:14       ` Matt Bobrowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aXh3L7BukANG6Dve@google.com \
    --to=mattbobrowski@google.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=sdf@fomichev.me \
    --cc=song@kernel.org \
    --cc=tj@kernel.org \
    --cc=yonghong.song@linux.dev \
    --cc=zhouchuyi@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox