public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Matt Bobrowski <mattbobrowski@google.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
	Yonghong Song <yonghong.song@linux.dev>,
	ohn Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@fomichev.me>,
	Jiri Olsa <jolsa@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Chuyi Zhou <zhouchuyi@bytedance.com>, Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH bpf-next 1/2] bpf: add new BPF_CGROUP_ITER_CHILDREN_ONLY control option
Date: Mon, 26 Jan 2026 09:03:49 +0000	[thread overview]
Message-ID: <aXct9XxD43eBZs-H@google.com> (raw)
In-Reply-To: <CAADnVQKHuj1snSOBh+VEJrcW82OW2zNOEd1=Dqe-m+kuOV7MWw@mail.gmail.com>

On Fri, Jan 23, 2026 at 09:17:03AM -0800, Alexei Starovoitov wrote:
> On Fri, Jan 23, 2026 at 3:06 AM Matt Bobrowski <mattbobrowski@google.com> wrote:
> >
> > On Thu, Jan 22, 2026 at 08:26:49PM -0800, Alexei Starovoitov wrote:
> > > On Wed, Jan 21, 2026 at 5:54 AM Matt Bobrowski <mattbobrowski@google.com> wrote:
> > > >
> > > >                 break;
> > > > +       case BPF_CGROUP_ITER_CHILDREN_ONLY:
> > > > +               kit->pos = css_next_child(kit->pos, kit->start);
> > > > +               break;
> > >
> > > There are no users of css_next_child() outside of cgroup
> > > internals. It's a red flag for me.
> >
> > Hm, I can see the slight hesitation here. However, until somewhat
> > recently, the same argument could have been applied to functions like
> > css_next_descendant_post(), right?
> >
> > css_next_descendant_post() has rather recently started seeing usage in
> > other subsystems (block/, kernel/sched/, kernel/bpf), despite
> > remaining internal-only since its inception over a decade ago. Given
> > that css_next_descendant_post() continues to remain unexported, in all
> > fairness, I don't see css_next_child() being all that different in
> > this context.
> >
> > Would your stance change if Tejun agreed to mark css_next_child() as
> > exportable, or simply agreed to it being used from BPF cgroup
> > iterators?
> >
> > Both css_next_child() and css_for_each_child() have remained virtually
> > unchanged - and therefore inherently stable - since their
> > introduction. I should also mention that functions like
> > css_next_child() and css_next_descendant_post() were originally marked
> > as exported, and only later unmarked. This presumably was done as part
> > of a symbol cleanup, not because the functions were unstable
> > primitives.
> >
> > > I feel there is something wrong with the use case. It should have
> > > been handled by the primitives we have.
> >
> > Right, arguably it could be handled with the current set of
> > primitives, but it isn't nice and certainly not as efficient as I'd
> > like.
> 
> Why is it? Just descdant_pre and break out of the loop if not immediate.

If I use a break when I hit a grandchild in pre-order traversal, I
will still terminate the loop early and miss all the subsequent
immediate children?

I do understand that using css_next_descendant_post() (without a
break) could allow me to logically filter for children in the
loop. However, my concern is purely about algorithmic complexity and
scalability.

css_next_descendant_post() performs a full subtree walk
(O(Descendants)). If I have a top-level cgroup with only a few
children but a massive hierarchy underneath (hundreds/thousands of
grandchildren), using the descendant based iterator requires me to
literally visit every single node in that subtree.

In contrast, iterating immediate children is just a linked-list walk
(O(Children)). For deep hierarchies, the performance difference is not
constant - it's orders of magnitude.

For example, if I have 5 children under a given root/parent with each
of those 5 children having 500 nested descedants each:
- A pre-order traversal would lead me to visiting 2505 nodes.
- A linear traversal would lead me to visting 5 nodes.

> The thought process should be to use what's available as much as possible.

Well, I get that and it was my initial idea. But, I quickly realized
that we have a gap in the current iterators capabilities, and this
patch is meant to address this gap.

> Adding UAPI for specific use cases isn't great.
> If it was a new kfunc that's one thing, but here you're touching uapi/bpf.h
> So acceptance bar is much much higher.

OK, but now that Tejun confirmed [0] that he's fine with exposing
css_next_child(), how does this weigh up?

[0] https://lore.kernel.org/bpf/aXPC_KMX_rvO14lR@slm.duckdns.org/

  reply	other threads:[~2026-01-26  9:03 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-21 13:54 [PATCH bpf-next 1/2] bpf: add new BPF_CGROUP_ITER_CHILDREN_ONLY control option Matt Bobrowski
2026-01-21 13:54 ` [PATCH bpf-next 2/2] bpf/selftests: cover " Matt Bobrowski
2026-01-21 19:14 ` [PATCH bpf-next 1/2] bpf: add new " Song Liu
2026-01-22 12:31   ` Matt Bobrowski
2026-01-23  4:26 ` Alexei Starovoitov
2026-01-23 11:06   ` Matt Bobrowski
2026-01-23 17:17     ` Alexei Starovoitov
2026-01-26  9:03       ` Matt Bobrowski [this message]
2026-01-27  2:26         ` Alexei Starovoitov
2026-01-27  8:28           ` Matt Bobrowski
2026-01-23 18:50     ` Tejun Heo
2026-01-26  9:14       ` Matt Bobrowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aXct9XxD43eBZs-H@google.com \
    --to=mattbobrowski@google.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=sdf@fomichev.me \
    --cc=song@kernel.org \
    --cc=tj@kernel.org \
    --cc=yonghong.song@linux.dev \
    --cc=zhouchuyi@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox