All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matt Bobrowski <mattbobrowski@google.com>
To: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Eduard Zingerman <eddyz87@gmail.com>,
	Yonghong Song <yonghong.song@linux.dev>,
	ohn Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@fomichev.me>,
	Jiri Olsa <jolsa@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Chuyi Zhou <zhouchuyi@bytedance.com>, Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH bpf-next 1/2] bpf: add new BPF_CGROUP_ITER_CHILDREN_ONLY control option
Date: Thu, 22 Jan 2026 12:31:44 +0000	[thread overview]
Message-ID: <aXIYsFwaov8SQczm@google.com> (raw)
In-Reply-To: <CAPhsuW6B27WYpkT-ZY1wgBUNQknM9pKZm+oGk+bM_p33mVPx_A@mail.gmail.com>

On Wed, Jan 21, 2026 at 11:14:10AM -0800, Song Liu wrote:
> On Wed, Jan 21, 2026 at 5:54 AM Matt Bobrowski <mattbobrowski@google.com> wrote:
> >
> > Currently, the BPF cgroup iterator supports walking descendants in
> > either pre-order (BPF_CGROUP_ITER_DESCENDANTS_PRE) or post-order
> > (BPF_CGROUP_ITER_DESCENDANTS_POST). These modes perform an exhaustive
> > depth-first search (DFS) of the hierarchy. In scenarios where a BPF
> > program may need to inspect only the direct children of a given parent
> > cgroup, a full DFS is unnecessarily expensive.
> >
> > This patch introduces a new BPF cgroup iterator control option,
> > BPF_CGROUP_ITER_CHILDREN_ONLY. This control option restricts the
> > traversal to the immediate children of a specified parent cgroup,
> > allowing for more targeted and efficient iteration, particularly when
> > exhaustive depth-first search (DFS) traversal is not required.
> >
> > Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
> 
> The code looks good to me.
>
> Some nitpick and some high level questions/ideas below

Sure, thank you for taking a look!

> >  enum bpf_cgroup_iter_order {
> >         BPF_CGROUP_ITER_ORDER_UNSPEC = 0,
> > -       BPF_CGROUP_ITER_SELF_ONLY,              /* process only a single object. */
> > -       BPF_CGROUP_ITER_DESCENDANTS_PRE,        /* walk descendants in pre-order. */
> > -       BPF_CGROUP_ITER_DESCENDANTS_POST,       /* walk descendants in post-order. */
> > -       BPF_CGROUP_ITER_ANCESTORS_UP,           /* walk ancestors upward. */
> > +       BPF_CGROUP_ITER_SELF_ONLY,              /* process only a single object. */
> > +       BPF_CGROUP_ITER_DESCENDANTS_PRE,        /* walk descendants in pre-order. */
> > +       BPF_CGROUP_ITER_DESCENDANTS_POST,       /* walk descendants in post-order. */
> > +       BPF_CGROUP_ITER_ANCESTORS_UP,           /* walk ancestors upward. */
> 
> Changes above seem unnecessary.

This is just noise, sorry. Will revert this once I send out v2.

> > +       /*
> > +        * Walks the immediate children of the specified parent
> > +        * cgroup_subsys_state. Unlike BPF_CGROUP_ITER_DESCENDANTS_PRE,
> > +        * BPF_CGROUP_ITER_DESCENDANTS_POST, and BPF_CGROUP_ITER_ANCESTORS_UP
> > +        * the iterator does not include the specified parent as one of the
> > +        * returned iterator elements.
> > +        */
> > +       BPF_CGROUP_ITER_CHILDREN_ONLY,
> >  };
> 
> [...]
> 
> > @@ -320,6 +332,7 @@ __bpf_kfunc int bpf_iter_css_new(struct bpf_iter_css *it,
> >         case BPF_CGROUP_ITER_DESCENDANTS_PRE:
> >         case BPF_CGROUP_ITER_DESCENDANTS_POST:
> >         case BPF_CGROUP_ITER_ANCESTORS_UP:
> > +       case BPF_CGROUP_ITER_CHILDREN_ONLY:
> >                 break;
> >         default:
> >                 return -EINVAL;
> > @@ -345,6 +358,9 @@ __bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *i
> >         case BPF_CGROUP_ITER_DESCENDANTS_POST:
> >                 kit->pos = css_next_descendant_post(kit->pos, kit->start);
> >                 break;
> > +       case BPF_CGROUP_ITER_CHILDREN_ONLY:
> > +               kit->pos = css_next_child(kit->pos, kit->start);
> > +               break;
> >         case BPF_CGROUP_ITER_ANCESTORS_UP:
> >                 kit->pos = kit->pos ? kit->pos->parent : kit->start;
> >         }
> 
> I wonder whether we can use the return values and/or the kfuncs to
> enable more flexible walks. For example, some return value means
> "do not walk more children, go back to the parent". This will enable
> use cases like "walk children and grandchildren nodes from here,
> but not any deeper". WDYT?

Some general control over whether the DFS traversal should continue in
a specific direction could be nice. That way you could reverse out of
a specific subtree/branch early without necessarily having to walk all
nodes under a given subtree/branch.

With the above said however, it's not really a control/semantic which
I'm after at this point. I'm purely interested in exhaustively
iterating children of a given parent before selectively deciding which
subtree/branch, and therefore parent, I'd like to explore next.

  reply	other threads:[~2026-01-22 12:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-21 13:54 [PATCH bpf-next 1/2] bpf: add new BPF_CGROUP_ITER_CHILDREN_ONLY control option Matt Bobrowski
2026-01-21 13:54 ` [PATCH bpf-next 2/2] bpf/selftests: cover " Matt Bobrowski
2026-01-21 19:14 ` [PATCH bpf-next 1/2] bpf: add new " Song Liu
2026-01-22 12:31   ` Matt Bobrowski [this message]
2026-01-23  4:26 ` Alexei Starovoitov
2026-01-23 11:06   ` Matt Bobrowski
2026-01-23 17:17     ` Alexei Starovoitov
2026-01-26  9:03       ` Matt Bobrowski
2026-01-27  2:26         ` Alexei Starovoitov
2026-01-27  8:28           ` Matt Bobrowski
2026-01-23 18:50     ` Tejun Heo
2026-01-26  9:14       ` Matt Bobrowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aXIYsFwaov8SQczm@google.com \
    --to=mattbobrowski@google.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=sdf@fomichev.me \
    --cc=song@kernel.org \
    --cc=tj@kernel.org \
    --cc=yonghong.song@linux.dev \
    --cc=zhouchuyi@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.