Re: Subject: [PATCH bpf-next 2/3] bpf: drop KF_ACQUIRE flag on BPF kfunc bpf_get_root_mem_cgroup()

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

From: Roman Gushchin <roman.gushchin@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Matt Bobrowski <mattbobrowski@google.com>,
	 Alexei Starovoitov <ast@kernel.org>,
	 Daniel Borkmann <daniel@iogearbox.net>,
	 Andrii Nakryiko <andrii@kernel.org>,
	 Martin KaFai Lau <martin.lau@linux.dev>,
	Eduard Zingerman <eddyz87@gmail.com>,  Song Liu <song@kernel.org>,
	Yonghong Song <yonghong.song@linux.dev>,
	 ohn Fastabend <john.fastabend@gmail.com>,
	 KP Singh <kpsingh@kernel.org>,
	 Stanislav Fomichev <sdf@fomichev.me>,
	 Jiri Olsa <jolsa@kernel.org>,
	 Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	 bpf <bpf@vger.kernel.org>
Subject: Re: Subject: [PATCH bpf-next 2/3] bpf: drop KF_ACQUIRE flag on BPF kfunc bpf_get_root_mem_cgroup()
Date: Fri, 16 Jan 2026 13:18:02 -0800	[thread overview]
Message-ID: <878qdx6yut.fsf@linux.dev> (raw)
In-Reply-To: <CAADnVQ+45MorO=pODKOEVXhpY1skVy1tPkkABPAxDJGx4vOijg@mail.gmail.com> (Alexei Starovoitov's message of "Fri, 16 Jan 2026 08:12:19 -0800")

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Fri, Jan 16, 2026 at 7:22 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>>
>> On Thu, Jan 15, 2026 at 11:55 PM Matt Bobrowski
>> <mattbobrowski@google.com> wrote:
>> >
>> > On Thu, Jan 15, 2026 at 08:54:42PM -0800, Roman Gushchin wrote:
>> > >
>> > > > With the BPF verifier now treating pointers to struct types returned
>> > > > from BPF kfuncs as implicitly trusted by default, there is no need for
>> > > > bpf_get_root_mem_cgroup() to be annotated with the KF_ACQUIRE flag.
>> > >
>> > > > bpf_get_root_mem_cgroup() does not acquire any references, but rather
>> > > > simply returns a NULL pointer or a pointer to a struct mem_cgroup
>> > > > object that is valid for the entire lifetime of the kernel.
>> > >
>> > > > This simplifies BPF programs using this kfunc by removing the
>> > > > requirement to pair the call with bpf_put_mem_cgroup().
>> > >
>> > > It's actually the opposite: having the get semantics (which is also
>> > > suggested by the name) allows to treat the root memory cgroup exactly
>> > > as any other. And it makes the code much simpler, otherwise you
>> > > need to have these ugly checks across the codebase:
>> > >       if (memcg != root_mem_cgroup)
>> > >               css_put(&memcg->css);
>> >
>> > I mean, you're certainly not forced to do this. But, I do also see
>> > what you mean.
>> >
>> > > This is why __all__ memcg && cgroup code follows this principle and the
>> > > hides the special handling of the root memory cgroup within
>> > > css_get()/css_put().
>> > >
>> > > I wasn't cc'ed on this series, otherwise I'd nack this patch.
>> > > If the overhead of an extra kfunc call is a concern here (which I
>> > > doubt), we can introduce a non-acquire bpf_root_mem_cgroup()
>> > > version.
>> > >
>> > > And I strongly suggest to revert this change.
>> >
>> > Apologies, I honestly thought I did CC you on this series. Don't know
>> > what happened with that. Anyway, I'm totally OK with reverting this
>> > patch and keeping bpf_get_root_mem_cgroup() with KF_ACQUIRE
>> > semantics. bpf_get_root_mem_cgroup() was selected as it was the very
>> > first BPF kfunc that came to mind where implicit trusted pointer
>> > semantics should be applied by the BPF verifier.
>> >
>> > Notably, the follow up selftest patch [0] will also need to be
>> > reverted if so as it relies on bpf_get_root_mem_cgroup() without
>> > KF_ACQUIRE. We can probably
>> >
>> > [0] https://lore.kernel.org/bpf/20260113083949.2502978-2-mattbobrowski@google.com/T/#mfa14fb83b3350c25f961fd43dc4df9b25d00c5f5
>>
>> Instead of revert of two patches, let's revert one and replace
>> with test kfunc that 2nd patch can use.
>>
>> tbh I don't think it's a big deal in practice.
>> Kernel code working with cgroups might be different than bpf.
>> I'm not sure what was the use case for bpf_get_root_mem_cgroup().
>>
>> Roman,
>> please share your protype bpf code for oom, so it's easier to see
>> why non-acquire semantics for bpf_get_root_mem_cgroup() are problematic.
>
> Actually, thinking more about it, bpf_get_root_mem_cgroup() should NOT have
> an acquire semantics, otherwise you cannot even implement:
>
> static inline bool mem_cgroup_is_root(struct mem_cgroup *memcg)
> {
>         return (memcg == root_mem_cgroup);
> }

You can check memcg->css.parent == NULL instead.

>
> without ugliness:
>
> static inline bool bpf_mem_cgroup_is_root(struct mem_cgroup *memcg)
> {
>         struct mem_cgroup *root_memcg = bpf_get_root_mem_cgroup();
>         bool ret = memcg == root_memcg;
>
>         bpf_put_mem_cgroup(root_memcg);
>         return ret;
> }

Maybe we need both, but if root_mem_cgroup is handled different, you
can't do a very natural thing like:

some_func (struct *mem_cgroup subtree_root) {
          struct mem_cgroup *memcg = subtree_root ?  subtree_root : bpf_get_root_mem_cgroup();

          // iterate over subtree


or you can't pass a pointer (with a reference) to a function or a work
with the assumption that it should drop the reference at the end.

Basically you can't easily mix the root_mem_cgroup pointer with normal
memcg pointers.

E.g. in my bpfoom case:

SEC("struct_ops.s/handle_out_of_memory")
int BPF_PROG(test_out_of_memory, struct oom_control *oc, struct bpf_struct_ops_link *link)
{
	struct task_struct *task;
	struct mem_cgroup *root_memcg = oc->memcg;
	struct mem_cgroup *memcg, *victim = NULL;
	struct cgroup_subsys_state *css_pos, *css;
	unsigned long usage, max_usage = 0;
	unsigned long pagecache = 0;
	int ret = 0;

	if (root_memcg)
		root_memcg = bpf_get_mem_cgroup(&root_memcg->css);
	else
		root_memcg = bpf_get_root_mem_cgroup();

	if (!root_memcg)
		return 0;

	css = &root_memcg->css;
	if (css && css->cgroup == link->cgroup)
		goto exit;

	bpf_rcu_read_lock();
	bpf_for_each(css, css_pos, &root_memcg->css, BPF_CGROUP_ITER_DESCENDANTS_POST) {
		if (css_pos->cgroup->nr_descendants + css_pos->cgroup->nr_dying_descendants)
			continue;

		memcg = bpf_get_mem_cgroup(css_pos);
		if (!memcg)
			continue;

                < ... >

		bpf_put_mem_cgroup(memcg);
	}
	bpf_rcu_read_unlock();

        < ... >

	bpf_put_mem_cgroup(victim);
exit:
	bpf_put_mem_cgroup(root_memcg);

	return ret;
}

--

How to write it without get semantics?

Thanks!

next prev parent reply	other threads:[~2026-01-16 21:18 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-13  8:39 [PATCH bpf-next 1/3] bpf: return PTR_TO_BTF_ID | PTR_TRUSTED from BPF kfuncs by default Matt Bobrowski
2026-01-13  8:39 ` [PATCH bpf-next 2/3] bpf: drop KF_ACQUIRE flag on BPF kfunc bpf_get_root_mem_cgroup() Matt Bobrowski
2026-01-13  9:25   ` Kumar Kartikeya Dwivedi
2026-01-16  4:54   ` Subject: " Roman Gushchin
2026-01-16  7:55     ` Matt Bobrowski
2026-01-16 15:22       ` Alexei Starovoitov
2026-01-16 16:12         ` Alexei Starovoitov
2026-01-16 21:18           ` Roman Gushchin [this message]
2026-01-20  1:29             ` Alexei Starovoitov
2026-01-20  6:52               ` Matt Bobrowski
2026-01-20  9:19                 ` Matt Bobrowski
2026-01-21  1:00               ` Roman Gushchin
2026-01-21  1:14                 ` Alexei Starovoitov
2026-01-21  9:05                   ` Matt Bobrowski
2026-01-13  8:39 ` [PATCH bpf-next 3/3] selftests/bpf: assert BPF kfunc default trusted pointer semantics Matt Bobrowski
2026-01-13  9:26   ` Kumar Kartikeya Dwivedi
2026-01-13  9:22 ` [PATCH bpf-next 1/3] bpf: return PTR_TO_BTF_ID | PTR_TRUSTED from BPF kfuncs by default Kumar Kartikeya Dwivedi
2026-01-14  3:30 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878qdx6yut.fsf@linux.dev \
    --to=roman.gushchin@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=mattbobrowski@google.com \
    --cc=memxor@gmail.com \
    --cc=sdf@fomichev.me \
    --cc=song@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox