From: Yonghong Song <yhs@meta.com>
To: Yosry Ahmed <yosryahmed@google.com>, Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
kernel-team@fb.com, KP Singh <kpsingh@kernel.org>,
Martin KaFai Lau <martin.lau@kernel.org>,
Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH bpf-next v2 2/6] bpf: Implement cgroup storage available to non-cgroup-attached bpf progs
Date: Fri, 21 Oct 2022 14:05:29 -0700 [thread overview]
Message-ID: <9abed364-39a3-b8c0-78ba-9d9ec45277ee@meta.com> (raw)
In-Reply-To: <CAJD7tkZb8vRWbBn5Z75MXf_g8tYTThYgkLXYKPUT0zzcRaK7+w@mail.gmail.com>
On 10/21/22 12:29 PM, Yosry Ahmed wrote:
> On Thu, Oct 20, 2022 at 3:13 PM Yonghong Song <yhs@fb.com> wrote:
>>
>> Similar to sk/inode/task storage, implement similar cgroup local storage.
>>
>> There already exists a local storage implementation for cgroup-attached
>> bpf programs. See map type BPF_MAP_TYPE_CGROUP_STORAGE and helper
>> bpf_get_local_storage(). But there are use cases such that non-cgroup
>> attached bpf progs wants to access cgroup local storage data. For example,
>> tc egress prog has access to sk and cgroup. It is possible to use
>> sk local storage to emulate cgroup local storage by storing data in socket.
>> But this is a waste as it could be lots of sockets belonging to a particular
>> cgroup. Alternatively, a separate map can be created with cgroup id as the key.
>> But this will introduce additional overhead to manipulate the new map.
>> A cgroup local storage, similar to existing sk/inode/task storage,
>> should help for this use case.
>>
>> The life-cycle of storage is managed with the life-cycle of the
>> cgroup struct. i.e. the storage is destroyed along with the owning cgroup
>> with a callback to the bpf_cgrp_storage_free when cgroup itself
>> is deleted.
>>
>> The userspace map operations can be done by using a cgroup fd as a key
>> passed to the lookup, update and delete operations.
>>
>> Typically, the following code is used to get the current cgroup:
>> struct task_struct *task = bpf_get_current_task_btf();
>> ... task->cgroups->dfl_cgrp ...
>> and in structure task_struct definition:
>> struct task_struct {
>> ....
>> struct css_set __rcu *cgroups;
>> ....
>> }
>> With sleepable program, accessing task->cgroups is not protected by rcu_read_lock.
>> So the current implementation only supports non-sleepable program and supporting
>> sleepable program will be the next step together with adding rcu_read_lock
>> protection for rcu tagged structures.
>>
>> Since map name BPF_MAP_TYPE_CGROUP_STORAGE has been used for old cgroup local
>> storage support, the new map name BPF_MAP_TYPE_CGRP_STORAGE is used
>> for cgroup storage available to non-cgroup-attached bpf programs. The old
>> cgroup storage supports bpf_get_local_storage() helper to get the cgroup data.
>> The new cgroup storage helper bpf_cgrp_storage_get() can provide similar
>> functionality. While old cgroup storage pre-allocates storage memory, the new
>> mechanism can also pre-allocate with a user space bpf_map_update_elem() call
>> to avoid potential run-time memory allocaiton failure.
>> Therefore, the new cgroup storage can provide all functionality w.r.t.
>> the old one. So in uapi bpf.h, the old BPF_MAP_TYPE_CGROUP_STORAGE is alias to
>> BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED to indicate the old cgroup storage can
>> be deprecated since the new one can provide the same functionality.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>> include/linux/bpf.h | 3 +
>> include/linux/bpf_types.h | 1 +
>> include/linux/cgroup-defs.h | 4 +
>> include/uapi/linux/bpf.h | 48 +++++-
>> kernel/bpf/Makefile | 2 +-
>> kernel/bpf/bpf_cgrp_storage.c | 276 +++++++++++++++++++++++++++++++++
>> kernel/bpf/helpers.c | 6 +
>> kernel/bpf/syscall.c | 3 +-
>> kernel/bpf/verifier.c | 13 +-
>> kernel/cgroup/cgroup.c | 4 +
>> kernel/trace/bpf_trace.c | 4 +
>> scripts/bpf_doc.py | 2 +
>> tools/include/uapi/linux/bpf.h | 48 +++++-
>> 13 files changed, 409 insertions(+), 5 deletions(-)
>> create mode 100644 kernel/bpf/bpf_cgrp_storage.c
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 9e7d46d16032..674da3129248 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -2045,6 +2045,7 @@ struct bpf_link *bpf_link_by_id(u32 id);
>>
>> const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id);
>> void bpf_task_storage_free(struct task_struct *task);
>> +void bpf_cgrp_storage_free(struct cgroup *cgroup);
>> bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog);
>> const struct btf_func_model *
>> bpf_jit_find_kfunc_model(const struct bpf_prog *prog,
>> @@ -2537,6 +2538,8 @@ extern const struct bpf_func_proto bpf_copy_from_user_task_proto;
>> extern const struct bpf_func_proto bpf_set_retval_proto;
>> extern const struct bpf_func_proto bpf_get_retval_proto;
>> extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto;
>> +extern const struct bpf_func_proto bpf_cgrp_storage_get_proto;
>> +extern const struct bpf_func_proto bpf_cgrp_storage_delete_proto;
>>
>> const struct bpf_func_proto *tracing_prog_func_proto(
>> enum bpf_func_id func_id, const struct bpf_prog *prog);
>> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
>> index 2c6a4f2562a7..f9d5aa62fed0 100644
>> --- a/include/linux/bpf_types.h
>> +++ b/include/linux/bpf_types.h
>> @@ -90,6 +90,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_CGROUP_ARRAY, cgroup_array_map_ops)
>> #ifdef CONFIG_CGROUP_BPF
>> BPF_MAP_TYPE(BPF_MAP_TYPE_CGROUP_STORAGE, cgroup_storage_map_ops)
>> BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, cgroup_storage_map_ops)
>> +BPF_MAP_TYPE(BPF_MAP_TYPE_CGRP_STORAGE, cgrp_storage_map_ops)
>> #endif
>> BPF_MAP_TYPE(BPF_MAP_TYPE_HASH, htab_map_ops)
>> BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_HASH, htab_percpu_map_ops)
>> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
>> index 8f481d1b159a..4a72bc3a0a4e 100644
>> --- a/include/linux/cgroup-defs.h
>> +++ b/include/linux/cgroup-defs.h
>> @@ -504,6 +504,10 @@ struct cgroup {
>> /* Used to store internal freezer state */
>> struct cgroup_freezer_state freezer;
>>
>> +#ifdef CONFIG_CGROUP_BPF
>> + struct bpf_local_storage __rcu *bpf_cgrp_storage;
>> +#endif
>
> Why is this protected by CONFIG_CGROUP_BPF as opposed to
> CONFIG_CGROUPS && CONFIG_BPF_SYSCALL?
>
> It seems to me (and you also point this out in a different reply) that
> CONFIG_CGROUP_BPF is mostly used for bpf programs that attach to
> cgroups, right?
Right. It should be CONFIG_BPF_SYSCALL. My v1 actually used
CONFIG_BPF_SYSCALL and changed to CONFIG_CGROUP_BPF in v2 thinking
it should work, but kernel test bot complains. Will change back
to CONFIG_BPF_SYSCALL in v3.
>
> (same for the freeing site)
>
>> +
>> /* All ancestors including self */
>> struct cgroup *ancestors[];
>> };
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 17f61338f8f8..2d7f79bf3500 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -922,7 +922,14 @@ enum bpf_map_type {
>> BPF_MAP_TYPE_CPUMAP,
>> BPF_MAP_TYPE_XSKMAP,
>> BPF_MAP_TYPE_SOCKHASH,
>> - BPF_MAP_TYPE_CGROUP_STORAGE,
>> + BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED,
>> + /* BPF_MAP_TYPE_CGROUP_STORAGE is available to bpf programs attaching
>> + * to a cgroup. The newer BPF_MAP_TYPE_CGRP_STORAGE is available to
>> + * both cgroup-attached and other progs and supports all functionality
>> + * provided by BPF_MAP_TYPE_CGROUP_STORAGE. So mark
>> + * BPF_MAP_TYPE_CGROUP_STORAGE deprecated.
>> + */
>> + BPF_MAP_TYPE_CGROUP_STORAGE = BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED,
>> BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
>> BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
>> BPF_MAP_TYPE_QUEUE,
>> @@ -935,6 +942,7 @@ enum bpf_map_type {
>> BPF_MAP_TYPE_TASK_STORAGE,
>> BPF_MAP_TYPE_BLOOM_FILTER,
>> BPF_MAP_TYPE_USER_RINGBUF,
>> + BPF_MAP_TYPE_CGRP_STORAGE,
>> };
>>
[...]
next prev parent reply other threads:[~2022-10-21 21:06 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-20 22:12 [PATCH bpf-next v2 0/6] bpf: Implement cgroup local storage available to non-cgroup-attached bpf progs Yonghong Song
2022-10-20 22:13 ` [PATCH bpf-next v2 1/6] bpf: Make struct cgroup btf id global Yonghong Song
2022-10-20 22:13 ` [PATCH bpf-next v2 2/6] bpf: Implement cgroup storage available to non-cgroup-attached bpf progs Yonghong Song
2022-10-21 5:22 ` David Vernet
2022-10-21 5:26 ` David Vernet
2022-10-21 17:33 ` Yonghong Song
2022-10-21 19:57 ` David Vernet
2022-10-21 22:57 ` Yonghong Song
2022-10-22 3:02 ` David Vernet
2022-10-23 16:45 ` Yonghong Song
2022-10-23 21:14 ` David Vernet
[not found] ` <202210210932.nHqTyTmx-lkp@intel.com>
2022-10-21 16:51 ` Yonghong Song
2022-10-21 19:29 ` Yosry Ahmed
2022-10-21 21:05 ` Yonghong Song [this message]
2022-10-20 22:13 ` [PATCH bpf-next v2 3/6] libbpf: Support new cgroup local storage Yonghong Song
2022-10-21 23:10 ` Andrii Nakryiko
2022-10-22 0:32 ` Yonghong Song
2022-10-22 1:05 ` Tejun Heo
2022-10-20 22:13 ` [PATCH bpf-next v2 4/6] bpftool: " Yonghong Song
2022-10-20 22:13 ` [PATCH bpf-next v2 5/6] selftests/bpf: Add selftests for " Yonghong Song
2022-10-20 22:13 ` [PATCH bpf-next v2 6/6] docs/bpf: Add documentation for map type BPF_MAP_TYPE_CGRP_STROAGE Yonghong Song
2022-10-21 7:12 ` David Vernet
2022-10-21 17:46 ` Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9abed364-39a3-b8c0-78ba-9d9ec45277ee@meta.com \
--to=yhs@meta.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@fb.com \
--cc=kpsingh@kernel.org \
--cc=martin.lau@kernel.org \
--cc=tj@kernel.org \
--cc=yhs@fb.com \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox