public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yhs@meta.com>
To: Yosry Ahmed <yosryahmed@google.com>, Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	kernel-team@fb.com, KP Singh <kpsingh@kernel.org>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH bpf-next v2 2/6] bpf: Implement cgroup storage available to non-cgroup-attached bpf progs
Date: Fri, 21 Oct 2022 14:05:29 -0700	[thread overview]
Message-ID: <9abed364-39a3-b8c0-78ba-9d9ec45277ee@meta.com> (raw)
In-Reply-To: <CAJD7tkZb8vRWbBn5Z75MXf_g8tYTThYgkLXYKPUT0zzcRaK7+w@mail.gmail.com>



On 10/21/22 12:29 PM, Yosry Ahmed wrote:
> On Thu, Oct 20, 2022 at 3:13 PM Yonghong Song <yhs@fb.com> wrote:
>>
>> Similar to sk/inode/task storage, implement similar cgroup local storage.
>>
>> There already exists a local storage implementation for cgroup-attached
>> bpf programs.  See map type BPF_MAP_TYPE_CGROUP_STORAGE and helper
>> bpf_get_local_storage(). But there are use cases such that non-cgroup
>> attached bpf progs wants to access cgroup local storage data. For example,
>> tc egress prog has access to sk and cgroup. It is possible to use
>> sk local storage to emulate cgroup local storage by storing data in socket.
>> But this is a waste as it could be lots of sockets belonging to a particular
>> cgroup. Alternatively, a separate map can be created with cgroup id as the key.
>> But this will introduce additional overhead to manipulate the new map.
>> A cgroup local storage, similar to existing sk/inode/task storage,
>> should help for this use case.
>>
>> The life-cycle of storage is managed with the life-cycle of the
>> cgroup struct.  i.e. the storage is destroyed along with the owning cgroup
>> with a callback to the bpf_cgrp_storage_free when cgroup itself
>> is deleted.
>>
>> The userspace map operations can be done by using a cgroup fd as a key
>> passed to the lookup, update and delete operations.
>>
>> Typically, the following code is used to get the current cgroup:
>>      struct task_struct *task = bpf_get_current_task_btf();
>>      ... task->cgroups->dfl_cgrp ...
>> and in structure task_struct definition:
>>      struct task_struct {
>>          ....
>>          struct css_set __rcu            *cgroups;
>>          ....
>>      }
>> With sleepable program, accessing task->cgroups is not protected by rcu_read_lock.
>> So the current implementation only supports non-sleepable program and supporting
>> sleepable program will be the next step together with adding rcu_read_lock
>> protection for rcu tagged structures.
>>
>> Since map name BPF_MAP_TYPE_CGROUP_STORAGE has been used for old cgroup local
>> storage support, the new map name BPF_MAP_TYPE_CGRP_STORAGE is used
>> for cgroup storage available to non-cgroup-attached bpf programs. The old
>> cgroup storage supports bpf_get_local_storage() helper to get the cgroup data.
>> The new cgroup storage helper bpf_cgrp_storage_get() can provide similar
>> functionality. While old cgroup storage pre-allocates storage memory, the new
>> mechanism can also pre-allocate with a user space bpf_map_update_elem() call
>> to avoid potential run-time memory allocaiton failure.
>> Therefore, the new cgroup storage can provide all functionality w.r.t.
>> the old one. So in uapi bpf.h, the old BPF_MAP_TYPE_CGROUP_STORAGE is alias to
>> BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED to indicate the old cgroup storage can
>> be deprecated since the new one can provide the same functionality.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   include/linux/bpf.h            |   3 +
>>   include/linux/bpf_types.h      |   1 +
>>   include/linux/cgroup-defs.h    |   4 +
>>   include/uapi/linux/bpf.h       |  48 +++++-
>>   kernel/bpf/Makefile            |   2 +-
>>   kernel/bpf/bpf_cgrp_storage.c  | 276 +++++++++++++++++++++++++++++++++
>>   kernel/bpf/helpers.c           |   6 +
>>   kernel/bpf/syscall.c           |   3 +-
>>   kernel/bpf/verifier.c          |  13 +-
>>   kernel/cgroup/cgroup.c         |   4 +
>>   kernel/trace/bpf_trace.c       |   4 +
>>   scripts/bpf_doc.py             |   2 +
>>   tools/include/uapi/linux/bpf.h |  48 +++++-
>>   13 files changed, 409 insertions(+), 5 deletions(-)
>>   create mode 100644 kernel/bpf/bpf_cgrp_storage.c
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 9e7d46d16032..674da3129248 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -2045,6 +2045,7 @@ struct bpf_link *bpf_link_by_id(u32 id);
>>
>>   const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id);
>>   void bpf_task_storage_free(struct task_struct *task);
>> +void bpf_cgrp_storage_free(struct cgroup *cgroup);
>>   bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog);
>>   const struct btf_func_model *
>>   bpf_jit_find_kfunc_model(const struct bpf_prog *prog,
>> @@ -2537,6 +2538,8 @@ extern const struct bpf_func_proto bpf_copy_from_user_task_proto;
>>   extern const struct bpf_func_proto bpf_set_retval_proto;
>>   extern const struct bpf_func_proto bpf_get_retval_proto;
>>   extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto;
>> +extern const struct bpf_func_proto bpf_cgrp_storage_get_proto;
>> +extern const struct bpf_func_proto bpf_cgrp_storage_delete_proto;
>>
>>   const struct bpf_func_proto *tracing_prog_func_proto(
>>     enum bpf_func_id func_id, const struct bpf_prog *prog);
>> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
>> index 2c6a4f2562a7..f9d5aa62fed0 100644
>> --- a/include/linux/bpf_types.h
>> +++ b/include/linux/bpf_types.h
>> @@ -90,6 +90,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_CGROUP_ARRAY, cgroup_array_map_ops)
>>   #ifdef CONFIG_CGROUP_BPF
>>   BPF_MAP_TYPE(BPF_MAP_TYPE_CGROUP_STORAGE, cgroup_storage_map_ops)
>>   BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, cgroup_storage_map_ops)
>> +BPF_MAP_TYPE(BPF_MAP_TYPE_CGRP_STORAGE, cgrp_storage_map_ops)
>>   #endif
>>   BPF_MAP_TYPE(BPF_MAP_TYPE_HASH, htab_map_ops)
>>   BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_HASH, htab_percpu_map_ops)
>> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
>> index 8f481d1b159a..4a72bc3a0a4e 100644
>> --- a/include/linux/cgroup-defs.h
>> +++ b/include/linux/cgroup-defs.h
>> @@ -504,6 +504,10 @@ struct cgroup {
>>          /* Used to store internal freezer state */
>>          struct cgroup_freezer_state freezer;
>>
>> +#ifdef CONFIG_CGROUP_BPF
>> +       struct bpf_local_storage __rcu  *bpf_cgrp_storage;
>> +#endif
> 
> Why is this protected by CONFIG_CGROUP_BPF as opposed to
> CONFIG_CGROUPS && CONFIG_BPF_SYSCALL?
> 
> It seems to me (and you also point this out in a different reply) that
> CONFIG_CGROUP_BPF is mostly used for bpf programs that attach to
> cgroups, right?

Right. It should be CONFIG_BPF_SYSCALL. My v1 actually used
CONFIG_BPF_SYSCALL and changed to CONFIG_CGROUP_BPF in v2 thinking
it should work, but kernel test bot complains. Will change back
to CONFIG_BPF_SYSCALL in v3.

> 
> (same for the freeing site)
> 
>> +
>>          /* All ancestors including self */
>>          struct cgroup *ancestors[];
>>   };
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 17f61338f8f8..2d7f79bf3500 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -922,7 +922,14 @@ enum bpf_map_type {
>>          BPF_MAP_TYPE_CPUMAP,
>>          BPF_MAP_TYPE_XSKMAP,
>>          BPF_MAP_TYPE_SOCKHASH,
>> -       BPF_MAP_TYPE_CGROUP_STORAGE,
>> +       BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED,
>> +       /* BPF_MAP_TYPE_CGROUP_STORAGE is available to bpf programs attaching
>> +        * to a cgroup. The newer BPF_MAP_TYPE_CGRP_STORAGE is available to
>> +        * both cgroup-attached and other progs and supports all functionality
>> +        * provided by BPF_MAP_TYPE_CGROUP_STORAGE. So mark
>> +        * BPF_MAP_TYPE_CGROUP_STORAGE deprecated.
>> +        */
>> +       BPF_MAP_TYPE_CGROUP_STORAGE = BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED,
>>          BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
>>          BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
>>          BPF_MAP_TYPE_QUEUE,
>> @@ -935,6 +942,7 @@ enum bpf_map_type {
>>          BPF_MAP_TYPE_TASK_STORAGE,
>>          BPF_MAP_TYPE_BLOOM_FILTER,
>>          BPF_MAP_TYPE_USER_RINGBUF,
>> +       BPF_MAP_TYPE_CGRP_STORAGE,
>>   };
>>
[...]

  reply	other threads:[~2022-10-21 21:06 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-20 22:12 [PATCH bpf-next v2 0/6] bpf: Implement cgroup local storage available to non-cgroup-attached bpf progs Yonghong Song
2022-10-20 22:13 ` [PATCH bpf-next v2 1/6] bpf: Make struct cgroup btf id global Yonghong Song
2022-10-20 22:13 ` [PATCH bpf-next v2 2/6] bpf: Implement cgroup storage available to non-cgroup-attached bpf progs Yonghong Song
2022-10-21  5:22   ` David Vernet
2022-10-21  5:26     ` David Vernet
2022-10-21 17:33     ` Yonghong Song
2022-10-21 19:57       ` David Vernet
2022-10-21 22:57         ` Yonghong Song
2022-10-22  3:02           ` David Vernet
2022-10-23 16:45             ` Yonghong Song
2022-10-23 21:14               ` David Vernet
     [not found]   ` <202210210932.nHqTyTmx-lkp@intel.com>
2022-10-21 16:51     ` Yonghong Song
2022-10-21 19:29   ` Yosry Ahmed
2022-10-21 21:05     ` Yonghong Song [this message]
2022-10-20 22:13 ` [PATCH bpf-next v2 3/6] libbpf: Support new cgroup local storage Yonghong Song
2022-10-21 23:10   ` Andrii Nakryiko
2022-10-22  0:32     ` Yonghong Song
2022-10-22  1:05       ` Tejun Heo
2022-10-20 22:13 ` [PATCH bpf-next v2 4/6] bpftool: " Yonghong Song
2022-10-20 22:13 ` [PATCH bpf-next v2 5/6] selftests/bpf: Add selftests for " Yonghong Song
2022-10-20 22:13 ` [PATCH bpf-next v2 6/6] docs/bpf: Add documentation for map type BPF_MAP_TYPE_CGRP_STROAGE Yonghong Song
2022-10-21  7:12   ` David Vernet
2022-10-21 17:46     ` Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9abed364-39a3-b8c0-78ba-9d9ec45277ee@meta.com \
    --to=yhs@meta.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@kernel.org \
    --cc=tj@kernel.org \
    --cc=yhs@fb.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox