From: Yonghong Song <yhs@meta.com>
To: sdf@google.com, Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
kernel-team@fb.com, KP Singh <kpsingh@kernel.org>,
Martin KaFai Lau <martin.lau@kernel.org>,
Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH bpf-next 2/5] bpf: Implement cgroup storage available to non-cgroup-attached bpf progs
Date: Mon, 17 Oct 2022 12:23:36 -0700 [thread overview]
Message-ID: <b997fb5e-ce9c-a693-cd6f-8c1405bbc13c@meta.com> (raw)
In-Reply-To: <Y02Yk8gUgVDuZR4Q@google.com>
On 10/17/22 11:01 AM, sdf@google.com wrote:
> On 10/13, Yonghong Song wrote:
>> Similar to sk/inode/task storage, implement similar cgroup local storage.
>
>> There already exists a local storage implementation for cgroup-attached
>> bpf programs. See map type BPF_MAP_TYPE_CGROUP_STORAGE and helper
>> bpf_get_local_storage(). But there are use cases such that non-cgroup
>> attached bpf progs wants to access cgroup local storage data. For
>> example,
>> tc egress prog has access to sk and cgroup. It is possible to use
>> sk local storage to emulate cgroup local storage by storing data in
>> socket.
>> But this is a waste as it could be lots of sockets belonging to a
>> particular
>> cgroup. Alternatively, a separate map can be created with cgroup id as
>> the key.
>> But this will introduce additional overhead to manipulate the new map.
>> A cgroup local storage, similar to existing sk/inode/task storage,
>> should help for this use case.
>
>> The life-cycle of storage is managed with the life-cycle of the
>> cgroup struct. i.e. the storage is destroyed along with the owning
>> cgroup
>> with a callback to the bpf_cgroup_storage_free when cgroup itself
>> is deleted.
>
>> The userspace map operations can be done by using a cgroup fd as a key
>> passed to the lookup, update and delete operations.
>
>
> [..]
>
>> Since map name BPF_MAP_TYPE_CGROUP_STORAGE has been used for old
>> cgroup local
>> storage support, the new map name BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE is
>> used
>> for cgroup storage available to non-cgroup-attached bpf programs. The two
>> helpers are named as bpf_cgroup_local_storage_get() and
>> bpf_cgroup_local_storage_delete().
>
> Have you considered doing something similar to 7d9c3427894f ("bpf: Make
> cgroup storages shared between programs on the same cgroup") where
> the map changes its behavior depending on the key size (see key_size checks
> in cgroup_storage_map_alloc)? Looks like sizeof(int) for fd still
> can be used so we can, in theory, reuse the name..
>
> Pros:
> - no need for a new map name
>
> Cons:
> - existing BPF_MAP_TYPE_CGROUP_STORAGE is already messy; might be not a
> good idea to add more stuff to it?
Thinking differently. I think I would have reuse the same map name
(BPF_MAP_TYPE_CGROUP_STORAGE) but with a flag like
BPF_F_LOCAL_STORAGE_GENERIC).
We could use map_extra as well, but I think an explicit flag might be
better.
>
> But, for the very least, should we also extend
> Documentation/bpf/map_cgroup_storage.rst to cover the new map? We've
> tried to keep some of the important details in there..
>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>> include/linux/bpf.h | 3 +
>> include/linux/bpf_types.h | 1 +
>> include/linux/cgroup-defs.h | 4 +
>> include/uapi/linux/bpf.h | 39 +++++
>> kernel/bpf/Makefile | 2 +-
>> kernel/bpf/bpf_cgroup_storage.c | 280 ++++++++++++++++++++++++++++++++
>> kernel/bpf/helpers.c | 6 +
>> kernel/bpf/syscall.c | 3 +-
>> kernel/bpf/verifier.c | 14 +-
>> kernel/cgroup/cgroup.c | 4 +
>> kernel/trace/bpf_trace.c | 4 +
>> scripts/bpf_doc.py | 2 +
>> tools/include/uapi/linux/bpf.h | 39 +++++
>> 13 files changed, 398 insertions(+), 3 deletions(-)
>> create mode 100644 kernel/bpf/bpf_cgroup_storage.c
>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 9e7d46d16032..1395a01c7f18 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -2045,6 +2045,7 @@ struct bpf_link *bpf_link_by_id(u32 id);
>
>> const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id
>> func_id);
>> void bpf_task_storage_free(struct task_struct *task);
>> +void bpf_local_cgroup_storage_free(struct cgroup *cgroup);
>> bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog);
>> const struct btf_func_model *
>> bpf_jit_find_kfunc_model(const struct bpf_prog *prog,
>> @@ -2537,6 +2538,8 @@ extern const struct bpf_func_proto
>> bpf_copy_from_user_task_proto;
>> extern const struct bpf_func_proto bpf_set_retval_proto;
>> extern const struct bpf_func_proto bpf_get_retval_proto;
>> extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto;
>> +extern const struct bpf_func_proto bpf_cgroup_storage_get_proto;
>> +extern const struct bpf_func_proto bpf_cgroup_storage_delete_proto;
>
>> const struct bpf_func_proto *tracing_prog_func_proto(
>> enum bpf_func_id func_id, const struct bpf_prog *prog);
>> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
>> index 2c6a4f2562a7..7a0362d7a0aa 100644
>> --- a/include/linux/bpf_types.h
>> +++ b/include/linux/bpf_types.h
>> @@ -90,6 +90,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_CGROUP_ARRAY,
>> cgroup_array_map_ops)
>> #ifdef CONFIG_CGROUP_BPF
>> BPF_MAP_TYPE(BPF_MAP_TYPE_CGROUP_STORAGE, cgroup_storage_map_ops)
>> BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
>> cgroup_storage_map_ops)
>> +BPF_MAP_TYPE(BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE,
>> cgroup_local_storage_map_ops)
>> #endif
>> BPF_MAP_TYPE(BPF_MAP_TYPE_HASH, htab_map_ops)
>> BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_HASH, htab_percpu_map_ops)
>> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
>> index 4bcf56b3491c..c6f4590dda68 100644
>> --- a/include/linux/cgroup-defs.h
>> +++ b/include/linux/cgroup-defs.h
>> @@ -504,6 +504,10 @@ struct cgroup {
>> /* Used to store internal freezer state */
>> struct cgroup_freezer_state freezer;
>
>> +#ifdef CONFIG_BPF_SYSCALL
>> + struct bpf_local_storage __rcu *bpf_cgroup_storage;
>> +#endif
>> +
>> /* ids of the ancestors at each level including self */
>> u64 ancestor_ids[];
>> };
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 17f61338f8f8..d918b4054297 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -935,6 +935,7 @@ enum bpf_map_type {
>> BPF_MAP_TYPE_TASK_STORAGE,
>> BPF_MAP_TYPE_BLOOM_FILTER,
>> BPF_MAP_TYPE_USER_RINGBUF,
>> + BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE,
>> };
>
>> /* Note that tracing related programs such as
>> @@ -5435,6 +5436,42 @@ union bpf_attr {
>> * **-E2BIG** if user-space has tried to publish a sample
>> which is
>> * larger than the size of the ring buffer, or which cannot fit
>> * within a struct bpf_dynptr.
>> + *
>> + * void *bpf_cgroup_local_storage_get(struct bpf_map *map, struct
>> cgroup *cgroup, void *value, u64 flags)
>> + * Description
>> + * Get a bpf_local_storage from the *cgroup*.
>> + *
>> + * Logically, it could be thought of as getting the value from
>> + * a *map* with *cgroup* as the **key**. From this
>> + * perspective, the usage is not much different from
>> + * **bpf_map_lookup_elem**\ (*map*, **&**\ *cgroup*) except this
>> + * helper enforces the key must be a cgroup struct and the map
>> must also
>> + * be a **BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE**.
>> + *
>> + * Underneath, the value is stored locally at *cgroup* instead of
>> + * the *map*. The *map* is used as the bpf-local-storage
>> + * "type". The bpf-local-storage "type" (i.e. the *map*) is
>> + * searched against all bpf_local_storage residing at *cgroup*.
>> + *
>> + * An optional *flags* (**BPF_LOCAL_STORAGE_GET_F_CREATE**)
>> can be
>> + * used such that a new bpf_local_storage will be
>> + * created if one does not exist. *value* can be used
>> + * together with **BPF_LOCAL_STORAGE_GET_F_CREATE** to specify
>> + * the initial value of a bpf_local_storage. If *value* is
>> + * **NULL**, the new bpf_local_storage will be zero initialized.
>> + * Return
>> + * A bpf_local_storage pointer is returned on success.
>> + *
>> + * **NULL** if not found or there was an error in adding
>> + * a new bpf_local_storage.
>> + *
>> + * long bpf_cgroup_local_storage_delete(struct bpf_map *map, struct
>> cgroup *cgroup)
>> + * Description
>> + * Delete a bpf_local_storage from a *cgroup*.
>> + * Return
>> + * 0 on success.
>> + *
>> + * **-ENOENT** if the bpf_local_storage cannot be found.
>> */
>> #define ___BPF_FUNC_MAPPER(FN, ctx...) \
>> FN(unspec, 0, ##ctx) \
>> @@ -5647,6 +5684,8 @@ union bpf_attr {
>> FN(tcp_raw_check_syncookie_ipv6, 207, ##ctx) \
>> FN(ktime_get_tai_ns, 208, ##ctx) \
>> FN(user_ringbuf_drain, 209, ##ctx) \
>> + FN(cgroup_local_storage_get, 210, ##ctx) \
>> + FN(cgroup_local_storage_delete, 211, ##ctx) \
>> /* */
>
>> /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER
>> that don't
>> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
>> index 341c94f208f4..b02693f51978 100644
>> --- a/kernel/bpf/Makefile
>> +++ b/kernel/bpf/Makefile
>> @@ -25,7 +25,7 @@ ifeq ($(CONFIG_PERF_EVENTS),y)
>> obj-$(CONFIG_BPF_SYSCALL) += stackmap.o
>> endif
>> ifeq ($(CONFIG_CGROUPS),y)
>> -obj-$(CONFIG_BPF_SYSCALL) += cgroup_iter.o
>> +obj-$(CONFIG_BPF_SYSCALL) += cgroup_iter.o bpf_cgroup_storage.o
>> endif
>> obj-$(CONFIG_CGROUP_BPF) += cgroup.o
>> ifeq ($(CONFIG_INET),y)
>> diff --git a/kernel/bpf/bpf_cgroup_storage.c
>> b/kernel/bpf/bpf_cgroup_storage.c
>> new file mode 100644
>> index 000000000000..9974784822da
>> --- /dev/null
>> +++ b/kernel/bpf/bpf_cgroup_storage.c
>> @@ -0,0 +1,280 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
>> + */
>> +
>> +#include <linux/types.h>
>> +#include <linux/bpf.h>
>> +#include <linux/bpf_local_storage.h>
>> +#include <uapi/linux/btf.h>
>> +#include <linux/btf_ids.h>
>> +
>> +DEFINE_BPF_STORAGE_CACHE(cgroup_cache);
>> +
>> +static DEFINE_PER_CPU(int, bpf_cgroup_storage_busy);
>> +
>> +static void bpf_cgroup_storage_lock(void)
>> +{
>> + migrate_disable();
>> + this_cpu_inc(bpf_cgroup_storage_busy);
>> +}
>> +
>> +static void bpf_cgroup_storage_unlock(void)
>> +{
>> + this_cpu_dec(bpf_cgroup_storage_busy);
>> + migrate_enable();
>> +}
>> +
>> +static bool bpf_cgroup_storage_trylock(void)
>> +{
>> + migrate_disable();
>> + if (unlikely(this_cpu_inc_return(bpf_cgroup_storage_busy) != 1)) {
>> + this_cpu_dec(bpf_cgroup_storage_busy);
>> + migrate_enable();
>> + return false;
>> + }
>> + return true;
>> +}
>
> Task storage has lock/unlock/trylock; inode storage doesn't; why does
> cgroup need it as well?
>
>> +static struct bpf_local_storage __rcu **cgroup_storage_ptr(void *owner)
>> +{
>> + struct cgroup *cg = owner;
>> +
>> + return &cg->bpf_cgroup_storage;
>> +}
>> +
>> +void bpf_local_cgroup_storage_free(struct cgroup *cgroup)
>> +{
>> + struct bpf_local_storage *local_storage;
>> + struct bpf_local_storage_elem *selem;
>> + bool free_cgroup_storage = false;
>> + struct hlist_node *n;
>> + unsigned long flags;
>> +
>> + rcu_read_lock();
>> + local_storage = rcu_dereference(cgroup->bpf_cgroup_storage);
>> + if (!local_storage) {
>> + rcu_read_unlock();
>> + return;
>> + }
>> +
>> + /* Neither the bpf_prog nor the bpf-map's syscall
>> + * could be modifying the local_storage->list now.
>> + * Thus, no elem can be added-to or deleted-from the
>> + * local_storage->list by the bpf_prog or by the bpf-map's syscall.
>> + *
>> + * It is racing with bpf_local_storage_map_free() alone
>> + * when unlinking elem from the local_storage->list and
>> + * the map's bucket->list.
>> + */
>> + bpf_cgroup_storage_lock();
>> + raw_spin_lock_irqsave(&local_storage->lock, flags);
>> + hlist_for_each_entry_safe(selem, n, &local_storage->list, snode) {
>> + bpf_selem_unlink_map(selem);
>> + free_cgroup_storage =
>> + bpf_selem_unlink_storage_nolock(local_storage, selem,
>> false, false);
>> + }
>> + raw_spin_unlock_irqrestore(&local_storage->lock, flags);
>> + bpf_cgroup_storage_unlock();
>> + rcu_read_unlock();
>> +
>> + /* free_cgroup_storage should always be true as long as
>> + * local_storage->list was non-empty.
>> + */
>> + if (free_cgroup_storage)
>> + kfree_rcu(local_storage, rcu);
>> +}
>
>> +static struct bpf_local_storage_data *
>> +cgroup_storage_lookup(struct cgroup *cgroup, struct bpf_map *map,
>> bool cacheit_lockit)
>> +{
>> + struct bpf_local_storage *cgroup_storage;
>> + struct bpf_local_storage_map *smap;
>> +
>> + cgroup_storage = rcu_dereference_check(cgroup->bpf_cgroup_storage,
>> + bpf_rcu_lock_held());
>> + if (!cgroup_storage)
>> + return NULL;
>> +
>> + smap = (struct bpf_local_storage_map *)map;
>> + return bpf_local_storage_lookup(cgroup_storage, smap,
>> cacheit_lockit);
>> +}
>> +
>> +static void *bpf_cgroup_storage_lookup_elem(struct bpf_map *map, void
>> *key)
>> +{
>> + struct bpf_local_storage_data *sdata;
>> + struct cgroup *cgroup;
>> + int fd;
>> +
>> + fd = *(int *)key;
>> + cgroup = cgroup_get_from_fd(fd);
>> + if (IS_ERR(cgroup))
>> + return ERR_CAST(cgroup);
>> +
>> + bpf_cgroup_storage_lock();
>> + sdata = cgroup_storage_lookup(cgroup, map, true);
>> + bpf_cgroup_storage_unlock();
>> + cgroup_put(cgroup);
>> + return sdata ? sdata->data : NULL;
>> +}
>
> A lot of the above (free/lookup) seems to be copy-pasted from the task
> storage;
> any point in trying to generalize the common parts?
>
>> +static int bpf_cgroup_storage_update_elem(struct bpf_map *map, void
>> *key,
>> + void *value, u64 map_flags)
>> +{
>> + struct bpf_local_storage_data *sdata;
>> + struct cgroup *cgroup;
>> + int err, fd;
>> +
>> + fd = *(int *)key;
>> + cgroup = cgroup_get_from_fd(fd);
>> + if (IS_ERR(cgroup))
>> + return PTR_ERR(cgroup);
>> +
>> + bpf_cgroup_storage_lock();
>> + sdata = bpf_local_storage_update(cgroup, (struct
>> bpf_local_storage_map *)map,
>> + value, map_flags, GFP_ATOMIC);
>> + bpf_cgroup_storage_unlock();
>> + err = PTR_ERR_OR_ZERO(sdata);
>> + cgroup_put(cgroup);
>> + return err;
>> +}
>> +
>> +static int cgroup_storage_delete(struct cgroup *cgroup, struct
>> bpf_map *map)
>> +{
>> + struct bpf_local_storage_data *sdata;
>> +
>> + sdata = cgroup_storage_lookup(cgroup, map, false);
>> + if (!sdata)
>> + return -ENOENT;
>> +
>> + bpf_selem_unlink(SELEM(sdata), true);
>> + return 0;
>> +}
>> +
>> +static int bpf_cgroup_storage_delete_elem(struct bpf_map *map, void
>> *key)
>> +{
>> + struct cgroup *cgroup;
>> + int err, fd;
>> +
>> + fd = *(int *)key;
>> + cgroup = cgroup_get_from_fd(fd);
>> + if (IS_ERR(cgroup))
>> + return PTR_ERR(cgroup);
>> +
>> + bpf_cgroup_storage_lock();
>> + err = cgroup_storage_delete(cgroup, map);
>> + bpf_cgroup_storage_unlock();
>> + if (err)
>> + return err;
>> +
>> + cgroup_put(cgroup);
>> + return 0;
>> +}
>> +
>> +static int notsupp_get_next_key(struct bpf_map *map, void *key, void
>> *next_key)
>> +{
>> + return -ENOTSUPP;
>> +}
>> +
>> +static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
>> +{
>> + struct bpf_local_storage_map *smap;
>> +
>> + smap = bpf_local_storage_map_alloc(attr);
>> + if (IS_ERR(smap))
>> + return ERR_CAST(smap);
>> +
>> + smap->cache_idx = bpf_local_storage_cache_idx_get(&cgroup_cache);
>> + return &smap->map;
>> +}
>> +
>> +static void cgroup_storage_map_free(struct bpf_map *map)
>> +{
>> + struct bpf_local_storage_map *smap;
>> +
>> + smap = (struct bpf_local_storage_map *)map;
>> + bpf_local_storage_cache_idx_free(&cgroup_cache, smap->cache_idx);
>> + bpf_local_storage_map_free(smap, NULL);
>> +}
>> +
>> +/* *gfp_flags* is a hidden argument provided by the verifier */
>> +BPF_CALL_5(bpf_cgroup_storage_get, struct bpf_map *, map, struct
>> cgroup *, cgroup,
>> + void *, value, u64, flags, gfp_t, gfp_flags)
>> +{
>> + struct bpf_local_storage_data *sdata;
>> +
>> + WARN_ON_ONCE(!bpf_rcu_lock_held());
>> + if (flags & ~(BPF_LOCAL_STORAGE_GET_F_CREATE))
>> + return (unsigned long)NULL;
>> +
>> + if (!cgroup)
>> + return (unsigned long)NULL;
>> +
>> + if (!bpf_cgroup_storage_trylock())
>> + return (unsigned long)NULL;
>> +
>> + sdata = cgroup_storage_lookup(cgroup, map, true);
>> + if (sdata)
>> + goto unlock;
>> +
>> + /* only allocate new storage, when the cgroup is refcounted */
>> + if (!percpu_ref_is_dying(&cgroup->self.refcnt) &&
>> + (flags & BPF_LOCAL_STORAGE_GET_F_CREATE))
>> + sdata = bpf_local_storage_update(cgroup, (struct
>> bpf_local_storage_map *)map,
>> + value, BPF_NOEXIST, gfp_flags);
>> +
>> +unlock:
>> + bpf_cgroup_storage_unlock();
>> + return IS_ERR_OR_NULL(sdata) ? (unsigned long)NULL : (unsigned
>> long)sdata->data;
>> +}
>> +
>> +BPF_CALL_2(bpf_cgroup_storage_delete, struct bpf_map *, map, struct
>> cgroup *, cgroup)
>> +{
>> + int ret;
>> +
>> + WARN_ON_ONCE(!bpf_rcu_lock_held());
>> + if (!cgroup)
>> + return -EINVAL;
>> +
>> + if (!bpf_cgroup_storage_trylock())
>> + return -EBUSY;
>> +
>> + ret = cgroup_storage_delete(cgroup, map);
>> + bpf_cgroup_storage_unlock();
>> + return ret;
>> +}
>> +
>> +BTF_ID_LIST_SINGLE(cgroup_storage_map_btf_ids, struct,
>> bpf_local_storage_map)
>> +const struct bpf_map_ops cgroup_local_storage_map_ops = {
>> + .map_meta_equal = bpf_map_meta_equal,
>> + .map_alloc_check = bpf_local_storage_map_alloc_check,
>> + .map_alloc = cgroup_storage_map_alloc,
>> + .map_free = cgroup_storage_map_free,
>> + .map_get_next_key = notsupp_get_next_key,
>> + .map_lookup_elem = bpf_cgroup_storage_lookup_elem,
>> + .map_update_elem = bpf_cgroup_storage_update_elem,
>> + .map_delete_elem = bpf_cgroup_storage_delete_elem,
>> + .map_check_btf = bpf_local_storage_map_check_btf,
>> + .map_btf_id = &cgroup_storage_map_btf_ids[0],
>> + .map_owner_storage_ptr = cgroup_storage_ptr,
>> +};
>> +
>> +const struct bpf_func_proto bpf_cgroup_storage_get_proto = {
>> + .func = bpf_cgroup_storage_get,
>> + .gpl_only = false,
>> + .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
>> + .arg1_type = ARG_CONST_MAP_PTR,
>> + .arg2_type = ARG_PTR_TO_BTF_ID,
>> + .arg2_btf_id = &bpf_cgroup_btf_id[0],
>> + .arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL,
>> + .arg4_type = ARG_ANYTHING,
>> +};
>> +
>> +const struct bpf_func_proto bpf_cgroup_storage_delete_proto = {
>> + .func = bpf_cgroup_storage_delete,
>> + .gpl_only = false,
>> + .ret_type = RET_INTEGER,
>> + .arg1_type = ARG_CONST_MAP_PTR,
>> + .arg2_type = ARG_PTR_TO_BTF_ID,
>> + .arg2_btf_id = &bpf_cgroup_btf_id[0],
>> +};
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index a6b04faed282..5c5bb08832ec 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -1663,6 +1663,12 @@ bpf_base_func_proto(enum bpf_func_id func_id)
>> return &bpf_dynptr_write_proto;
>> case BPF_FUNC_dynptr_data:
>> return &bpf_dynptr_data_proto;
>> +#ifdef CONFIG_CGROUPS
>> + case BPF_FUNC_cgroup_local_storage_get:
>> + return &bpf_cgroup_storage_get_proto;
>> + case BPF_FUNC_cgroup_local_storage_delete:
>> + return &bpf_cgroup_storage_delete_proto;
>> +#endif
>> default:
>> break;
>> }
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 7b373a5e861f..e53c7fae6e22 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -1016,7 +1016,8 @@ static int map_check_btf(struct bpf_map *map,
>> const struct btf *btf,
>> map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE &&
>> map->map_type != BPF_MAP_TYPE_SK_STORAGE &&
>> map->map_type != BPF_MAP_TYPE_INODE_STORAGE &&
>> - map->map_type != BPF_MAP_TYPE_TASK_STORAGE)
>> + map->map_type != BPF_MAP_TYPE_TASK_STORAGE &&
>> + map->map_type != BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE)
>> return -ENOTSUPP;
>> if (map->spin_lock_off + sizeof(struct bpf_spin_lock) >
>> map->value_size) {
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 6f6d2d511c06..f36f6a3c0d50 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -6360,6 +6360,11 @@ static int check_map_func_compatibility(struct
>> bpf_verifier_env *env,
>> func_id != BPF_FUNC_task_storage_delete)
>> goto error;
>> break;
>> + case BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE:
>> + if (func_id != BPF_FUNC_cgroup_local_storage_get &&
>> + func_id != BPF_FUNC_cgroup_local_storage_delete)
>> + goto error;
>> + break;
>> case BPF_MAP_TYPE_BLOOM_FILTER:
>> if (func_id != BPF_FUNC_map_peek_elem &&
>> func_id != BPF_FUNC_map_push_elem)
>> @@ -6472,6 +6477,11 @@ static int check_map_func_compatibility(struct
>> bpf_verifier_env *env,
>> if (map->map_type != BPF_MAP_TYPE_TASK_STORAGE)
>> goto error;
>> break;
>> + case BPF_FUNC_cgroup_local_storage_get:
>> + case BPF_FUNC_cgroup_local_storage_delete:
>> + if (map->map_type != BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE)
>> + goto error;
>> + break;
>> default:
>> break;
>> }
>> @@ -12713,6 +12723,7 @@ static int check_map_prog_compatibility(struct
>> bpf_verifier_env *env,
>> case BPF_MAP_TYPE_INODE_STORAGE:
>> case BPF_MAP_TYPE_SK_STORAGE:
>> case BPF_MAP_TYPE_TASK_STORAGE:
>> + case BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE:
>> break;
>> default:
>> verbose(env,
>> @@ -14149,7 +14160,8 @@ static int do_misc_fixups(struct
>> bpf_verifier_env *env)
>
>> if (insn->imm == BPF_FUNC_task_storage_get ||
>> insn->imm == BPF_FUNC_sk_storage_get ||
>> - insn->imm == BPF_FUNC_inode_storage_get) {
>> + insn->imm == BPF_FUNC_inode_storage_get ||
>> + insn->imm == BPF_FUNC_cgroup_local_storage_get) {
>> if (env->prog->aux->sleepable)
>> insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force
>> __s32)GFP_KERNEL);
>> else
>> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
>> index 8ad2c267ff47..2fa2c950c7fb 100644
>> --- a/kernel/cgroup/cgroup.c
>> +++ b/kernel/cgroup/cgroup.c
>> @@ -985,6 +985,10 @@ void put_css_set_locked(struct css_set *cset)
>> put_css_set_locked(cset->dom_cset);
>> }
>
>> +#ifdef CONFIG_BPF_SYSCALL
>> + bpf_local_cgroup_storage_free(cset->dfl_cgrp);
>> +#endif
>> +
>> kfree_rcu(cset, rcu_head);
>> }
>
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index 688552df95ca..179adaae4a9f 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -1454,6 +1454,10 @@ bpf_tracing_func_proto(enum bpf_func_id
>> func_id, const struct bpf_prog *prog)
>> return &bpf_get_current_cgroup_id_proto;
>> case BPF_FUNC_get_current_ancestor_cgroup_id:
>> return &bpf_get_current_ancestor_cgroup_id_proto;
>> + case BPF_FUNC_cgroup_local_storage_get:
>> + return &bpf_cgroup_storage_get_proto;
>> + case BPF_FUNC_cgroup_local_storage_delete:
>> + return &bpf_cgroup_storage_delete_proto;
>> #endif
>> case BPF_FUNC_send_signal:
>> return &bpf_send_signal_proto;
>> diff --git a/scripts/bpf_doc.py b/scripts/bpf_doc.py
>> index c0e6690be82a..fdb0aff8cb5a 100755
>> --- a/scripts/bpf_doc.py
>> +++ b/scripts/bpf_doc.py
>> @@ -685,6 +685,7 @@ class PrinterHelpers(Printer):
>> 'struct udp6_sock',
>> 'struct unix_sock',
>> 'struct task_struct',
>> + 'struct cgroup',
>
>> 'struct __sk_buff',
>> 'struct sk_msg_md',
>> @@ -742,6 +743,7 @@ class PrinterHelpers(Printer):
>> 'struct udp6_sock',
>> 'struct unix_sock',
>> 'struct task_struct',
>> + 'struct cgroup',
>> 'struct path',
>> 'struct btf_ptr',
>> 'struct inode',
>> diff --git a/tools/include/uapi/linux/bpf.h
>> b/tools/include/uapi/linux/bpf.h
>> index 17f61338f8f8..d918b4054297 100644
>> --- a/tools/include/uapi/linux/bpf.h
>> +++ b/tools/include/uapi/linux/bpf.h
>> @@ -935,6 +935,7 @@ enum bpf_map_type {
>> BPF_MAP_TYPE_TASK_STORAGE,
>> BPF_MAP_TYPE_BLOOM_FILTER,
>> BPF_MAP_TYPE_USER_RINGBUF,
>> + BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE,
>> };
>
>> /* Note that tracing related programs such as
>> @@ -5435,6 +5436,42 @@ union bpf_attr {
>> * **-E2BIG** if user-space has tried to publish a sample
>> which is
>> * larger than the size of the ring buffer, or which cannot fit
>> * within a struct bpf_dynptr.
>> + *
>> + * void *bpf_cgroup_local_storage_get(struct bpf_map *map, struct
>> cgroup *cgroup, void *value, u64 flags)
>> + * Description
>> + * Get a bpf_local_storage from the *cgroup*.
>> + *
>> + * Logically, it could be thought of as getting the value from
>> + * a *map* with *cgroup* as the **key**. From this
>> + * perspective, the usage is not much different from
>> + * **bpf_map_lookup_elem**\ (*map*, **&**\ *cgroup*) except this
>> + * helper enforces the key must be a cgroup struct and the map
>> must also
>> + * be a **BPF_MAP_TYPE_CGROUP_LOCAL_STORAGE**.
>> + *
>> + * Underneath, the value is stored locally at *cgroup* instead of
>> + * the *map*. The *map* is used as the bpf-local-storage
>> + * "type". The bpf-local-storage "type" (i.e. the *map*) is
>> + * searched against all bpf_local_storage residing at *cgroup*.
>> + *
>> + * An optional *flags* (**BPF_LOCAL_STORAGE_GET_F_CREATE**)
>> can be
>> + * used such that a new bpf_local_storage will be
>> + * created if one does not exist. *value* can be used
>> + * together with **BPF_LOCAL_STORAGE_GET_F_CREATE** to specify
>> + * the initial value of a bpf_local_storage. If *value* is
>> + * **NULL**, the new bpf_local_storage will be zero initialized.
>> + * Return
>> + * A bpf_local_storage pointer is returned on success.
>> + *
>> + * **NULL** if not found or there was an error in adding
>> + * a new bpf_local_storage.
>> + *
>> + * long bpf_cgroup_local_storage_delete(struct bpf_map *map, struct
>> cgroup *cgroup)
>> + * Description
>> + * Delete a bpf_local_storage from a *cgroup*.
>> + * Return
>> + * 0 on success.
>> + *
>> + * **-ENOENT** if the bpf_local_storage cannot be found.
>> */
>> #define ___BPF_FUNC_MAPPER(FN, ctx...) \
>> FN(unspec, 0, ##ctx) \
>> @@ -5647,6 +5684,8 @@ union bpf_attr {
>> FN(tcp_raw_check_syncookie_ipv6, 207, ##ctx) \
>> FN(ktime_get_tai_ns, 208, ##ctx) \
>> FN(user_ringbuf_drain, 209, ##ctx) \
>> + FN(cgroup_local_storage_get, 210, ##ctx) \
>> + FN(cgroup_local_storage_delete, 211, ##ctx) \
>> /* */
>
>> /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER
>> that don't
>> --
>> 2.30.2
>
next prev parent reply other threads:[~2022-10-17 19:26 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-14 4:56 [PATCH bpf-next 0/5] bpf: Implement cgroup local storage available to non-cgroup-attached bpf progs Yonghong Song
2022-10-14 4:56 ` [PATCH bpf-next 1/5] bpf: Make struct cgroup btf id global Yonghong Song
2022-10-14 4:56 ` [PATCH bpf-next 2/5] bpf: Implement cgroup storage available to non-cgroup-attached bpf progs Yonghong Song
2022-10-17 18:01 ` sdf
2022-10-17 18:25 ` Yosry Ahmed
2022-10-17 18:43 ` Stanislav Fomichev
2022-10-17 18:47 ` Yosry Ahmed
2022-10-17 19:07 ` Stanislav Fomichev
2022-10-17 19:11 ` Yosry Ahmed
2022-10-17 19:26 ` Tejun Heo
2022-10-17 21:07 ` Martin KaFai Lau
2022-10-17 21:23 ` Yosry Ahmed
2022-10-17 23:55 ` Martin KaFai Lau
2022-10-18 0:47 ` Yosry Ahmed
2022-10-17 22:16 ` sdf
2022-10-18 0:52 ` Martin KaFai Lau
2022-10-18 5:59 ` Yonghong Song
2022-10-18 17:08 ` sdf
2022-10-18 17:17 ` Alexei Starovoitov
2022-10-18 18:08 ` Martin KaFai Lau
2022-10-18 18:11 ` Yosry Ahmed
2022-10-18 18:26 ` Yonghong Song
2022-10-18 23:12 ` Andrii Nakryiko
2022-10-17 20:15 ` Yonghong Song
2022-10-17 20:18 ` Yosry Ahmed
2022-10-17 20:13 ` Yonghong Song
2022-10-17 20:10 ` Yonghong Song
2022-10-17 20:14 ` Yosry Ahmed
2022-10-17 20:29 ` Yonghong Song
2022-10-17 19:23 ` Yonghong Song [this message]
2022-10-17 21:03 ` Stanislav Fomichev
2022-10-17 22:26 ` Martin KaFai Lau
2022-10-17 18:16 ` David Vernet
2022-10-17 19:45 ` Yonghong Song
2022-10-14 4:56 ` [PATCH bpf-next 3/5] libbpf: Support new cgroup local storage Yonghong Song
2022-10-14 4:56 ` [PATCH bpf-next 4/5] bpftool: " Yonghong Song
2022-10-17 10:26 ` Quentin Monnet
2022-10-14 4:56 ` [PATCH bpf-next 5/5] selftests/bpf: Add selftests for " Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b997fb5e-ce9c-a693-cd6f-8c1405bbc13c@meta.com \
--to=yhs@meta.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@fb.com \
--cc=kpsingh@kernel.org \
--cc=martin.lau@kernel.org \
--cc=sdf@google.com \
--cc=tj@kernel.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox