From: Philo Lu <lulie@linux.alibaba.com>
To: Jiri Olsa <olsajiri@gmail.com>
Cc: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
john.fastabend@gmail.com, andrii@kernel.org,
martin.lau@linux.dev, song@kernel.org, yonghong.song@linux.dev,
kpsingh@kernel.org, sdf@google.com, haoluo@google.com,
rostedt@goodmis.org, mhiramat@kernel.org,
mathieu.desnoyers@efficios.com,
linux-trace-kernel@vger.kernel.org, xuanzhuo@linux.alibaba.com,
dust.li@linux.alibaba.com, alibuda@linux.alibaba.com,
guwen@linux.alibaba.com, hengqi@linux.alibaba.com,
shung-hsi.yu@suse.com
Subject: Re: [PATCH bpf-next 1/3] bpf: implement relay map basis
Date: Sat, 23 Dec 2023 10:54:47 +0800 [thread overview]
Message-ID: <a4794c5d-3e3f-4210-a26f-000ad68d6799@linux.alibaba.com> (raw)
In-Reply-To: <ZYWhFHLqwQDgI7OG@krava>
On 2023/12/22 22:45, Jiri Olsa wrote:
> On Fri, Dec 22, 2023 at 08:21:44PM +0800, Philo Lu wrote:
>
> SNIP
>
>> +/* bpf_attr is used as follows:
>> + * - key size: must be 0
>> + * - value size: value will be used as directory name by map_update_elem
>> + * (to create relay files). If passed as 0, it will be set to NAME_MAX as
>> + * default
>> + *
>> + * - max_entries: subbuf size
>> + * - map_extra: subbuf num, default as 8
>> + *
>> + * When alloc, we do not set up relay files considering dir_name conflicts.
>> + * Instead we use relay_late_setup_files() in map_update_elem(), and thus the
>> + * value is used as dir_name, and map->name is used as base_filename.
>> + */
>> +static struct bpf_map *relay_map_alloc(union bpf_attr *attr)
>> +{
>> + struct bpf_relay_map *rmap;
>> +
>> + if (unlikely(attr->map_flags & ~RELAY_CREATE_FLAG_MASK))
>> + return ERR_PTR(-EINVAL);
>> +
>> + /* key size must be 0 in relay map */
>> + if (unlikely(attr->key_size))
>> + return ERR_PTR(-EINVAL);
>> +
>> + if (unlikely(attr->value_size > NAME_MAX)) {
>> + pr_warn("value_size should be no more than %d\n", NAME_MAX);
>> + return ERR_PTR(-EINVAL);
>> + } else if (attr->value_size == 0)
>> + attr->value_size = NAME_MAX;
>
> the concept of no key with just value seems strange.. I never worked
> with relay channels, so perhaps stupid question: but why not have one
> relay channel for given key? having the debugfs like:
>
> /sys/kernel/debug/my_rmap/mychannel/<cpu>
>
Here, a relay map is actually a relay channel, which includes buffers
for all cpus. And I think 2 levels is enough when we use relay map in
`/sys/kernel/debug/`: <dir_name>/<map_name>[#cpu]. The `dir_name` is
necessary because user could use the same `map_name` in different bpf
programs, and we can use it as an additional tag to distinguish them.
The `dir_name` is set by user with relay_map_update_elem.
Here is an example. Assume we have 2 relay maps (rmap_a and rmap_b) and
2 cpus, the debugfs will be like:
```
/sys/kernel/debug/<dir_name1>/rmap_a0
/sys/kernel/debug/<dir_name1>/rmap_a1
/sys/kernel/debug/<dir_name2>/rmap_b0
/sys/kernel/debug/<dir_name2>/rmap_b1
```
So I think the key point here is that we just need one field to set the
`dir_name`, either key or value. I chose key as NULL because I think it
suggests "Normally map_update_elem should be invoked just once for a
relay map". But I think it okay to use key instead, and value as NULL.
>> +
>> + /* set default subbuf num */
>> + attr->map_extra = attr->map_extra & UINT_MAX;
>> + if (!attr->map_extra)
>> + attr->map_extra = 8;
>> +
>> + if (!attr->map_name || strlen(attr->map_name) == 0)
>
> attr->map_name is allways != NULL
>
>> + return ERR_PTR(-EINVAL);
>> +
>> + rmap = bpf_map_area_alloc(sizeof(*rmap), NUMA_NO_NODE);
>> + if (!rmap)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + bpf_map_init_from_attr(&rmap->map, attr);
>> +
>> + rmap->relay_cb.create_buf_file = create_buf_file_handler;
>> + rmap->relay_cb.remove_buf_file = remove_buf_file_handler;
>> + if (attr->map_flags & BPF_F_OVERWRITE)
>> + rmap->relay_cb.subbuf_start = subbuf_start_overwrite;
>> +
>> + rmap->relay_chan = relay_open(NULL, NULL,
>> + attr->max_entries, attr->map_extra,
>> + &rmap->relay_cb, NULL);
>
> wrong indentation
>
Got it. I will adjust it.
>> + if (!rmap->relay_chan)
>> + return ERR_PTR(-EINVAL);
>> +
>> + return &rmap->map;
>> +}
>> +
>> +static void relay_map_free(struct bpf_map *map)
>> +{
>> + struct bpf_relay_map *rmap;
>> +
>> + rmap = container_of(map, struct bpf_relay_map, map);
>> + relay_close(rmap->relay_chan);
>> + debugfs_remove_recursive(rmap->relay_chan->parent);
>> + kfree(rmap);
>
> should you use bpf_map_area_free instead?
>
Thanks for catching. Will fix it.
> jirka
>
>> +}
>> +
>> +static void *relay_map_lookup_elem(struct bpf_map *map, void *key)
>> +{
>> + return ERR_PTR(-EOPNOTSUPP);
>> +}
>> +
>> +static long relay_map_update_elem(struct bpf_map *map, void *key, void *value,
>> + u64 flags)
>> +{
>> + return -EOPNOTSUPP;
>> +}
>> +
>> +static long relay_map_delete_elem(struct bpf_map *map, void *key)
>> +{
>> + return -EOPNOTSUPP;
>> +}
>> +
>> +static int relay_map_get_next_key(struct bpf_map *map, void *key,
>> + void *next_key)
>> +{
>> + return -EOPNOTSUPP;
>> +}
>> +
>> +static u64 relay_map_mem_usage(const struct bpf_map *map)
>> +{
>> + struct bpf_relay_map *rmap;
>> + u64 usage = sizeof(struct bpf_relay_map);
>> +
>> + rmap = container_of(map, struct bpf_relay_map, map);
>> + usage += sizeof(struct rchan);
>> + usage += (sizeof(struct rchan_buf) + rmap->relay_chan->alloc_size)
>> + * num_online_cpus();
>> + return usage;
>> +}
>> +
>> +BTF_ID_LIST_SINGLE(relay_map_btf_ids, struct, bpf_relay_map)
>> +const struct bpf_map_ops relay_map_ops = {
>> + .map_meta_equal = bpf_map_meta_equal,
>> + .map_alloc = relay_map_alloc,
>> + .map_free = relay_map_free,
>> + .map_lookup_elem = relay_map_lookup_elem,
>> + .map_update_elem = relay_map_update_elem,
>> + .map_delete_elem = relay_map_delete_elem,
>> + .map_get_next_key = relay_map_get_next_key,
>> + .map_mem_usage = relay_map_mem_usage,
>> + .map_btf_id = &relay_map_btf_ids[0],
>> +};
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 1bf9805ee185..35ae54ac6736 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -1147,6 +1147,7 @@ static int map_create(union bpf_attr *attr)
>> }
>>
>> if (attr->map_type != BPF_MAP_TYPE_BLOOM_FILTER &&
>> + attr->map_type != BPF_MAP_TYPE_RELAY &&
>> attr->map_extra != 0)
>> return -EINVAL;
>>
>> --
>> 2.32.0.3.g01195cf9f
>>
next prev parent reply other threads:[~2023-12-23 2:54 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-22 12:21 [PATCH bpf-next 0/3] bpf: introduce BPF_MAP_TYPE_RELAY Philo Lu
2023-12-22 12:21 ` [PATCH bpf-next 1/3] bpf: implement relay map basis Philo Lu
2023-12-22 14:45 ` Jiri Olsa
2023-12-23 2:54 ` Philo Lu [this message]
2023-12-22 23:07 ` kernel test robot
2023-12-23 0:11 ` kernel test robot
2023-12-23 11:22 ` Hou Tao
2023-12-23 13:02 ` Philo Lu
2023-12-25 11:36 ` Philo Lu
2023-12-25 13:06 ` Hou Tao
2023-12-22 12:21 ` [PATCH bpf-next 2/3] bpf: implement map_update_elem to init relay file Philo Lu
2023-12-22 14:45 ` Jiri Olsa
2023-12-23 2:55 ` Philo Lu
2023-12-23 11:28 ` Hou Tao
2023-12-23 13:19 ` Philo Lu
2023-12-22 12:21 ` [PATCH bpf-next 3/3] bpf: introduce bpf_relay_output helper Philo Lu
2023-12-22 14:46 ` Jiri Olsa
2023-12-23 2:56 ` Philo Lu
2023-12-22 14:45 ` [PATCH bpf-next 0/3] bpf: introduce BPF_MAP_TYPE_RELAY Jiri Olsa
2023-12-23 2:57 ` Philo Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a4794c5d-3e3f-4210-a26f-000ad68d6799@linux.alibaba.com \
--to=lulie@linux.alibaba.com \
--cc=alibuda@linux.alibaba.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=dust.li@linux.alibaba.com \
--cc=guwen@linux.alibaba.com \
--cc=haoluo@google.com \
--cc=hengqi@linux.alibaba.com \
--cc=john.fastabend@gmail.com \
--cc=kpsingh@kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=olsajiri@gmail.com \
--cc=rostedt@goodmis.org \
--cc=sdf@google.com \
--cc=shung-hsi.yu@suse.com \
--cc=song@kernel.org \
--cc=xuanzhuo@linux.alibaba.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox