* [PATCH bpf-next 0/3] bpf: introduce BPF_MAP_TYPE_RELAY
@ 2023-12-22 12:21 Philo Lu
2023-12-22 12:21 ` [PATCH bpf-next 1/3] bpf: implement relay map basis Philo Lu
` (3 more replies)
0 siblings, 4 replies; 20+ messages in thread
From: Philo Lu @ 2023-12-22 12:21 UTC (permalink / raw)
To: bpf
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, rostedt, mhiramat,
mathieu.desnoyers, linux-trace-kernel, xuanzhuo, dust.li, alibuda,
guwen, hengqi, shung-hsi.yu
The patch set introduce a new type of map, BPF_MAP_TYPE_RELAY, based on
relay interface [0]. It provides a way for persistent and overwritable data
transfer.
As stated in [0], relay is a efficient method for log and data transfer.
And the interface is simple enough so that we can implement and use this
type of map with current map interfaces. Besides we need a new helper
bpf_relay_output to output data to user, similar with bpf_ringbuf_output.
We need this map because currently neither ringbuf nor perfbuf satisfies
the requirements of relatively long-term consistent tracing, where the bpf
program keeps writing into the buffer without any bundled reader, and the
buffer supports overwriting. For users, they just run the bpf program to
collect data, and are able to read as need. The detailed discussion can be
found at [1].
The buffer is exposed to users as per-cpu files in debugfs, supporting read
and mmap, and it is up to users how to formulate and read it, either
through a program with mmap or just `cat`. Specifically, the files are
created as "/sys/kerenl/debug/<dirname>/<mapname>#cpu", where the <dirname>
is defined with map_update_elem (Note that we do not need to implement
actual elem operators for relay_map).
If this map is acceptable, other parts including docs, libbpf support,
selftests, and benchmarks (if need) will be added in the following version.
[0]
https://github.com/torvalds/linux/blob/master/Documentation/filesystems/relay.rst
[1]
https://lore.kernel.org/bpf/20231219122850.433be151@gandalf.local.home/T/
Philo Lu (3):
bpf: implement relay map basis
bpf: implement map_update_elem to init relay file
bpf: introduce bpf_relay_output helper
include/linux/bpf.h | 1 +
include/linux/bpf_types.h | 3 +
include/uapi/linux/bpf.h | 17 +++
kernel/bpf/Makefile | 3 +
kernel/bpf/helpers.c | 4 +
kernel/bpf/relaymap.c | 213 ++++++++++++++++++++++++++++++++++++++
kernel/bpf/syscall.c | 1 +
kernel/bpf/verifier.c | 8 ++
kernel/trace/bpf_trace.c | 4 +
9 files changed, 254 insertions(+)
create mode 100644 kernel/bpf/relaymap.c
--
2.32.0.3.g01195cf9f
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH bpf-next 1/3] bpf: implement relay map basis
2023-12-22 12:21 [PATCH bpf-next 0/3] bpf: introduce BPF_MAP_TYPE_RELAY Philo Lu
@ 2023-12-22 12:21 ` Philo Lu
2023-12-22 14:45 ` Jiri Olsa
` (3 more replies)
2023-12-22 12:21 ` [PATCH bpf-next 2/3] bpf: implement map_update_elem to init relay file Philo Lu
` (2 subsequent siblings)
3 siblings, 4 replies; 20+ messages in thread
From: Philo Lu @ 2023-12-22 12:21 UTC (permalink / raw)
To: bpf
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, rostedt, mhiramat,
mathieu.desnoyers, linux-trace-kernel, xuanzhuo, dust.li, alibuda,
guwen, hengqi, shung-hsi.yu
BPF_MAP_TYPE_RELAY is implemented based on relay interface, which
creates per-cpu buffer to transfer data. Each buffer is essentially a
list of fix-sized sub-buffers, and is exposed to user space as files in
debugfs. attr->max_entries is used as subbuf size and attr->map_extra is
used as subbuf num. Currently, the default value of subbuf num is 8.
The data can be accessed by read or mmap via these files. For example,
if there are 2 cpus, files could be `/sys/kernel/debug/mydir/my_rmap0`
and `/sys/kernel/debug/mydir/my_rmap1`.
Buffer-only mode is used to create the relay map, which just allocates
the buffer without creating user-space files. Then user can setup the
files with map_update_elem, thus allowing user to define the directory
name in debugfs. map_update_elem is implemented in the following patch.
A new map flag named BPF_F_OVERWRITE is introduced to set overwrite mode
of relay map.
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
---
include/linux/bpf_types.h | 3 +
include/uapi/linux/bpf.h | 7 ++
kernel/bpf/Makefile | 3 +
kernel/bpf/relaymap.c | 157 ++++++++++++++++++++++++++++++++++++++
kernel/bpf/syscall.c | 1 +
5 files changed, 171 insertions(+)
create mode 100644 kernel/bpf/relaymap.c
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index fc0d6f32c687..c122d7b494c5 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -132,6 +132,9 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_USER_RINGBUF, user_ringbuf_map_ops)
+#ifdef CONFIG_RELAY
+BPF_MAP_TYPE(BPF_MAP_TYPE_RELAY, relay_map_ops)
+#endif
BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 754e68ca8744..143b75676bd3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -951,6 +951,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_BLOOM_FILTER,
BPF_MAP_TYPE_USER_RINGBUF,
BPF_MAP_TYPE_CGRP_STORAGE,
+ BPF_MAP_TYPE_RELAY,
};
/* Note that tracing related programs such as
@@ -1330,6 +1331,9 @@ enum {
/* Get path from provided FD in BPF_OBJ_PIN/BPF_OBJ_GET commands */
BPF_F_PATH_FD = (1U << 14),
+
+/* Enable overwrite for relay map */
+ BPF_F_OVERWRITE = (1U << 15),
};
/* Flags for BPF_PROG_QUERY. */
@@ -1401,6 +1405,9 @@ union bpf_attr {
* BPF_MAP_TYPE_BLOOM_FILTER - the lowest 4 bits indicate the
* number of hash functions (if 0, the bloom filter will default
* to using 5 hash functions).
+ *
+ * BPF_MAP_TYPE_RELAY - the lowest 32 bits indicate the number of
+ * relay subbufs (if 0, the number will be set to 8 by default).
*/
__u64 map_extra;
};
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index f526b7573e97..45b35bb0e572 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -10,6 +10,9 @@ obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o
obj-$(CONFIG_BPF_SYSCALL) += bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
+ifeq ($(CONFIG_RELAY),y)
+obj-$(CONFIG_BPF_SYSCALL) += relaymap.o
+endif
obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o
obj-$(CONFIG_BPF_SYSCALL) += disasm.o mprog.o
diff --git a/kernel/bpf/relaymap.c b/kernel/bpf/relaymap.c
new file mode 100644
index 000000000000..d0adc7f67758
--- /dev/null
+++ b/kernel/bpf/relaymap.c
@@ -0,0 +1,157 @@
+#include <linux/cpumask.h>
+#include <linux/debugfs.h>
+#include <linux/filter.h>
+#include <linux/relay.h>
+#include <linux/slab.h>
+#include <linux/bpf.h>
+#include <linux/err.h>
+
+#define RELAY_CREATE_FLAG_MASK (BPF_F_OVERWRITE)
+
+struct bpf_relay_map {
+ struct bpf_map map;
+ struct rchan *relay_chan;
+ struct rchan_callbacks relay_cb;
+};
+
+static struct dentry *create_buf_file_handler(const char *filename,
+ struct dentry *parent, umode_t mode,
+ struct rchan_buf *buf, int *is_global)
+{
+ /* Because we do relay_late_setup_files(), create_buf_file(NULL, NULL, ...)
+ * will be called by relay_open.
+ */
+ if (!filename)
+ return NULL;
+
+ return debugfs_create_file(filename, mode, parent, buf,
+ &relay_file_operations);
+}
+
+static int remove_buf_file_handler(struct dentry *dentry)
+{
+ debugfs_remove(dentry);
+ return 0;
+}
+
+/* For non-overwrite, use default subbuf_start cb */
+static int subbuf_start_overwrite(struct rchan_buf *buf, void *subbuf,
+ void *prev_subbuf, size_t prev_padding)
+{
+ return 1;
+}
+
+/* bpf_attr is used as follows:
+ * - key size: must be 0
+ * - value size: value will be used as directory name by map_update_elem
+ * (to create relay files). If passed as 0, it will be set to NAME_MAX as
+ * default
+ *
+ * - max_entries: subbuf size
+ * - map_extra: subbuf num, default as 8
+ *
+ * When alloc, we do not set up relay files considering dir_name conflicts.
+ * Instead we use relay_late_setup_files() in map_update_elem(), and thus the
+ * value is used as dir_name, and map->name is used as base_filename.
+ */
+static struct bpf_map *relay_map_alloc(union bpf_attr *attr)
+{
+ struct bpf_relay_map *rmap;
+
+ if (unlikely(attr->map_flags & ~RELAY_CREATE_FLAG_MASK))
+ return ERR_PTR(-EINVAL);
+
+ /* key size must be 0 in relay map */
+ if (unlikely(attr->key_size))
+ return ERR_PTR(-EINVAL);
+
+ if (unlikely(attr->value_size > NAME_MAX)) {
+ pr_warn("value_size should be no more than %d\n", NAME_MAX);
+ return ERR_PTR(-EINVAL);
+ } else if (attr->value_size == 0)
+ attr->value_size = NAME_MAX;
+
+ /* set default subbuf num */
+ attr->map_extra = attr->map_extra & UINT_MAX;
+ if (!attr->map_extra)
+ attr->map_extra = 8;
+
+ if (!attr->map_name || strlen(attr->map_name) == 0)
+ return ERR_PTR(-EINVAL);
+
+ rmap = bpf_map_area_alloc(sizeof(*rmap), NUMA_NO_NODE);
+ if (!rmap)
+ return ERR_PTR(-ENOMEM);
+
+ bpf_map_init_from_attr(&rmap->map, attr);
+
+ rmap->relay_cb.create_buf_file = create_buf_file_handler;
+ rmap->relay_cb.remove_buf_file = remove_buf_file_handler;
+ if (attr->map_flags & BPF_F_OVERWRITE)
+ rmap->relay_cb.subbuf_start = subbuf_start_overwrite;
+
+ rmap->relay_chan = relay_open(NULL, NULL,
+ attr->max_entries, attr->map_extra,
+ &rmap->relay_cb, NULL);
+ if (!rmap->relay_chan)
+ return ERR_PTR(-EINVAL);
+
+ return &rmap->map;
+}
+
+static void relay_map_free(struct bpf_map *map)
+{
+ struct bpf_relay_map *rmap;
+
+ rmap = container_of(map, struct bpf_relay_map, map);
+ relay_close(rmap->relay_chan);
+ debugfs_remove_recursive(rmap->relay_chan->parent);
+ kfree(rmap);
+}
+
+static void *relay_map_lookup_elem(struct bpf_map *map, void *key)
+{
+ return ERR_PTR(-EOPNOTSUPP);
+}
+
+static long relay_map_update_elem(struct bpf_map *map, void *key, void *value,
+ u64 flags)
+{
+ return -EOPNOTSUPP;
+}
+
+static long relay_map_delete_elem(struct bpf_map *map, void *key)
+{
+ return -EOPNOTSUPP;
+}
+
+static int relay_map_get_next_key(struct bpf_map *map, void *key,
+ void *next_key)
+{
+ return -EOPNOTSUPP;
+}
+
+static u64 relay_map_mem_usage(const struct bpf_map *map)
+{
+ struct bpf_relay_map *rmap;
+ u64 usage = sizeof(struct bpf_relay_map);
+
+ rmap = container_of(map, struct bpf_relay_map, map);
+ usage += sizeof(struct rchan);
+ usage += (sizeof(struct rchan_buf) + rmap->relay_chan->alloc_size)
+ * num_online_cpus();
+ return usage;
+}
+
+BTF_ID_LIST_SINGLE(relay_map_btf_ids, struct, bpf_relay_map)
+const struct bpf_map_ops relay_map_ops = {
+ .map_meta_equal = bpf_map_meta_equal,
+ .map_alloc = relay_map_alloc,
+ .map_free = relay_map_free,
+ .map_lookup_elem = relay_map_lookup_elem,
+ .map_update_elem = relay_map_update_elem,
+ .map_delete_elem = relay_map_delete_elem,
+ .map_get_next_key = relay_map_get_next_key,
+ .map_mem_usage = relay_map_mem_usage,
+ .map_btf_id = &relay_map_btf_ids[0],
+};
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 1bf9805ee185..35ae54ac6736 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1147,6 +1147,7 @@ static int map_create(union bpf_attr *attr)
}
if (attr->map_type != BPF_MAP_TYPE_BLOOM_FILTER &&
+ attr->map_type != BPF_MAP_TYPE_RELAY &&
attr->map_extra != 0)
return -EINVAL;
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH bpf-next 2/3] bpf: implement map_update_elem to init relay file
2023-12-22 12:21 [PATCH bpf-next 0/3] bpf: introduce BPF_MAP_TYPE_RELAY Philo Lu
2023-12-22 12:21 ` [PATCH bpf-next 1/3] bpf: implement relay map basis Philo Lu
@ 2023-12-22 12:21 ` Philo Lu
2023-12-22 14:45 ` Jiri Olsa
2023-12-23 11:28 ` Hou Tao
2023-12-22 12:21 ` [PATCH bpf-next 3/3] bpf: introduce bpf_relay_output helper Philo Lu
2023-12-22 14:45 ` [PATCH bpf-next 0/3] bpf: introduce BPF_MAP_TYPE_RELAY Jiri Olsa
3 siblings, 2 replies; 20+ messages in thread
From: Philo Lu @ 2023-12-22 12:21 UTC (permalink / raw)
To: bpf
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, rostedt, mhiramat,
mathieu.desnoyers, linux-trace-kernel, xuanzhuo, dust.li, alibuda,
guwen, hengqi, shung-hsi.yu
map_update_elem is used to create relay files and bind them with the
relay channel, which is created with BPF_MAP_CREATE. This allows users
to set a custom directory name. It must be used with key=NULL and
flag=0.
Here is an example:
```
struct {
__uint(type, BPF_MAP_TYPE_RELAY);
__uint(max_entries, 4096);
} my_relay SEC(".maps");
...
char dir_name[] = "relay_test";
bpf_map_update_elem(map_fd, NULL, dir_name, 0);
```
Then, directory `/sys/kerenl/debug/relay_test` will be created, which
includes files of my_relay0...my_relay[#cpu]. Each represents a per-cpu
buffer with size 8 * 4096 B (there are 8 subbufs by default, each with
size 4096B).
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
---
kernel/bpf/relaymap.c | 32 +++++++++++++++++++++++++++++++-
1 file changed, 31 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/relaymap.c b/kernel/bpf/relaymap.c
index d0adc7f67758..588c8de0a4bd 100644
--- a/kernel/bpf/relaymap.c
+++ b/kernel/bpf/relaymap.c
@@ -117,7 +117,37 @@ static void *relay_map_lookup_elem(struct bpf_map *map, void *key)
static long relay_map_update_elem(struct bpf_map *map, void *key, void *value,
u64 flags)
{
- return -EOPNOTSUPP;
+ struct bpf_relay_map *rmap;
+ struct dentry *parent;
+ int err;
+
+ if (unlikely(flags))
+ return -EINVAL;
+
+ if (unlikely(key))
+ return -EINVAL;
+
+ rmap = container_of(map, struct bpf_relay_map, map);
+
+ /* The directory already exists */
+ if (rmap->relay_chan->has_base_filename)
+ return -EEXIST;
+
+ /* Setup relay files. Note that the directory name passed as value should
+ * not be longer than map->value_size, including the '\0' at the end.
+ */
+ ((char *)value)[map->value_size - 1] = '\0';
+ parent = debugfs_create_dir(value, NULL);
+ if (IS_ERR_OR_NULL(parent))
+ return PTR_ERR(parent);
+
+ err = relay_late_setup_files(rmap->relay_chan, map->name, parent);
+ if (err) {
+ debugfs_remove_recursive(parent);
+ return err;
+ }
+
+ return 0;
}
static long relay_map_delete_elem(struct bpf_map *map, void *key)
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH bpf-next 3/3] bpf: introduce bpf_relay_output helper
2023-12-22 12:21 [PATCH bpf-next 0/3] bpf: introduce BPF_MAP_TYPE_RELAY Philo Lu
2023-12-22 12:21 ` [PATCH bpf-next 1/3] bpf: implement relay map basis Philo Lu
2023-12-22 12:21 ` [PATCH bpf-next 2/3] bpf: implement map_update_elem to init relay file Philo Lu
@ 2023-12-22 12:21 ` Philo Lu
2023-12-22 14:46 ` Jiri Olsa
2023-12-22 14:45 ` [PATCH bpf-next 0/3] bpf: introduce BPF_MAP_TYPE_RELAY Jiri Olsa
3 siblings, 1 reply; 20+ messages in thread
From: Philo Lu @ 2023-12-22 12:21 UTC (permalink / raw)
To: bpf
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, rostedt, mhiramat,
mathieu.desnoyers, linux-trace-kernel, xuanzhuo, dust.li, alibuda,
guwen, hengqi, shung-hsi.yu
Like perfbuf/ringbuf, a helper is needed to write into the buffer, named
bpf_relay_output, whose usage is same as ringbuf. Note that it works only
after relay files are set, i.e., after calling map_update_elem for the
created relay map.
Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
---
include/linux/bpf.h | 1 +
include/uapi/linux/bpf.h | 10 ++++++++++
kernel/bpf/helpers.c | 4 ++++
kernel/bpf/relaymap.c | 26 ++++++++++++++++++++++++++
kernel/bpf/verifier.c | 8 ++++++++
kernel/trace/bpf_trace.c | 4 ++++
6 files changed, 53 insertions(+)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 7671530d6e4e..b177122369e6 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3095,6 +3095,7 @@ extern const struct bpf_func_proto bpf_get_retval_proto;
extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto;
extern const struct bpf_func_proto bpf_cgrp_storage_get_proto;
extern const struct bpf_func_proto bpf_cgrp_storage_delete_proto;
+extern const struct bpf_func_proto bpf_relay_output_proto;
const struct bpf_func_proto *tracing_prog_func_proto(
enum bpf_func_id func_id, const struct bpf_prog *prog);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 143b75676bd3..03c0c1953ba1 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5686,6 +5686,15 @@ union bpf_attr {
* 0 on success.
*
* **-ENOENT** if the bpf_local_storage cannot be found.
+ *
+ * long bpf_relay_output(void *map, void *data, u64 size, u64 flags)
+ * Description
+ * Copy *size* bytes from *data* into *map* of type BPF_MAP_TYPE_RELAY.
+ * Currently, the *flags* must be 0.
+ * Return
+ * 0 on success.
+ *
+ * **-ENOENT** if the relay base_file in debugfs cannot be found.
*/
#define ___BPF_FUNC_MAPPER(FN, ctx...) \
FN(unspec, 0, ##ctx) \
@@ -5900,6 +5909,7 @@ union bpf_attr {
FN(user_ringbuf_drain, 209, ##ctx) \
FN(cgrp_storage_get, 210, ##ctx) \
FN(cgrp_storage_delete, 211, ##ctx) \
+ FN(relay_output, 212, ##ctx) \
/* */
/* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index be72824f32b2..0c26e87ce729 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1720,6 +1720,10 @@ bpf_base_func_proto(enum bpf_func_id func_id)
return &bpf_ringbuf_discard_proto;
case BPF_FUNC_ringbuf_query:
return &bpf_ringbuf_query_proto;
+#ifdef CONFIG_RELAY
+ case BPF_FUNC_relay_output:
+ return &bpf_relay_output_proto;
+#endif
case BPF_FUNC_strncmp:
return &bpf_strncmp_proto;
case BPF_FUNC_strtol:
diff --git a/kernel/bpf/relaymap.c b/kernel/bpf/relaymap.c
index 588c8de0a4bd..f9e2e4a780df 100644
--- a/kernel/bpf/relaymap.c
+++ b/kernel/bpf/relaymap.c
@@ -173,6 +173,32 @@ static u64 relay_map_mem_usage(const struct bpf_map *map)
return usage;
}
+BPF_CALL_4(bpf_relay_output, struct bpf_map *, map, void *, data, u64, size,
+ u64, flags)
+{
+ struct bpf_relay_map *rmap;
+
+ /* not support any flag now */
+ if (unlikely(flags))
+ return -EINVAL;
+
+ rmap = container_of(map, struct bpf_relay_map, map);
+ if (!rmap->relay_chan->has_base_filename)
+ return -ENOENT;
+
+ relay_write(rmap->relay_chan, data, size);
+ return 0;
+}
+
+const struct bpf_func_proto bpf_relay_output_proto = {
+ .func = bpf_relay_output,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_CONST_MAP_PTR,
+ .arg2_type = ARG_PTR_TO_MEM | MEM_RDONLY,
+ .arg3_type = ARG_CONST_SIZE_OR_ZERO,
+ .arg4_type = ARG_ANYTHING,
+};
+
BTF_ID_LIST_SINGLE(relay_map_btf_ids, struct, bpf_relay_map)
const struct bpf_map_ops relay_map_ops = {
.map_meta_equal = bpf_map_meta_equal,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f13008d27f35..8c8287d6ae18 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8800,6 +8800,10 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
if (func_id != BPF_FUNC_user_ringbuf_drain)
goto error;
break;
+ case BPF_MAP_TYPE_RELAY:
+ if (func_id != BPF_FUNC_relay_output)
+ goto error;
+ break;
case BPF_MAP_TYPE_STACK_TRACE:
if (func_id != BPF_FUNC_get_stackid)
goto error;
@@ -8932,6 +8936,10 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
if (map->map_type != BPF_MAP_TYPE_USER_RINGBUF)
goto error;
break;
+ case BPF_FUNC_relay_output:
+ if (map->map_type != BPF_MAP_TYPE_RELAY)
+ goto error;
+ break;
case BPF_FUNC_get_stackid:
if (map->map_type != BPF_MAP_TYPE_STACK_TRACE)
goto error;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 7ac6c52b25eb..5b13553c6232 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1594,6 +1594,10 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_ringbuf_discard_proto;
case BPF_FUNC_ringbuf_query:
return &bpf_ringbuf_query_proto;
+#ifdef CONFIG_RELAY
+ case BPF_FUNC_relay_output:
+ return &bpf_relay_output_proto;
+#endif
case BPF_FUNC_jiffies64:
return &bpf_jiffies64_proto;
case BPF_FUNC_get_task_stack:
--
2.32.0.3.g01195cf9f
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH bpf-next 0/3] bpf: introduce BPF_MAP_TYPE_RELAY
2023-12-22 12:21 [PATCH bpf-next 0/3] bpf: introduce BPF_MAP_TYPE_RELAY Philo Lu
` (2 preceding siblings ...)
2023-12-22 12:21 ` [PATCH bpf-next 3/3] bpf: introduce bpf_relay_output helper Philo Lu
@ 2023-12-22 14:45 ` Jiri Olsa
2023-12-23 2:57 ` Philo Lu
3 siblings, 1 reply; 20+ messages in thread
From: Jiri Olsa @ 2023-12-22 14:45 UTC (permalink / raw)
To: Philo Lu
Cc: bpf, ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, rostedt, mhiramat,
mathieu.desnoyers, linux-trace-kernel, xuanzhuo, dust.li, alibuda,
guwen, hengqi, shung-hsi.yu
On Fri, Dec 22, 2023 at 08:21:43PM +0800, Philo Lu wrote:
> The patch set introduce a new type of map, BPF_MAP_TYPE_RELAY, based on
> relay interface [0]. It provides a way for persistent and overwritable data
> transfer.
>
> As stated in [0], relay is a efficient method for log and data transfer.
> And the interface is simple enough so that we can implement and use this
> type of map with current map interfaces. Besides we need a new helper
> bpf_relay_output to output data to user, similar with bpf_ringbuf_output.
>
> We need this map because currently neither ringbuf nor perfbuf satisfies
> the requirements of relatively long-term consistent tracing, where the bpf
> program keeps writing into the buffer without any bundled reader, and the
> buffer supports overwriting. For users, they just run the bpf program to
> collect data, and are able to read as need. The detailed discussion can be
> found at [1].
>
> The buffer is exposed to users as per-cpu files in debugfs, supporting read
> and mmap, and it is up to users how to formulate and read it, either
> through a program with mmap or just `cat`. Specifically, the files are
> created as "/sys/kerenl/debug/<dirname>/<mapname>#cpu", where the <dirname>
> is defined with map_update_elem (Note that we do not need to implement
> actual elem operators for relay_map).
>
> If this map is acceptable, other parts including docs, libbpf support,
> selftests, and benchmarks (if need) will be added in the following version.
looks useful, selftests might be already helpful to see the usage
jirka
>
> [0]
> https://github.com/torvalds/linux/blob/master/Documentation/filesystems/relay.rst
> [1]
> https://lore.kernel.org/bpf/20231219122850.433be151@gandalf.local.home/T/
>
> Philo Lu (3):
> bpf: implement relay map basis
> bpf: implement map_update_elem to init relay file
> bpf: introduce bpf_relay_output helper
>
> include/linux/bpf.h | 1 +
> include/linux/bpf_types.h | 3 +
> include/uapi/linux/bpf.h | 17 +++
> kernel/bpf/Makefile | 3 +
> kernel/bpf/helpers.c | 4 +
> kernel/bpf/relaymap.c | 213 ++++++++++++++++++++++++++++++++++++++
> kernel/bpf/syscall.c | 1 +
> kernel/bpf/verifier.c | 8 ++
> kernel/trace/bpf_trace.c | 4 +
> 9 files changed, 254 insertions(+)
> create mode 100644 kernel/bpf/relaymap.c
>
> --
> 2.32.0.3.g01195cf9f
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH bpf-next 1/3] bpf: implement relay map basis
2023-12-22 12:21 ` [PATCH bpf-next 1/3] bpf: implement relay map basis Philo Lu
@ 2023-12-22 14:45 ` Jiri Olsa
2023-12-23 2:54 ` Philo Lu
2023-12-22 23:07 ` kernel test robot
` (2 subsequent siblings)
3 siblings, 1 reply; 20+ messages in thread
From: Jiri Olsa @ 2023-12-22 14:45 UTC (permalink / raw)
To: Philo Lu
Cc: bpf, ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, rostedt, mhiramat,
mathieu.desnoyers, linux-trace-kernel, xuanzhuo, dust.li, alibuda,
guwen, hengqi, shung-hsi.yu
On Fri, Dec 22, 2023 at 08:21:44PM +0800, Philo Lu wrote:
SNIP
> +/* bpf_attr is used as follows:
> + * - key size: must be 0
> + * - value size: value will be used as directory name by map_update_elem
> + * (to create relay files). If passed as 0, it will be set to NAME_MAX as
> + * default
> + *
> + * - max_entries: subbuf size
> + * - map_extra: subbuf num, default as 8
> + *
> + * When alloc, we do not set up relay files considering dir_name conflicts.
> + * Instead we use relay_late_setup_files() in map_update_elem(), and thus the
> + * value is used as dir_name, and map->name is used as base_filename.
> + */
> +static struct bpf_map *relay_map_alloc(union bpf_attr *attr)
> +{
> + struct bpf_relay_map *rmap;
> +
> + if (unlikely(attr->map_flags & ~RELAY_CREATE_FLAG_MASK))
> + return ERR_PTR(-EINVAL);
> +
> + /* key size must be 0 in relay map */
> + if (unlikely(attr->key_size))
> + return ERR_PTR(-EINVAL);
> +
> + if (unlikely(attr->value_size > NAME_MAX)) {
> + pr_warn("value_size should be no more than %d\n", NAME_MAX);
> + return ERR_PTR(-EINVAL);
> + } else if (attr->value_size == 0)
> + attr->value_size = NAME_MAX;
the concept of no key with just value seems strange.. I never worked
with relay channels, so perhaps stupid question: but why not have one
relay channel for given key? having the debugfs like:
/sys/kernel/debug/my_rmap/mychannel/<cpu>
> +
> + /* set default subbuf num */
> + attr->map_extra = attr->map_extra & UINT_MAX;
> + if (!attr->map_extra)
> + attr->map_extra = 8;
> +
> + if (!attr->map_name || strlen(attr->map_name) == 0)
attr->map_name is allways != NULL
> + return ERR_PTR(-EINVAL);
> +
> + rmap = bpf_map_area_alloc(sizeof(*rmap), NUMA_NO_NODE);
> + if (!rmap)
> + return ERR_PTR(-ENOMEM);
> +
> + bpf_map_init_from_attr(&rmap->map, attr);
> +
> + rmap->relay_cb.create_buf_file = create_buf_file_handler;
> + rmap->relay_cb.remove_buf_file = remove_buf_file_handler;
> + if (attr->map_flags & BPF_F_OVERWRITE)
> + rmap->relay_cb.subbuf_start = subbuf_start_overwrite;
> +
> + rmap->relay_chan = relay_open(NULL, NULL,
> + attr->max_entries, attr->map_extra,
> + &rmap->relay_cb, NULL);
wrong indentation
> + if (!rmap->relay_chan)
> + return ERR_PTR(-EINVAL);
> +
> + return &rmap->map;
> +}
> +
> +static void relay_map_free(struct bpf_map *map)
> +{
> + struct bpf_relay_map *rmap;
> +
> + rmap = container_of(map, struct bpf_relay_map, map);
> + relay_close(rmap->relay_chan);
> + debugfs_remove_recursive(rmap->relay_chan->parent);
> + kfree(rmap);
should you use bpf_map_area_free instead?
jirka
> +}
> +
> +static void *relay_map_lookup_elem(struct bpf_map *map, void *key)
> +{
> + return ERR_PTR(-EOPNOTSUPP);
> +}
> +
> +static long relay_map_update_elem(struct bpf_map *map, void *key, void *value,
> + u64 flags)
> +{
> + return -EOPNOTSUPP;
> +}
> +
> +static long relay_map_delete_elem(struct bpf_map *map, void *key)
> +{
> + return -EOPNOTSUPP;
> +}
> +
> +static int relay_map_get_next_key(struct bpf_map *map, void *key,
> + void *next_key)
> +{
> + return -EOPNOTSUPP;
> +}
> +
> +static u64 relay_map_mem_usage(const struct bpf_map *map)
> +{
> + struct bpf_relay_map *rmap;
> + u64 usage = sizeof(struct bpf_relay_map);
> +
> + rmap = container_of(map, struct bpf_relay_map, map);
> + usage += sizeof(struct rchan);
> + usage += (sizeof(struct rchan_buf) + rmap->relay_chan->alloc_size)
> + * num_online_cpus();
> + return usage;
> +}
> +
> +BTF_ID_LIST_SINGLE(relay_map_btf_ids, struct, bpf_relay_map)
> +const struct bpf_map_ops relay_map_ops = {
> + .map_meta_equal = bpf_map_meta_equal,
> + .map_alloc = relay_map_alloc,
> + .map_free = relay_map_free,
> + .map_lookup_elem = relay_map_lookup_elem,
> + .map_update_elem = relay_map_update_elem,
> + .map_delete_elem = relay_map_delete_elem,
> + .map_get_next_key = relay_map_get_next_key,
> + .map_mem_usage = relay_map_mem_usage,
> + .map_btf_id = &relay_map_btf_ids[0],
> +};
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 1bf9805ee185..35ae54ac6736 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1147,6 +1147,7 @@ static int map_create(union bpf_attr *attr)
> }
>
> if (attr->map_type != BPF_MAP_TYPE_BLOOM_FILTER &&
> + attr->map_type != BPF_MAP_TYPE_RELAY &&
> attr->map_extra != 0)
> return -EINVAL;
>
> --
> 2.32.0.3.g01195cf9f
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH bpf-next 2/3] bpf: implement map_update_elem to init relay file
2023-12-22 12:21 ` [PATCH bpf-next 2/3] bpf: implement map_update_elem to init relay file Philo Lu
@ 2023-12-22 14:45 ` Jiri Olsa
2023-12-23 2:55 ` Philo Lu
2023-12-23 11:28 ` Hou Tao
1 sibling, 1 reply; 20+ messages in thread
From: Jiri Olsa @ 2023-12-22 14:45 UTC (permalink / raw)
To: Philo Lu
Cc: bpf, ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, rostedt, mhiramat,
mathieu.desnoyers, linux-trace-kernel, xuanzhuo, dust.li, alibuda,
guwen, hengqi, shung-hsi.yu
On Fri, Dec 22, 2023 at 08:21:45PM +0800, Philo Lu wrote:
> map_update_elem is used to create relay files and bind them with the
> relay channel, which is created with BPF_MAP_CREATE. This allows users
> to set a custom directory name. It must be used with key=NULL and
> flag=0.
>
> Here is an example:
> ```
> struct {
> __uint(type, BPF_MAP_TYPE_RELAY);
> __uint(max_entries, 4096);
> } my_relay SEC(".maps");
> ...
> char dir_name[] = "relay_test";
> bpf_map_update_elem(map_fd, NULL, dir_name, 0);
> ```
>
> Then, directory `/sys/kerenl/debug/relay_test` will be created, which
> includes files of my_relay0...my_relay[#cpu]. Each represents a per-cpu
> buffer with size 8 * 4096 B (there are 8 subbufs by default, each with
> size 4096B).
>
> Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
> ---
> kernel/bpf/relaymap.c | 32 +++++++++++++++++++++++++++++++-
> 1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/relaymap.c b/kernel/bpf/relaymap.c
> index d0adc7f67758..588c8de0a4bd 100644
> --- a/kernel/bpf/relaymap.c
> +++ b/kernel/bpf/relaymap.c
> @@ -117,7 +117,37 @@ static void *relay_map_lookup_elem(struct bpf_map *map, void *key)
> static long relay_map_update_elem(struct bpf_map *map, void *key, void *value,
> u64 flags)
> {
> - return -EOPNOTSUPP;
> + struct bpf_relay_map *rmap;
> + struct dentry *parent;
> + int err;
> +
> + if (unlikely(flags))
> + return -EINVAL;
> +
> + if (unlikely(key))
> + return -EINVAL;
> +
> + rmap = container_of(map, struct bpf_relay_map, map);
> +
> + /* The directory already exists */
> + if (rmap->relay_chan->has_base_filename)
> + return -EEXIST;
> +
> + /* Setup relay files. Note that the directory name passed as value should
> + * not be longer than map->value_size, including the '\0' at the end.
> + */
> + ((char *)value)[map->value_size - 1] = '\0';
> + parent = debugfs_create_dir(value, NULL);
> + if (IS_ERR_OR_NULL(parent))
> + return PTR_ERR(parent);
> +
> + err = relay_late_setup_files(rmap->relay_chan, map->name, parent);
> + if (err) {
> + debugfs_remove_recursive(parent);
> + return err;
> + }
looks like this patch could be moved to the previous one?
jirka
> +
> + return 0;
> }
>
> static long relay_map_delete_elem(struct bpf_map *map, void *key)
> --
> 2.32.0.3.g01195cf9f
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH bpf-next 3/3] bpf: introduce bpf_relay_output helper
2023-12-22 12:21 ` [PATCH bpf-next 3/3] bpf: introduce bpf_relay_output helper Philo Lu
@ 2023-12-22 14:46 ` Jiri Olsa
2023-12-23 2:56 ` Philo Lu
0 siblings, 1 reply; 20+ messages in thread
From: Jiri Olsa @ 2023-12-22 14:46 UTC (permalink / raw)
To: Philo Lu
Cc: bpf, ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, rostedt, mhiramat,
mathieu.desnoyers, linux-trace-kernel, xuanzhuo, dust.li, alibuda,
guwen, hengqi, shung-hsi.yu
On Fri, Dec 22, 2023 at 08:21:46PM +0800, Philo Lu wrote:
> Like perfbuf/ringbuf, a helper is needed to write into the buffer, named
> bpf_relay_output, whose usage is same as ringbuf. Note that it works only
> after relay files are set, i.e., after calling map_update_elem for the
> created relay map.
we can't add helpers anymore, this will need to be kfunc
check functions marked with __bpf_kfunc as examples
jirka
>
> Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
> ---
> include/linux/bpf.h | 1 +
> include/uapi/linux/bpf.h | 10 ++++++++++
> kernel/bpf/helpers.c | 4 ++++
> kernel/bpf/relaymap.c | 26 ++++++++++++++++++++++++++
> kernel/bpf/verifier.c | 8 ++++++++
> kernel/trace/bpf_trace.c | 4 ++++
> 6 files changed, 53 insertions(+)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 7671530d6e4e..b177122369e6 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -3095,6 +3095,7 @@ extern const struct bpf_func_proto bpf_get_retval_proto;
> extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto;
> extern const struct bpf_func_proto bpf_cgrp_storage_get_proto;
> extern const struct bpf_func_proto bpf_cgrp_storage_delete_proto;
> +extern const struct bpf_func_proto bpf_relay_output_proto;
>
> const struct bpf_func_proto *tracing_prog_func_proto(
> enum bpf_func_id func_id, const struct bpf_prog *prog);
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 143b75676bd3..03c0c1953ba1 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5686,6 +5686,15 @@ union bpf_attr {
> * 0 on success.
> *
> * **-ENOENT** if the bpf_local_storage cannot be found.
> + *
> + * long bpf_relay_output(void *map, void *data, u64 size, u64 flags)
> + * Description
> + * Copy *size* bytes from *data* into *map* of type BPF_MAP_TYPE_RELAY.
> + * Currently, the *flags* must be 0.
> + * Return
> + * 0 on success.
> + *
> + * **-ENOENT** if the relay base_file in debugfs cannot be found.
> */
> #define ___BPF_FUNC_MAPPER(FN, ctx...) \
> FN(unspec, 0, ##ctx) \
> @@ -5900,6 +5909,7 @@ union bpf_attr {
> FN(user_ringbuf_drain, 209, ##ctx) \
> FN(cgrp_storage_get, 210, ##ctx) \
> FN(cgrp_storage_delete, 211, ##ctx) \
> + FN(relay_output, 212, ##ctx) \
> /* */
>
> /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index be72824f32b2..0c26e87ce729 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -1720,6 +1720,10 @@ bpf_base_func_proto(enum bpf_func_id func_id)
> return &bpf_ringbuf_discard_proto;
> case BPF_FUNC_ringbuf_query:
> return &bpf_ringbuf_query_proto;
> +#ifdef CONFIG_RELAY
> + case BPF_FUNC_relay_output:
> + return &bpf_relay_output_proto;
> +#endif
> case BPF_FUNC_strncmp:
> return &bpf_strncmp_proto;
> case BPF_FUNC_strtol:
> diff --git a/kernel/bpf/relaymap.c b/kernel/bpf/relaymap.c
> index 588c8de0a4bd..f9e2e4a780df 100644
> --- a/kernel/bpf/relaymap.c
> +++ b/kernel/bpf/relaymap.c
> @@ -173,6 +173,32 @@ static u64 relay_map_mem_usage(const struct bpf_map *map)
> return usage;
> }
>
> +BPF_CALL_4(bpf_relay_output, struct bpf_map *, map, void *, data, u64, size,
> + u64, flags)
> +{
> + struct bpf_relay_map *rmap;
> +
> + /* not support any flag now */
> + if (unlikely(flags))
> + return -EINVAL;
> +
> + rmap = container_of(map, struct bpf_relay_map, map);
> + if (!rmap->relay_chan->has_base_filename)
> + return -ENOENT;
> +
> + relay_write(rmap->relay_chan, data, size);
> + return 0;
> +}
> +
> +const struct bpf_func_proto bpf_relay_output_proto = {
> + .func = bpf_relay_output,
> + .ret_type = RET_INTEGER,
> + .arg1_type = ARG_CONST_MAP_PTR,
> + .arg2_type = ARG_PTR_TO_MEM | MEM_RDONLY,
> + .arg3_type = ARG_CONST_SIZE_OR_ZERO,
> + .arg4_type = ARG_ANYTHING,
> +};
> +
> BTF_ID_LIST_SINGLE(relay_map_btf_ids, struct, bpf_relay_map)
> const struct bpf_map_ops relay_map_ops = {
> .map_meta_equal = bpf_map_meta_equal,
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index f13008d27f35..8c8287d6ae18 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -8800,6 +8800,10 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
> if (func_id != BPF_FUNC_user_ringbuf_drain)
> goto error;
> break;
> + case BPF_MAP_TYPE_RELAY:
> + if (func_id != BPF_FUNC_relay_output)
> + goto error;
> + break;
> case BPF_MAP_TYPE_STACK_TRACE:
> if (func_id != BPF_FUNC_get_stackid)
> goto error;
> @@ -8932,6 +8936,10 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
> if (map->map_type != BPF_MAP_TYPE_USER_RINGBUF)
> goto error;
> break;
> + case BPF_FUNC_relay_output:
> + if (map->map_type != BPF_MAP_TYPE_RELAY)
> + goto error;
> + break;
> case BPF_FUNC_get_stackid:
> if (map->map_type != BPF_MAP_TYPE_STACK_TRACE)
> goto error;
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 7ac6c52b25eb..5b13553c6232 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -1594,6 +1594,10 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> return &bpf_ringbuf_discard_proto;
> case BPF_FUNC_ringbuf_query:
> return &bpf_ringbuf_query_proto;
> +#ifdef CONFIG_RELAY
> + case BPF_FUNC_relay_output:
> + return &bpf_relay_output_proto;
> +#endif
> case BPF_FUNC_jiffies64:
> return &bpf_jiffies64_proto;
> case BPF_FUNC_get_task_stack:
> --
> 2.32.0.3.g01195cf9f
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH bpf-next 1/3] bpf: implement relay map basis
2023-12-22 12:21 ` [PATCH bpf-next 1/3] bpf: implement relay map basis Philo Lu
2023-12-22 14:45 ` Jiri Olsa
@ 2023-12-22 23:07 ` kernel test robot
2023-12-23 0:11 ` kernel test robot
2023-12-23 11:22 ` Hou Tao
3 siblings, 0 replies; 20+ messages in thread
From: kernel test robot @ 2023-12-22 23:07 UTC (permalink / raw)
To: Philo Lu, bpf
Cc: llvm, oe-kbuild-all, ast, daniel, john.fastabend, andrii,
martin.lau, song, yonghong.song, kpsingh, sdf, haoluo, jolsa,
rostedt, mhiramat, mathieu.desnoyers, linux-trace-kernel,
xuanzhuo, dust.li, alibuda, guwen, hengqi, shung-hsi.yu
Hi Philo,
kernel test robot noticed the following build warnings:
[auto build test WARNING on bpf-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Philo-Lu/bpf-implement-relay-map-basis/20231222-204545
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20231222122146.65519-2-lulie%40linux.alibaba.com
patch subject: [PATCH bpf-next 1/3] bpf: implement relay map basis
config: arm-randconfig-001-20231223 (https://download.01.org/0day-ci/archive/20231223/202312230638.IYDD8JOz-lkp@intel.com/config)
compiler: clang version 18.0.0git (https://github.com/llvm/llvm-project d3ef86708241a3bee902615c190dead1638c4e09)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231223/202312230638.IYDD8JOz-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202312230638.IYDD8JOz-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> kernel/bpf/relaymap.c:79:13: warning: address of array 'attr->map_name' will always evaluate to 'true' [-Wpointer-bool-conversion]
79 | if (!attr->map_name || strlen(attr->map_name) == 0)
| ~~~~~~~^~~~~~~~
1 warning generated.
vim +79 kernel/bpf/relaymap.c
43
44 /* bpf_attr is used as follows:
45 * - key size: must be 0
46 * - value size: value will be used as directory name by map_update_elem
47 * (to create relay files). If passed as 0, it will be set to NAME_MAX as
48 * default
49 *
50 * - max_entries: subbuf size
51 * - map_extra: subbuf num, default as 8
52 *
53 * When alloc, we do not set up relay files considering dir_name conflicts.
54 * Instead we use relay_late_setup_files() in map_update_elem(), and thus the
55 * value is used as dir_name, and map->name is used as base_filename.
56 */
57 static struct bpf_map *relay_map_alloc(union bpf_attr *attr)
58 {
59 struct bpf_relay_map *rmap;
60
61 if (unlikely(attr->map_flags & ~RELAY_CREATE_FLAG_MASK))
62 return ERR_PTR(-EINVAL);
63
64 /* key size must be 0 in relay map */
65 if (unlikely(attr->key_size))
66 return ERR_PTR(-EINVAL);
67
68 if (unlikely(attr->value_size > NAME_MAX)) {
69 pr_warn("value_size should be no more than %d\n", NAME_MAX);
70 return ERR_PTR(-EINVAL);
71 } else if (attr->value_size == 0)
72 attr->value_size = NAME_MAX;
73
74 /* set default subbuf num */
75 attr->map_extra = attr->map_extra & UINT_MAX;
76 if (!attr->map_extra)
77 attr->map_extra = 8;
78
> 79 if (!attr->map_name || strlen(attr->map_name) == 0)
80 return ERR_PTR(-EINVAL);
81
82 rmap = bpf_map_area_alloc(sizeof(*rmap), NUMA_NO_NODE);
83 if (!rmap)
84 return ERR_PTR(-ENOMEM);
85
86 bpf_map_init_from_attr(&rmap->map, attr);
87
88 rmap->relay_cb.create_buf_file = create_buf_file_handler;
89 rmap->relay_cb.remove_buf_file = remove_buf_file_handler;
90 if (attr->map_flags & BPF_F_OVERWRITE)
91 rmap->relay_cb.subbuf_start = subbuf_start_overwrite;
92
93 rmap->relay_chan = relay_open(NULL, NULL,
94 attr->max_entries, attr->map_extra,
95 &rmap->relay_cb, NULL);
96 if (!rmap->relay_chan)
97 return ERR_PTR(-EINVAL);
98
99 return &rmap->map;
100 }
101
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH bpf-next 1/3] bpf: implement relay map basis
2023-12-22 12:21 ` [PATCH bpf-next 1/3] bpf: implement relay map basis Philo Lu
2023-12-22 14:45 ` Jiri Olsa
2023-12-22 23:07 ` kernel test robot
@ 2023-12-23 0:11 ` kernel test robot
2023-12-23 11:22 ` Hou Tao
3 siblings, 0 replies; 20+ messages in thread
From: kernel test robot @ 2023-12-23 0:11 UTC (permalink / raw)
To: Philo Lu, bpf
Cc: oe-kbuild-all, ast, daniel, john.fastabend, andrii, martin.lau,
song, yonghong.song, kpsingh, sdf, haoluo, jolsa, rostedt,
mhiramat, mathieu.desnoyers, linux-trace-kernel, xuanzhuo,
dust.li, alibuda, guwen, hengqi, shung-hsi.yu
Hi Philo,
kernel test robot noticed the following build warnings:
[auto build test WARNING on bpf-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Philo-Lu/bpf-implement-relay-map-basis/20231222-204545
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20231222122146.65519-2-lulie%40linux.alibaba.com
patch subject: [PATCH bpf-next 1/3] bpf: implement relay map basis
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20231223/202312230850.NQeIZXj6-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231223/202312230850.NQeIZXj6-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202312230850.NQeIZXj6-lkp@intel.com/
All warnings (new ones prefixed by >>):
kernel/bpf/relaymap.c: In function 'relay_map_alloc':
>> kernel/bpf/relaymap.c:79:13: warning: the comparison will always evaluate as 'true' for the address of 'map_name' will never be NULL [-Waddress]
79 | if (!attr->map_name || strlen(attr->map_name) == 0)
| ^
In file included from include/linux/bpf.h:7,
from include/linux/filter.h:9,
from kernel/bpf/relaymap.c:3:
include/uapi/linux/bpf.h:1394:25: note: 'map_name' declared here
1394 | char map_name[BPF_OBJ_NAME_LEN];
| ^~~~~~~~
vim +79 kernel/bpf/relaymap.c
43
44 /* bpf_attr is used as follows:
45 * - key size: must be 0
46 * - value size: value will be used as directory name by map_update_elem
47 * (to create relay files). If passed as 0, it will be set to NAME_MAX as
48 * default
49 *
50 * - max_entries: subbuf size
51 * - map_extra: subbuf num, default as 8
52 *
53 * When alloc, we do not set up relay files considering dir_name conflicts.
54 * Instead we use relay_late_setup_files() in map_update_elem(), and thus the
55 * value is used as dir_name, and map->name is used as base_filename.
56 */
57 static struct bpf_map *relay_map_alloc(union bpf_attr *attr)
58 {
59 struct bpf_relay_map *rmap;
60
61 if (unlikely(attr->map_flags & ~RELAY_CREATE_FLAG_MASK))
62 return ERR_PTR(-EINVAL);
63
64 /* key size must be 0 in relay map */
65 if (unlikely(attr->key_size))
66 return ERR_PTR(-EINVAL);
67
68 if (unlikely(attr->value_size > NAME_MAX)) {
69 pr_warn("value_size should be no more than %d\n", NAME_MAX);
70 return ERR_PTR(-EINVAL);
71 } else if (attr->value_size == 0)
72 attr->value_size = NAME_MAX;
73
74 /* set default subbuf num */
75 attr->map_extra = attr->map_extra & UINT_MAX;
76 if (!attr->map_extra)
77 attr->map_extra = 8;
78
> 79 if (!attr->map_name || strlen(attr->map_name) == 0)
80 return ERR_PTR(-EINVAL);
81
82 rmap = bpf_map_area_alloc(sizeof(*rmap), NUMA_NO_NODE);
83 if (!rmap)
84 return ERR_PTR(-ENOMEM);
85
86 bpf_map_init_from_attr(&rmap->map, attr);
87
88 rmap->relay_cb.create_buf_file = create_buf_file_handler;
89 rmap->relay_cb.remove_buf_file = remove_buf_file_handler;
90 if (attr->map_flags & BPF_F_OVERWRITE)
91 rmap->relay_cb.subbuf_start = subbuf_start_overwrite;
92
93 rmap->relay_chan = relay_open(NULL, NULL,
94 attr->max_entries, attr->map_extra,
95 &rmap->relay_cb, NULL);
96 if (!rmap->relay_chan)
97 return ERR_PTR(-EINVAL);
98
99 return &rmap->map;
100 }
101
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH bpf-next 1/3] bpf: implement relay map basis
2023-12-22 14:45 ` Jiri Olsa
@ 2023-12-23 2:54 ` Philo Lu
0 siblings, 0 replies; 20+ messages in thread
From: Philo Lu @ 2023-12-23 2:54 UTC (permalink / raw)
To: Jiri Olsa
Cc: bpf, ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, rostedt, mhiramat,
mathieu.desnoyers, linux-trace-kernel, xuanzhuo, dust.li, alibuda,
guwen, hengqi, shung-hsi.yu
On 2023/12/22 22:45, Jiri Olsa wrote:
> On Fri, Dec 22, 2023 at 08:21:44PM +0800, Philo Lu wrote:
>
> SNIP
>
>> +/* bpf_attr is used as follows:
>> + * - key size: must be 0
>> + * - value size: value will be used as directory name by map_update_elem
>> + * (to create relay files). If passed as 0, it will be set to NAME_MAX as
>> + * default
>> + *
>> + * - max_entries: subbuf size
>> + * - map_extra: subbuf num, default as 8
>> + *
>> + * When alloc, we do not set up relay files considering dir_name conflicts.
>> + * Instead we use relay_late_setup_files() in map_update_elem(), and thus the
>> + * value is used as dir_name, and map->name is used as base_filename.
>> + */
>> +static struct bpf_map *relay_map_alloc(union bpf_attr *attr)
>> +{
>> + struct bpf_relay_map *rmap;
>> +
>> + if (unlikely(attr->map_flags & ~RELAY_CREATE_FLAG_MASK))
>> + return ERR_PTR(-EINVAL);
>> +
>> + /* key size must be 0 in relay map */
>> + if (unlikely(attr->key_size))
>> + return ERR_PTR(-EINVAL);
>> +
>> + if (unlikely(attr->value_size > NAME_MAX)) {
>> + pr_warn("value_size should be no more than %d\n", NAME_MAX);
>> + return ERR_PTR(-EINVAL);
>> + } else if (attr->value_size == 0)
>> + attr->value_size = NAME_MAX;
>
> the concept of no key with just value seems strange.. I never worked
> with relay channels, so perhaps stupid question: but why not have one
> relay channel for given key? having the debugfs like:
>
> /sys/kernel/debug/my_rmap/mychannel/<cpu>
>
Here, a relay map is actually a relay channel, which includes buffers
for all cpus. And I think 2 levels is enough when we use relay map in
`/sys/kernel/debug/`: <dir_name>/<map_name>[#cpu]. The `dir_name` is
necessary because user could use the same `map_name` in different bpf
programs, and we can use it as an additional tag to distinguish them.
The `dir_name` is set by user with relay_map_update_elem.
Here is an example. Assume we have 2 relay maps (rmap_a and rmap_b) and
2 cpus, the debugfs will be like:
```
/sys/kernel/debug/<dir_name1>/rmap_a0
/sys/kernel/debug/<dir_name1>/rmap_a1
/sys/kernel/debug/<dir_name2>/rmap_b0
/sys/kernel/debug/<dir_name2>/rmap_b1
```
So I think the key point here is that we just need one field to set the
`dir_name`, either key or value. I chose key as NULL because I think it
suggests "Normally map_update_elem should be invoked just once for a
relay map". But I think it okay to use key instead, and value as NULL.
>> +
>> + /* set default subbuf num */
>> + attr->map_extra = attr->map_extra & UINT_MAX;
>> + if (!attr->map_extra)
>> + attr->map_extra = 8;
>> +
>> + if (!attr->map_name || strlen(attr->map_name) == 0)
>
> attr->map_name is allways != NULL
>
>> + return ERR_PTR(-EINVAL);
>> +
>> + rmap = bpf_map_area_alloc(sizeof(*rmap), NUMA_NO_NODE);
>> + if (!rmap)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + bpf_map_init_from_attr(&rmap->map, attr);
>> +
>> + rmap->relay_cb.create_buf_file = create_buf_file_handler;
>> + rmap->relay_cb.remove_buf_file = remove_buf_file_handler;
>> + if (attr->map_flags & BPF_F_OVERWRITE)
>> + rmap->relay_cb.subbuf_start = subbuf_start_overwrite;
>> +
>> + rmap->relay_chan = relay_open(NULL, NULL,
>> + attr->max_entries, attr->map_extra,
>> + &rmap->relay_cb, NULL);
>
> wrong indentation
>
Got it. I will adjust it.
>> + if (!rmap->relay_chan)
>> + return ERR_PTR(-EINVAL);
>> +
>> + return &rmap->map;
>> +}
>> +
>> +static void relay_map_free(struct bpf_map *map)
>> +{
>> + struct bpf_relay_map *rmap;
>> +
>> + rmap = container_of(map, struct bpf_relay_map, map);
>> + relay_close(rmap->relay_chan);
>> + debugfs_remove_recursive(rmap->relay_chan->parent);
>> + kfree(rmap);
>
> should you use bpf_map_area_free instead?
>
Thanks for catching. Will fix it.
> jirka
>
>> +}
>> +
>> +static void *relay_map_lookup_elem(struct bpf_map *map, void *key)
>> +{
>> + return ERR_PTR(-EOPNOTSUPP);
>> +}
>> +
>> +static long relay_map_update_elem(struct bpf_map *map, void *key, void *value,
>> + u64 flags)
>> +{
>> + return -EOPNOTSUPP;
>> +}
>> +
>> +static long relay_map_delete_elem(struct bpf_map *map, void *key)
>> +{
>> + return -EOPNOTSUPP;
>> +}
>> +
>> +static int relay_map_get_next_key(struct bpf_map *map, void *key,
>> + void *next_key)
>> +{
>> + return -EOPNOTSUPP;
>> +}
>> +
>> +static u64 relay_map_mem_usage(const struct bpf_map *map)
>> +{
>> + struct bpf_relay_map *rmap;
>> + u64 usage = sizeof(struct bpf_relay_map);
>> +
>> + rmap = container_of(map, struct bpf_relay_map, map);
>> + usage += sizeof(struct rchan);
>> + usage += (sizeof(struct rchan_buf) + rmap->relay_chan->alloc_size)
>> + * num_online_cpus();
>> + return usage;
>> +}
>> +
>> +BTF_ID_LIST_SINGLE(relay_map_btf_ids, struct, bpf_relay_map)
>> +const struct bpf_map_ops relay_map_ops = {
>> + .map_meta_equal = bpf_map_meta_equal,
>> + .map_alloc = relay_map_alloc,
>> + .map_free = relay_map_free,
>> + .map_lookup_elem = relay_map_lookup_elem,
>> + .map_update_elem = relay_map_update_elem,
>> + .map_delete_elem = relay_map_delete_elem,
>> + .map_get_next_key = relay_map_get_next_key,
>> + .map_mem_usage = relay_map_mem_usage,
>> + .map_btf_id = &relay_map_btf_ids[0],
>> +};
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 1bf9805ee185..35ae54ac6736 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -1147,6 +1147,7 @@ static int map_create(union bpf_attr *attr)
>> }
>>
>> if (attr->map_type != BPF_MAP_TYPE_BLOOM_FILTER &&
>> + attr->map_type != BPF_MAP_TYPE_RELAY &&
>> attr->map_extra != 0)
>> return -EINVAL;
>>
>> --
>> 2.32.0.3.g01195cf9f
>>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH bpf-next 2/3] bpf: implement map_update_elem to init relay file
2023-12-22 14:45 ` Jiri Olsa
@ 2023-12-23 2:55 ` Philo Lu
0 siblings, 0 replies; 20+ messages in thread
From: Philo Lu @ 2023-12-23 2:55 UTC (permalink / raw)
To: Jiri Olsa
Cc: bpf, ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, rostedt, mhiramat,
mathieu.desnoyers, linux-trace-kernel, xuanzhuo, dust.li, alibuda,
guwen, hengqi, shung-hsi.yu
On 2023/12/22 22:45, Jiri Olsa wrote:
> On Fri, Dec 22, 2023 at 08:21:45PM +0800, Philo Lu wrote:
>> map_update_elem is used to create relay files and bind them with the
>> relay channel, which is created with BPF_MAP_CREATE. This allows users
>> to set a custom directory name. It must be used with key=NULL and
>> flag=0.
>>
>> Here is an example:
>> ```
>> struct {
>> __uint(type, BPF_MAP_TYPE_RELAY);
>> __uint(max_entries, 4096);
>> } my_relay SEC(".maps");
>> ...
>> char dir_name[] = "relay_test";
>> bpf_map_update_elem(map_fd, NULL, dir_name, 0);
>> ```
>>
>> Then, directory `/sys/kerenl/debug/relay_test` will be created, which
>> includes files of my_relay0...my_relay[#cpu]. Each represents a per-cpu
>> buffer with size 8 * 4096 B (there are 8 subbufs by default, each with
>> size 4096B).
>>
>> Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
>> ---
>> kernel/bpf/relaymap.c | 32 +++++++++++++++++++++++++++++++-
>> 1 file changed, 31 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/bpf/relaymap.c b/kernel/bpf/relaymap.c
>> index d0adc7f67758..588c8de0a4bd 100644
>> --- a/kernel/bpf/relaymap.c
>> +++ b/kernel/bpf/relaymap.c
>> @@ -117,7 +117,37 @@ static void *relay_map_lookup_elem(struct bpf_map *map, void *key)
>> static long relay_map_update_elem(struct bpf_map *map, void *key, void *value,
>> u64 flags)
>> {
>> - return -EOPNOTSUPP;
>> + struct bpf_relay_map *rmap;
>> + struct dentry *parent;
>> + int err;
>> +
>> + if (unlikely(flags))
>> + return -EINVAL;
>> +
>> + if (unlikely(key))
>> + return -EINVAL;
>> +
>> + rmap = container_of(map, struct bpf_relay_map, map);
>> +
>> + /* The directory already exists */
>> + if (rmap->relay_chan->has_base_filename)
>> + return -EEXIST;
>> +
>> + /* Setup relay files. Note that the directory name passed as value should
>> + * not be longer than map->value_size, including the '\0' at the end.
>> + */
>> + ((char *)value)[map->value_size - 1] = '\0';
>> + parent = debugfs_create_dir(value, NULL);
>> + if (IS_ERR_OR_NULL(parent))
>> + return PTR_ERR(parent);
>> +
>> + err = relay_late_setup_files(rmap->relay_chan, map->name, parent);
>> + if (err) {
>> + debugfs_remove_recursive(parent);
>> + return err;
>> + }
>
> looks like this patch could be moved to the previous one?
>
OK, will do it.
> jirka
>
>> +
>> + return 0;
>> }
>>
>> static long relay_map_delete_elem(struct bpf_map *map, void *key)
>> --
>> 2.32.0.3.g01195cf9f
>>
>>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH bpf-next 3/3] bpf: introduce bpf_relay_output helper
2023-12-22 14:46 ` Jiri Olsa
@ 2023-12-23 2:56 ` Philo Lu
0 siblings, 0 replies; 20+ messages in thread
From: Philo Lu @ 2023-12-23 2:56 UTC (permalink / raw)
To: Jiri Olsa
Cc: bpf, ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, rostedt, mhiramat,
mathieu.desnoyers, linux-trace-kernel, xuanzhuo, dust.li, alibuda,
guwen, hengqi, shung-hsi.yu
On 2023/12/22 22:46, Jiri Olsa wrote:
> On Fri, Dec 22, 2023 at 08:21:46PM +0800, Philo Lu wrote:
>> Like perfbuf/ringbuf, a helper is needed to write into the buffer, named
>> bpf_relay_output, whose usage is same as ringbuf. Note that it works only
>> after relay files are set, i.e., after calling map_update_elem for the
>> created relay map.
>
> we can't add helpers anymore, this will need to be kfunc
> check functions marked with __bpf_kfunc as examples
>
Got it, I will change it to kfunc in the next version.
Thanks.
> jirka
>
>>
>> Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
>> ---
>> include/linux/bpf.h | 1 +
>> include/uapi/linux/bpf.h | 10 ++++++++++
>> kernel/bpf/helpers.c | 4 ++++
>> kernel/bpf/relaymap.c | 26 ++++++++++++++++++++++++++
>> kernel/bpf/verifier.c | 8 ++++++++
>> kernel/trace/bpf_trace.c | 4 ++++
>> 6 files changed, 53 insertions(+)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 7671530d6e4e..b177122369e6 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -3095,6 +3095,7 @@ extern const struct bpf_func_proto bpf_get_retval_proto;
>> extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto;
>> extern const struct bpf_func_proto bpf_cgrp_storage_get_proto;
>> extern const struct bpf_func_proto bpf_cgrp_storage_delete_proto;
>> +extern const struct bpf_func_proto bpf_relay_output_proto;
>>
>> const struct bpf_func_proto *tracing_prog_func_proto(
>> enum bpf_func_id func_id, const struct bpf_prog *prog);
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 143b75676bd3..03c0c1953ba1 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -5686,6 +5686,15 @@ union bpf_attr {
>> * 0 on success.
>> *
>> * **-ENOENT** if the bpf_local_storage cannot be found.
>> + *
>> + * long bpf_relay_output(void *map, void *data, u64 size, u64 flags)
>> + * Description
>> + * Copy *size* bytes from *data* into *map* of type BPF_MAP_TYPE_RELAY.
>> + * Currently, the *flags* must be 0.
>> + * Return
>> + * 0 on success.
>> + *
>> + * **-ENOENT** if the relay base_file in debugfs cannot be found.
>> */
>> #define ___BPF_FUNC_MAPPER(FN, ctx...) \
>> FN(unspec, 0, ##ctx) \
>> @@ -5900,6 +5909,7 @@ union bpf_attr {
>> FN(user_ringbuf_drain, 209, ##ctx) \
>> FN(cgrp_storage_get, 210, ##ctx) \
>> FN(cgrp_storage_delete, 211, ##ctx) \
>> + FN(relay_output, 212, ##ctx) \
>> /* */
>>
>> /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index be72824f32b2..0c26e87ce729 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -1720,6 +1720,10 @@ bpf_base_func_proto(enum bpf_func_id func_id)
>> return &bpf_ringbuf_discard_proto;
>> case BPF_FUNC_ringbuf_query:
>> return &bpf_ringbuf_query_proto;
>> +#ifdef CONFIG_RELAY
>> + case BPF_FUNC_relay_output:
>> + return &bpf_relay_output_proto;
>> +#endif
>> case BPF_FUNC_strncmp:
>> return &bpf_strncmp_proto;
>> case BPF_FUNC_strtol:
>> diff --git a/kernel/bpf/relaymap.c b/kernel/bpf/relaymap.c
>> index 588c8de0a4bd..f9e2e4a780df 100644
>> --- a/kernel/bpf/relaymap.c
>> +++ b/kernel/bpf/relaymap.c
>> @@ -173,6 +173,32 @@ static u64 relay_map_mem_usage(const struct bpf_map *map)
>> return usage;
>> }
>>
>> +BPF_CALL_4(bpf_relay_output, struct bpf_map *, map, void *, data, u64, size,
>> + u64, flags)
>> +{
>> + struct bpf_relay_map *rmap;
>> +
>> + /* not support any flag now */
>> + if (unlikely(flags))
>> + return -EINVAL;
>> +
>> + rmap = container_of(map, struct bpf_relay_map, map);
>> + if (!rmap->relay_chan->has_base_filename)
>> + return -ENOENT;
>> +
>> + relay_write(rmap->relay_chan, data, size);
>> + return 0;
>> +}
>> +
>> +const struct bpf_func_proto bpf_relay_output_proto = {
>> + .func = bpf_relay_output,
>> + .ret_type = RET_INTEGER,
>> + .arg1_type = ARG_CONST_MAP_PTR,
>> + .arg2_type = ARG_PTR_TO_MEM | MEM_RDONLY,
>> + .arg3_type = ARG_CONST_SIZE_OR_ZERO,
>> + .arg4_type = ARG_ANYTHING,
>> +};
>> +
>> BTF_ID_LIST_SINGLE(relay_map_btf_ids, struct, bpf_relay_map)
>> const struct bpf_map_ops relay_map_ops = {
>> .map_meta_equal = bpf_map_meta_equal,
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index f13008d27f35..8c8287d6ae18 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -8800,6 +8800,10 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
>> if (func_id != BPF_FUNC_user_ringbuf_drain)
>> goto error;
>> break;
>> + case BPF_MAP_TYPE_RELAY:
>> + if (func_id != BPF_FUNC_relay_output)
>> + goto error;
>> + break;
>> case BPF_MAP_TYPE_STACK_TRACE:
>> if (func_id != BPF_FUNC_get_stackid)
>> goto error;
>> @@ -8932,6 +8936,10 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
>> if (map->map_type != BPF_MAP_TYPE_USER_RINGBUF)
>> goto error;
>> break;
>> + case BPF_FUNC_relay_output:
>> + if (map->map_type != BPF_MAP_TYPE_RELAY)
>> + goto error;
>> + break;
>> case BPF_FUNC_get_stackid:
>> if (map->map_type != BPF_MAP_TYPE_STACK_TRACE)
>> goto error;
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index 7ac6c52b25eb..5b13553c6232 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -1594,6 +1594,10 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>> return &bpf_ringbuf_discard_proto;
>> case BPF_FUNC_ringbuf_query:
>> return &bpf_ringbuf_query_proto;
>> +#ifdef CONFIG_RELAY
>> + case BPF_FUNC_relay_output:
>> + return &bpf_relay_output_proto;
>> +#endif
>> case BPF_FUNC_jiffies64:
>> return &bpf_jiffies64_proto;
>> case BPF_FUNC_get_task_stack:
>> --
>> 2.32.0.3.g01195cf9f
>>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH bpf-next 0/3] bpf: introduce BPF_MAP_TYPE_RELAY
2023-12-22 14:45 ` [PATCH bpf-next 0/3] bpf: introduce BPF_MAP_TYPE_RELAY Jiri Olsa
@ 2023-12-23 2:57 ` Philo Lu
0 siblings, 0 replies; 20+ messages in thread
From: Philo Lu @ 2023-12-23 2:57 UTC (permalink / raw)
To: Jiri Olsa
Cc: bpf, ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, rostedt, mhiramat,
mathieu.desnoyers, linux-trace-kernel, xuanzhuo, dust.li, alibuda,
guwen, hengqi, shung-hsi.yu
On 2023/12/22 22:45, Jiri Olsa wrote:
> On Fri, Dec 22, 2023 at 08:21:43PM +0800, Philo Lu wrote:
>> The patch set introduce a new type of map, BPF_MAP_TYPE_RELAY, based on
>> relay interface [0]. It provides a way for persistent and overwritable data
>> transfer.
>>
>> As stated in [0], relay is a efficient method for log and data transfer.
>> And the interface is simple enough so that we can implement and use this
>> type of map with current map interfaces. Besides we need a new helper
>> bpf_relay_output to output data to user, similar with bpf_ringbuf_output.
>>
>> We need this map because currently neither ringbuf nor perfbuf satisfies
>> the requirements of relatively long-term consistent tracing, where the bpf
>> program keeps writing into the buffer without any bundled reader, and the
>> buffer supports overwriting. For users, they just run the bpf program to
>> collect data, and are able to read as need. The detailed discussion can be
>> found at [1].
>>
>> The buffer is exposed to users as per-cpu files in debugfs, supporting read
>> and mmap, and it is up to users how to formulate and read it, either
>> through a program with mmap or just `cat`. Specifically, the files are
>> created as "/sys/kerenl/debug/<dirname>/<mapname>#cpu", where the <dirname>
>> is defined with map_update_elem (Note that we do not need to implement
>> actual elem operators for relay_map).
>>
>> If this map is acceptable, other parts including docs, libbpf support,
>> selftests, and benchmarks (if need) will be added in the following version.
>
> looks useful, selftests might be already helpful to see the usage
>
Ok, I will add selftests in the next version.
Thanks.
> jirka
>
>>
>> [0]
>> https://github.com/torvalds/linux/blob/master/Documentation/filesystems/relay.rst
>> [1]
>> https://lore.kernel.org/bpf/20231219122850.433be151@gandalf.local.home/T/
>>
>> Philo Lu (3):
>> bpf: implement relay map basis
>> bpf: implement map_update_elem to init relay file
>> bpf: introduce bpf_relay_output helper
>>
>> include/linux/bpf.h | 1 +
>> include/linux/bpf_types.h | 3 +
>> include/uapi/linux/bpf.h | 17 +++
>> kernel/bpf/Makefile | 3 +
>> kernel/bpf/helpers.c | 4 +
>> kernel/bpf/relaymap.c | 213 ++++++++++++++++++++++++++++++++++++++
>> kernel/bpf/syscall.c | 1 +
>> kernel/bpf/verifier.c | 8 ++
>> kernel/trace/bpf_trace.c | 4 +
>> 9 files changed, 254 insertions(+)
>> create mode 100644 kernel/bpf/relaymap.c
>>
>> --
>> 2.32.0.3.g01195cf9f
>>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH bpf-next 1/3] bpf: implement relay map basis
2023-12-22 12:21 ` [PATCH bpf-next 1/3] bpf: implement relay map basis Philo Lu
` (2 preceding siblings ...)
2023-12-23 0:11 ` kernel test robot
@ 2023-12-23 11:22 ` Hou Tao
2023-12-23 13:02 ` Philo Lu
3 siblings, 1 reply; 20+ messages in thread
From: Hou Tao @ 2023-12-23 11:22 UTC (permalink / raw)
To: Philo Lu, bpf
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, rostedt, mhiramat,
mathieu.desnoyers, linux-trace-kernel, xuanzhuo, dust.li, alibuda,
guwen, hengqi, shung-hsi.yu
Hi,
On 12/22/2023 8:21 PM, Philo Lu wrote:
> BPF_MAP_TYPE_RELAY is implemented based on relay interface, which
> creates per-cpu buffer to transfer data. Each buffer is essentially a
> list of fix-sized sub-buffers, and is exposed to user space as files in
> debugfs. attr->max_entries is used as subbuf size and attr->map_extra is
> used as subbuf num. Currently, the default value of subbuf num is 8.
>
> The data can be accessed by read or mmap via these files. For example,
> if there are 2 cpus, files could be `/sys/kernel/debug/mydir/my_rmap0`
> and `/sys/kernel/debug/mydir/my_rmap1`.
>
> Buffer-only mode is used to create the relay map, which just allocates
> the buffer without creating user-space files. Then user can setup the
> files with map_update_elem, thus allowing user to define the directory
> name in debugfs. map_update_elem is implemented in the following patch.
>
> A new map flag named BPF_F_OVERWRITE is introduced to set overwrite mode
> of relay map.
Beside adding a new map type, could we consider only use kfuncs to
support the creation of rchan and the write of rchan ? I think
bpf_cpumask will be a good reference.
>
> Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
> ---
> include/linux/bpf_types.h | 3 +
> include/uapi/linux/bpf.h | 7 ++
> kernel/bpf/Makefile | 3 +
> kernel/bpf/relaymap.c | 157 ++++++++++++++++++++++++++++++++++++++
> kernel/bpf/syscall.c | 1 +
> 5 files changed, 171 insertions(+)
> create mode 100644 kernel/bpf/relaymap.c
>
SNIP
> diff --git a/kernel/bpf/relaymap.c b/kernel/bpf/relaymap.c
> new file mode 100644
> index 000000000000..d0adc7f67758
> --- /dev/null
> +++ b/kernel/bpf/relaymap.c
> @@ -0,0 +1,157 @@
> +#include <linux/cpumask.h>
> +#include <linux/debugfs.h>
> +#include <linux/filter.h>
> +#include <linux/relay.h>
> +#include <linux/slab.h>
> +#include <linux/bpf.h>
> +#include <linux/err.h>
> +
> +#define RELAY_CREATE_FLAG_MASK (BPF_F_OVERWRITE)
> +
> +struct bpf_relay_map {
> + struct bpf_map map;
> + struct rchan *relay_chan;
> + struct rchan_callbacks relay_cb;
> +};
It seems that there is no need to add relay_cb in bpf_relay_map. We
could define two kinds of rchan_callbacks: one for non-overwrite mode
and another one for overwrite mode.
> +
> +static struct dentry *create_buf_file_handler(const char *filename,
> + struct dentry *parent, umode_t mode,
> + struct rchan_buf *buf, int *is_global)
> +{
> + /* Because we do relay_late_setup_files(), create_buf_file(NULL, NULL, ...)
> + * will be called by relay_open.
> + */
> + if (!filename)
> + return NULL;
> +
> + return debugfs_create_file(filename, mode, parent, buf,
> + &relay_file_operations);
> +}
> +
> +static int remove_buf_file_handler(struct dentry *dentry)
> +{
> + debugfs_remove(dentry);
> + return 0;
> +}
> +
> +/* For non-overwrite, use default subbuf_start cb */
> +static int subbuf_start_overwrite(struct rchan_buf *buf, void *subbuf,
> + void *prev_subbuf, size_t prev_padding)
> +{
> + return 1;
> +}
> +
> +/* bpf_attr is used as follows:
> + * - key size: must be 0
> + * - value size: value will be used as directory name by map_update_elem
> + * (to create relay files). If passed as 0, it will be set to NAME_MAX as
> + * default
> + *
> + * - max_entries: subbuf size
> + * - map_extra: subbuf num, default as 8
> + *
> + * When alloc, we do not set up relay files considering dir_name conflicts.
> + * Instead we use relay_late_setup_files() in map_update_elem(), and thus the
> + * value is used as dir_name, and map->name is used as base_filename.
> + */
> +static struct bpf_map *relay_map_alloc(union bpf_attr *attr)
> +{
> + struct bpf_relay_map *rmap;
> +
> + if (unlikely(attr->map_flags & ~RELAY_CREATE_FLAG_MASK))
> + return ERR_PTR(-EINVAL);
> +
> + /* key size must be 0 in relay map */
> + if (unlikely(attr->key_size))
> + return ERR_PTR(-EINVAL);
> +
> + if (unlikely(attr->value_size > NAME_MAX)) {
> + pr_warn("value_size should be no more than %d\n", NAME_MAX);
> + return ERR_PTR(-EINVAL);
> + } else if (attr->value_size == 0)
> + attr->value_size = NAME_MAX;
> +
> + /* set default subbuf num */
> + attr->map_extra = attr->map_extra & UINT_MAX;
Should we reject invalid map_extra and return -EINVAL instead ?
> + if (!attr->map_extra)
> + attr->map_extra = 8;
> +
> + if (!attr->map_name || strlen(attr->map_name) == 0)
> + return ERR_PTR(-EINVAL);
> +
> + rmap = bpf_map_area_alloc(sizeof(*rmap), NUMA_NO_NODE);
> + if (!rmap)
> + return ERR_PTR(-ENOMEM);
> +
> + bpf_map_init_from_attr(&rmap->map, attr);
> +
> + rmap->relay_cb.create_buf_file = create_buf_file_handler;
> + rmap->relay_cb.remove_buf_file = remove_buf_file_handler;
> + if (attr->map_flags & BPF_F_OVERWRITE)
> + rmap->relay_cb.subbuf_start = subbuf_start_overwrite;
> +
> + rmap->relay_chan = relay_open(NULL, NULL,
> + attr->max_entries, attr->map_extra,
> + &rmap->relay_cb, NULL);
> + if (!rmap->relay_chan)
> + return ERR_PTR(-EINVAL);
Need to free rmap.
> +
> + return &rmap->map;
> +}
> +
> +static void relay_map_free(struct bpf_map *map)
> +{
> + struct bpf_relay_map *rmap;
> +
> + rmap = container_of(map, struct bpf_relay_map, map);
> + relay_close(rmap->relay_chan);
> + debugfs_remove_recursive(rmap->relay_chan->parent);
rmap->relay_chan may be freed by relay_close(), so it is not safe to
dereference relay_chan->parent here. And is debugfs_remove_recursive()
necessary here ?
> + kfree(rmap);
> +}
> +
> +static void *relay_map_lookup_elem(struct bpf_map *map, void *key)
> +{
> + return ERR_PTR(-EOPNOTSUPP);
> +}
> +
> +static long relay_map_update_elem(struct bpf_map *map, void *key, void *value,
> + u64 flags)
> +{
> + return -EOPNOTSUPP;
> +}
> +
> +static long relay_map_delete_elem(struct bpf_map *map, void *key)
> +{
> + return -EOPNOTSUPP;
> +}
> +
> +static int relay_map_get_next_key(struct bpf_map *map, void *key,
> + void *next_key)
> +{
> + return -EOPNOTSUPP;
> +}
> +
> +static u64 relay_map_mem_usage(const struct bpf_map *map)
> +{
> + struct bpf_relay_map *rmap;
> + u64 usage = sizeof(struct bpf_relay_map);
> +
> + rmap = container_of(map, struct bpf_relay_map, map);
> + usage += sizeof(struct rchan);
> + usage += (sizeof(struct rchan_buf) + rmap->relay_chan->alloc_size)
> + * num_online_cpus();
> + return usage;
> +}
> +
> +BTF_ID_LIST_SINGLE(relay_map_btf_ids, struct, bpf_relay_map)
> +const struct bpf_map_ops relay_map_ops = {
> + .map_meta_equal = bpf_map_meta_equal,
> + .map_alloc = relay_map_alloc,
> + .map_free = relay_map_free,
> + .map_lookup_elem = relay_map_lookup_elem,
> + .map_update_elem = relay_map_update_elem,
> + .map_delete_elem = relay_map_delete_elem,
> + .map_get_next_key = relay_map_get_next_key,
> + .map_mem_usage = relay_map_mem_usage,
> + .map_btf_id = &relay_map_btf_ids[0],
> +};
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH bpf-next 2/3] bpf: implement map_update_elem to init relay file
2023-12-22 12:21 ` [PATCH bpf-next 2/3] bpf: implement map_update_elem to init relay file Philo Lu
2023-12-22 14:45 ` Jiri Olsa
@ 2023-12-23 11:28 ` Hou Tao
2023-12-23 13:19 ` Philo Lu
1 sibling, 1 reply; 20+ messages in thread
From: Hou Tao @ 2023-12-23 11:28 UTC (permalink / raw)
To: Philo Lu, bpf
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, rostedt, mhiramat,
mathieu.desnoyers, linux-trace-kernel, xuanzhuo, dust.li, alibuda,
guwen, hengqi, shung-hsi.yu
Hi,
On 12/22/2023 8:21 PM, Philo Lu wrote:
> map_update_elem is used to create relay files and bind them with the
> relay channel, which is created with BPF_MAP_CREATE. This allows users
> to set a custom directory name. It must be used with key=NULL and
> flag=0.
>
> Here is an example:
> ```
> struct {
> __uint(type, BPF_MAP_TYPE_RELAY);
> __uint(max_entries, 4096);
> } my_relay SEC(".maps");
> ...
> char dir_name[] = "relay_test";
> bpf_map_update_elem(map_fd, NULL, dir_name, 0);
> ```
>
> Then, directory `/sys/kerenl/debug/relay_test` will be created, which
> includes files of my_relay0...my_relay[#cpu]. Each represents a per-cpu
> buffer with size 8 * 4096 B (there are 8 subbufs by default, each with
> size 4096B).
It is a little weird. Because the name of the relay file is
$debug_fs_root/$value_name/${map_name}xxx. Could we update it to
$debug_fs_root/$map_name/$value_name/xxx instead ?
> Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
> ---
> kernel/bpf/relaymap.c | 32 +++++++++++++++++++++++++++++++-
> 1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/relaymap.c b/kernel/bpf/relaymap.c
> index d0adc7f67758..588c8de0a4bd 100644
> --- a/kernel/bpf/relaymap.c
> +++ b/kernel/bpf/relaymap.c
> @@ -117,7 +117,37 @@ static void *relay_map_lookup_elem(struct bpf_map *map, void *key)
> static long relay_map_update_elem(struct bpf_map *map, void *key, void *value,
> u64 flags)
> {
> - return -EOPNOTSUPP;
> + struct bpf_relay_map *rmap;
> + struct dentry *parent;
> + int err;
> +
> + if (unlikely(flags))
> + return -EINVAL;
> +
> + if (unlikely(key))
> + return -EINVAL;
> +
> + rmap = container_of(map, struct bpf_relay_map, map);
> +
Lock is needed here, because .map_update_elem can be invoked concurrently.
> + /* The directory already exists */
> + if (rmap->relay_chan->has_base_filename)
> + return -EEXIST;
> +
> + /* Setup relay files. Note that the directory name passed as value should
> + * not be longer than map->value_size, including the '\0' at the end.
> + */
> + ((char *)value)[map->value_size - 1] = '\0';
> + parent = debugfs_create_dir(value, NULL);
> + if (IS_ERR_OR_NULL(parent))
> + return PTR_ERR(parent);
> +
> + err = relay_late_setup_files(rmap->relay_chan, map->name, parent);
> + if (err) {
> + debugfs_remove_recursive(parent);
> + return err;
> + }
> +
> + return 0;
> }
>
> static long relay_map_delete_elem(struct bpf_map *map, void *key)
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH bpf-next 1/3] bpf: implement relay map basis
2023-12-23 11:22 ` Hou Tao
@ 2023-12-23 13:02 ` Philo Lu
2023-12-25 11:36 ` Philo Lu
0 siblings, 1 reply; 20+ messages in thread
From: Philo Lu @ 2023-12-23 13:02 UTC (permalink / raw)
To: Hou Tao, bpf
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, rostedt, mhiramat,
mathieu.desnoyers, linux-trace-kernel, xuanzhuo, dust.li, alibuda,
guwen, hengqi, shung-hsi.yu
On 2023/12/23 19:22, Hou Tao wrote:
> Hi,
>
> On 12/22/2023 8:21 PM, Philo Lu wrote:
>> BPF_MAP_TYPE_RELAY is implemented based on relay interface, which
>> creates per-cpu buffer to transfer data. Each buffer is essentially a
>> list of fix-sized sub-buffers, and is exposed to user space as files in
>> debugfs. attr->max_entries is used as subbuf size and attr->map_extra is
>> used as subbuf num. Currently, the default value of subbuf num is 8.
>>
>> The data can be accessed by read or mmap via these files. For example,
>> if there are 2 cpus, files could be `/sys/kernel/debug/mydir/my_rmap0`
>> and `/sys/kernel/debug/mydir/my_rmap1`.
>>
>> Buffer-only mode is used to create the relay map, which just allocates
>> the buffer without creating user-space files. Then user can setup the
>> files with map_update_elem, thus allowing user to define the directory
>> name in debugfs. map_update_elem is implemented in the following patch.
>>
>> A new map flag named BPF_F_OVERWRITE is introduced to set overwrite mode
>> of relay map.
>
> Beside adding a new map type, could we consider only use kfuncs to
> support the creation of rchan and the write of rchan ? I think
> bpf_cpumask will be a good reference.
This is a good question. TBH, I have thought of implement it with
helpers (I'm not very familiar with kfuncs, but I think they could be
similar?), but I was stumped by how to close the channel. We can create
a relay channel, hold it with a map, but it could be difficult for the
bpf program to close the channel with relay_close(). And I think it
could be the difference compared with bpf_cpumask.
>>
>> Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
>> ---
>> include/linux/bpf_types.h | 3 +
>> include/uapi/linux/bpf.h | 7 ++
>> kernel/bpf/Makefile | 3 +
>> kernel/bpf/relaymap.c | 157 ++++++++++++++++++++++++++++++++++++++
>> kernel/bpf/syscall.c | 1 +
>> 5 files changed, 171 insertions(+)
>> create mode 100644 kernel/bpf/relaymap.c
>>
>
> SNIP
>> diff --git a/kernel/bpf/relaymap.c b/kernel/bpf/relaymap.c
>> new file mode 100644
>> index 000000000000..d0adc7f67758
>> --- /dev/null
>> +++ b/kernel/bpf/relaymap.c
>> @@ -0,0 +1,157 @@
>> +#include <linux/cpumask.h>
>> +#include <linux/debugfs.h>
>> +#include <linux/filter.h>
>> +#include <linux/relay.h>
>> +#include <linux/slab.h>
>> +#include <linux/bpf.h>
>> +#include <linux/err.h>
>> +
>> +#define RELAY_CREATE_FLAG_MASK (BPF_F_OVERWRITE)
>> +
>> +struct bpf_relay_map {
>> + struct bpf_map map;
>> + struct rchan *relay_chan;
>> + struct rchan_callbacks relay_cb;
>> +};
>
> It seems that there is no need to add relay_cb in bpf_relay_map. We
> could define two kinds of rchan_callbacks: one for non-overwrite mode
> and another one for overwrite mode.
We need to store relay_cb, because the relay channel only uses a pointer
of struct rchan_callbacks. So I maintain it with the map, or is there a
better place to store it?
>> +
>> +static struct dentry *create_buf_file_handler(const char *filename,
>> + struct dentry *parent, umode_t mode,
>> + struct rchan_buf *buf, int *is_global)
>> +{
>> + /* Because we do relay_late_setup_files(), create_buf_file(NULL, NULL, ...)
>> + * will be called by relay_open.
>> + */
>> + if (!filename)
>> + return NULL;
>> +
>> + return debugfs_create_file(filename, mode, parent, buf,
>> + &relay_file_operations);
>> +}
>> +
>> +static int remove_buf_file_handler(struct dentry *dentry)
>> +{
>> + debugfs_remove(dentry);
>> + return 0;
>> +}
>> +
>> +/* For non-overwrite, use default subbuf_start cb */
>> +static int subbuf_start_overwrite(struct rchan_buf *buf, void *subbuf,
>> + void *prev_subbuf, size_t prev_padding)
>> +{
>> + return 1;
>> +}
>> +
>> +/* bpf_attr is used as follows:
>> + * - key size: must be 0
>> + * - value size: value will be used as directory name by map_update_elem
>> + * (to create relay files). If passed as 0, it will be set to NAME_MAX as
>> + * default
>> + *
>> + * - max_entries: subbuf size
>> + * - map_extra: subbuf num, default as 8
>> + *
>> + * When alloc, we do not set up relay files considering dir_name conflicts.
>> + * Instead we use relay_late_setup_files() in map_update_elem(), and thus the
>> + * value is used as dir_name, and map->name is used as base_filename.
>> + */
>> +static struct bpf_map *relay_map_alloc(union bpf_attr *attr)
>> +{
>> + struct bpf_relay_map *rmap;
>> +
>> + if (unlikely(attr->map_flags & ~RELAY_CREATE_FLAG_MASK))
>> + return ERR_PTR(-EINVAL);
>> +
>> + /* key size must be 0 in relay map */
>> + if (unlikely(attr->key_size))
>> + return ERR_PTR(-EINVAL);
>> +
>> + if (unlikely(attr->value_size > NAME_MAX)) {
>> + pr_warn("value_size should be no more than %d\n", NAME_MAX);
>> + return ERR_PTR(-EINVAL);
>> + } else if (attr->value_size == 0)
>> + attr->value_size = NAME_MAX;
>> +
>> + /* set default subbuf num */
>> + attr->map_extra = attr->map_extra & UINT_MAX;
>
> Should we reject invalid map_extra and return -EINVAL instead ?
Ack, will add a check here.
>> + if (!attr->map_extra)
>> + attr->map_extra = 8;
>> +
>> + if (!attr->map_name || strlen(attr->map_name) == 0)
>> + return ERR_PTR(-EINVAL);
>> +
>> + rmap = bpf_map_area_alloc(sizeof(*rmap), NUMA_NO_NODE);
>> + if (!rmap)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + bpf_map_init_from_attr(&rmap->map, attr);
>> +
>> + rmap->relay_cb.create_buf_file = create_buf_file_handler;
>> + rmap->relay_cb.remove_buf_file = remove_buf_file_handler;
>> + if (attr->map_flags & BPF_F_OVERWRITE)
>> + rmap->relay_cb.subbuf_start = subbuf_start_overwrite;
>> +
>> + rmap->relay_chan = relay_open(NULL, NULL,
>> + attr->max_entries, attr->map_extra,
>> + &rmap->relay_cb, NULL);
>> + if (!rmap->relay_chan)
>> + return ERR_PTR(-EINVAL);
>
> Need to free rmap.
Good catch. Will fix it.
>> +
>> + return &rmap->map;
>> +}
>> +
>> +static void relay_map_free(struct bpf_map *map)
>> +{
>> + struct bpf_relay_map *rmap;
>> +
>> + rmap = container_of(map, struct bpf_relay_map, map);
>> + relay_close(rmap->relay_chan);
>> + debugfs_remove_recursive(rmap->relay_chan->parent);
>
> rmap->relay_chan may be freed by relay_close(), so it is not safe to
> dereference relay_chan->parent here. And is debugfs_remove_recursive()
> necessary here ?
debugfs_remove_recursive is needed here, otherwise the directory will be
left, but the order is indeed incorrect here. I will move it before
relay_close, and check if rmap->relay_chan->parent is valid.
Thanks.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH bpf-next 2/3] bpf: implement map_update_elem to init relay file
2023-12-23 11:28 ` Hou Tao
@ 2023-12-23 13:19 ` Philo Lu
0 siblings, 0 replies; 20+ messages in thread
From: Philo Lu @ 2023-12-23 13:19 UTC (permalink / raw)
To: Hou Tao, bpf
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, rostedt, mhiramat,
mathieu.desnoyers, linux-trace-kernel, xuanzhuo, dust.li, alibuda,
guwen, hengqi, shung-hsi.yu
On 2023/12/23 19:28, Hou Tao wrote:
> Hi,
>
> On 12/22/2023 8:21 PM, Philo Lu wrote:
>> map_update_elem is used to create relay files and bind them with the
>> relay channel, which is created with BPF_MAP_CREATE. This allows users
>> to set a custom directory name. It must be used with key=NULL and
>> flag=0.
>>
>> Here is an example:
>> ```
>> struct {
>> __uint(type, BPF_MAP_TYPE_RELAY);
>> __uint(max_entries, 4096);
>> } my_relay SEC(".maps");
>> ...
>> char dir_name[] = "relay_test";
>> bpf_map_update_elem(map_fd, NULL, dir_name, 0);
>> ```
>>
>> Then, directory `/sys/kerenl/debug/relay_test` will be created, which
>> includes files of my_relay0...my_relay[#cpu]. Each represents a per-cpu
>> buffer with size 8 * 4096 B (there are 8 subbufs by default, each with
>> size 4096B).
>
> It is a little weird. Because the name of the relay file is
> $debug_fs_root/$value_name/${map_name}xxx. Could we update it to
> $debug_fs_root/$map_name/$value_name/xxx instead ?
I think a unique directory is enough for a relay map, so currently users
can use map_update_elem to set the directory name. Thus
$map_name/${value_name}xxx may be better than $map_name/$value_name/xxx.
As for whether map_name or value_name is better to be used as the
directory name, I think it more likely that different bpf programs share
a same map_name. So value_name is currently used.
>> Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
>> ---
>> kernel/bpf/relaymap.c | 32 +++++++++++++++++++++++++++++++-
>> 1 file changed, 31 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/bpf/relaymap.c b/kernel/bpf/relaymap.c
>> index d0adc7f67758..588c8de0a4bd 100644
>> --- a/kernel/bpf/relaymap.c
>> +++ b/kernel/bpf/relaymap.c
>> @@ -117,7 +117,37 @@ static void *relay_map_lookup_elem(struct bpf_map *map, void *key)
>> static long relay_map_update_elem(struct bpf_map *map, void *key, void *value,
>> u64 flags)
>> {
>> - return -EOPNOTSUPP;
>> + struct bpf_relay_map *rmap;
>> + struct dentry *parent;
>> + int err;
>> +
>> + if (unlikely(flags))
>> + return -EINVAL;
>> +
>> + if (unlikely(key))
>> + return -EINVAL;
>> +
>> + rmap = container_of(map, struct bpf_relay_map, map);
>> +
>
> Lock is needed here, because .map_update_elem can be invoked concurrently.
Got it. I will fix it in the next version.
Thanks.
>> + /* The directory already exists */
>> + if (rmap->relay_chan->has_base_filename)
>> + return -EEXIST;
>> +
>> + /* Setup relay files. Note that the directory name passed as value should
>> + * not be longer than map->value_size, including the '\0' at the end.
>> + */
>> + ((char *)value)[map->value_size - 1] = '\0';
>> + parent = debugfs_create_dir(value, NULL);
>> + if (IS_ERR_OR_NULL(parent))
>> + return PTR_ERR(parent);
>> +
>> + err = relay_late_setup_files(rmap->relay_chan, map->name, parent);
>> + if (err) {
>> + debugfs_remove_recursive(parent);
>> + return err;
>> + }
>> +
>> + return 0;
>> }
>>
>> static long relay_map_delete_elem(struct bpf_map *map, void *key)
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH bpf-next 1/3] bpf: implement relay map basis
2023-12-23 13:02 ` Philo Lu
@ 2023-12-25 11:36 ` Philo Lu
2023-12-25 13:06 ` Hou Tao
0 siblings, 1 reply; 20+ messages in thread
From: Philo Lu @ 2023-12-25 11:36 UTC (permalink / raw)
To: Hou Tao, bpf
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, rostedt, mhiramat,
mathieu.desnoyers, linux-trace-kernel, xuanzhuo, dust.li, alibuda,
guwen, hengqi, shung-hsi.yu
On 2023/12/23 21:02, Philo Lu wrote:
>
>
> On 2023/12/23 19:22, Hou Tao wrote:
>> Hi,
>>
>> On 12/22/2023 8:21 PM, Philo Lu wrote:
>>> BPF_MAP_TYPE_RELAY is implemented based on relay interface, which
>>> creates per-cpu buffer to transfer data. Each buffer is essentially a
>>> list of fix-sized sub-buffers, and is exposed to user space as files in
>>> debugfs. attr->max_entries is used as subbuf size and attr->map_extra is
>>> used as subbuf num. Currently, the default value of subbuf num is 8.
>>>
>>> The data can be accessed by read or mmap via these files. For example,
>>> if there are 2 cpus, files could be `/sys/kernel/debug/mydir/my_rmap0`
>>> and `/sys/kernel/debug/mydir/my_rmap1`.
>>>
>>> Buffer-only mode is used to create the relay map, which just allocates
>>> the buffer without creating user-space files. Then user can setup the
>>> files with map_update_elem, thus allowing user to define the directory
>>> name in debugfs. map_update_elem is implemented in the following patch.
>>>
>>> A new map flag named BPF_F_OVERWRITE is introduced to set overwrite mode
>>> of relay map.
>>
>> Beside adding a new map type, could we consider only use kfuncs to
>> support the creation of rchan and the write of rchan ? I think
>> bpf_cpumask will be a good reference.
>
> This is a good question. TBH, I have thought of implement it with
> helpers (I'm not very familiar with kfuncs, but I think they could be
> similar?), but I was stumped by how to close the channel. We can create
> a relay channel, hold it with a map, but it could be difficult for the
> bpf program to close the channel with relay_close(). And I think it
> could be the difference compared with bpf_cpumask.
I've learned more about kfunc and kptr, and find that kptr can be
automatically released with a given map. Then, it is technically
feasible to use relay interface with kfuncs. Specificially, creating a
relay channel and getting the pointer with kfunc, transferring it as a
kptr into a map, and then it lives with the map.
Though I'm not sure if this is better than map-based implementation, as
mostly it will be used with a map (I haven't thought of a case without a
map yet). And with kfunc, it will be a little more complex to create a
relay channel.
Thanks.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH bpf-next 1/3] bpf: implement relay map basis
2023-12-25 11:36 ` Philo Lu
@ 2023-12-25 13:06 ` Hou Tao
0 siblings, 0 replies; 20+ messages in thread
From: Hou Tao @ 2023-12-25 13:06 UTC (permalink / raw)
To: Philo Lu, bpf
Cc: ast, daniel, john.fastabend, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, rostedt, mhiramat,
mathieu.desnoyers, linux-trace-kernel, xuanzhuo, dust.li, alibuda,
guwen, hengqi, shung-hsi.yu
Hi,
On 12/25/2023 7:36 PM, Philo Lu wrote:
> On 2023/12/23 21:02, Philo Lu wrote:
>>
>>
>> On 2023/12/23 19:22, Hou Tao wrote:
>>> Hi,
>>>
>>> On 12/22/2023 8:21 PM, Philo Lu wrote:
>>>> BPF_MAP_TYPE_RELAY is implemented based on relay interface, which
>>>> creates per-cpu buffer to transfer data. Each buffer is essentially a
>>>> list of fix-sized sub-buffers, and is exposed to user space as
>>>> files in
>>>> debugfs. attr->max_entries is used as subbuf size and
>>>> attr->map_extra is
>>>> used as subbuf num. Currently, the default value of subbuf num is 8.
>>>>
>>>> The data can be accessed by read or mmap via these files. For example,
>>>> if there are 2 cpus, files could be `/sys/kernel/debug/mydir/my_rmap0`
>>>> and `/sys/kernel/debug/mydir/my_rmap1`.
>>>>
>>>> Buffer-only mode is used to create the relay map, which just allocates
>>>> the buffer without creating user-space files. Then user can setup the
>>>> files with map_update_elem, thus allowing user to define the directory
>>>> name in debugfs. map_update_elem is implemented in the following
>>>> patch.
>>>>
>>>> A new map flag named BPF_F_OVERWRITE is introduced to set overwrite
>>>> mode
>>>> of relay map.
>>>
>>> Beside adding a new map type, could we consider only use kfuncs to
>>> support the creation of rchan and the write of rchan ? I think
>>> bpf_cpumask will be a good reference.
>>
>> This is a good question. TBH, I have thought of implement it with
>> helpers (I'm not very familiar with kfuncs, but I think they could be
>> similar?), but I was stumped by how to close the channel. We can
>> create a relay channel, hold it with a map, but it could be difficult
>> for the bpf program to close the channel with relay_close(). And I
>> think it could be the difference compared with bpf_cpumask.
>
> I've learned more about kfunc and kptr, and find that kptr can be
> automatically released with a given map. Then, it is technically
> feasible to use relay interface with kfuncs. Specificially, creating a
> relay channel and getting the pointer with kfunc, transferring it as a
> kptr into a map, and then it lives with the map.
Yes. kptr of bpf_cpumask can be destroyed by the freeing of map or doing
it explicitly through bpf_kptr_xchg() and release kfunc.
>
> Though I'm not sure if this is better than map-based implementation,
> as mostly it will be used with a map (I haven't thought of a case
> without a map yet). And with kfunc, it will be a little more complex
> to create a relay channel.
>
The motivation for requesting to implement BPF_MAP_TYPE_REPLAY through
kfunc is that Alexei had expressed the tendency to freeze the stable map
type [1] and to implement new map type through kfunc. However we can let
the maintainers to decide which way is better and more acceptable.
[1]
https://lore.kernel.org/bpf/CAEf4BzZTYcGNVWL7gSPHCqao_Ehx_3P7YK6r+p_-hrvpE8fEvA@mail.gmail.com/T/
> Thanks.
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2023-12-25 13:06 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-22 12:21 [PATCH bpf-next 0/3] bpf: introduce BPF_MAP_TYPE_RELAY Philo Lu
2023-12-22 12:21 ` [PATCH bpf-next 1/3] bpf: implement relay map basis Philo Lu
2023-12-22 14:45 ` Jiri Olsa
2023-12-23 2:54 ` Philo Lu
2023-12-22 23:07 ` kernel test robot
2023-12-23 0:11 ` kernel test robot
2023-12-23 11:22 ` Hou Tao
2023-12-23 13:02 ` Philo Lu
2023-12-25 11:36 ` Philo Lu
2023-12-25 13:06 ` Hou Tao
2023-12-22 12:21 ` [PATCH bpf-next 2/3] bpf: implement map_update_elem to init relay file Philo Lu
2023-12-22 14:45 ` Jiri Olsa
2023-12-23 2:55 ` Philo Lu
2023-12-23 11:28 ` Hou Tao
2023-12-23 13:19 ` Philo Lu
2023-12-22 12:21 ` [PATCH bpf-next 3/3] bpf: introduce bpf_relay_output helper Philo Lu
2023-12-22 14:46 ` Jiri Olsa
2023-12-23 2:56 ` Philo Lu
2023-12-22 14:45 ` [PATCH bpf-next 0/3] bpf: introduce BPF_MAP_TYPE_RELAY Jiri Olsa
2023-12-23 2:57 ` Philo Lu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox