* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-15 21:16 ` [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task " Amery Hung
@ 2025-05-16 18:57 ` Alexei Starovoitov
2025-05-16 20:41 ` Amery Hung
2025-05-19 20:04 ` Tejun Heo
` (2 subsequent siblings)
3 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2025-05-16 18:57 UTC (permalink / raw)
To: Amery Hung
Cc: bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
Tejun Heo, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Kernel Team
On Thu, May 15, 2025 at 2:16 PM Amery Hung <ameryhung@gmail.com> wrote:
>
> Task local data defines an abstract storage type for storing task-
> specific data (TLD). This patch provides user space and bpf
> implementation as header-only libraries for accessing task local data.
>
> Task local data is a bpf task local storage map with two UPTRs:
> 1) u_tld_metadata, shared by all tasks of the same process, consists of
> the total count of TLDs and an array of metadata of TLDs. A metadata of
> a TLD comprises the size and the name. The name is used to identify a
> specific TLD in bpf 2) data is memory for storing TLDs specific to the
> task.
>
> The following are the basic task local data API:
>
> User space BPF
> Create key tld_create_key() -
> Fetch key - tld_fetch_key()
> Get data tld_get_data() tld_get_data()
>
> A TLD is first created by the user space with tld_create_key(). First,
> it goes through the metadata array to check if the TLD can be added.
> The total TLD size needs to fit into a page (limited by UPTR), and no
> two TLDs can have the same name. It also calculates the offset, the next
> available space in u_tld_data, by summing sizes of TLDs. If the TLD
> can be added, it increases the count using cmpxchg as there may be other
> concurrent tld_create_key(). After a successful cmpxchg, the last
> metadata slot now belongs to the calling thread and will be updated.
> tld_create_key() returns the offset encapsulated as a opaque object key
> to prevent user misuse.
>
> Then user space can pass the key to tld_get_data() to get a pointer
> to the TLD. The pointer will remain valid for the lifetime of the
> thread.
>
> BPF programs also locate TLDs with the keys. This is done by calling
> tld_fetch_key() with the name of the TLD. Similar to tld_create_key(),
> it scans through metadata array, compare the name of TLDs and compute
> the offset. Once found, the offset is also returned as a key, which
> can be passed to the bpf version of tld_get_data() to retrieve a
> pointer to the TLD.
>
> User space task local data library uses a light way approach to ensure
> thread safety (i.e., atomic operation + compiler and memory barriers).
> While a metadata is being updated, other threads may also try to read it.
> To prevent them from seeing incomplete data, metadata::size is used to
> signal the completion of the update, where 0 means the update is still
> ongoing. Threads will wait until seeing a non-zero size to read a
> metadata. Acquire/release order is required for metadata::size to
> prevent hardware reordering. For example, moving store to metadata::name
> after store to metadata::size or moving load from metadata::name before
> load from metadata::size.
>
> Signed-off-by: Amery Hung <ameryhung@gmail.com>
> ---
> .../bpf/prog_tests/task_local_data.h | 263 ++++++++++++++++++
> .../selftests/bpf/progs/task_local_data.bpf.h | 220 +++++++++++++++
> 2 files changed, 483 insertions(+)
> create mode 100644 tools/testing/selftests/bpf/prog_tests/task_local_data.h
> create mode 100644 tools/testing/selftests/bpf/progs/task_local_data.bpf.h
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_data.h b/tools/testing/selftests/bpf/prog_tests/task_local_data.h
> new file mode 100644
> index 000000000000..ec43ea59267c
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/task_local_data.h
> @@ -0,0 +1,263 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __TASK_LOCAL_DATA_H
> +#define __TASK_LOCAL_DATA_H
> +
> +#include <fcntl.h>
> +#include <errno.h>
> +#include <sched.h>
> +#include <stddef.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <sys/syscall.h>
> +#include <sys/types.h>
> +
> +#include <bpf/bpf.h>
> +
> +#ifndef PIDFD_THREAD
> +#define PIDFD_THREAD O_EXCL
> +#endif
> +
> +#define PAGE_SIZE 4096
> +
> +#ifndef __round_mask
> +#define __round_mask(x, y) ((__typeof__(x))((y)-1))
> +#endif
> +#ifndef round_up
> +#define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1)
> +#endif
> +
> +#ifndef READ_ONCE
> +#define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
> +#endif
> +
> +#ifndef WRITE_ONCE
> +#define WRITE_ONCE(x, val) ((*(volatile typeof(x) *)&(x)) = (val))
> +#endif
> +
> +#define TLD_DATA_SIZE PAGE_SIZE
wrap it with #ifndef ?
So that application can #define something smaller.
> +#define TLD_DATA_CNT 63
> +#define TLD_NAME_LEN 62
> +
> +typedef struct {
> + __s16 off;
> +} tld_key_t;
> +
> +struct tld_metadata {
> + char name[TLD_NAME_LEN];
> + __u16 size;
> +};
> +
> +struct u_tld_metadata {
> + __u8 cnt;
> + __u8 padding[63];
> + struct tld_metadata metadata[TLD_DATA_CNT];
> +};
> +
> +struct u_tld_data {
> + char data[TLD_DATA_SIZE];
> +};
> +
> +struct tld_map_value {
> + struct u_tld_data *data;
> + struct u_tld_metadata *metadata;
> +};
> +
> +struct u_tld_metadata *tld_metadata_p __attribute__((weak));
> +__thread struct u_tld_data *tld_data_p __attribute__((weak));
> +
> +static int __tld_init_metadata(int map_fd)
> +{
> + struct u_tld_metadata *new_metadata;
> + struct tld_map_value map_val;
> + int task_fd = 0, err;
> +
> + task_fd = syscall(SYS_pidfd_open, getpid(), 0);
this task_fd and task_fd in __tld_init_data() are two different things.
Please name them differently.
> + if (task_fd < 0) {
> + err = -errno;
> + goto out;
> + }
> +
> + new_metadata = aligned_alloc(PAGE_SIZE, PAGE_SIZE);
> + if (!new_metadata) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + memset(new_metadata, 0, PAGE_SIZE);
> +
> + map_val.data = NULL;
> + map_val.metadata = new_metadata;
> +
> + /*
> + * bpf_map_update_elem(.., pid_fd,..., BPF_NOEXIST) guarantees that
> + * only one tld_create_key() can update tld_metadata_p.
> + */
> + err = bpf_map_update_elem(map_fd, &task_fd, &map_val, BPF_NOEXIST);
> + if (err) {
> + free(new_metadata);
> + if (err == -EEXIST || err == -EAGAIN)
> + err = 0;
> + goto out;
> + }
> +
> + WRITE_ONCE(tld_metadata_p, new_metadata);
> +out:
> + if (task_fd > 0)
>=
> + close(task_fd);
> + return err;
> +}
> +
> +static int __tld_init_data(int map_fd)
> +{
> + struct u_tld_data *new_data = NULL;
> + struct tld_map_value map_val;
> + int err, task_fd = 0;
> +
> + task_fd = syscall(SYS_pidfd_open, gettid(), PIDFD_THREAD);
> + if (task_fd < 0) {
> + err = -errno;
> + goto out;
> + }
> +
> + new_data = aligned_alloc(PAGE_SIZE, TLD_DATA_SIZE);
probably roundup_power_of_two(TLD_DATA_SIZE) ?
Don't know about libc, but musl implementation of aligned_alloc()
is naive. It allocs align+size.
So it's a memory waste.
> + if (!new_data) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + map_val.data = new_data;
> + map_val.metadata = READ_ONCE(tld_metadata_p);
> +
> + err = bpf_map_update_elem(map_fd, &task_fd, &map_val, 0);
> + if (err) {
> + free(new_data);
> + goto out;
> + }
> +
> + tld_data_p = new_data;
> +out:
> + if (task_fd > 0)
>=
> + close(task_fd);
> + return err;
> +}
> +
> +/**
> + * tld_create_key() - Create a key associated with a TLD.
> + *
> + * @map_fd: A file descriptor of the underlying task local storage map,
> + * tld_data_map
> + * @name: The name the TLD will be associated with
> + * @size: Size of the TLD
> + *
> + * Returns an opaque object key. Use tld_key_is_err() or tld_key_err_or_zero() to
> + * check if the key creation succeed. Pass to tld_get_data() to get a pointer to
> + * the TLD. bpf programs can also fetch the same key by name.
> + */
> +__attribute__((unused))
> +static tld_key_t tld_create_key(int map_fd, const char *name, size_t size)
> +{
> + int err, i, cnt, sz, off = 0;
> +
> + if (!READ_ONCE(tld_metadata_p)) {
> + err = __tld_init_metadata(map_fd);
> + if (err)
> + return (tld_key_t) {.off = err};
> + }
> +
> + if (!tld_data_p) {
> + err = __tld_init_data(map_fd);
> + if (err)
> + return (tld_key_t) {.off = err};
> + }
> +
> + size = round_up(size, 8);
why roundup ? and to 8 in particular?
If user space wants byte size keys, why not let it?
> +
> + for (i = 0; i < TLD_DATA_CNT; i++) {
> +retry:
> + cnt = __atomic_load_n(&tld_metadata_p->cnt, __ATOMIC_RELAXED);
Instead of explicit __atomic builtins use _Atomic __u8 cnt;
?
> + if (i < cnt) {
> + /*
> + * Pending tld_create_key() uses size to signal if the metadata has
> + * been fully updated.
> + */
> + while (!(sz = __atomic_load_n(&tld_metadata_p->metadata[i].size,
> + __ATOMIC_ACQUIRE)))
> + sched_yield();
> +
> + if (!strncmp(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN))
> + return (tld_key_t) {.off = -EEXIST};
> +
> + off += sz;
> + continue;
> + }
> +
> + if (off + size > TLD_DATA_SIZE)
> + return (tld_key_t) {.off = -E2BIG};
> +
> + /*
> + * Only one tld_create_key() can increase the current cnt by one and
> + * takes the latest available slot. Other threads will check again if a new
> + * TLD can still be added, and then compete for the new slot after the
> + * succeeding thread update the size.
> + */
> + if (!__atomic_compare_exchange_n(&tld_metadata_p->cnt, &cnt, cnt + 1, true,
> + __ATOMIC_RELAXED, __ATOMIC_RELAXED))
weak and relaxed/relaxed ?
That's unusual.
I don't know what it is supposed to do.
I think weak=false and __ATOMIC_ACQUIRE, __ATOMIC_RELAXED
would look as expected.
> + goto retry;
> +
> + strncpy(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN);
> + __atomic_store_n(&tld_metadata_p->metadata[i].size, size, __ATOMIC_RELEASE);
> + return (tld_key_t) {.off = off};
> + }
> +
> + return (tld_key_t) {.off = -ENOSPC};
> +}
> +
> +__attribute__((unused))
> +static inline bool tld_key_is_err(tld_key_t key)
> +{
> + return key.off < 0;
> +}
> +
> +__attribute__((unused))
> +static inline int tld_key_err_or_zero(tld_key_t key)
> +{
> + return tld_key_is_err(key) ? key.off : 0;
> +}
> +
> +/**
> + * tld_get_data() - Gets a pointer to the TLD associated with the key.
> + *
> + * @map_fd: A file descriptor of the underlying task local storage map,
> + * tld_data_map
> + * @key: A key object returned by tld_create_key().
> + *
> + * Returns a pointer to the TLD if the key is valid; NULL if no key has been
> + * added, not enough memory for TLD for this thread, or the key is invalid.
> + *
> + * Threads that call tld_get_data() must call tld_free() on exit to prevent
> + * memory leak.
> + */
> +__attribute__((unused))
> +static void *tld_get_data(int map_fd, tld_key_t key)
> +{
> + if (!READ_ONCE(tld_metadata_p))
> + return NULL;
> +
> + if (!tld_data_p && __tld_init_data(map_fd))
> + return NULL;
Why call it again?
tld_create_key() should have done it, no?
> +
> + return tld_data_p->data + key.off;
> +}
> +
> +/**
> + * tld_free() - Frees task local data memory of the calling thread
> + */
> +__attribute__((unused))
> +static void tld_free(void)
> +{
> + if (tld_data_p)
> + free(tld_data_p);
> +}
Since this .h allocates tld_metadata_p, it probably needs
a helper to free it too.
> +
> +#endif /* __TASK_LOCAL_DATA_H */
> diff --git a/tools/testing/selftests/bpf/progs/task_local_data.bpf.h b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> new file mode 100644
> index 000000000000..5f48e408a5e5
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> @@ -0,0 +1,220 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __TASK_LOCAL_DATA_BPF_H
> +#define __TASK_LOCAL_DATA_BPF_H
> +
> +/*
> + * Task local data is a library that facilitates sharing per-task data
> + * between user space and bpf programs.
> + *
> + *
> + * PREREQUISITE
> + *
> + * A TLD, an entry of data in task local data, first needs to be created by the
> + * user space. This is done by calling user space API, tld_create_key(), with
> + * the name of the TLD and the size.
> + *
> + * tld_key_t prio, in_cs;
> + *
> + * prio = tld_create_key("priority", sizeof(int));
> + * in_cs = tld_create_key("in_critical_section", sizeof(bool));
> + *
> + * A key associated with the TLD, which has an opaque type tld_key_t, will be
> + * returned. It can be used to get a pointer to the TLD in the user space by
> + * calling tld_get_data().
> + *
> + *
> + * USAGE
> + *
> + * Similar to user space, bpf programs locate a TLD using the same key.
> + * tld_fetch_key() allows bpf programs to retrieve a key created in the user
> + * space by name, which is specified in the second argument of tld_create_key().
> + * tld_fetch_key() additionally will cache the key in a task local storage map,
> + * tld_key_map, to avoid performing costly string comparisons every time when
> + * accessing a TLD. This requires the developer to define the map value type of
> + * tld_key_map, struct tld_keys. It only needs to contain keys used by bpf
> + * programs in the compilation unit.
> + *
> + * struct tld_keys {
> + * tld_key_t prio;
> + * tld_key_t in_cs;
> + * };
> + *
> + * Then, for every new task, a bpf program will fetch and cache keys once and
> + * for all. This should be done ideally in a non-critical path (e.g., in
> + * sched_ext_ops::init_task).
> + *
> + * struct tld_object tld_obj;
> + *
> + * err = tld_object_init(task, &tld);
> + * if (err)
> + * return 0;
> + *
> + * tld_fetch_key(&tld_obj, "priority", prio);
> + * tld_fetch_key(&tld_obj, "in_critical_section", in_cs);
> + *
> + * Note that, the first argument of tld_fetch_key() is a pointer to tld_object.
> + * It should be declared as a stack variable and initialized via tld_object_init().
> + *
> + * Finally, just like user space programs, bpf programs can get a pointer to a
> + * TLD by calling tld_get_data(), with cached keys.
> + *
> + * int *p;
> + *
> + * p = tld_get_data(&tld_obj, prio, sizeof(int));
> + * if (p)
> + * // do something depending on *p
> + */
> +#include <errno.h>
> +#include <bpf/bpf_helpers.h>
> +
> +#define TLD_DATA_SIZE __PAGE_SIZE
> +#define TLD_DATA_CNT 63
> +#define TLD_NAME_LEN 62
> +
> +typedef struct {
> + __s16 off;
> +} tld_key_t;
> +
> +struct u_tld_data *dummy_data;
> +struct u_tld_metadata *dummy_metadata;
> +
> +struct tld_metadata {
> + char name[TLD_NAME_LEN];
> + __u16 size;
> +};
> +
> +struct u_tld_metadata {
> + __u8 cnt;
> + __u8 padding[63];
> + struct tld_metadata metadata[TLD_DATA_CNT];
> +};
> +
> +struct u_tld_data {
> + char data[TLD_DATA_SIZE];
> +};
> +
> +struct tld_map_value {
> + struct u_tld_data __uptr *data;
> + struct u_tld_metadata __uptr *metadata;
> +};
> +
> +struct tld_object {
> + struct tld_map_value *data_map;
> + struct tld_keys *key_map;
> +};
> +
> +/*
> + * Map value of tld_key_map for caching keys. Must be defined by the developer.
> + * Members should be tld_key_t and passed to the 3rd argument of tld_fetch_key().
> + */
> +struct tld_keys;
> +
> +struct {
> + __uint(type, BPF_MAP_TYPE_TASK_STORAGE);
> + __uint(map_flags, BPF_F_NO_PREALLOC);
> + __type(key, int);
> + __type(value, struct tld_map_value);
> +} tld_data_map SEC(".maps");
> +
> +struct {
> + __uint(type, BPF_MAP_TYPE_TASK_STORAGE);
> + __uint(map_flags, BPF_F_NO_PREALLOC);
> + __type(key, int);
> + __type(value, struct tld_keys);
> +} tld_key_map SEC(".maps");
> +
> +/**
> + * tld_object_init() - Initializes a tld_object.
> + *
> + * @task: The task_struct of the target task
> + * @tld_obj: A pointer to a tld_object to be initialized
> + *
> + * Returns 0 on success; -ENODATA if the task has no TLD; -ENOMEM if the creation
> + * of tld_key_map fails
> + */
> +__attribute__((unused))
> +static int tld_object_init(struct task_struct *task, struct tld_object *tld_obj)
> +{
> + tld_obj->data_map = bpf_task_storage_get(&tld_data_map, task, 0, 0);
> + if (!tld_obj->data_map)
> + return -ENODATA;
> +
> + tld_obj->key_map = bpf_task_storage_get(&tld_key_map, task, 0,
> + BPF_LOCAL_STORAGE_GET_F_CREATE);
> + if (!tld_obj->key_map)
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +/**
> + * tld_fetch_key() - Fetches the key to a TLD by name. The key has to be created
> + * by user space first with tld_create_key().
> + *
> + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init()
> + * @name: The name of the key associated with a TLD
> + * @key: The key in struct tld_keys to be saved to
> + *
> + * Returns a positive integer on success; 0 otherwise
> + */
> +#define tld_fetch_key(tld_obj, name, key) \
> + ({ \
> + (tld_obj)->key_map->key.off = __tld_fetch_key(tld_obj, name); \
> + })
> +
> +__attribute__((unused))
> +static u16 __tld_fetch_key(struct tld_object *tld_obj, const char *name)
> +{
> + int i, meta_off, cnt;
> + void *metadata, *nm, *sz;
> + u16 off = 0;
> +
> + if (!tld_obj->data_map || !tld_obj->data_map->metadata)
> + return 0;
> +
> + cnt = tld_obj->data_map->metadata->cnt;
> + metadata = tld_obj->data_map->metadata->metadata;
> +
> + bpf_for(i, 0, cnt) {
> + meta_off = i * sizeof(struct tld_metadata);
> + if (meta_off > TLD_DATA_SIZE - offsetof(struct u_tld_metadata, metadata)
> + - sizeof(struct tld_metadata))
> + break;
> +
> + nm = metadata + meta_off + offsetof(struct tld_metadata, name);
> + sz = metadata + meta_off + offsetof(struct tld_metadata, size);
> +
> + /*
> + * Reserve 0 for uninitialized keys. Increase the offset of a valid key
> + * by one and subtract it later in tld_get_data().
> + */
> + if (!bpf_strncmp(nm, TLD_NAME_LEN, name))
> + return off + 1;
I think all this +1, -1 dance is doing is helping to catch an
error when tld_get_data() is called without tld_fetch_key().
I feel this is too defensive.
Let tld_fetch_key() return proper negative error code.
> +
> + off += *(u16 *)sz;
> + }
> +
> + return 0;
> +}
> +
> +/**
> + * tld_get_data() - Retrieves a pointer to the TLD associated with the key.
> + *
> + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init()
> + * @key: The key of a TLD saved in tld_maps
> + * @size: The size of the TLD. Must be a known constant value
> + *
> + * Returns a pointer to the TLD data associated with the key; NULL if the key
> + * is not valid or the size is too big
> + */
> +#define tld_get_data(tld_obj, key, size) \
> + __tld_get_data(tld_obj, (tld_obj)->key_map->key.off - 1, size)
> +
> +__attribute__((unused))
> +__always_inline void *__tld_get_data(struct tld_object *tld_obj, u32 off, u32 size)
> +{
> + return (tld_obj->data_map->data && off >= 0 && off < TLD_DATA_SIZE - size) ?
> + (void *)tld_obj->data_map->data + off : NULL;
If we require users to call tld_fetch_key() first and check for errors
tld_get_data() can be faster:
#define tld_get_data(tld_obj, key)
(void *)tld_obj->data_map->data + tld_obj->key_map->key.off
I wouldn't bother with extra checks, especially for size.
Bigger question.. can we cache the whole pointer for each key ?
and then
#define tld_get_data(tld_obj, key) ld_obj->key_map->key
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-16 18:57 ` Alexei Starovoitov
@ 2025-05-16 20:41 ` Amery Hung
2025-05-16 22:22 ` Alexei Starovoitov
0 siblings, 1 reply; 17+ messages in thread
From: Amery Hung @ 2025-05-16 20:41 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
Tejun Heo, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Kernel Team
On Fri, May 16, 2025 at 2:57 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, May 15, 2025 at 2:16 PM Amery Hung <ameryhung@gmail.com> wrote:
> >
> > Task local data defines an abstract storage type for storing task-
> > specific data (TLD). This patch provides user space and bpf
> > implementation as header-only libraries for accessing task local data.
> >
> > Task local data is a bpf task local storage map with two UPTRs:
> > 1) u_tld_metadata, shared by all tasks of the same process, consists of
> > the total count of TLDs and an array of metadata of TLDs. A metadata of
> > a TLD comprises the size and the name. The name is used to identify a
> > specific TLD in bpf 2) data is memory for storing TLDs specific to the
> > task.
> >
> > The following are the basic task local data API:
> >
> > User space BPF
> > Create key tld_create_key() -
> > Fetch key - tld_fetch_key()
> > Get data tld_get_data() tld_get_data()
> >
> > A TLD is first created by the user space with tld_create_key(). First,
> > it goes through the metadata array to check if the TLD can be added.
> > The total TLD size needs to fit into a page (limited by UPTR), and no
> > two TLDs can have the same name. It also calculates the offset, the next
> > available space in u_tld_data, by summing sizes of TLDs. If the TLD
> > can be added, it increases the count using cmpxchg as there may be other
> > concurrent tld_create_key(). After a successful cmpxchg, the last
> > metadata slot now belongs to the calling thread and will be updated.
> > tld_create_key() returns the offset encapsulated as a opaque object key
> > to prevent user misuse.
> >
> > Then user space can pass the key to tld_get_data() to get a pointer
> > to the TLD. The pointer will remain valid for the lifetime of the
> > thread.
> >
> > BPF programs also locate TLDs with the keys. This is done by calling
> > tld_fetch_key() with the name of the TLD. Similar to tld_create_key(),
> > it scans through metadata array, compare the name of TLDs and compute
> > the offset. Once found, the offset is also returned as a key, which
> > can be passed to the bpf version of tld_get_data() to retrieve a
> > pointer to the TLD.
> >
> > User space task local data library uses a light way approach to ensure
> > thread safety (i.e., atomic operation + compiler and memory barriers).
> > While a metadata is being updated, other threads may also try to read it.
> > To prevent them from seeing incomplete data, metadata::size is used to
> > signal the completion of the update, where 0 means the update is still
> > ongoing. Threads will wait until seeing a non-zero size to read a
> > metadata. Acquire/release order is required for metadata::size to
> > prevent hardware reordering. For example, moving store to metadata::name
> > after store to metadata::size or moving load from metadata::name before
> > load from metadata::size.
> >
> > Signed-off-by: Amery Hung <ameryhung@gmail.com>
> > ---
> > .../bpf/prog_tests/task_local_data.h | 263 ++++++++++++++++++
> > .../selftests/bpf/progs/task_local_data.bpf.h | 220 +++++++++++++++
> > 2 files changed, 483 insertions(+)
> > create mode 100644 tools/testing/selftests/bpf/prog_tests/task_local_data.h
> > create mode 100644 tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_data.h b/tools/testing/selftests/bpf/prog_tests/task_local_data.h
> > new file mode 100644
> > index 000000000000..ec43ea59267c
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/task_local_data.h
> > @@ -0,0 +1,263 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef __TASK_LOCAL_DATA_H
> > +#define __TASK_LOCAL_DATA_H
> > +
> > +#include <fcntl.h>
> > +#include <errno.h>
> > +#include <sched.h>
> > +#include <stddef.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <unistd.h>
> > +#include <sys/syscall.h>
> > +#include <sys/types.h>
> > +
> > +#include <bpf/bpf.h>
> > +
> > +#ifndef PIDFD_THREAD
> > +#define PIDFD_THREAD O_EXCL
> > +#endif
> > +
> > +#define PAGE_SIZE 4096
> > +
> > +#ifndef __round_mask
> > +#define __round_mask(x, y) ((__typeof__(x))((y)-1))
> > +#endif
> > +#ifndef round_up
> > +#define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1)
> > +#endif
> > +
> > +#ifndef READ_ONCE
> > +#define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
> > +#endif
> > +
> > +#ifndef WRITE_ONCE
> > +#define WRITE_ONCE(x, val) ((*(volatile typeof(x) *)&(x)) = (val))
> > +#endif
> > +
> > +#define TLD_DATA_SIZE PAGE_SIZE
>
> wrap it with #ifndef ?
>
> So that application can #define something smaller.
>
> > +#define TLD_DATA_CNT 63
> > +#define TLD_NAME_LEN 62
> > +
> > +typedef struct {
> > + __s16 off;
> > +} tld_key_t;
> > +
> > +struct tld_metadata {
> > + char name[TLD_NAME_LEN];
> > + __u16 size;
> > +};
> > +
> > +struct u_tld_metadata {
> > + __u8 cnt;
> > + __u8 padding[63];
> > + struct tld_metadata metadata[TLD_DATA_CNT];
> > +};
> > +
> > +struct u_tld_data {
> > + char data[TLD_DATA_SIZE];
> > +};
> > +
> > +struct tld_map_value {
> > + struct u_tld_data *data;
> > + struct u_tld_metadata *metadata;
> > +};
> > +
> > +struct u_tld_metadata *tld_metadata_p __attribute__((weak));
> > +__thread struct u_tld_data *tld_data_p __attribute__((weak));
> > +
> > +static int __tld_init_metadata(int map_fd)
> > +{
> > + struct u_tld_metadata *new_metadata;
> > + struct tld_map_value map_val;
> > + int task_fd = 0, err;
> > +
> > + task_fd = syscall(SYS_pidfd_open, getpid(), 0);
>
> this task_fd and task_fd in __tld_init_data() are two different things.
> Please name them differently.
I will rename them to tid_fd and pid_fd.
>
> > + if (task_fd < 0) {
> > + err = -errno;
> > + goto out;
> > + }
> > +
> > + new_metadata = aligned_alloc(PAGE_SIZE, PAGE_SIZE);
> > + if (!new_metadata) {
> > + err = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + memset(new_metadata, 0, PAGE_SIZE);
> > +
> > + map_val.data = NULL;
> > + map_val.metadata = new_metadata;
> > +
> > + /*
> > + * bpf_map_update_elem(.., pid_fd,..., BPF_NOEXIST) guarantees that
> > + * only one tld_create_key() can update tld_metadata_p.
> > + */
> > + err = bpf_map_update_elem(map_fd, &task_fd, &map_val, BPF_NOEXIST);
> > + if (err) {
> > + free(new_metadata);
> > + if (err == -EEXIST || err == -EAGAIN)
> > + err = 0;
> > + goto out;
> > + }
> > +
> > + WRITE_ONCE(tld_metadata_p, new_metadata);
> > +out:
> > + if (task_fd > 0)
>
> >=
>
> > + close(task_fd);
> > + return err;
> > +}
> > +
> > +static int __tld_init_data(int map_fd)
> > +{
> > + struct u_tld_data *new_data = NULL;
> > + struct tld_map_value map_val;
> > + int err, task_fd = 0;
> > +
> > + task_fd = syscall(SYS_pidfd_open, gettid(), PIDFD_THREAD);
> > + if (task_fd < 0) {
> > + err = -errno;
> > + goto out;
> > + }
> > +
> > + new_data = aligned_alloc(PAGE_SIZE, TLD_DATA_SIZE);
>
> probably roundup_power_of_two(TLD_DATA_SIZE) ?
> Don't know about libc, but musl implementation of aligned_alloc()
> is naive. It allocs align+size.
> So it's a memory waste.
>
I will do roundup_power_of_two for the size in the next respin, and
also check libc implementation.
> > + if (!new_data) {
> > + err = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + map_val.data = new_data;
> > + map_val.metadata = READ_ONCE(tld_metadata_p);
> > +
> > + err = bpf_map_update_elem(map_fd, &task_fd, &map_val, 0);
> > + if (err) {
> > + free(new_data);
> > + goto out;
> > + }
> > +
> > + tld_data_p = new_data;
> > +out:
> > + if (task_fd > 0)
>
> >=
>
> > + close(task_fd);
> > + return err;
> > +}
> > +
> > +/**
> > + * tld_create_key() - Create a key associated with a TLD.
> > + *
> > + * @map_fd: A file descriptor of the underlying task local storage map,
> > + * tld_data_map
> > + * @name: The name the TLD will be associated with
> > + * @size: Size of the TLD
> > + *
> > + * Returns an opaque object key. Use tld_key_is_err() or tld_key_err_or_zero() to
> > + * check if the key creation succeed. Pass to tld_get_data() to get a pointer to
> > + * the TLD. bpf programs can also fetch the same key by name.
> > + */
> > +__attribute__((unused))
> > +static tld_key_t tld_create_key(int map_fd, const char *name, size_t size)
> > +{
> > + int err, i, cnt, sz, off = 0;
> > +
> > + if (!READ_ONCE(tld_metadata_p)) {
> > + err = __tld_init_metadata(map_fd);
> > + if (err)
> > + return (tld_key_t) {.off = err};
> > + }
> > +
> > + if (!tld_data_p) {
> > + err = __tld_init_data(map_fd);
> > + if (err)
> > + return (tld_key_t) {.off = err};
> > + }
> > +
> > + size = round_up(size, 8);
>
> why roundup ? and to 8 in particular?
> If user space wants byte size keys, why not let it?
I will remove it. This was to prevent breaking using TLD in atomic
operations, but it should be very unlikely as they are thread
specific.
>
> > +
> > + for (i = 0; i < TLD_DATA_CNT; i++) {
> > +retry:
> > + cnt = __atomic_load_n(&tld_metadata_p->cnt, __ATOMIC_RELAXED);
>
> Instead of explicit __atomic builtins use _Atomic __u8 cnt;
> ?
>
Got it. I will use builtins in stdatomic.h.
> > + if (i < cnt) {
> > + /*
> > + * Pending tld_create_key() uses size to signal if the metadata has
> > + * been fully updated.
> > + */
> > + while (!(sz = __atomic_load_n(&tld_metadata_p->metadata[i].size,
> > + __ATOMIC_ACQUIRE)))
> > + sched_yield();
> > +
> > + if (!strncmp(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN))
> > + return (tld_key_t) {.off = -EEXIST};
> > +
> > + off += sz;
> > + continue;
> > + }
> > +
> > + if (off + size > TLD_DATA_SIZE)
> > + return (tld_key_t) {.off = -E2BIG};
> > +
> > + /*
> > + * Only one tld_create_key() can increase the current cnt by one and
> > + * takes the latest available slot. Other threads will check again if a new
> > + * TLD can still be added, and then compete for the new slot after the
> > + * succeeding thread update the size.
> > + */
> > + if (!__atomic_compare_exchange_n(&tld_metadata_p->cnt, &cnt, cnt + 1, true,
> > + __ATOMIC_RELAXED, __ATOMIC_RELAXED))
>
> weak and relaxed/relaxed ?
I can't see reordering issue with cnt so I choose to use relax. I can
change to strong acq/rel just to be safe.
> That's unusual.
> I don't know what it is supposed to do.
> I think weak=false and __ATOMIC_ACQUIRE, __ATOMIC_RELAXED
> would look as expected.
>
Do you mean weak=false and __ATOMIC_RELAXED, __ATOMIC_ACQUIRE?
> > + goto retry;
> > +
> > + strncpy(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN);
> > + __atomic_store_n(&tld_metadata_p->metadata[i].size, size, __ATOMIC_RELEASE);
> > + return (tld_key_t) {.off = off};
> > + }
> > +
> > + return (tld_key_t) {.off = -ENOSPC};
> > +}
> > +
> > +__attribute__((unused))
> > +static inline bool tld_key_is_err(tld_key_t key)
> > +{
> > + return key.off < 0;
> > +}
> > +
> > +__attribute__((unused))
> > +static inline int tld_key_err_or_zero(tld_key_t key)
> > +{
> > + return tld_key_is_err(key) ? key.off : 0;
> > +}
> > +
> > +/**
> > + * tld_get_data() - Gets a pointer to the TLD associated with the key.
> > + *
> > + * @map_fd: A file descriptor of the underlying task local storage map,
> > + * tld_data_map
> > + * @key: A key object returned by tld_create_key().
> > + *
> > + * Returns a pointer to the TLD if the key is valid; NULL if no key has been
> > + * added, not enough memory for TLD for this thread, or the key is invalid.
> > + *
> > + * Threads that call tld_get_data() must call tld_free() on exit to prevent
> > + * memory leak.
> > + */
> > +__attribute__((unused))
> > +static void *tld_get_data(int map_fd, tld_key_t key)
> > +{
> > + if (!READ_ONCE(tld_metadata_p))
> > + return NULL;
> > +
> > + if (!tld_data_p && __tld_init_data(map_fd))
> > + return NULL;
>
> Why call it again?
> tld_create_key() should have done it, no?
>
A TLD is created by calling tld_create_key() once. Then, threads may
call tld_get_data() to get their thread-specific TLD. So it is
possible for a thread to call tld_get_data() with tld_data_p==NULL.
> > +
> > + return tld_data_p->data + key.off;
> > +}
> > +
> > +/**
> > + * tld_free() - Frees task local data memory of the calling thread
> > + */
> > +__attribute__((unused))
> > +static void tld_free(void)
> > +{
> > + if (tld_data_p)
> > + free(tld_data_p);
> > +}
>
> Since this .h allocates tld_metadata_p, it probably needs
> a helper to free it too.
>
> > +
> > +#endif /* __TASK_LOCAL_DATA_H */
> > diff --git a/tools/testing/selftests/bpf/progs/task_local_data.bpf.h b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> > new file mode 100644
> > index 000000000000..5f48e408a5e5
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> > @@ -0,0 +1,220 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef __TASK_LOCAL_DATA_BPF_H
> > +#define __TASK_LOCAL_DATA_BPF_H
> > +
> > +/*
> > + * Task local data is a library that facilitates sharing per-task data
> > + * between user space and bpf programs.
> > + *
> > + *
> > + * PREREQUISITE
> > + *
> > + * A TLD, an entry of data in task local data, first needs to be created by the
> > + * user space. This is done by calling user space API, tld_create_key(), with
> > + * the name of the TLD and the size.
> > + *
> > + * tld_key_t prio, in_cs;
> > + *
> > + * prio = tld_create_key("priority", sizeof(int));
> > + * in_cs = tld_create_key("in_critical_section", sizeof(bool));
> > + *
> > + * A key associated with the TLD, which has an opaque type tld_key_t, will be
> > + * returned. It can be used to get a pointer to the TLD in the user space by
> > + * calling tld_get_data().
> > + *
> > + *
> > + * USAGE
> > + *
> > + * Similar to user space, bpf programs locate a TLD using the same key.
> > + * tld_fetch_key() allows bpf programs to retrieve a key created in the user
> > + * space by name, which is specified in the second argument of tld_create_key().
> > + * tld_fetch_key() additionally will cache the key in a task local storage map,
> > + * tld_key_map, to avoid performing costly string comparisons every time when
> > + * accessing a TLD. This requires the developer to define the map value type of
> > + * tld_key_map, struct tld_keys. It only needs to contain keys used by bpf
> > + * programs in the compilation unit.
> > + *
> > + * struct tld_keys {
> > + * tld_key_t prio;
> > + * tld_key_t in_cs;
> > + * };
> > + *
> > + * Then, for every new task, a bpf program will fetch and cache keys once and
> > + * for all. This should be done ideally in a non-critical path (e.g., in
> > + * sched_ext_ops::init_task).
> > + *
> > + * struct tld_object tld_obj;
> > + *
> > + * err = tld_object_init(task, &tld);
> > + * if (err)
> > + * return 0;
> > + *
> > + * tld_fetch_key(&tld_obj, "priority", prio);
> > + * tld_fetch_key(&tld_obj, "in_critical_section", in_cs);
> > + *
> > + * Note that, the first argument of tld_fetch_key() is a pointer to tld_object.
> > + * It should be declared as a stack variable and initialized via tld_object_init().
> > + *
> > + * Finally, just like user space programs, bpf programs can get a pointer to a
> > + * TLD by calling tld_get_data(), with cached keys.
> > + *
> > + * int *p;
> > + *
> > + * p = tld_get_data(&tld_obj, prio, sizeof(int));
> > + * if (p)
> > + * // do something depending on *p
> > + */
> > +#include <errno.h>
> > +#include <bpf/bpf_helpers.h>
> > +
> > +#define TLD_DATA_SIZE __PAGE_SIZE
> > +#define TLD_DATA_CNT 63
> > +#define TLD_NAME_LEN 62
> > +
> > +typedef struct {
> > + __s16 off;
> > +} tld_key_t;
> > +
> > +struct u_tld_data *dummy_data;
> > +struct u_tld_metadata *dummy_metadata;
> > +
> > +struct tld_metadata {
> > + char name[TLD_NAME_LEN];
> > + __u16 size;
> > +};
> > +
> > +struct u_tld_metadata {
> > + __u8 cnt;
> > + __u8 padding[63];
> > + struct tld_metadata metadata[TLD_DATA_CNT];
> > +};
> > +
> > +struct u_tld_data {
> > + char data[TLD_DATA_SIZE];
> > +};
> > +
> > +struct tld_map_value {
> > + struct u_tld_data __uptr *data;
> > + struct u_tld_metadata __uptr *metadata;
> > +};
> > +
> > +struct tld_object {
> > + struct tld_map_value *data_map;
> > + struct tld_keys *key_map;
> > +};
> > +
> > +/*
> > + * Map value of tld_key_map for caching keys. Must be defined by the developer.
> > + * Members should be tld_key_t and passed to the 3rd argument of tld_fetch_key().
> > + */
> > +struct tld_keys;
> > +
> > +struct {
> > + __uint(type, BPF_MAP_TYPE_TASK_STORAGE);
> > + __uint(map_flags, BPF_F_NO_PREALLOC);
> > + __type(key, int);
> > + __type(value, struct tld_map_value);
> > +} tld_data_map SEC(".maps");
> > +
> > +struct {
> > + __uint(type, BPF_MAP_TYPE_TASK_STORAGE);
> > + __uint(map_flags, BPF_F_NO_PREALLOC);
> > + __type(key, int);
> > + __type(value, struct tld_keys);
> > +} tld_key_map SEC(".maps");
> > +
> > +/**
> > + * tld_object_init() - Initializes a tld_object.
> > + *
> > + * @task: The task_struct of the target task
> > + * @tld_obj: A pointer to a tld_object to be initialized
> > + *
> > + * Returns 0 on success; -ENODATA if the task has no TLD; -ENOMEM if the creation
> > + * of tld_key_map fails
> > + */
> > +__attribute__((unused))
> > +static int tld_object_init(struct task_struct *task, struct tld_object *tld_obj)
> > +{
> > + tld_obj->data_map = bpf_task_storage_get(&tld_data_map, task, 0, 0);
> > + if (!tld_obj->data_map)
> > + return -ENODATA;
> > +
> > + tld_obj->key_map = bpf_task_storage_get(&tld_key_map, task, 0,
> > + BPF_LOCAL_STORAGE_GET_F_CREATE);
> > + if (!tld_obj->key_map)
> > + return -ENOMEM;
> > +
> > + return 0;
> > +}
> > +
> > +/**
> > + * tld_fetch_key() - Fetches the key to a TLD by name. The key has to be created
> > + * by user space first with tld_create_key().
> > + *
> > + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init()
> > + * @name: The name of the key associated with a TLD
> > + * @key: The key in struct tld_keys to be saved to
> > + *
> > + * Returns a positive integer on success; 0 otherwise
> > + */
> > +#define tld_fetch_key(tld_obj, name, key) \
> > + ({ \
> > + (tld_obj)->key_map->key.off = __tld_fetch_key(tld_obj, name); \
> > + })
> > +
> > +__attribute__((unused))
> > +static u16 __tld_fetch_key(struct tld_object *tld_obj, const char *name)
> > +{
> > + int i, meta_off, cnt;
> > + void *metadata, *nm, *sz;
> > + u16 off = 0;
> > +
> > + if (!tld_obj->data_map || !tld_obj->data_map->metadata)
> > + return 0;
> > +
> > + cnt = tld_obj->data_map->metadata->cnt;
> > + metadata = tld_obj->data_map->metadata->metadata;
> > +
> > + bpf_for(i, 0, cnt) {
> > + meta_off = i * sizeof(struct tld_metadata);
> > + if (meta_off > TLD_DATA_SIZE - offsetof(struct u_tld_metadata, metadata)
> > + - sizeof(struct tld_metadata))
> > + break;
> > +
> > + nm = metadata + meta_off + offsetof(struct tld_metadata, name);
> > + sz = metadata + meta_off + offsetof(struct tld_metadata, size);
> > +
> > + /*
> > + * Reserve 0 for uninitialized keys. Increase the offset of a valid key
> > + * by one and subtract it later in tld_get_data().
> > + */
> > + if (!bpf_strncmp(nm, TLD_NAME_LEN, name))
> > + return off + 1;
>
> I think all this +1, -1 dance is doing is helping to catch an
> error when tld_get_data() is called without tld_fetch_key().
> I feel this is too defensive.
>
> Let tld_fetch_key() return proper negative error code.
>
I can certainly return negative error code.
But for the +1, -1 logic, I think is a simpler way to differentiate an
uninitialized key in tld_key_map from the first TLD (both key.off ==
0). After all, bpf programs can call tld_get_data() without
tld_fetch_key().
> > +
> > + off += *(u16 *)sz;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +/**
> > + * tld_get_data() - Retrieves a pointer to the TLD associated with the key.
> > + *
> > + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init()
> > + * @key: The key of a TLD saved in tld_maps
> > + * @size: The size of the TLD. Must be a known constant value
> > + *
> > + * Returns a pointer to the TLD data associated with the key; NULL if the key
> > + * is not valid or the size is too big
> > + */
> > +#define tld_get_data(tld_obj, key, size) \
> > + __tld_get_data(tld_obj, (tld_obj)->key_map->key.off - 1, size)
> > +
> > +__attribute__((unused))
> > +__always_inline void *__tld_get_data(struct tld_object *tld_obj, u32 off, u32 size)
> > +{
> > + return (tld_obj->data_map->data && off >= 0 && off < TLD_DATA_SIZE - size) ?
> > + (void *)tld_obj->data_map->data + off : NULL;
>
> If we require users to call tld_fetch_key() first and check for errors
> tld_get_data() can be faster:
> #define tld_get_data(tld_obj, key)
> (void *)tld_obj->data_map->data + tld_obj->key_map->key.off
>
tld_get_data() can be called in a bpf program without tld_fetch_key(),
so checking tld_obj->data_map->data is needed as the first load from
tld_obj->data_map->data yields a "mem_or_null".
I did try to save this uptr "mem" after null check to stack (e.g., in
a tld_object) so that we can save subsequent checks. However, the
compiler sometime will do a fresh load from map_ptr when reading
tld_obj->data_map->data. Might need some work or trick to make it
happen.
> I wouldn't bother with extra checks, especially for size.
>
It needs to be bound-checked. If tld_get_data() doesn't do it, bpf
programs have to do it before acceess. Otherwise:
; return (tld_obj->data_map->data && off >= 0) ? @ task_local_data.bpf.h:218
25: (bf) r3 = r1 ; R1_w=mem(sz=4096) R3_w=mem(sz=4096)
26: (0f) r3 += r2 ;
R2_w=scalar(smin=0,smax=umax=0xffffffff,smin32=0xffff7fff,smax32=32766,var_off=(0x0;
0xffffffff)) R3_w=mem(sz=4096,smin=0,smax=umax=0xffffffff,var_off=(0x0;
0xfffffff)
; test_value1 = *int_p; @ test_task_local_data.c:63
27: (61) r2 = *(u32 *)(r3 +0)
R3 unbounded memory access, make sure to bounds check any such access
> Bigger question.. can we cache the whole pointer for each key ?
> and then
> #define tld_get_data(tld_obj, key) ld_obj->key_map->key
Maybe define the member type of tld_key_map as __uptr and allow bpf
programs to update a uptr field with a valid uptr?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-16 20:41 ` Amery Hung
@ 2025-05-16 22:22 ` Alexei Starovoitov
2025-05-20 7:36 ` Amery Hung
0 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2025-05-16 22:22 UTC (permalink / raw)
To: Amery Hung
Cc: bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
Tejun Heo, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Kernel Team
On Fri, May 16, 2025 at 1:41 PM Amery Hung <ameryhung@gmail.com> wrote:
>
> > > + size = round_up(size, 8);
> >
> > why roundup ? and to 8 in particular?
> > If user space wants byte size keys, why not let it?
>
> I will remove it. This was to prevent breaking using TLD in atomic
> operations, but it should be very unlikely as they are thread
> specific.
You mean for a case where one part of the app (like a shared library)
is using u32, but the other is using u64 and doing atomic ops on it?
Make sense to align the off set by tld_create_key(),
but it can be done without rounding up all previous keys to 8.
63 bytes in the header are wasted. So use 2 as an offset.
A single cmpxchg 4 byte can update cnt+offset.
Actually why store size in each tld_metadata ?
Won't the logic will be simpler if it's an offset ?
bpf side tld_fetch_key() wouldn't need to count.
> > > + if (i < cnt) {
> > > + /*
> > > + * Pending tld_create_key() uses size to signal if the metadata has
> > > + * been fully updated.
> > > + */
> > > + while (!(sz = __atomic_load_n(&tld_metadata_p->metadata[i].size,
> > > + __ATOMIC_ACQUIRE)))
> > > + sched_yield();
> > > +
> > > + if (!strncmp(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN))
> > > + return (tld_key_t) {.off = -EEXIST};
> > > +
> > > + off += sz;
> > > + continue;
> > > + }
> > > +
> > > + if (off + size > TLD_DATA_SIZE)
> > > + return (tld_key_t) {.off = -E2BIG};
> > > +
> > > + /*
> > > + * Only one tld_create_key() can increase the current cnt by one and
> > > + * takes the latest available slot. Other threads will check again if a new
> > > + * TLD can still be added, and then compete for the new slot after the
> > > + * succeeding thread update the size.
> > > + */
> > > + if (!__atomic_compare_exchange_n(&tld_metadata_p->cnt, &cnt, cnt + 1, true,
> > > + __ATOMIC_RELAXED, __ATOMIC_RELAXED))
> >
> > weak and relaxed/relaxed ?
>
> I can't see reordering issue with cnt so I choose to use relax. I can
> change to strong acq/rel just to be safe.
>
> > That's unusual.
> > I don't know what it is supposed to do.
> > I think weak=false and __ATOMIC_ACQUIRE, __ATOMIC_RELAXED
> > would look as expected.
> >
>
> Do you mean weak=false and __ATOMIC_RELAXED, __ATOMIC_ACQUIRE?
no idea. I just grepped the kernel and saw:
TEST_KERNEL_LOCKED(atomic_builtin_with_memorder,
__atomic_compare_exchange_n(flag, &v, 1, 0,
__ATOMIC_ACQUIRE, __ATOMIC_RELAXED),
__atomic_store_n(flag, 0, __ATOMIC_RELEASE));
TEST_KERNEL_LOCKED(atomic_builtin_wrong_memorder,
__atomic_compare_exchange_n(flag, &v, 1, 0,
__ATOMIC_RELAXED, __ATOMIC_RELAXED),
__atomic_store_n(flag, 0, __ATOMIC_RELAXED));
I'd just use __ATOMIC_SEQ_CST everywhere.
Speed is not important here.
>
> > > + goto retry;
> > > +
> > > + strncpy(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN);
> > > + __atomic_store_n(&tld_metadata_p->metadata[i].size, size, __ATOMIC_RELEASE);
> > > + return (tld_key_t) {.off = off};
> > > + }
> > > +
> > > + return (tld_key_t) {.off = -ENOSPC};
> > > +}
> > > +
> > > +__attribute__((unused))
> > > +static inline bool tld_key_is_err(tld_key_t key)
> > > +{
> > > + return key.off < 0;
> > > +}
> > > +
> > > +__attribute__((unused))
> > > +static inline int tld_key_err_or_zero(tld_key_t key)
> > > +{
> > > + return tld_key_is_err(key) ? key.off : 0;
> > > +}
> > > +
> > > +/**
> > > + * tld_get_data() - Gets a pointer to the TLD associated with the key.
> > > + *
> > > + * @map_fd: A file descriptor of the underlying task local storage map,
> > > + * tld_data_map
> > > + * @key: A key object returned by tld_create_key().
> > > + *
> > > + * Returns a pointer to the TLD if the key is valid; NULL if no key has been
> > > + * added, not enough memory for TLD for this thread, or the key is invalid.
> > > + *
> > > + * Threads that call tld_get_data() must call tld_free() on exit to prevent
> > > + * memory leak.
> > > + */
> > > +__attribute__((unused))
> > > +static void *tld_get_data(int map_fd, tld_key_t key)
> > > +{
> > > + if (!READ_ONCE(tld_metadata_p))
> > > + return NULL;
> > > +
> > > + if (!tld_data_p && __tld_init_data(map_fd))
> > > + return NULL;
> >
> > Why call it again?
> > tld_create_key() should have done it, no?
> >
>
> A TLD is created by calling tld_create_key() once. Then, threads may
> call tld_get_data() to get their thread-specific TLD. So it is
> possible for a thread to call tld_get_data() with tld_data_p==NULL.
I see. Please add a comment.
> > > +
> > > + return tld_data_p->data + key.off;
> > > +}
> > > +
> > > +/**
> > > + * tld_free() - Frees task local data memory of the calling thread
> > > + */
> > > +__attribute__((unused))
> > > +static void tld_free(void)
> > > +{
> > > + if (tld_data_p)
> > > + free(tld_data_p);
> > > +}
> >
> > Since this .h allocates tld_metadata_p, it probably needs
> > a helper to free it too.
> >
> > > +
> > > +#endif /* __TASK_LOCAL_DATA_H */
> > > diff --git a/tools/testing/selftests/bpf/progs/task_local_data.bpf.h b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> > > new file mode 100644
> > > index 000000000000..5f48e408a5e5
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> > > @@ -0,0 +1,220 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +#ifndef __TASK_LOCAL_DATA_BPF_H
> > > +#define __TASK_LOCAL_DATA_BPF_H
> > > +
> > > +/*
> > > + * Task local data is a library that facilitates sharing per-task data
> > > + * between user space and bpf programs.
> > > + *
> > > + *
> > > + * PREREQUISITE
> > > + *
> > > + * A TLD, an entry of data in task local data, first needs to be created by the
> > > + * user space. This is done by calling user space API, tld_create_key(), with
> > > + * the name of the TLD and the size.
> > > + *
> > > + * tld_key_t prio, in_cs;
> > > + *
> > > + * prio = tld_create_key("priority", sizeof(int));
> > > + * in_cs = tld_create_key("in_critical_section", sizeof(bool));
> > > + *
> > > + * A key associated with the TLD, which has an opaque type tld_key_t, will be
> > > + * returned. It can be used to get a pointer to the TLD in the user space by
> > > + * calling tld_get_data().
> > > + *
> > > + *
> > > + * USAGE
> > > + *
> > > + * Similar to user space, bpf programs locate a TLD using the same key.
> > > + * tld_fetch_key() allows bpf programs to retrieve a key created in the user
> > > + * space by name, which is specified in the second argument of tld_create_key().
> > > + * tld_fetch_key() additionally will cache the key in a task local storage map,
> > > + * tld_key_map, to avoid performing costly string comparisons every time when
> > > + * accessing a TLD. This requires the developer to define the map value type of
> > > + * tld_key_map, struct tld_keys. It only needs to contain keys used by bpf
> > > + * programs in the compilation unit.
> > > + *
> > > + * struct tld_keys {
> > > + * tld_key_t prio;
> > > + * tld_key_t in_cs;
> > > + * };
> > > + *
> > > + * Then, for every new task, a bpf program will fetch and cache keys once and
> > > + * for all. This should be done ideally in a non-critical path (e.g., in
> > > + * sched_ext_ops::init_task).
> > > + *
> > > + * struct tld_object tld_obj;
> > > + *
> > > + * err = tld_object_init(task, &tld);
> > > + * if (err)
> > > + * return 0;
> > > + *
> > > + * tld_fetch_key(&tld_obj, "priority", prio);
> > > + * tld_fetch_key(&tld_obj, "in_critical_section", in_cs);
> > > + *
> > > + * Note that, the first argument of tld_fetch_key() is a pointer to tld_object.
> > > + * It should be declared as a stack variable and initialized via tld_object_init().
> > > + *
> > > + * Finally, just like user space programs, bpf programs can get a pointer to a
> > > + * TLD by calling tld_get_data(), with cached keys.
> > > + *
> > > + * int *p;
> > > + *
> > > + * p = tld_get_data(&tld_obj, prio, sizeof(int));
> > > + * if (p)
> > > + * // do something depending on *p
> > > + */
> > > +#include <errno.h>
> > > +#include <bpf/bpf_helpers.h>
> > > +
> > > +#define TLD_DATA_SIZE __PAGE_SIZE
> > > +#define TLD_DATA_CNT 63
> > > +#define TLD_NAME_LEN 62
> > > +
> > > +typedef struct {
> > > + __s16 off;
> > > +} tld_key_t;
> > > +
> > > +struct u_tld_data *dummy_data;
> > > +struct u_tld_metadata *dummy_metadata;
> > > +
> > > +struct tld_metadata {
> > > + char name[TLD_NAME_LEN];
> > > + __u16 size;
> > > +};
> > > +
> > > +struct u_tld_metadata {
> > > + __u8 cnt;
> > > + __u8 padding[63];
> > > + struct tld_metadata metadata[TLD_DATA_CNT];
> > > +};
> > > +
> > > +struct u_tld_data {
> > > + char data[TLD_DATA_SIZE];
> > > +};
> > > +
> > > +struct tld_map_value {
> > > + struct u_tld_data __uptr *data;
> > > + struct u_tld_metadata __uptr *metadata;
> > > +};
> > > +
> > > +struct tld_object {
> > > + struct tld_map_value *data_map;
> > > + struct tld_keys *key_map;
> > > +};
> > > +
> > > +/*
> > > + * Map value of tld_key_map for caching keys. Must be defined by the developer.
> > > + * Members should be tld_key_t and passed to the 3rd argument of tld_fetch_key().
> > > + */
> > > +struct tld_keys;
> > > +
> > > +struct {
> > > + __uint(type, BPF_MAP_TYPE_TASK_STORAGE);
> > > + __uint(map_flags, BPF_F_NO_PREALLOC);
> > > + __type(key, int);
> > > + __type(value, struct tld_map_value);
> > > +} tld_data_map SEC(".maps");
> > > +
> > > +struct {
> > > + __uint(type, BPF_MAP_TYPE_TASK_STORAGE);
> > > + __uint(map_flags, BPF_F_NO_PREALLOC);
> > > + __type(key, int);
> > > + __type(value, struct tld_keys);
> > > +} tld_key_map SEC(".maps");
> > > +
> > > +/**
> > > + * tld_object_init() - Initializes a tld_object.
> > > + *
> > > + * @task: The task_struct of the target task
> > > + * @tld_obj: A pointer to a tld_object to be initialized
> > > + *
> > > + * Returns 0 on success; -ENODATA if the task has no TLD; -ENOMEM if the creation
> > > + * of tld_key_map fails
> > > + */
> > > +__attribute__((unused))
> > > +static int tld_object_init(struct task_struct *task, struct tld_object *tld_obj)
> > > +{
> > > + tld_obj->data_map = bpf_task_storage_get(&tld_data_map, task, 0, 0);
> > > + if (!tld_obj->data_map)
> > > + return -ENODATA;
> > > +
> > > + tld_obj->key_map = bpf_task_storage_get(&tld_key_map, task, 0,
> > > + BPF_LOCAL_STORAGE_GET_F_CREATE);
> > > + if (!tld_obj->key_map)
> > > + return -ENOMEM;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +/**
> > > + * tld_fetch_key() - Fetches the key to a TLD by name. The key has to be created
> > > + * by user space first with tld_create_key().
> > > + *
> > > + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init()
> > > + * @name: The name of the key associated with a TLD
> > > + * @key: The key in struct tld_keys to be saved to
> > > + *
> > > + * Returns a positive integer on success; 0 otherwise
> > > + */
> > > +#define tld_fetch_key(tld_obj, name, key) \
> > > + ({ \
> > > + (tld_obj)->key_map->key.off = __tld_fetch_key(tld_obj, name); \
> > > + })
> > > +
> > > +__attribute__((unused))
> > > +static u16 __tld_fetch_key(struct tld_object *tld_obj, const char *name)
> > > +{
> > > + int i, meta_off, cnt;
> > > + void *metadata, *nm, *sz;
> > > + u16 off = 0;
> > > +
> > > + if (!tld_obj->data_map || !tld_obj->data_map->metadata)
> > > + return 0;
> > > +
> > > + cnt = tld_obj->data_map->metadata->cnt;
> > > + metadata = tld_obj->data_map->metadata->metadata;
> > > +
> > > + bpf_for(i, 0, cnt) {
> > > + meta_off = i * sizeof(struct tld_metadata);
> > > + if (meta_off > TLD_DATA_SIZE - offsetof(struct u_tld_metadata, metadata)
> > > + - sizeof(struct tld_metadata))
> > > + break;
> > > +
> > > + nm = metadata + meta_off + offsetof(struct tld_metadata, name);
> > > + sz = metadata + meta_off + offsetof(struct tld_metadata, size);
> > > +
> > > + /*
> > > + * Reserve 0 for uninitialized keys. Increase the offset of a valid key
> > > + * by one and subtract it later in tld_get_data().
> > > + */
> > > + if (!bpf_strncmp(nm, TLD_NAME_LEN, name))
> > > + return off + 1;
> >
> > I think all this +1, -1 dance is doing is helping to catch an
> > error when tld_get_data() is called without tld_fetch_key().
> > I feel this is too defensive.
> >
> > Let tld_fetch_key() return proper negative error code.
> >
>
> I can certainly return negative error code.
>
> But for the +1, -1 logic, I think is a simpler way to differentiate an
> uninitialized key in tld_key_map from the first TLD (both key.off ==
> 0). After all, bpf programs can call tld_get_data() without
> tld_fetch_key().
I'm saying we don't need to catch this case.
progs should not call tld_get_data() without tld_fetch_key().
If they do, it's a bug.
>
> > > +
> > > + off += *(u16 *)sz;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +/**
> > > + * tld_get_data() - Retrieves a pointer to the TLD associated with the key.
> > > + *
> > > + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init()
> > > + * @key: The key of a TLD saved in tld_maps
> > > + * @size: The size of the TLD. Must be a known constant value
> > > + *
> > > + * Returns a pointer to the TLD data associated with the key; NULL if the key
> > > + * is not valid or the size is too big
> > > + */
> > > +#define tld_get_data(tld_obj, key, size) \
> > > + __tld_get_data(tld_obj, (tld_obj)->key_map->key.off - 1, size)
> > > +
> > > +__attribute__((unused))
> > > +__always_inline void *__tld_get_data(struct tld_object *tld_obj, u32 off, u32 size)
> > > +{
> > > + return (tld_obj->data_map->data && off >= 0 && off < TLD_DATA_SIZE - size) ?
> > > + (void *)tld_obj->data_map->data + off : NULL;
> >
> > If we require users to call tld_fetch_key() first and check for errors
> > tld_get_data() can be faster:
> > #define tld_get_data(tld_obj, key)
> > (void *)tld_obj->data_map->data + tld_obj->key_map->key.off
> >
>
> tld_get_data() can be called in a bpf program without tld_fetch_key(),
> so checking tld_obj->data_map->data is needed as the first load from
> tld_obj->data_map->data yields a "mem_or_null".
>
> I did try to save this uptr "mem" after null check to stack (e.g., in
> a tld_object) so that we can save subsequent checks. However, the
> compiler sometime will do a fresh load from map_ptr when reading
> tld_obj->data_map->data. Might need some work or trick to make it
> happen.
likely because you do tld_obj->data_map->data twice.
> > I wouldn't bother with extra checks, especially for size.
> >
>
> It needs to be bound-checked. If tld_get_data() doesn't do it, bpf
> programs have to do it before acceess. Otherwise:
>
> ; return (tld_obj->data_map->data && off >= 0) ? @ task_local_data.bpf.h:218
> 25: (bf) r3 = r1 ; R1_w=mem(sz=4096) R3_w=mem(sz=4096)
> 26: (0f) r3 += r2 ;
> R2_w=scalar(smin=0,smax=umax=0xffffffff,smin32=0xffff7fff,smax32=32766,var_off=(0x0;
> 0xffffffff)) R3_w=mem(sz=4096,smin=0,smax=umax=0xffffffff,var_off=(0x0;
> 0xfffffff)
> ; test_value1 = *int_p; @ test_task_local_data.c:63
> 27: (61) r2 = *(u32 *)(r3 +0)
> R3 unbounded memory access, make sure to bounds check any such access
That's easy to fix.
Then something like:
#define tld_get_data(tld_obj, key) \
({
void * data = tld_obj->data_map->data;
if (data)
data += tld_obj->key_map->key.off & (PAGE_SIZE - 1);
data;
})
size is really not needed. The verifier sees it as one page.
Bad bpf prog can write into the wrong key and the verifier cannot stop it.
> > Bigger question.. can we cache the whole pointer for each key ?
> > and then
> > #define tld_get_data(tld_obj, key) ld_obj->key_map->key
>
> Maybe define the member type of tld_key_map as __uptr and allow bpf
> programs to update a uptr field with a valid uptr?
yeah. That indeed gets complicated. Maybe it's possible with some
verifier changes, but let's not go there yet.
The tld_get_data() proposed above is speedy enough.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-16 22:22 ` Alexei Starovoitov
@ 2025-05-20 7:36 ` Amery Hung
2025-05-20 20:03 ` Alexei Starovoitov
0 siblings, 1 reply; 17+ messages in thread
From: Amery Hung @ 2025-05-20 7:36 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
Tejun Heo, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Kernel Team
On Fri, May 16, 2025 at 3:22 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, May 16, 2025 at 1:41 PM Amery Hung <ameryhung@gmail.com> wrote:
> >
> > > > + size = round_up(size, 8);
> > >
> > > why roundup ? and to 8 in particular?
> > > If user space wants byte size keys, why not let it?
> >
> > I will remove it. This was to prevent breaking using TLD in atomic
> > operations, but it should be very unlikely as they are thread
> > specific.
>
> You mean for a case where one part of the app (like a shared library)
> is using u32, but the other is using u64 and doing atomic ops on it?
>
> Make sense to align the off set by tld_create_key(),
> but it can be done without rounding up all previous keys to 8.
> 63 bytes in the header are wasted. So use 2 as an offset.
> A single cmpxchg 4 byte can update cnt+offset.
> Actually why store size in each tld_metadata ?
> Won't the logic will be simpler if it's an offset ?
> bpf side tld_fetch_key() wouldn't need to count.
>
I changed to size since metadata is initialized to 0 and size == 0 can
be used to signal pending metadata update, while 0 is a valid offset.
I will save offset in metadata in the next respin. Tejun suggested a
run-length key encoding, and there are other fields in the metadata
that can be used for the signaling.
> > > > + if (i < cnt) {
> > > > + /*
> > > > + * Pending tld_create_key() uses size to signal if the metadata has
> > > > + * been fully updated.
> > > > + */
> > > > + while (!(sz = __atomic_load_n(&tld_metadata_p->metadata[i].size,
> > > > + __ATOMIC_ACQUIRE)))
> > > > + sched_yield();
> > > > +
> > > > + if (!strncmp(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN))
> > > > + return (tld_key_t) {.off = -EEXIST};
> > > > +
> > > > + off += sz;
> > > > + continue;
> > > > + }
> > > > +
> > > > + if (off + size > TLD_DATA_SIZE)
> > > > + return (tld_key_t) {.off = -E2BIG};
> > > > +
> > > > + /*
> > > > + * Only one tld_create_key() can increase the current cnt by one and
> > > > + * takes the latest available slot. Other threads will check again if a new
> > > > + * TLD can still be added, and then compete for the new slot after the
> > > > + * succeeding thread update the size.
> > > > + */
> > > > + if (!__atomic_compare_exchange_n(&tld_metadata_p->cnt, &cnt, cnt + 1, true,
> > > > + __ATOMIC_RELAXED, __ATOMIC_RELAXED))
> > >
> > > weak and relaxed/relaxed ?
> >
> > I can't see reordering issue with cnt so I choose to use relax. I can
> > change to strong acq/rel just to be safe.
> >
> > > That's unusual.
> > > I don't know what it is supposed to do.
> > > I think weak=false and __ATOMIC_ACQUIRE, __ATOMIC_RELAXED
> > > would look as expected.
> > >
> >
> > Do you mean weak=false and __ATOMIC_RELAXED, __ATOMIC_ACQUIRE?
>
> no idea. I just grepped the kernel and saw:
> TEST_KERNEL_LOCKED(atomic_builtin_with_memorder,
> __atomic_compare_exchange_n(flag, &v, 1, 0,
> __ATOMIC_ACQUIRE, __ATOMIC_RELAXED),
> __atomic_store_n(flag, 0, __ATOMIC_RELEASE));
> TEST_KERNEL_LOCKED(atomic_builtin_wrong_memorder,
> __atomic_compare_exchange_n(flag, &v, 1, 0,
> __ATOMIC_RELAXED, __ATOMIC_RELAXED),
> __atomic_store_n(flag, 0, __ATOMIC_RELAXED));
>
> I'd just use __ATOMIC_SEQ_CST everywhere.
> Speed is not important here.
Make sense. I will use __ATOMIC_SEQ_CST. Thanks for the suggestion.
>
> >
> > > > + goto retry;
> > > > +
> > > > + strncpy(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN);
> > > > + __atomic_store_n(&tld_metadata_p->metadata[i].size, size, __ATOMIC_RELEASE);
> > > > + return (tld_key_t) {.off = off};
> > > > + }
> > > > +
> > > > + return (tld_key_t) {.off = -ENOSPC};
> > > > +}
> > > > +
> > > > +__attribute__((unused))
> > > > +static inline bool tld_key_is_err(tld_key_t key)
> > > > +{
> > > > + return key.off < 0;
> > > > +}
> > > > +
> > > > +__attribute__((unused))
> > > > +static inline int tld_key_err_or_zero(tld_key_t key)
> > > > +{
> > > > + return tld_key_is_err(key) ? key.off : 0;
> > > > +}
> > > > +
> > > > +/**
> > > > + * tld_get_data() - Gets a pointer to the TLD associated with the key.
> > > > + *
> > > > + * @map_fd: A file descriptor of the underlying task local storage map,
> > > > + * tld_data_map
> > > > + * @key: A key object returned by tld_create_key().
> > > > + *
> > > > + * Returns a pointer to the TLD if the key is valid; NULL if no key has been
> > > > + * added, not enough memory for TLD for this thread, or the key is invalid.
> > > > + *
> > > > + * Threads that call tld_get_data() must call tld_free() on exit to prevent
> > > > + * memory leak.
> > > > + */
> > > > +__attribute__((unused))
> > > > +static void *tld_get_data(int map_fd, tld_key_t key)
> > > > +{
> > > > + if (!READ_ONCE(tld_metadata_p))
> > > > + return NULL;
> > > > +
> > > > + if (!tld_data_p && __tld_init_data(map_fd))
> > > > + return NULL;
> > >
> > > Why call it again?
> > > tld_create_key() should have done it, no?
> > >
> >
> > A TLD is created by calling tld_create_key() once. Then, threads may
> > call tld_get_data() to get their thread-specific TLD. So it is
> > possible for a thread to call tld_get_data() with tld_data_p==NULL.
>
> I see. Please add a comment.
I will explain it in the comment.
>
> > > > +
> > > > + return tld_data_p->data + key.off;
> > > > +}
> > > > +
> > > > +/**
> > > > + * tld_free() - Frees task local data memory of the calling thread
> > > > + */
> > > > +__attribute__((unused))
> > > > +static void tld_free(void)
> > > > +{
> > > > + if (tld_data_p)
> > > > + free(tld_data_p);
> > > > +}
> > >
> > > Since this .h allocates tld_metadata_p, it probably needs
> > > a helper to free it too.
> > >
> > > > +
> > > > +#endif /* __TASK_LOCAL_DATA_H */
> > > > diff --git a/tools/testing/selftests/bpf/progs/task_local_data.bpf.h b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> > > > new file mode 100644
> > > > index 000000000000..5f48e408a5e5
> > > > --- /dev/null
> > > > +++ b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> > > > @@ -0,0 +1,220 @@
> > > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > > +#ifndef __TASK_LOCAL_DATA_BPF_H
> > > > +#define __TASK_LOCAL_DATA_BPF_H
> > > > +
> > > > +/*
> > > > + * Task local data is a library that facilitates sharing per-task data
> > > > + * between user space and bpf programs.
> > > > + *
> > > > + *
> > > > + * PREREQUISITE
> > > > + *
> > > > + * A TLD, an entry of data in task local data, first needs to be created by the
> > > > + * user space. This is done by calling user space API, tld_create_key(), with
> > > > + * the name of the TLD and the size.
> > > > + *
> > > > + * tld_key_t prio, in_cs;
> > > > + *
> > > > + * prio = tld_create_key("priority", sizeof(int));
> > > > + * in_cs = tld_create_key("in_critical_section", sizeof(bool));
> > > > + *
> > > > + * A key associated with the TLD, which has an opaque type tld_key_t, will be
> > > > + * returned. It can be used to get a pointer to the TLD in the user space by
> > > > + * calling tld_get_data().
> > > > + *
> > > > + *
> > > > + * USAGE
> > > > + *
> > > > + * Similar to user space, bpf programs locate a TLD using the same key.
> > > > + * tld_fetch_key() allows bpf programs to retrieve a key created in the user
> > > > + * space by name, which is specified in the second argument of tld_create_key().
> > > > + * tld_fetch_key() additionally will cache the key in a task local storage map,
> > > > + * tld_key_map, to avoid performing costly string comparisons every time when
> > > > + * accessing a TLD. This requires the developer to define the map value type of
> > > > + * tld_key_map, struct tld_keys. It only needs to contain keys used by bpf
> > > > + * programs in the compilation unit.
> > > > + *
> > > > + * struct tld_keys {
> > > > + * tld_key_t prio;
> > > > + * tld_key_t in_cs;
> > > > + * };
> > > > + *
> > > > + * Then, for every new task, a bpf program will fetch and cache keys once and
> > > > + * for all. This should be done ideally in a non-critical path (e.g., in
> > > > + * sched_ext_ops::init_task).
> > > > + *
> > > > + * struct tld_object tld_obj;
> > > > + *
> > > > + * err = tld_object_init(task, &tld);
> > > > + * if (err)
> > > > + * return 0;
> > > > + *
> > > > + * tld_fetch_key(&tld_obj, "priority", prio);
> > > > + * tld_fetch_key(&tld_obj, "in_critical_section", in_cs);
> > > > + *
> > > > + * Note that, the first argument of tld_fetch_key() is a pointer to tld_object.
> > > > + * It should be declared as a stack variable and initialized via tld_object_init().
> > > > + *
> > > > + * Finally, just like user space programs, bpf programs can get a pointer to a
> > > > + * TLD by calling tld_get_data(), with cached keys.
> > > > + *
> > > > + * int *p;
> > > > + *
> > > > + * p = tld_get_data(&tld_obj, prio, sizeof(int));
> > > > + * if (p)
> > > > + * // do something depending on *p
> > > > + */
> > > > +#include <errno.h>
> > > > +#include <bpf/bpf_helpers.h>
> > > > +
> > > > +#define TLD_DATA_SIZE __PAGE_SIZE
> > > > +#define TLD_DATA_CNT 63
> > > > +#define TLD_NAME_LEN 62
> > > > +
> > > > +typedef struct {
> > > > + __s16 off;
> > > > +} tld_key_t;
> > > > +
> > > > +struct u_tld_data *dummy_data;
> > > > +struct u_tld_metadata *dummy_metadata;
> > > > +
> > > > +struct tld_metadata {
> > > > + char name[TLD_NAME_LEN];
> > > > + __u16 size;
> > > > +};
> > > > +
> > > > +struct u_tld_metadata {
> > > > + __u8 cnt;
> > > > + __u8 padding[63];
> > > > + struct tld_metadata metadata[TLD_DATA_CNT];
> > > > +};
> > > > +
> > > > +struct u_tld_data {
> > > > + char data[TLD_DATA_SIZE];
> > > > +};
> > > > +
> > > > +struct tld_map_value {
> > > > + struct u_tld_data __uptr *data;
> > > > + struct u_tld_metadata __uptr *metadata;
> > > > +};
> > > > +
> > > > +struct tld_object {
> > > > + struct tld_map_value *data_map;
> > > > + struct tld_keys *key_map;
> > > > +};
> > > > +
> > > > +/*
> > > > + * Map value of tld_key_map for caching keys. Must be defined by the developer.
> > > > + * Members should be tld_key_t and passed to the 3rd argument of tld_fetch_key().
> > > > + */
> > > > +struct tld_keys;
> > > > +
> > > > +struct {
> > > > + __uint(type, BPF_MAP_TYPE_TASK_STORAGE);
> > > > + __uint(map_flags, BPF_F_NO_PREALLOC);
> > > > + __type(key, int);
> > > > + __type(value, struct tld_map_value);
> > > > +} tld_data_map SEC(".maps");
> > > > +
> > > > +struct {
> > > > + __uint(type, BPF_MAP_TYPE_TASK_STORAGE);
> > > > + __uint(map_flags, BPF_F_NO_PREALLOC);
> > > > + __type(key, int);
> > > > + __type(value, struct tld_keys);
> > > > +} tld_key_map SEC(".maps");
> > > > +
> > > > +/**
> > > > + * tld_object_init() - Initializes a tld_object.
> > > > + *
> > > > + * @task: The task_struct of the target task
> > > > + * @tld_obj: A pointer to a tld_object to be initialized
> > > > + *
> > > > + * Returns 0 on success; -ENODATA if the task has no TLD; -ENOMEM if the creation
> > > > + * of tld_key_map fails
> > > > + */
> > > > +__attribute__((unused))
> > > > +static int tld_object_init(struct task_struct *task, struct tld_object *tld_obj)
> > > > +{
> > > > + tld_obj->data_map = bpf_task_storage_get(&tld_data_map, task, 0, 0);
> > > > + if (!tld_obj->data_map)
> > > > + return -ENODATA;
> > > > +
> > > > + tld_obj->key_map = bpf_task_storage_get(&tld_key_map, task, 0,
> > > > + BPF_LOCAL_STORAGE_GET_F_CREATE);
> > > > + if (!tld_obj->key_map)
> > > > + return -ENOMEM;
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > +/**
> > > > + * tld_fetch_key() - Fetches the key to a TLD by name. The key has to be created
> > > > + * by user space first with tld_create_key().
> > > > + *
> > > > + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init()
> > > > + * @name: The name of the key associated with a TLD
> > > > + * @key: The key in struct tld_keys to be saved to
> > > > + *
> > > > + * Returns a positive integer on success; 0 otherwise
> > > > + */
> > > > +#define tld_fetch_key(tld_obj, name, key) \
> > > > + ({ \
> > > > + (tld_obj)->key_map->key.off = __tld_fetch_key(tld_obj, name); \
> > > > + })
> > > > +
> > > > +__attribute__((unused))
> > > > +static u16 __tld_fetch_key(struct tld_object *tld_obj, const char *name)
> > > > +{
> > > > + int i, meta_off, cnt;
> > > > + void *metadata, *nm, *sz;
> > > > + u16 off = 0;
> > > > +
> > > > + if (!tld_obj->data_map || !tld_obj->data_map->metadata)
> > > > + return 0;
> > > > +
> > > > + cnt = tld_obj->data_map->metadata->cnt;
> > > > + metadata = tld_obj->data_map->metadata->metadata;
> > > > +
> > > > + bpf_for(i, 0, cnt) {
> > > > + meta_off = i * sizeof(struct tld_metadata);
> > > > + if (meta_off > TLD_DATA_SIZE - offsetof(struct u_tld_metadata, metadata)
> > > > + - sizeof(struct tld_metadata))
> > > > + break;
> > > > +
> > > > + nm = metadata + meta_off + offsetof(struct tld_metadata, name);
> > > > + sz = metadata + meta_off + offsetof(struct tld_metadata, size);
> > > > +
> > > > + /*
> > > > + * Reserve 0 for uninitialized keys. Increase the offset of a valid key
> > > > + * by one and subtract it later in tld_get_data().
> > > > + */
> > > > + if (!bpf_strncmp(nm, TLD_NAME_LEN, name))
> > > > + return off + 1;
> > >
> > > I think all this +1, -1 dance is doing is helping to catch an
> > > error when tld_get_data() is called without tld_fetch_key().
> > > I feel this is too defensive.
> > >
> > > Let tld_fetch_key() return proper negative error code.
> > >
> >
> > I can certainly return negative error code.
> >
> > But for the +1, -1 logic, I think is a simpler way to differentiate an
> > uninitialized key in tld_key_map from the first TLD (both key.off ==
> > 0). After all, bpf programs can call tld_get_data() without
> > tld_fetch_key().
>
> I'm saying we don't need to catch this case.
> progs should not call tld_get_data() without tld_fetch_key().
> If they do, it's a bug.
>
Got it. I will document this in the comment of tld_get_data().
> >
> > > > +
> > > > + off += *(u16 *)sz;
> > > > + }
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > +/**
> > > > + * tld_get_data() - Retrieves a pointer to the TLD associated with the key.
> > > > + *
> > > > + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init()
> > > > + * @key: The key of a TLD saved in tld_maps
> > > > + * @size: The size of the TLD. Must be a known constant value
> > > > + *
> > > > + * Returns a pointer to the TLD data associated with the key; NULL if the key
> > > > + * is not valid or the size is too big
> > > > + */
> > > > +#define tld_get_data(tld_obj, key, size) \
> > > > + __tld_get_data(tld_obj, (tld_obj)->key_map->key.off - 1, size)
> > > > +
> > > > +__attribute__((unused))
> > > > +__always_inline void *__tld_get_data(struct tld_object *tld_obj, u32 off, u32 size)
> > > > +{
> > > > + return (tld_obj->data_map->data && off >= 0 && off < TLD_DATA_SIZE - size) ?
> > > > + (void *)tld_obj->data_map->data + off : NULL;
> > >
> > > If we require users to call tld_fetch_key() first and check for errors
> > > tld_get_data() can be faster:
> > > #define tld_get_data(tld_obj, key)
> > > (void *)tld_obj->data_map->data + tld_obj->key_map->key.off
> > >
> >
> > tld_get_data() can be called in a bpf program without tld_fetch_key(),
> > so checking tld_obj->data_map->data is needed as the first load from
> > tld_obj->data_map->data yields a "mem_or_null".
> >
> > I did try to save this uptr "mem" after null check to stack (e.g., in
> > a tld_object) so that we can save subsequent checks. However, the
> > compiler sometime will do a fresh load from map_ptr when reading
> > tld_obj->data_map->data. Might need some work or trick to make it
> > happen.
>
> likely because you do tld_obj->data_map->data twice.
>
> > > I wouldn't bother with extra checks, especially for size.
> > >
> >
> > It needs to be bound-checked. If tld_get_data() doesn't do it, bpf
> > programs have to do it before acceess. Otherwise:
> >
> > ; return (tld_obj->data_map->data && off >= 0) ? @ task_local_data.bpf.h:218
> > 25: (bf) r3 = r1 ; R1_w=mem(sz=4096) R3_w=mem(sz=4096)
> > 26: (0f) r3 += r2 ;
> > R2_w=scalar(smin=0,smax=umax=0xffffffff,smin32=0xffff7fff,smax32=32766,var_off=(0x0;
> > 0xffffffff)) R3_w=mem(sz=4096,smin=0,smax=umax=0xffffffff,var_off=(0x0;
> > 0xfffffff)
> > ; test_value1 = *int_p; @ test_task_local_data.c:63
> > 27: (61) r2 = *(u32 *)(r3 +0)
> > R3 unbounded memory access, make sure to bounds check any such access
>
> That's easy to fix.
> Then something like:
> #define tld_get_data(tld_obj, key) \
> ({
> void * data = tld_obj->data_map->data;
> if (data)
> data += tld_obj->key_map->key.off & (PAGE_SIZE - 1);
> data;
> })
>
> size is really not needed. The verifier sees it as one page.
> Bad bpf prog can write into the wrong key and the verifier cannot stop it.
>
key.off is a variable offset, so the verifier may assume key.off ==
PAGE_SIZE - 1. If a bpf program tries to dereference a pointer
returned by the proposed tld_get_data() as an int * without bound
check, the verifier will still consider this a potential out-of-bound
access:
invalid access to memory, mem_size=4096 off=4095 size=4
I think if there needs to be a bound check anyways, hiding it
tld_get_data() makes the user written part less complex.
> > > Bigger question.. can we cache the whole pointer for each key ?
> > > and then
> > > #define tld_get_data(tld_obj, key) ld_obj->key_map->key
> >
> > Maybe define the member type of tld_key_map as __uptr and allow bpf
> > programs to update a uptr field with a valid uptr?
>
> yeah. That indeed gets complicated. Maybe it's possible with some
> verifier changes, but let's not go there yet.
> The tld_get_data() proposed above is speedy enough.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-20 7:36 ` Amery Hung
@ 2025-05-20 20:03 ` Alexei Starovoitov
0 siblings, 0 replies; 17+ messages in thread
From: Alexei Starovoitov @ 2025-05-20 20:03 UTC (permalink / raw)
To: Amery Hung
Cc: bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
Tejun Heo, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Kernel Team
On Tue, May 20, 2025 at 12:37 AM Amery Hung <ameryhung@gmail.com> wrote:
>
> > Then something like:
> > #define tld_get_data(tld_obj, key) \
> > ({
> > void * data = tld_obj->data_map->data;
> > if (data)
> > data += tld_obj->key_map->key.off & (PAGE_SIZE - 1);
> > data;
> > })
> >
> > size is really not needed. The verifier sees it as one page.
> > Bad bpf prog can write into the wrong key and the verifier cannot stop it.
> >
>
> key.off is a variable offset, so the verifier may assume key.off ==
> PAGE_SIZE - 1. If a bpf program tries to dereference a pointer
> returned by the proposed tld_get_data() as an int * without bound
> check, the verifier will still consider this a potential out-of-bound
> access:
>
> invalid access to memory, mem_size=4096 off=4095 size=4
>
> I think if there needs to be a bound check anyways, hiding it
> tld_get_data() makes the user written part less complex.
I see. Yeah off < TLD_DATA_SIZE - size check cannot be removed.
I was hoping to save an extra branch. oh well.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-15 21:16 ` [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task " Amery Hung
2025-05-16 18:57 ` Alexei Starovoitov
@ 2025-05-19 20:04 ` Tejun Heo
2025-05-20 6:44 ` Amery Hung
2025-05-20 22:57 ` Andrii Nakryiko
2025-05-22 8:36 ` Tony Ambardar
3 siblings, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2025-05-19 20:04 UTC (permalink / raw)
To: Amery Hung
Cc: bpf, netdev, alexei.starovoitov, andrii, daniel, memxor,
martin.lau, kernel-team
Hello,
On Thu, May 15, 2025 at 02:16:00PM -0700, Amery Hung wrote:
...
> +#define PAGE_SIZE 4096
This might conflict with other definitions. Looks like non-4k page sizes are
a lot more popular on arm. Would this be a problem?
> +static int __tld_init_metadata(int map_fd)
> +{
> + struct u_tld_metadata *new_metadata;
> + struct tld_map_value map_val;
> + int task_fd = 0, err;
> +
> + task_fd = syscall(SYS_pidfd_open, getpid(), 0);
> + if (task_fd < 0) {
> + err = -errno;
> + goto out;
> + }
> +
> + new_metadata = aligned_alloc(PAGE_SIZE, PAGE_SIZE);
Is 4k size limit from UPTR? Is it still 4k on machines with >4k pages? If
this isn't a hard limit from UPTR, would it make sense to encode the size in
the header part of the metadata?
> +static int __tld_init_data(int map_fd)
> +{
> + struct u_tld_data *new_data = NULL;
> + struct tld_map_value map_val;
> + int err, task_fd = 0;
> +
> + task_fd = syscall(SYS_pidfd_open, gettid(), PIDFD_THREAD);
> + if (task_fd < 0) {
> + err = -errno;
> + goto out;
> + }
> +
> + new_data = aligned_alloc(PAGE_SIZE, TLD_DATA_SIZE);
Ditto.
Noob question. Does this means that each thread will map a 4k page no matter
how much data it actually uses?
> +__attribute__((unused))
> +static tld_key_t tld_create_key(int map_fd, const char *name, size_t size)
> +{
> + int err, i, cnt, sz, off = 0;
> +
> + if (!READ_ONCE(tld_metadata_p)) {
> + err = __tld_init_metadata(map_fd);
> + if (err)
> + return (tld_key_t) {.off = err};
> + }
> +
> + if (!tld_data_p) {
> + err = __tld_init_data(map_fd);
> + if (err)
> + return (tld_key_t) {.off = err};
> + }
> +
> + size = round_up(size, 8);
> +
> + for (i = 0; i < TLD_DATA_CNT; i++) {
> +retry:
> + cnt = __atomic_load_n(&tld_metadata_p->cnt, __ATOMIC_RELAXED);
> + if (i < cnt) {
> + /*
> + * Pending tld_create_key() uses size to signal if the metadata has
> + * been fully updated.
> + */
> + while (!(sz = __atomic_load_n(&tld_metadata_p->metadata[i].size,
> + __ATOMIC_ACQUIRE)))
> + sched_yield();
> +
> + if (!strncmp(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN))
> + return (tld_key_t) {.off = -EEXIST};
> +
> + off += sz;
> + continue;
> + }
> +
> + if (off + size > TLD_DATA_SIZE)
> + return (tld_key_t) {.off = -E2BIG};
> +
> + /*
> + * Only one tld_create_key() can increase the current cnt by one and
> + * takes the latest available slot. Other threads will check again if a new
> + * TLD can still be added, and then compete for the new slot after the
> + * succeeding thread update the size.
> + */
> + if (!__atomic_compare_exchange_n(&tld_metadata_p->cnt, &cnt, cnt + 1, true,
> + __ATOMIC_RELAXED, __ATOMIC_RELAXED))
> + goto retry;
> +
> + strncpy(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN);
> + __atomic_store_n(&tld_metadata_p->metadata[i].size, size, __ATOMIC_RELEASE);
> + return (tld_key_t) {.off = off};
> + }
> +
> + return (tld_key_t) {.off = -ENOSPC};
> +}
This looks fine to me but I wonder whether run-length encoding the key
strings would be more efficient and less restrictive in terms of key length.
e.g.:
struct key {
u32 data_len;
u16 key_off;
u16 key_len;
};
struct metadata {
struct key keys[MAX_KEYS];
char key_strs[SOME_SIZE];
};
The logic can be mostly the same. The only difference would be that key
string is not inline. Determine winner in the creation path by compxchg'ing
on data_len, but set key_off and key_len only after key string is updated.
Losing on cmpxhcg or seeing an entry where key_len is zero means that that
one lost and should relax and retry. It can still use the same 4k metadata
page but will likely be able to allow more keys while also relaxing
restrictions on key length.
Hmm... maybe making the key string variably sized makes things difficult for
the BPF code. If so (or for any other reasons), please feel free to ignore
the above.
> +#endif /* __TASK_LOCAL_DATA_H */
> diff --git a/tools/testing/selftests/bpf/progs/task_local_data.bpf.h b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> new file mode 100644
> index 000000000000..5f48e408a5e5
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
...
> +/**
> + * tld_get_data() - Retrieves a pointer to the TLD associated with the key.
> + *
> + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init()
> + * @key: The key of a TLD saved in tld_maps
> + * @size: The size of the TLD. Must be a known constant value
> + *
> + * Returns a pointer to the TLD data associated with the key; NULL if the key
> + * is not valid or the size is too big
> + */
> +#define tld_get_data(tld_obj, key, size) \
> + __tld_get_data(tld_obj, (tld_obj)->key_map->key.off - 1, size)
> +
> +__attribute__((unused))
> +__always_inline void *__tld_get_data(struct tld_object *tld_obj, u32 off, u32 size)
> +{
> + return (tld_obj->data_map->data && off >= 0 && off < TLD_DATA_SIZE - size) ?
> + (void *)tld_obj->data_map->data + off : NULL;
> +}
Neat.
Generally looks great to me. The only thing I wonder is whether the data
area sizing can be determined at init time rather than fixed to 4k.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-19 20:04 ` Tejun Heo
@ 2025-05-20 6:44 ` Amery Hung
0 siblings, 0 replies; 17+ messages in thread
From: Amery Hung @ 2025-05-20 6:44 UTC (permalink / raw)
To: Tejun Heo
Cc: bpf, netdev, alexei.starovoitov, andrii, daniel, memxor,
martin.lau, kernel-team
On Mon, May 19, 2025 at 1:04 PM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Thu, May 15, 2025 at 02:16:00PM -0700, Amery Hung wrote:
> ...
> > +#define PAGE_SIZE 4096
>
> This might conflict with other definitions. Looks like non-4k page sizes are
> a lot more popular on arm. Would this be a problem?
>
> > +static int __tld_init_metadata(int map_fd)
> > +{
> > + struct u_tld_metadata *new_metadata;
> > + struct tld_map_value map_val;
> > + int task_fd = 0, err;
> > +
> > + task_fd = syscall(SYS_pidfd_open, getpid(), 0);
> > + if (task_fd < 0) {
> > + err = -errno;
> > + goto out;
> > + }
> > +
> > + new_metadata = aligned_alloc(PAGE_SIZE, PAGE_SIZE);
>
> Is 4k size limit from UPTR? Is it still 4k on machines with >4k pages? If
> this isn't a hard limit from UPTR, would it make sense to encode the size in
> the header part of the metadata?
>
UPTR size limit is a page. I will make PAGE_SIZE arch dependent. For
metadata, since all threads of a process share one metadata page, I
think it is okay to make it a fixed size. For data, I think it makes
sense to encode it in the header of metadata.
> > +static int __tld_init_data(int map_fd)
> > +{
> > + struct u_tld_data *new_data = NULL;
> > + struct tld_map_value map_val;
> > + int err, task_fd = 0;
> > +
> > + task_fd = syscall(SYS_pidfd_open, gettid(), PIDFD_THREAD);
> > + if (task_fd < 0) {
> > + err = -errno;
> > + goto out;
> > + }
> > +
> > + new_data = aligned_alloc(PAGE_SIZE, TLD_DATA_SIZE);
>
> Ditto.
>
> Noob question. Does this means that each thread will map a 4k page no matter
> how much data it actually uses?
Unfortunately this is the case currently, but hey maybe we can make
data size dynamic
>
> > +__attribute__((unused))
> > +static tld_key_t tld_create_key(int map_fd, const char *name, size_t size)
> > +{
> > + int err, i, cnt, sz, off = 0;
> > +
> > + if (!READ_ONCE(tld_metadata_p)) {
> > + err = __tld_init_metadata(map_fd);
> > + if (err)
> > + return (tld_key_t) {.off = err};
> > + }
> > +
> > + if (!tld_data_p) {
> > + err = __tld_init_data(map_fd);
> > + if (err)
> > + return (tld_key_t) {.off = err};
> > + }
> > +
> > + size = round_up(size, 8);
> > +
> > + for (i = 0; i < TLD_DATA_CNT; i++) {
> > +retry:
> > + cnt = __atomic_load_n(&tld_metadata_p->cnt, __ATOMIC_RELAXED);
> > + if (i < cnt) {
> > + /*
> > + * Pending tld_create_key() uses size to signal if the metadata has
> > + * been fully updated.
> > + */
> > + while (!(sz = __atomic_load_n(&tld_metadata_p->metadata[i].size,
> > + __ATOMIC_ACQUIRE)))
> > + sched_yield();
> > +
> > + if (!strncmp(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN))
> > + return (tld_key_t) {.off = -EEXIST};
> > +
> > + off += sz;
> > + continue;
> > + }
> > +
> > + if (off + size > TLD_DATA_SIZE)
> > + return (tld_key_t) {.off = -E2BIG};
> > +
> > + /*
> > + * Only one tld_create_key() can increase the current cnt by one and
> > + * takes the latest available slot. Other threads will check again if a new
> > + * TLD can still be added, and then compete for the new slot after the
> > + * succeeding thread update the size.
> > + */
> > + if (!__atomic_compare_exchange_n(&tld_metadata_p->cnt, &cnt, cnt + 1, true,
> > + __ATOMIC_RELAXED, __ATOMIC_RELAXED))
> > + goto retry;
> > +
> > + strncpy(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN);
> > + __atomic_store_n(&tld_metadata_p->metadata[i].size, size, __ATOMIC_RELEASE);
> > + return (tld_key_t) {.off = off};
> > + }
> > +
> > + return (tld_key_t) {.off = -ENOSPC};
> > +}
>
> This looks fine to me but I wonder whether run-length encoding the key
> strings would be more efficient and less restrictive in terms of key length.
> e.g.:
>
> struct key {
> u32 data_len;
> u16 key_off;
> u16 key_len;
> };
>
> struct metadata {
> struct key keys[MAX_KEYS];
> char key_strs[SOME_SIZE];
> };
>
> The logic can be mostly the same. The only difference would be that key
> string is not inline. Determine winner in the creation path by compxchg'ing
> on data_len, but set key_off and key_len only after key string is updated.
> Losing on cmpxhcg or seeing an entry where key_len is zero means that that
> one lost and should relax and retry. It can still use the same 4k metadata
> page but will likely be able to allow more keys while also relaxing
> restrictions on key length.
>
> Hmm... maybe making the key string variably sized makes things difficult for
> the BPF code. If so (or for any other reasons), please feel free to ignore
> the above.
I think this is a great suggestion. The current implementation may
waste spaces in metadata if a key does not use all 62 bytes. I don't
see an obvious obstacle in bpf. I will try to incorporate this in the
next respin.
>
> > +#endif /* __TASK_LOCAL_DATA_H */
> > diff --git a/tools/testing/selftests/bpf/progs/task_local_data.bpf.h b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> > new file mode 100644
> > index 000000000000..5f48e408a5e5
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> ...
> > +/**
> > + * tld_get_data() - Retrieves a pointer to the TLD associated with the key.
> > + *
> > + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init()
> > + * @key: The key of a TLD saved in tld_maps
> > + * @size: The size of the TLD. Must be a known constant value
> > + *
> > + * Returns a pointer to the TLD data associated with the key; NULL if the key
> > + * is not valid or the size is too big
> > + */
> > +#define tld_get_data(tld_obj, key, size) \
> > + __tld_get_data(tld_obj, (tld_obj)->key_map->key.off - 1, size)
> > +
> > +__attribute__((unused))
> > +__always_inline void *__tld_get_data(struct tld_object *tld_obj, u32 off, u32 size)
> > +{
> > + return (tld_obj->data_map->data && off >= 0 && off < TLD_DATA_SIZE - size) ?
> > + (void *)tld_obj->data_map->data + off : NULL;
> > +}
>
> Neat.
>
> Generally looks great to me. The only thing I wonder is whether the data
> area sizing can be determined at init time rather than fixed to 4k.
>
I think we can achieve it by first limiting tld_create_key() to the
init phase (i.e., only calling them in C/C++ constructor). Then,
tld_create_key() will not allocate memory for data. Instead, on the
first call to tld_get_data(), we freeze the size of the data area and
allocate the memory just enough or round up to the power of two.
For C, we can define a new macro API (e.g., tld_define_key()) that
generates a __attribute__((constructor)) function that in turn calls
tld_create_key().
> Thanks.
>
> --
> tejun
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-15 21:16 ` [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task " Amery Hung
2025-05-16 18:57 ` Alexei Starovoitov
2025-05-19 20:04 ` Tejun Heo
@ 2025-05-20 22:57 ` Andrii Nakryiko
2025-05-22 17:27 ` Amery Hung
2025-05-22 8:36 ` Tony Ambardar
3 siblings, 1 reply; 17+ messages in thread
From: Andrii Nakryiko @ 2025-05-20 22:57 UTC (permalink / raw)
To: Amery Hung
Cc: bpf, netdev, alexei.starovoitov, andrii, daniel, tj, memxor,
martin.lau, kernel-team
On Thu, May 15, 2025 at 2:16 PM Amery Hung <ameryhung@gmail.com> wrote:
>
> Task local data defines an abstract storage type for storing task-
> specific data (TLD). This patch provides user space and bpf
> implementation as header-only libraries for accessing task local data.
>
> Task local data is a bpf task local storage map with two UPTRs:
> 1) u_tld_metadata, shared by all tasks of the same process, consists of
> the total count of TLDs and an array of metadata of TLDs. A metadata of
> a TLD comprises the size and the name. The name is used to identify a
> specific TLD in bpf 2) data is memory for storing TLDs specific to the
> task.
>
> The following are the basic task local data API:
>
> User space BPF
> Create key tld_create_key() -
> Fetch key - tld_fetch_key()
> Get data tld_get_data() tld_get_data()
>
> A TLD is first created by the user space with tld_create_key(). First,
> it goes through the metadata array to check if the TLD can be added.
> The total TLD size needs to fit into a page (limited by UPTR), and no
> two TLDs can have the same name. It also calculates the offset, the next
> available space in u_tld_data, by summing sizes of TLDs. If the TLD
> can be added, it increases the count using cmpxchg as there may be other
> concurrent tld_create_key(). After a successful cmpxchg, the last
> metadata slot now belongs to the calling thread and will be updated.
> tld_create_key() returns the offset encapsulated as a opaque object key
> to prevent user misuse.
>
> Then user space can pass the key to tld_get_data() to get a pointer
> to the TLD. The pointer will remain valid for the lifetime of the
> thread.
>
> BPF programs also locate TLDs with the keys. This is done by calling
> tld_fetch_key() with the name of the TLD. Similar to tld_create_key(),
> it scans through metadata array, compare the name of TLDs and compute
> the offset. Once found, the offset is also returned as a key, which
> can be passed to the bpf version of tld_get_data() to retrieve a
> pointer to the TLD.
>
> User space task local data library uses a light way approach to ensure
> thread safety (i.e., atomic operation + compiler and memory barriers).
> While a metadata is being updated, other threads may also try to read it.
> To prevent them from seeing incomplete data, metadata::size is used to
> signal the completion of the update, where 0 means the update is still
> ongoing. Threads will wait until seeing a non-zero size to read a
> metadata. Acquire/release order is required for metadata::size to
> prevent hardware reordering. For example, moving store to metadata::name
> after store to metadata::size or moving load from metadata::name before
> load from metadata::size.
>
> Signed-off-by: Amery Hung <ameryhung@gmail.com>
> ---
> .../bpf/prog_tests/task_local_data.h | 263 ++++++++++++++++++
> .../selftests/bpf/progs/task_local_data.bpf.h | 220 +++++++++++++++
> 2 files changed, 483 insertions(+)
> create mode 100644 tools/testing/selftests/bpf/prog_tests/task_local_data.h
> create mode 100644 tools/testing/selftests/bpf/progs/task_local_data.bpf.h
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_data.h b/tools/testing/selftests/bpf/prog_tests/task_local_data.h
> new file mode 100644
> index 000000000000..ec43ea59267c
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/task_local_data.h
> @@ -0,0 +1,263 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __TASK_LOCAL_DATA_H
> +#define __TASK_LOCAL_DATA_H
> +
> +#include <fcntl.h>
> +#include <errno.h>
> +#include <sched.h>
> +#include <stddef.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <sys/syscall.h>
> +#include <sys/types.h>
> +
> +#include <bpf/bpf.h>
> +
> +#ifndef PIDFD_THREAD
> +#define PIDFD_THREAD O_EXCL
> +#endif
> +
> +#define PAGE_SIZE 4096
> +
> +#ifndef __round_mask
> +#define __round_mask(x, y) ((__typeof__(x))((y)-1))
> +#endif
> +#ifndef round_up
> +#define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1)
> +#endif
> +
> +#ifndef READ_ONCE
> +#define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
> +#endif
> +
> +#ifndef WRITE_ONCE
> +#define WRITE_ONCE(x, val) ((*(volatile typeof(x) *)&(x)) = (val))
> +#endif
this is a lot of pollution of user applications with generic names...
consider TLD_ prefixing all of them?
> +
> +#define TLD_DATA_SIZE PAGE_SIZE
> +#define TLD_DATA_CNT 63
> +#define TLD_NAME_LEN 62
> +
> +typedef struct {
> + __s16 off;
> +} tld_key_t;
> +
> +struct tld_metadata {
> + char name[TLD_NAME_LEN];
> + __u16 size;
> +};
> +
> +struct u_tld_metadata {
> + __u8 cnt;
> + __u8 padding[63];
> + struct tld_metadata metadata[TLD_DATA_CNT];
> +};
> +
> +struct u_tld_data {
> + char data[TLD_DATA_SIZE];
> +};
> +
> +struct tld_map_value {
> + struct u_tld_data *data;
> + struct u_tld_metadata *metadata;
> +};
> +
> +struct u_tld_metadata *tld_metadata_p __attribute__((weak));
> +__thread struct u_tld_data *tld_data_p __attribute__((weak));
> +
> +static int __tld_init_metadata(int map_fd)
> +{
> + struct u_tld_metadata *new_metadata;
> + struct tld_map_value map_val;
> + int task_fd = 0, err;
> +
[...]
> +
> + map_val.data = new_data;
> + map_val.metadata = READ_ONCE(tld_metadata_p);
> +
> + err = bpf_map_update_elem(map_fd, &task_fd, &map_val, 0);
> + if (err) {
> + free(new_data);
> + goto out;
> + }
> +
> + tld_data_p = new_data;
> +out:
> + if (task_fd > 0)
task_fd can be zero, so >= 0 and init to -1; same in init_metadata
> + close(task_fd);
> + return err;
> +}
> +
> +/**
> + * tld_create_key() - Create a key associated with a TLD.
> + *
> + * @map_fd: A file descriptor of the underlying task local storage map,
> + * tld_data_map
> + * @name: The name the TLD will be associated with
> + * @size: Size of the TLD
> + *
> + * Returns an opaque object key. Use tld_key_is_err() or tld_key_err_or_zero() to
> + * check if the key creation succeed. Pass to tld_get_data() to get a pointer to
typo: succeeded
> + * the TLD. bpf programs can also fetch the same key by name.
> + */
> +__attribute__((unused))
> +static tld_key_t tld_create_key(int map_fd, const char *name, size_t size)
> +{
> + int err, i, cnt, sz, off = 0;
> +
> + if (!READ_ONCE(tld_metadata_p)) {
> + err = __tld_init_metadata(map_fd);
> + if (err)
> + return (tld_key_t) {.off = err};
> + }
> +
> + if (!tld_data_p) {
> + err = __tld_init_data(map_fd);
> + if (err)
> + return (tld_key_t) {.off = err};
> + }
> +
> + size = round_up(size, 8);
I'd record actual requested size, but internally round up to 8 where
necessary (see below)
> +
> + for (i = 0; i < TLD_DATA_CNT; i++) {
> +retry:
> + cnt = __atomic_load_n(&tld_metadata_p->cnt, __ATOMIC_RELAXED);
> + if (i < cnt) {
> + /*
> + * Pending tld_create_key() uses size to signal if the metadata has
> + * been fully updated.
> + */
> + while (!(sz = __atomic_load_n(&tld_metadata_p->metadata[i].size,
> + __ATOMIC_ACQUIRE)))
> + sched_yield();
> +
> + if (!strncmp(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN))
> + return (tld_key_t) {.off = -EEXIST};
do you check mismatched size for the same key? if not, should it be checked?
but if name and size matches, shouldn't this be a success instead of
-EEXIST error?...
> +
> + off += sz;
you should probably specify alignment guarantees explicitly and round
that up somewhere here, so that if you allocate bool and then u64, u64
is properly 8 byte aligned and internally you know that the size was 1
and 8? With BPF ringbuf we guarantee 8 byte alignment, and so far it
worked out great, so I'd just document 8 and go with that.
> + continue;
> + }
> +
> + if (off + size > TLD_DATA_SIZE)
> + return (tld_key_t) {.off = -E2BIG};
> +
> + /*
> + * Only one tld_create_key() can increase the current cnt by one and
> + * takes the latest available slot. Other threads will check again if a new
> + * TLD can still be added, and then compete for the new slot after the
> + * succeeding thread update the size.
> + */
> + if (!__atomic_compare_exchange_n(&tld_metadata_p->cnt, &cnt, cnt + 1, true,
> + __ATOMIC_RELAXED, __ATOMIC_RELAXED))
> + goto retry;
> +
> + strncpy(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN);
> + __atomic_store_n(&tld_metadata_p->metadata[i].size, size, __ATOMIC_RELEASE);
> + return (tld_key_t) {.off = off};
> + }
> +
> + return (tld_key_t) {.off = -ENOSPC};
[...]
> + * USAGE
> + *
> + * Similar to user space, bpf programs locate a TLD using the same key.
> + * tld_fetch_key() allows bpf programs to retrieve a key created in the user
> + * space by name, which is specified in the second argument of tld_create_key().
> + * tld_fetch_key() additionally will cache the key in a task local storage map,
> + * tld_key_map, to avoid performing costly string comparisons every time when
> + * accessing a TLD. This requires the developer to define the map value type of
> + * tld_key_map, struct tld_keys. It only needs to contain keys used by bpf
> + * programs in the compilation unit.
> + *
> + * struct tld_keys {
> + * tld_key_t prio;
> + * tld_key_t in_cs;
> + * };
> + *
> + * Then, for every new task, a bpf program will fetch and cache keys once and
> + * for all. This should be done ideally in a non-critical path (e.g., in
> + * sched_ext_ops::init_task).
> + *
> + * struct tld_object tld_obj;
> + *
> + * err = tld_object_init(task, &tld);
tld_obj?
> + * if (err)
> + * return 0;
> + *
> + * tld_fetch_key(&tld_obj, "priority", prio);
> + * tld_fetch_key(&tld_obj, "in_critical_section", in_cs);
> + *
[...]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-20 22:57 ` Andrii Nakryiko
@ 2025-05-22 17:27 ` Amery Hung
0 siblings, 0 replies; 17+ messages in thread
From: Amery Hung @ 2025-05-22 17:27 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: bpf, netdev, alexei.starovoitov, andrii, daniel, tj, memxor,
martin.lau, kernel-team
On Tue, May 20, 2025 at 3:58 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, May 15, 2025 at 2:16 PM Amery Hung <ameryhung@gmail.com> wrote:
> >
> > Task local data defines an abstract storage type for storing task-
> > specific data (TLD). This patch provides user space and bpf
> > implementation as header-only libraries for accessing task local data.
> >
> > Task local data is a bpf task local storage map with two UPTRs:
> > 1) u_tld_metadata, shared by all tasks of the same process, consists of
> > the total count of TLDs and an array of metadata of TLDs. A metadata of
> > a TLD comprises the size and the name. The name is used to identify a
> > specific TLD in bpf 2) data is memory for storing TLDs specific to the
> > task.
> >
> > The following are the basic task local data API:
> >
> > User space BPF
> > Create key tld_create_key() -
> > Fetch key - tld_fetch_key()
> > Get data tld_get_data() tld_get_data()
> >
> > A TLD is first created by the user space with tld_create_key(). First,
> > it goes through the metadata array to check if the TLD can be added.
> > The total TLD size needs to fit into a page (limited by UPTR), and no
> > two TLDs can have the same name. It also calculates the offset, the next
> > available space in u_tld_data, by summing sizes of TLDs. If the TLD
> > can be added, it increases the count using cmpxchg as there may be other
> > concurrent tld_create_key(). After a successful cmpxchg, the last
> > metadata slot now belongs to the calling thread and will be updated.
> > tld_create_key() returns the offset encapsulated as a opaque object key
> > to prevent user misuse.
> >
> > Then user space can pass the key to tld_get_data() to get a pointer
> > to the TLD. The pointer will remain valid for the lifetime of the
> > thread.
> >
> > BPF programs also locate TLDs with the keys. This is done by calling
> > tld_fetch_key() with the name of the TLD. Similar to tld_create_key(),
> > it scans through metadata array, compare the name of TLDs and compute
> > the offset. Once found, the offset is also returned as a key, which
> > can be passed to the bpf version of tld_get_data() to retrieve a
> > pointer to the TLD.
> >
> > User space task local data library uses a light way approach to ensure
> > thread safety (i.e., atomic operation + compiler and memory barriers).
> > While a metadata is being updated, other threads may also try to read it.
> > To prevent them from seeing incomplete data, metadata::size is used to
> > signal the completion of the update, where 0 means the update is still
> > ongoing. Threads will wait until seeing a non-zero size to read a
> > metadata. Acquire/release order is required for metadata::size to
> > prevent hardware reordering. For example, moving store to metadata::name
> > after store to metadata::size or moving load from metadata::name before
> > load from metadata::size.
> >
> > Signed-off-by: Amery Hung <ameryhung@gmail.com>
> > ---
> > .../bpf/prog_tests/task_local_data.h | 263 ++++++++++++++++++
> > .../selftests/bpf/progs/task_local_data.bpf.h | 220 +++++++++++++++
> > 2 files changed, 483 insertions(+)
> > create mode 100644 tools/testing/selftests/bpf/prog_tests/task_local_data.h
> > create mode 100644 tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_data.h b/tools/testing/selftests/bpf/prog_tests/task_local_data.h
> > new file mode 100644
> > index 000000000000..ec43ea59267c
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/task_local_data.h
> > @@ -0,0 +1,263 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef __TASK_LOCAL_DATA_H
> > +#define __TASK_LOCAL_DATA_H
> > +
> > +#include <fcntl.h>
> > +#include <errno.h>
> > +#include <sched.h>
> > +#include <stddef.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <unistd.h>
> > +#include <sys/syscall.h>
> > +#include <sys/types.h>
> > +
> > +#include <bpf/bpf.h>
> > +
> > +#ifndef PIDFD_THREAD
> > +#define PIDFD_THREAD O_EXCL
> > +#endif
> > +
> > +#define PAGE_SIZE 4096
> > +
> > +#ifndef __round_mask
> > +#define __round_mask(x, y) ((__typeof__(x))((y)-1))
> > +#endif
> > +#ifndef round_up
> > +#define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1)
> > +#endif
> > +
> > +#ifndef READ_ONCE
> > +#define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
> > +#endif
> > +
> > +#ifndef WRITE_ONCE
> > +#define WRITE_ONCE(x, val) ((*(volatile typeof(x) *)&(x)) = (val))
> > +#endif
>
> this is a lot of pollution of user applications with generic names...
> consider TLD_ prefixing all of them?
>
I will make the name more specific by adding TLD_ prefix.
> > +
> > +#define TLD_DATA_SIZE PAGE_SIZE
> > +#define TLD_DATA_CNT 63
> > +#define TLD_NAME_LEN 62
> > +
> > +typedef struct {
> > + __s16 off;
> > +} tld_key_t;
> > +
> > +struct tld_metadata {
> > + char name[TLD_NAME_LEN];
> > + __u16 size;
> > +};
> > +
> > +struct u_tld_metadata {
> > + __u8 cnt;
> > + __u8 padding[63];
> > + struct tld_metadata metadata[TLD_DATA_CNT];
> > +};
> > +
> > +struct u_tld_data {
> > + char data[TLD_DATA_SIZE];
> > +};
> > +
> > +struct tld_map_value {
> > + struct u_tld_data *data;
> > + struct u_tld_metadata *metadata;
> > +};
> > +
> > +struct u_tld_metadata *tld_metadata_p __attribute__((weak));
> > +__thread struct u_tld_data *tld_data_p __attribute__((weak));
> > +
> > +static int __tld_init_metadata(int map_fd)
> > +{
> > + struct u_tld_metadata *new_metadata;
> > + struct tld_map_value map_val;
> > + int task_fd = 0, err;
> > +
>
> [...]
>
> > +
> > + map_val.data = new_data;
> > + map_val.metadata = READ_ONCE(tld_metadata_p);
> > +
> > + err = bpf_map_update_elem(map_fd, &task_fd, &map_val, 0);
> > + if (err) {
> > + free(new_data);
> > + goto out;
> > + }
> > +
> > + tld_data_p = new_data;
> > +out:
> > + if (task_fd > 0)
>
> task_fd can be zero, so >= 0 and init to -1; same in init_metadata
Yeah. I will fix this in the next respin.
>
> > + close(task_fd);
> > + return err;
> > +}
> > +
> > +/**
> > + * tld_create_key() - Create a key associated with a TLD.
> > + *
> > + * @map_fd: A file descriptor of the underlying task local storage map,
> > + * tld_data_map
> > + * @name: The name the TLD will be associated with
> > + * @size: Size of the TLD
> > + *
> > + * Returns an opaque object key. Use tld_key_is_err() or tld_key_err_or_zero() to
> > + * check if the key creation succeed. Pass to tld_get_data() to get a pointer to
>
> typo: succeeded
Will fix the typo. Thanks
>
> > + * the TLD. bpf programs can also fetch the same key by name.
> > + */
> > +__attribute__((unused))
> > +static tld_key_t tld_create_key(int map_fd, const char *name, size_t size)
> > +{
> > + int err, i, cnt, sz, off = 0;
> > +
> > + if (!READ_ONCE(tld_metadata_p)) {
> > + err = __tld_init_metadata(map_fd);
> > + if (err)
> > + return (tld_key_t) {.off = err};
> > + }
> > +
> > + if (!tld_data_p) {
> > + err = __tld_init_data(map_fd);
> > + if (err)
> > + return (tld_key_t) {.off = err};
> > + }
> > +
> > + size = round_up(size, 8);
>
> I'd record actual requested size, but internally round up to 8 where
> necessary (see below)
>
> > +
> > + for (i = 0; i < TLD_DATA_CNT; i++) {
> > +retry:
> > + cnt = __atomic_load_n(&tld_metadata_p->cnt, __ATOMIC_RELAXED);
> > + if (i < cnt) {
> > + /*
> > + * Pending tld_create_key() uses size to signal if the metadata has
> > + * been fully updated.
> > + */
> > + while (!(sz = __atomic_load_n(&tld_metadata_p->metadata[i].size,
> > + __ATOMIC_ACQUIRE)))
> > + sched_yield();
> > +
> > + if (!strncmp(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN))
> > + return (tld_key_t) {.off = -EEXIST};
>
> do you check mismatched size for the same key? if not, should it be checked?
>
> but if name and size matches, shouldn't this be a success instead of
> -EEXIST error?...
>
I think users should only call tld_create_key() once for a TLD.
Returning an -EEXIST gives us a way to detect conflict. If users knows
that they will be calling tld_create_key() for a TLD in multiple
places, they could still do it by treating -EEXIST as success.
Therefore, no checking mismatching size here.
>
> > +
> > + off += sz;
>
> you should probably specify alignment guarantees explicitly and round
> that up somewhere here, so that if you allocate bool and then u64, u64
> is properly 8 byte aligned and internally you know that the size was 1
> and 8? With BPF ringbuf we guarantee 8 byte alignment, and so far it
> worked out great, so I'd just document 8 and go with that.
>
Thanks for the suggestion. I will document the alignment guarantee.
> > + continue;
> > + }
> > +
> > + if (off + size > TLD_DATA_SIZE)
> > + return (tld_key_t) {.off = -E2BIG};
> > +
> > + /*
> > + * Only one tld_create_key() can increase the current cnt by one and
> > + * takes the latest available slot. Other threads will check again if a new
> > + * TLD can still be added, and then compete for the new slot after the
> > + * succeeding thread update the size.
> > + */
> > + if (!__atomic_compare_exchange_n(&tld_metadata_p->cnt, &cnt, cnt + 1, true,
> > + __ATOMIC_RELAXED, __ATOMIC_RELAXED))
> > + goto retry;
> > +
> > + strncpy(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN);
> > + __atomic_store_n(&tld_metadata_p->metadata[i].size, size, __ATOMIC_RELEASE);
> > + return (tld_key_t) {.off = off};
> > + }
> > +
> > + return (tld_key_t) {.off = -ENOSPC};
>
> [...]
>
> > + * USAGE
> > + *
> > + * Similar to user space, bpf programs locate a TLD using the same key.
> > + * tld_fetch_key() allows bpf programs to retrieve a key created in the user
> > + * space by name, which is specified in the second argument of tld_create_key().
> > + * tld_fetch_key() additionally will cache the key in a task local storage map,
> > + * tld_key_map, to avoid performing costly string comparisons every time when
> > + * accessing a TLD. This requires the developer to define the map value type of
> > + * tld_key_map, struct tld_keys. It only needs to contain keys used by bpf
> > + * programs in the compilation unit.
> > + *
> > + * struct tld_keys {
> > + * tld_key_t prio;
> > + * tld_key_t in_cs;
> > + * };
> > + *
> > + * Then, for every new task, a bpf program will fetch and cache keys once and
> > + * for all. This should be done ideally in a non-critical path (e.g., in
> > + * sched_ext_ops::init_task).
> > + *
> > + * struct tld_object tld_obj;
> > + *
> > + * err = tld_object_init(task, &tld);
>
> tld_obj?
Will fix the typo
>
> > + * if (err)
> > + * return 0;
> > + *
> > + * tld_fetch_key(&tld_obj, "priority", prio);
> > + * tld_fetch_key(&tld_obj, "in_critical_section", in_cs);
> > + *
>
> [...]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-15 21:16 ` [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task " Amery Hung
` (2 preceding siblings ...)
2025-05-20 22:57 ` Andrii Nakryiko
@ 2025-05-22 8:36 ` Tony Ambardar
2025-05-22 16:49 ` Amery Hung
3 siblings, 1 reply; 17+ messages in thread
From: Tony Ambardar @ 2025-05-22 8:36 UTC (permalink / raw)
To: Amery Hung
Cc: bpf, netdev, alexei.starovoitov, andrii, daniel, tj, memxor,
martin.lau, kernel-team
Hi Amery,
I'm trying out your series in an arm32 JIT testing env I'm working on.
On Thu, May 15, 2025 at 02:16:00PM -0700, Amery Hung wrote:
[...]
> diff --git a/tools/testing/selftests/bpf/progs/task_local_data.bpf.h b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> new file mode 100644
> index 000000000000..5f48e408a5e5
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> @@ -0,0 +1,220 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __TASK_LOCAL_DATA_BPF_H
> +#define __TASK_LOCAL_DATA_BPF_H
> +
> +/*
> + * Task local data is a library that facilitates sharing per-task data
> + * between user space and bpf programs.
> + *
> + *
> + * PREREQUISITE
> + *
> + * A TLD, an entry of data in task local data, first needs to be created by the
> + * user space. This is done by calling user space API, tld_create_key(), with
> + * the name of the TLD and the size.
> + *
> + * tld_key_t prio, in_cs;
> + *
> + * prio = tld_create_key("priority", sizeof(int));
> + * in_cs = tld_create_key("in_critical_section", sizeof(bool));
> + *
> + * A key associated with the TLD, which has an opaque type tld_key_t, will be
> + * returned. It can be used to get a pointer to the TLD in the user space by
> + * calling tld_get_data().
> + *
> + *
> + * USAGE
> + *
> + * Similar to user space, bpf programs locate a TLD using the same key.
> + * tld_fetch_key() allows bpf programs to retrieve a key created in the user
> + * space by name, which is specified in the second argument of tld_create_key().
> + * tld_fetch_key() additionally will cache the key in a task local storage map,
> + * tld_key_map, to avoid performing costly string comparisons every time when
> + * accessing a TLD. This requires the developer to define the map value type of
> + * tld_key_map, struct tld_keys. It only needs to contain keys used by bpf
> + * programs in the compilation unit.
> + *
> + * struct tld_keys {
> + * tld_key_t prio;
> + * tld_key_t in_cs;
> + * };
> + *
> + * Then, for every new task, a bpf program will fetch and cache keys once and
> + * for all. This should be done ideally in a non-critical path (e.g., in
> + * sched_ext_ops::init_task).
> + *
> + * struct tld_object tld_obj;
> + *
> + * err = tld_object_init(task, &tld);
> + * if (err)
> + * return 0;
> + *
> + * tld_fetch_key(&tld_obj, "priority", prio);
> + * tld_fetch_key(&tld_obj, "in_critical_section", in_cs);
> + *
> + * Note that, the first argument of tld_fetch_key() is a pointer to tld_object.
> + * It should be declared as a stack variable and initialized via tld_object_init().
> + *
> + * Finally, just like user space programs, bpf programs can get a pointer to a
> + * TLD by calling tld_get_data(), with cached keys.
> + *
> + * int *p;
> + *
> + * p = tld_get_data(&tld_obj, prio, sizeof(int));
> + * if (p)
> + * // do something depending on *p
> + */
> +#include <errno.h>
> +#include <bpf/bpf_helpers.h>
> +
> +#define TLD_DATA_SIZE __PAGE_SIZE
> +#define TLD_DATA_CNT 63
> +#define TLD_NAME_LEN 62
> +
> +typedef struct {
> + __s16 off;
> +} tld_key_t;
> +
> +struct u_tld_data *dummy_data;
> +struct u_tld_metadata *dummy_metadata;
I suspect I've overlooked something, but what are these 2 "dummy" globals
used for? The code builds OK without them, although I do see test errors
as noted below.
I'll also mention the only reason I noticed these is that "bpftool gen
skeleton" automatically maps these to user space, but results in an
ASSERT() failure during build on 32-bit targets due to lack of support,
so dropping them avoids that.
root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -w 0 -a task_local_data
test_task_local_data_basic:PASS:pthread_mutex_init 0 nsec
libbpf: prog 'task_init': BPF program load failed: -EACCES
libbpf: prog 'task_init': -- BEGIN PROG LOAD LOG --
arg#0 reference type('UNKNOWN ') size cannot be determined: -22
0: R1=ctx() R10=fp0
; task = bpf_get_current_task_btf(); @ test_task_local_data.c:31
0: (85) call bpf_get_current_task_btf#158 ; R0_w=trusted_ptr_task_struct()
1: (bf) r6 = r0 ; R0_w=trusted_ptr_task_struct() R6_w=trusted_ptr_task_struct()
; tld_obj->data_map = bpf_task_storage_get(&tld_data_map, task, 0, 0); @ task_local_data.bpf.h:135
2: (18) r1 = 0xc25ddc00 ; R1_w=map_ptr(map=tld_data_map,ks=4,vs=16)
4: (bf) r2 = r6 ; R2_w=trusted_ptr_task_struct() R6_w=trusted_ptr_task_struct()
5: (b7) r3 = 0 ; R3_w=0
6: (b7) r4 = 0 ; R4_w=0
7: (85) call bpf_task_storage_get#156 ; R0=map_value_or_null(id=1,map=tld_data_map,ks=4,vs=16)
8: (b4) w7 = 1 ; R7_w=1
9: (7b) *(u64 *)(r10 -16) = r0 ; R0=map_value_or_null(id=1,map=tld_data_map,ks=4,vs=16) R10=fp0 fp-16_w=map_value_or_null(id=1,map=tld_data_map,ks=4,vs=16)
; if (!tld_obj->data_map) @ task_local_data.bpf.h:136
10: (15) if r0 == 0x0 goto pc+37 ; R0=map_value(map=tld_data_map,ks=4,vs=16)
; tld_obj->key_map = bpf_task_storage_get(&tld_key_map, task, 0, @ task_local_data.bpf.h:139
11: (18) r1 = 0xc3ade000 ; R1_w=map_ptr(map=tld_key_map,ks=4,vs=6)
13: (bf) r2 = r6 ; R2_w=trusted_ptr_task_struct() R6=trusted_ptr_task_struct()
14: (b7) r3 = 0 ; R3_w=0
15: (b7) r4 = 1 ; R4_w=1
16: (85) call bpf_task_storage_get#156 ; R0=map_value_or_null(id=2,map=tld_key_map,ks=4,vs=6)
17: (bf) r6 = r0 ; R0=map_value_or_null(id=2,map=tld_key_map,ks=4,vs=6) R6_w=map_value_or_null(id=2,map=tld_key_map,ks=4,vs=6)
18: (7b) *(u64 *)(r10 -8) = r6 ; R6_w=map_value_or_null(id=2,map=tld_key_map,ks=4,vs=6) R10=fp0 fp-8_w=map_value_or_null(id=2,map=tld_key_map,ks=4,vs=6)
; @ task_local_data.bpf.h:0
19: (15) if r6 == 0x0 goto pc+28 ; R6_w=map_value(map=tld_key_map,ks=4,vs=6)
20: (bf) r1 = r10 ; R1_w=fp0 R10=fp0
; @ test_task_local_data.c:0
21: (07) r1 += -16 ; R1_w=fp-16
; if (!tld_fetch_key(&tld_obj, "value1", value1)) @ test_task_local_data.c:36
22: (18) r2 = 0xc2ddddd0 ; R2_w=map_value(map=.rodata.str1.1,ks=4,vs=30)
24: (85) call pc+25
caller:
R6_w=map_value(map=tld_key_map,ks=4,vs=6) R7=1 R10=fp0 fp-8_w=map_value(map=tld_key_map,ks=4,vs=6) fp-16=map_value(map=tld_data_map,ks=4,vs=16)
callee:
frame1: R1_w=fp[0]-16 R2_w=map_value(map=.rodata.str1.1,ks=4,vs=30) R10=fp0
50: frame1: R1_w=fp[0]-16 R2_w=map_value(map=.rodata.str1.1,ks=4,vs=30) R10=fp0
; static u16 __tld_fetch_key(struct tld_object *tld_obj, const char *name) @ task_local_data.bpf.h:163
50: (7b) *(u64 *)(r10 -16) = r2 ; frame1: R2_w=map_value(map=.rodata.str1.1,ks=4,vs=30) R10=fp0 fp-16_w=map_value(map=.rodata.str1.1,ks=4,vs=30)
51: (b4) w7 = 0 ; frame1: R7_w=0
; if (!tld_obj->data_map || !tld_obj->data_map->metadata) @ task_local_data.bpf.h:169
52: (79) r1 = *(u64 *)(r1 +0) ; frame1: R1=map_value(map=tld_data_map,ks=4,vs=16) fp-16=map_value(map=.rodata.str1.1,ks=4,vs=30)
53: (15) if r1 == 0x0 goto pc+36 ; frame1: R1=map_value(map=tld_data_map,ks=4,vs=16)
54: (79) r6 = *(u64 *)(r1 +8) ; frame1: R1=map_value(map=tld_data_map,ks=4,vs=16) R6_w=scalar()
55: (15) if r6 == 0x0 goto pc+34 ; frame1: R6_w=scalar(umin=1)
; cnt = tld_obj->data_map->metadata->cnt; @ task_local_data.bpf.h:172
56: (71) r8 = *(u8 *)(r6 +0)
R6 invalid mem access 'scalar'
processed 29 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 1
-- END PROG LOAD LOG --
libbpf: prog 'task_init': failed to load: -EACCES
libbpf: failed to load object 'test_task_local_data'
libbpf: failed to load BPF skeleton 'test_task_local_data': -EACCES
test_task_local_data_basic:FAIL:skel_open_and_load unexpected error: -13
#409/1 task_local_data/task_local_data_basic:FAIL
I'm unsure if this verifier error is related to the dummy pointers, but
it does seem there's a pointer issue...
Further thoughts or suggestions (from anyone) would be most welcome.
Thanks,
Tony
> +
> +struct tld_metadata {
> + char name[TLD_NAME_LEN];
> + __u16 size;
> +};
> +
> +struct u_tld_metadata {
> + __u8 cnt;
> + __u8 padding[63];
> + struct tld_metadata metadata[TLD_DATA_CNT];
> +};
> +
> +struct u_tld_data {
> + char data[TLD_DATA_SIZE];
> +};
> +
> +struct tld_map_value {
> + struct u_tld_data __uptr *data;
> + struct u_tld_metadata __uptr *metadata;
> +};
> +
> +struct tld_object {
> + struct tld_map_value *data_map;
> + struct tld_keys *key_map;
> +};
> +
[...]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-22 8:36 ` Tony Ambardar
@ 2025-05-22 16:49 ` Amery Hung
2025-05-23 10:09 ` Tony Ambardar
0 siblings, 1 reply; 17+ messages in thread
From: Amery Hung @ 2025-05-22 16:49 UTC (permalink / raw)
To: Tony Ambardar
Cc: bpf, netdev, alexei.starovoitov, andrii, daniel, tj, memxor,
martin.lau, kernel-team
On Thu, May 22, 2025 at 1:36 AM Tony Ambardar <tony.ambardar@gmail.com> wrote:
>
> Hi Amery,
>
> I'm trying out your series in an arm32 JIT testing env I'm working on.
>
>
> On Thu, May 15, 2025 at 02:16:00PM -0700, Amery Hung wrote:
>
> [...]
>
> > diff --git a/tools/testing/selftests/bpf/progs/task_local_data.bpf.h b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> > new file mode 100644
> > index 000000000000..5f48e408a5e5
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> > @@ -0,0 +1,220 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef __TASK_LOCAL_DATA_BPF_H
> > +#define __TASK_LOCAL_DATA_BPF_H
> > +
> > +/*
> > + * Task local data is a library that facilitates sharing per-task data
> > + * between user space and bpf programs.
> > + *
> > + *
> > + * PREREQUISITE
> > + *
> > + * A TLD, an entry of data in task local data, first needs to be created by the
> > + * user space. This is done by calling user space API, tld_create_key(), with
> > + * the name of the TLD and the size.
> > + *
> > + * tld_key_t prio, in_cs;
> > + *
> > + * prio = tld_create_key("priority", sizeof(int));
> > + * in_cs = tld_create_key("in_critical_section", sizeof(bool));
> > + *
> > + * A key associated with the TLD, which has an opaque type tld_key_t, will be
> > + * returned. It can be used to get a pointer to the TLD in the user space by
> > + * calling tld_get_data().
> > + *
> > + *
> > + * USAGE
> > + *
> > + * Similar to user space, bpf programs locate a TLD using the same key.
> > + * tld_fetch_key() allows bpf programs to retrieve a key created in the user
> > + * space by name, which is specified in the second argument of tld_create_key().
> > + * tld_fetch_key() additionally will cache the key in a task local storage map,
> > + * tld_key_map, to avoid performing costly string comparisons every time when
> > + * accessing a TLD. This requires the developer to define the map value type of
> > + * tld_key_map, struct tld_keys. It only needs to contain keys used by bpf
> > + * programs in the compilation unit.
> > + *
> > + * struct tld_keys {
> > + * tld_key_t prio;
> > + * tld_key_t in_cs;
> > + * };
> > + *
> > + * Then, for every new task, a bpf program will fetch and cache keys once and
> > + * for all. This should be done ideally in a non-critical path (e.g., in
> > + * sched_ext_ops::init_task).
> > + *
> > + * struct tld_object tld_obj;
> > + *
> > + * err = tld_object_init(task, &tld);
> > + * if (err)
> > + * return 0;
> > + *
> > + * tld_fetch_key(&tld_obj, "priority", prio);
> > + * tld_fetch_key(&tld_obj, "in_critical_section", in_cs);
> > + *
> > + * Note that, the first argument of tld_fetch_key() is a pointer to tld_object.
> > + * It should be declared as a stack variable and initialized via tld_object_init().
> > + *
> > + * Finally, just like user space programs, bpf programs can get a pointer to a
> > + * TLD by calling tld_get_data(), with cached keys.
> > + *
> > + * int *p;
> > + *
> > + * p = tld_get_data(&tld_obj, prio, sizeof(int));
> > + * if (p)
> > + * // do something depending on *p
> > + */
> > +#include <errno.h>
> > +#include <bpf/bpf_helpers.h>
> > +
> > +#define TLD_DATA_SIZE __PAGE_SIZE
> > +#define TLD_DATA_CNT 63
> > +#define TLD_NAME_LEN 62
> > +
> > +typedef struct {
> > + __s16 off;
> > +} tld_key_t;
> > +
> > +struct u_tld_data *dummy_data;
> > +struct u_tld_metadata *dummy_metadata;
>
> I suspect I've overlooked something, but what are these 2 "dummy" globals
> used for? The code builds OK without them, although I do see test errors
> as noted below.
>
Hi, sorry for the confusion. The forward declaration is to prevent
dummy_data/metadata tld_map_value to be fwd_kind. I will explain this
in the comment.
The BTF should look like this:
[9] STRUCT 'tld_map_value' size=16 vlen=2
'data' type_id=10 bits_offset=0
'metadata' type_id=11 bits_offset=64
[10] PTR '(anon)' type_id=74
[11] PTR '(anon)' type_id=73
[57] STRUCT 'u_tld_data' size=4096 vlen=1
'data' type_id=58 bits_offset=0
[61] STRUCT 'u_tld_metadata' size=4096 vlen=3
'cnt' type_id=62 bits_offset=0
'padding' type_id=64 bits_offset=8
'metadata' type_id=67 bits_offset=512
[73] TYPE_TAG 'uptr' type_id=61
[74] TYPE_TAG 'uptr' type_id=57
Without the forward declaration, the BTF will look like this:
[9] STRUCT 'tld_map_value' size=16 vlen=2
'data' type_id=10 bits_offset=0
'metadata' type_id=11 bits_offset=64
[10] PTR '(anon)' type_id=63
[11] PTR '(anon)' type_id=61
[60] FWD 'u_tld_metadata' fwd_kind=struct
[61] TYPE_TAG 'uptr' type_id=60
[62] FWD 'u_tld_data' fwd_kind=struct
[63] TYPE_TAG 'uptr' type_id=62
> I'll also mention the only reason I noticed these is that "bpftool gen
> skeleton" automatically maps these to user space, but results in an
> ASSERT() failure during build on 32-bit targets due to lack of support,
> so dropping them avoids that.
Can you provide more details of the error?
>
>
> 24: (85) call pc+25
> caller:
> R6_w=map_value(map=tld_key_map,ks=4,vs=6) R7=1 R10=fp0 fp-8_w=map_value(map=tld_key_map,ks=4,vs=6) fp-16=map_value(map=tld_data_map,ks=4,vs=16)
> callee:
> frame1: R1_w=fp[0]-16 R2_w=map_value(map=.rodata.str1.1,ks=4,vs=30) R10=fp0
> 50: frame1: R1_w=fp[0]-16 R2_w=map_value(map=.rodata.str1.1,ks=4,vs=30) R10=fp0
> ; static u16 __tld_fetch_key(struct tld_object *tld_obj, const char *name) @ task_local_data.bpf.h:163
> 50: (7b) *(u64 *)(r10 -16) = r2 ; frame1: R2_w=map_value(map=.rodata.str1.1,ks=4,vs=30) R10=fp0 fp-16_w=map_value(map=.rodata.str1.1,ks=4,vs=30)
> 51: (b4) w7 = 0 ; frame1: R7_w=0
> ; if (!tld_obj->data_map || !tld_obj->data_map->metadata) @ task_local_data.bpf.h:169
> 52: (79) r1 = *(u64 *)(r1 +0) ; frame1: R1=map_value(map=tld_data_map,ks=4,vs=16) fp-16=map_value(map=.rodata.str1.1,ks=4,vs=30)
> 53: (15) if r1 == 0x0 goto pc+36 ; frame1: R1=map_value(map=tld_data_map,ks=4,vs=16)
> 54: (79) r6 = *(u64 *)(r1 +8) ; frame1: R1=map_value(map=tld_data_map,ks=4,vs=16) R6_w=scalar()
> 55: (15) if r6 == 0x0 goto pc+34 ; frame1: R6_w=scalar(umin=1)
> ; cnt = tld_obj->data_map->metadata->cnt; @ task_local_data.bpf.h:172
> 56: (71) r8 = *(u8 *)(r6 +0)
> R6 invalid mem access 'scalar'
> processed 29 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 1
> -- END PROG LOAD LOG --
> libbpf: prog 'task_init': failed to load: -EACCES
> libbpf: failed to load object 'test_task_local_data'
> libbpf: failed to load BPF skeleton 'test_task_local_data': -EACCES
> test_task_local_data_basic:FAIL:skel_open_and_load unexpected error: -13
> #409/1 task_local_data/task_local_data_basic:FAIL
>
>
> I'm unsure if this verifier error is related to the dummy pointers, but
> it does seem there's a pointer issue...
>
The error is exactly caused by removing the dummy_xxx.
> Further thoughts or suggestions (from anyone) would be most welcome.
>
> Thanks,
> Tony
>
> > +
> > +struct tld_metadata {
> > + char name[TLD_NAME_LEN];
> > + __u16 size;
> > +};
> > +
> > +struct u_tld_metadata {
> > + __u8 cnt;
> > + __u8 padding[63];
> > + struct tld_metadata metadata[TLD_DATA_CNT];
> > +};
> > +
> > +struct u_tld_data {
> > + char data[TLD_DATA_SIZE];
> > +};
> > +
> > +struct tld_map_value {
> > + struct u_tld_data __uptr *data;
> > + struct u_tld_metadata __uptr *metadata;
> > +};
> > +
> > +struct tld_object {
> > + struct tld_map_value *data_map;
> > + struct tld_keys *key_map;
> > +};
> > +
>
> [...]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-22 16:49 ` Amery Hung
@ 2025-05-23 10:09 ` Tony Ambardar
0 siblings, 0 replies; 17+ messages in thread
From: Tony Ambardar @ 2025-05-23 10:09 UTC (permalink / raw)
To: Amery Hung
Cc: bpf, netdev, alexei.starovoitov, andrii, daniel, tj, memxor,
martin.lau, kernel-team
On Thu, May 22, 2025 at 09:49:23AM -0700, Amery Hung wrote:
> On Thu, May 22, 2025 at 1:36 AM Tony Ambardar <tony.ambardar@gmail.com> wrote:
> >
> > Hi Amery,
> >
> > I'm trying out your series in an arm32 JIT testing env I'm working on.
> >
> >
> > On Thu, May 15, 2025 at 02:16:00PM -0700, Amery Hung wrote:
> >
> > > +
[...]
> > > +struct u_tld_data *dummy_data;
> > > +struct u_tld_metadata *dummy_metadata;
> >
> > I suspect I've overlooked something, but what are these 2 "dummy" globals
> > used for? The code builds OK without them, although I do see test errors
> > as noted below.
> >
>
> Hi, sorry for the confusion. The forward declaration is to prevent
> dummy_data/metadata tld_map_value to be fwd_kind. I will explain this
> in the comment.
>
> The BTF should look like this:
>
> [9] STRUCT 'tld_map_value' size=16 vlen=2
> 'data' type_id=10 bits_offset=0
> 'metadata' type_id=11 bits_offset=64
> [10] PTR '(anon)' type_id=74
> [11] PTR '(anon)' type_id=73
> [57] STRUCT 'u_tld_data' size=4096 vlen=1
> 'data' type_id=58 bits_offset=0
> [61] STRUCT 'u_tld_metadata' size=4096 vlen=3
> 'cnt' type_id=62 bits_offset=0
> 'padding' type_id=64 bits_offset=8
> 'metadata' type_id=67 bits_offset=512
> [73] TYPE_TAG 'uptr' type_id=61
> [74] TYPE_TAG 'uptr' type_id=57
>
> Without the forward declaration, the BTF will look like this:
>
> [9] STRUCT 'tld_map_value' size=16 vlen=2
> 'data' type_id=10 bits_offset=0
> 'metadata' type_id=11 bits_offset=64
> [10] PTR '(anon)' type_id=63
> [11] PTR '(anon)' type_id=61
> [60] FWD 'u_tld_metadata' fwd_kind=struct
> [61] TYPE_TAG 'uptr' type_id=60
> [62] FWD 'u_tld_data' fwd_kind=struct
> [63] TYPE_TAG 'uptr' type_id=62
>
Oh, I see. Yes, having some commentary/links would be helpful since this
makes for some tricky debugging. Was there ever any discussion of
alternatives to using the forward decl?
I'm afraid I missed the 'uptr' tag introduction and need to understand
the background.
> > I'll also mention the only reason I noticed these is that "bpftool gen
> > skeleton" automatically maps these to user space, but results in an
> > ASSERT() failure during build on 32-bit targets due to lack of support,
> > so dropping them avoids that.
>
> Can you provide more details of the error?
>
bpftool skeleton includes checks when mapping global vars to user space.
These fail on 32-bit archs for types whose sizes differ from the BPF VM
(i.e. longs, pointers):
In file included from .../tools/testing/selftests/bpf/prog_tests/test_task_local_data.c:13:
./test_task_local_data.skel.h: In function ‘test_task_local_data__assert’:
./test_task_local_data.skel.h:531:9: error: static assertion failed: "unexpected size of \'dummy_data\'"
531 | _Static_assert(sizeof(s->bss->dummy_data) == 8, "unexpected size of 'dummy_data'");
| ^~~~~~~~~~~~~~
./test_task_local_data.skel.h:532:9: error: static assertion failed: "unexpected size of \'dummy_metadata\'"
532 | _Static_assert(sizeof(s->bss->dummy_metadata) == 8, "unexpected size of 'dummy_metadata'");
| ^~~~~~~~~~~~~~
On a whim, I tried making the dummy_xxxxxxx vars static but this still
leaves the fwd_kind and leads to verifier rejection.
> >
> >
> > 24: (85) call pc+25
> > caller:
> > R6_w=map_value(map=tld_key_map,ks=4,vs=6) R7=1 R10=fp0 fp-8_w=map_value(map=tld_key_map,ks=4,vs=6) fp-16=map_value(map=tld_data_map,ks=4,vs=16)
> > callee:
> > frame1: R1_w=fp[0]-16 R2_w=map_value(map=.rodata.str1.1,ks=4,vs=30) R10=fp0
> > 50: frame1: R1_w=fp[0]-16 R2_w=map_value(map=.rodata.str1.1,ks=4,vs=30) R10=fp0
> > ; static u16 __tld_fetch_key(struct tld_object *tld_obj, const char *name) @ task_local_data.bpf.h:163
> > 50: (7b) *(u64 *)(r10 -16) = r2 ; frame1: R2_w=map_value(map=.rodata.str1.1,ks=4,vs=30) R10=fp0 fp-16_w=map_value(map=.rodata.str1.1,ks=4,vs=30)
> > 51: (b4) w7 = 0 ; frame1: R7_w=0
> > ; if (!tld_obj->data_map || !tld_obj->data_map->metadata) @ task_local_data.bpf.h:169
> > 52: (79) r1 = *(u64 *)(r1 +0) ; frame1: R1=map_value(map=tld_data_map,ks=4,vs=16) fp-16=map_value(map=.rodata.str1.1,ks=4,vs=30)
> > 53: (15) if r1 == 0x0 goto pc+36 ; frame1: R1=map_value(map=tld_data_map,ks=4,vs=16)
> > 54: (79) r6 = *(u64 *)(r1 +8) ; frame1: R1=map_value(map=tld_data_map,ks=4,vs=16) R6_w=scalar()
> > 55: (15) if r6 == 0x0 goto pc+34 ; frame1: R6_w=scalar(umin=1)
> > ; cnt = tld_obj->data_map->metadata->cnt; @ task_local_data.bpf.h:172
> > 56: (71) r8 = *(u8 *)(r6 +0)
> > R6 invalid mem access 'scalar'
> > processed 29 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 1
> > -- END PROG LOAD LOG --
> > libbpf: prog 'task_init': failed to load: -EACCES
> > libbpf: failed to load object 'test_task_local_data'
> > libbpf: failed to load BPF skeleton 'test_task_local_data': -EACCES
> > test_task_local_data_basic:FAIL:skel_open_and_load unexpected error: -13
> > #409/1 task_local_data/task_local_data_basic:FAIL
> >
> >
> > I'm unsure if this verifier error is related to the dummy pointers, but
> > it does seem there's a pointer issue...
> >
>
> The error is exactly caused by removing the dummy_xxx.
>
> > Further thoughts or suggestions (from anyone) would be most welcome.
> >
> > Thanks,
> > Tony
> >
> > > +
> > > +struct tld_metadata {
> > > + char name[TLD_NAME_LEN];
> > > + __u16 size;
> > > +};
> > > +
> > > +struct u_tld_metadata {
> > > + __u8 cnt;
> > > + __u8 padding[63];
> > > + struct tld_metadata metadata[TLD_DATA_CNT];
> > > +};
> > > +
> > > +struct u_tld_data {
> > > + char data[TLD_DATA_SIZE];
> > > +};
> > > +
> > > +struct tld_map_value {
> > > + struct u_tld_data __uptr *data;
> > > + struct u_tld_metadata __uptr *metadata;
> > > +};
> > > +
> > > +struct tld_object {
> > > + struct tld_map_value *data_map;
> > > + struct tld_keys *key_map;
> > > +};
> > > +
> >
> > [...]
^ permalink raw reply [flat|nested] 17+ messages in thread