* [PATCH bpf-next v4 0/3] Task local data
@ 2025-05-15 21:15 Amery Hung
2025-05-15 21:16 ` [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task " Amery Hung
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Amery Hung @ 2025-05-15 21:15 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, tj, memxor,
martin.lau, ameryhung, kernel-team
* Overview *
Task local data defines an abstract storage type for storing data specific
to each task and provides user space and bpf libraries to access it. The
result is a fast and easy way to share per-task data between user space
and bpf programs. The intended use case is sched_ext, where user space
programs will pass hints to sched_ext bpf programs to affect task
scheduling.
Task local data is built on top of task local storage map and UPTR[0]
to achieve fast per-task data sharing. UPTR is a type of special field
supported in task local storage map value. A user page assigned to a UPTR
will be pinned by the kernel when the map is updated. Therefore, user
space programs can update data seen by bpf programs without syscalls.
Additionally, unlike most bpf maps, task local data does not require a
static map value definition. This design is driven by sched_ext, which
would like to allow multiple developers to share a storage without the
need to explicitly agree on the layout of it. While a centralized layout
definition would have worked, the friction of synchronizing it across
different repos is not desirable. This simplify code base management and
makes experimenting easier.
In the rest of the cover letter, "task local data" is used to refer to
the abstract storage and TLD is used to denote a single data entry in
the storage.
* Design *
Task local data library provides simple APIs for user space and bpf
through two header files, task_local_data.h and task_loca_data.bpf.h,
respectively. The usage is illustrated in the following diagram.
An entry of data in the task local data, TLD, first needs to be created
in the user space by calling tld_create_key() with the size of the data
and a name associated with the data. The function returns an opaque key
object of tld_key_t type, which can be used to locate a TLD. The same
key may be passed to tld_get_data() in different threads, and a pointer
to data specific to the calling thread will be returned. The pointer will
remain valid until the process terminates, so there is not need to call
tld_get_data() in subsequent accesses.
On the bpf side, programs will also use keys to locate TLDs. For every
new task, a bpf program must first fetch the keys and save them for later
uses. This is done by calling tld_fetch_key() with names specified in
tld_create_key(). The key will be saved in a task local storage map,
tld_key_map. The map value type, struct tld_keys, __must__ be defined by
developers. It should contain keys used in the compilation unit. Finally,
bpf programs can call tld_get_data() to get a pointer to a TLD that is
shared with user space.
┌─ Application ───────────────────────────────────────────────────────┐
│ tld_key_t kx = tld_create_key(fd, "X", sizeof(int)); │
│ ... ┌─ library A ────────────────────────┐│
│ int *x = tld_get_data(fd, kx);│ ky = tld_create_key(fd, "Y", ││
│ if (x) *x = 123; │ sizeof(bool)); ││
│ │ bool *y = tld_get_data(ky); ││
│ ┌─────┤ if (y) *y = true; ││
│ │ └────────────────────────────────────┘│
└───────┬─────────────────│───────────────────────────────────────────┘
V V
+ ─ Task local data ─ ─ ─ ─ ─ + ┌─ sched_ext_ops::init_task ────────┐
| ┌─ tld_data_map ──────────┐ | │ tld_init_object(task, &tld_obj); │
| │ BPF Task local storage │ | │ tld_fetch_key(&tld_obj, "X", kx); │
| │ │ |<─┤ tld_fetch_key(&tld_obj, "Y", ky); │
| │ data_page __uptr *data │ | └───────────────────────────────────┘
| │ metadata_page __uptr *metadata
| └─────────────────────────┘ | ┌─ Other sched_ext_ops op ──────────┐
| ┌─ tld_key_map ───────────┐ | │ tld_init_object(task, &tld_obj); │
| │ BPF Task local storage │ | │ bool *y = tld_get_data(&tld_obj, ├┐
| │ │ |<─┤ ky, 1); ││
| │ tld_key_t kx; │ | │ if (y) ││
| │ tld_key_t ky; │ | │ /* do something */ ││
| └─────────────────────────┘ | └┬──────────────────────────────────┘│
+ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ + └───────────────────────────────────┘
* Implementation *
Task local data defines the storage to be a task local storage map with
two UPTRs, data and metadata. Data points to a blob of memory for storing
TLDs individual to every task. Metadata, individual to each process and
shared by its threads, records the number of TLDs declared and the
metadata of each TLD. Metadata for a TLD contains the key name and the
size of the TLD in data.
struct data_page {
char data[PAGE_SIZE];
};
struct metadata_page {
u8 cnt;
struct metadata data[TLD_DATA_CNT];
};
Both user space and bpf API follow the same protocol when accessing
task local data. A pointer to a TLD is located by a key. tld_key_t
effectively is the offset of a TLD in data. To add a TLD, user space
API, tld_create_key(), loops through metadata->data until an empty slot
is found and update it. It also adds sizes of prior TLDs along the way
to derive the offset. To fetch a key, bpf API, tld_fetch_key(), also
loops through metadata->data until the key name is found. The offset is
also derived by adding sizes. The detail of task local data operations
can be found in patch 1.
[0] https://lore.kernel.org/bpf/20241023234759.860539-1-martin.lau@linux.dev/
v3 -> v4
- API improvements
- Simplify API
- Drop string obfuscation
- Use opaque type for key
- Better documentation
- Implementation
- Switch to dynamic allocation for per-task data
- Now offer as header-only libraries
- No TLS map pinning; leave it to users
- Drop pthread dependency
- Add more invalid tld_create_key() test
- Add a race test for tld_create_key()
v3: https://lore.kernel.org/bpf/20250425214039.2919818-1-ameryhung@gmail.com/
Amery Hung (3):
selftests/bpf: Introduce task local data
selftests/bpf: Test basic task local data operations
selftests/bpf: Test concurrent task local data key creation
.../bpf/prog_tests/task_local_data.h | 263 ++++++++++++++++++
.../bpf/prog_tests/test_task_local_data.c | 254 +++++++++++++++++
.../selftests/bpf/progs/task_local_data.bpf.h | 220 +++++++++++++++
.../bpf/progs/test_task_local_data.c | 81 ++++++
4 files changed, 818 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/task_local_data.h
create mode 100644 tools/testing/selftests/bpf/prog_tests/test_task_local_data.c
create mode 100644 tools/testing/selftests/bpf/progs/task_local_data.bpf.h
create mode 100644 tools/testing/selftests/bpf/progs/test_task_local_data.c
--
2.47.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-15 21:15 [PATCH bpf-next v4 0/3] Task local data Amery Hung
@ 2025-05-15 21:16 ` Amery Hung
2025-05-16 18:57 ` Alexei Starovoitov
` (3 more replies)
2025-05-15 21:16 ` [PATCH bpf-next v4 2/3] selftests/bpf: Test basic task local data operations Amery Hung
2025-05-15 21:16 ` [PATCH bpf-next v4 3/3] selftests/bpf: Test concurrent task local data key creation Amery Hung
2 siblings, 4 replies; 17+ messages in thread
From: Amery Hung @ 2025-05-15 21:16 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, tj, memxor,
martin.lau, ameryhung, kernel-team
Task local data defines an abstract storage type for storing task-
specific data (TLD). This patch provides user space and bpf
implementation as header-only libraries for accessing task local data.
Task local data is a bpf task local storage map with two UPTRs:
1) u_tld_metadata, shared by all tasks of the same process, consists of
the total count of TLDs and an array of metadata of TLDs. A metadata of
a TLD comprises the size and the name. The name is used to identify a
specific TLD in bpf 2) data is memory for storing TLDs specific to the
task.
The following are the basic task local data API:
User space BPF
Create key tld_create_key() -
Fetch key - tld_fetch_key()
Get data tld_get_data() tld_get_data()
A TLD is first created by the user space with tld_create_key(). First,
it goes through the metadata array to check if the TLD can be added.
The total TLD size needs to fit into a page (limited by UPTR), and no
two TLDs can have the same name. It also calculates the offset, the next
available space in u_tld_data, by summing sizes of TLDs. If the TLD
can be added, it increases the count using cmpxchg as there may be other
concurrent tld_create_key(). After a successful cmpxchg, the last
metadata slot now belongs to the calling thread and will be updated.
tld_create_key() returns the offset encapsulated as a opaque object key
to prevent user misuse.
Then user space can pass the key to tld_get_data() to get a pointer
to the TLD. The pointer will remain valid for the lifetime of the
thread.
BPF programs also locate TLDs with the keys. This is done by calling
tld_fetch_key() with the name of the TLD. Similar to tld_create_key(),
it scans through metadata array, compare the name of TLDs and compute
the offset. Once found, the offset is also returned as a key, which
can be passed to the bpf version of tld_get_data() to retrieve a
pointer to the TLD.
User space task local data library uses a light way approach to ensure
thread safety (i.e., atomic operation + compiler and memory barriers).
While a metadata is being updated, other threads may also try to read it.
To prevent them from seeing incomplete data, metadata::size is used to
signal the completion of the update, where 0 means the update is still
ongoing. Threads will wait until seeing a non-zero size to read a
metadata. Acquire/release order is required for metadata::size to
prevent hardware reordering. For example, moving store to metadata::name
after store to metadata::size or moving load from metadata::name before
load from metadata::size.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
.../bpf/prog_tests/task_local_data.h | 263 ++++++++++++++++++
.../selftests/bpf/progs/task_local_data.bpf.h | 220 +++++++++++++++
2 files changed, 483 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/task_local_data.h
create mode 100644 tools/testing/selftests/bpf/progs/task_local_data.bpf.h
diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_data.h b/tools/testing/selftests/bpf/prog_tests/task_local_data.h
new file mode 100644
index 000000000000..ec43ea59267c
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/task_local_data.h
@@ -0,0 +1,263 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __TASK_LOCAL_DATA_H
+#define __TASK_LOCAL_DATA_H
+
+#include <fcntl.h>
+#include <errno.h>
+#include <sched.h>
+#include <stddef.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+
+#include <bpf/bpf.h>
+
+#ifndef PIDFD_THREAD
+#define PIDFD_THREAD O_EXCL
+#endif
+
+#define PAGE_SIZE 4096
+
+#ifndef __round_mask
+#define __round_mask(x, y) ((__typeof__(x))((y)-1))
+#endif
+#ifndef round_up
+#define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1)
+#endif
+
+#ifndef READ_ONCE
+#define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
+#endif
+
+#ifndef WRITE_ONCE
+#define WRITE_ONCE(x, val) ((*(volatile typeof(x) *)&(x)) = (val))
+#endif
+
+#define TLD_DATA_SIZE PAGE_SIZE
+#define TLD_DATA_CNT 63
+#define TLD_NAME_LEN 62
+
+typedef struct {
+ __s16 off;
+} tld_key_t;
+
+struct tld_metadata {
+ char name[TLD_NAME_LEN];
+ __u16 size;
+};
+
+struct u_tld_metadata {
+ __u8 cnt;
+ __u8 padding[63];
+ struct tld_metadata metadata[TLD_DATA_CNT];
+};
+
+struct u_tld_data {
+ char data[TLD_DATA_SIZE];
+};
+
+struct tld_map_value {
+ struct u_tld_data *data;
+ struct u_tld_metadata *metadata;
+};
+
+struct u_tld_metadata *tld_metadata_p __attribute__((weak));
+__thread struct u_tld_data *tld_data_p __attribute__((weak));
+
+static int __tld_init_metadata(int map_fd)
+{
+ struct u_tld_metadata *new_metadata;
+ struct tld_map_value map_val;
+ int task_fd = 0, err;
+
+ task_fd = syscall(SYS_pidfd_open, getpid(), 0);
+ if (task_fd < 0) {
+ err = -errno;
+ goto out;
+ }
+
+ new_metadata = aligned_alloc(PAGE_SIZE, PAGE_SIZE);
+ if (!new_metadata) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ memset(new_metadata, 0, PAGE_SIZE);
+
+ map_val.data = NULL;
+ map_val.metadata = new_metadata;
+
+ /*
+ * bpf_map_update_elem(.., pid_fd,..., BPF_NOEXIST) guarantees that
+ * only one tld_create_key() can update tld_metadata_p.
+ */
+ err = bpf_map_update_elem(map_fd, &task_fd, &map_val, BPF_NOEXIST);
+ if (err) {
+ free(new_metadata);
+ if (err == -EEXIST || err == -EAGAIN)
+ err = 0;
+ goto out;
+ }
+
+ WRITE_ONCE(tld_metadata_p, new_metadata);
+out:
+ if (task_fd > 0)
+ close(task_fd);
+ return err;
+}
+
+static int __tld_init_data(int map_fd)
+{
+ struct u_tld_data *new_data = NULL;
+ struct tld_map_value map_val;
+ int err, task_fd = 0;
+
+ task_fd = syscall(SYS_pidfd_open, gettid(), PIDFD_THREAD);
+ if (task_fd < 0) {
+ err = -errno;
+ goto out;
+ }
+
+ new_data = aligned_alloc(PAGE_SIZE, TLD_DATA_SIZE);
+ if (!new_data) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ map_val.data = new_data;
+ map_val.metadata = READ_ONCE(tld_metadata_p);
+
+ err = bpf_map_update_elem(map_fd, &task_fd, &map_val, 0);
+ if (err) {
+ free(new_data);
+ goto out;
+ }
+
+ tld_data_p = new_data;
+out:
+ if (task_fd > 0)
+ close(task_fd);
+ return err;
+}
+
+/**
+ * tld_create_key() - Create a key associated with a TLD.
+ *
+ * @map_fd: A file descriptor of the underlying task local storage map,
+ * tld_data_map
+ * @name: The name the TLD will be associated with
+ * @size: Size of the TLD
+ *
+ * Returns an opaque object key. Use tld_key_is_err() or tld_key_err_or_zero() to
+ * check if the key creation succeed. Pass to tld_get_data() to get a pointer to
+ * the TLD. bpf programs can also fetch the same key by name.
+ */
+__attribute__((unused))
+static tld_key_t tld_create_key(int map_fd, const char *name, size_t size)
+{
+ int err, i, cnt, sz, off = 0;
+
+ if (!READ_ONCE(tld_metadata_p)) {
+ err = __tld_init_metadata(map_fd);
+ if (err)
+ return (tld_key_t) {.off = err};
+ }
+
+ if (!tld_data_p) {
+ err = __tld_init_data(map_fd);
+ if (err)
+ return (tld_key_t) {.off = err};
+ }
+
+ size = round_up(size, 8);
+
+ for (i = 0; i < TLD_DATA_CNT; i++) {
+retry:
+ cnt = __atomic_load_n(&tld_metadata_p->cnt, __ATOMIC_RELAXED);
+ if (i < cnt) {
+ /*
+ * Pending tld_create_key() uses size to signal if the metadata has
+ * been fully updated.
+ */
+ while (!(sz = __atomic_load_n(&tld_metadata_p->metadata[i].size,
+ __ATOMIC_ACQUIRE)))
+ sched_yield();
+
+ if (!strncmp(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN))
+ return (tld_key_t) {.off = -EEXIST};
+
+ off += sz;
+ continue;
+ }
+
+ if (off + size > TLD_DATA_SIZE)
+ return (tld_key_t) {.off = -E2BIG};
+
+ /*
+ * Only one tld_create_key() can increase the current cnt by one and
+ * takes the latest available slot. Other threads will check again if a new
+ * TLD can still be added, and then compete for the new slot after the
+ * succeeding thread update the size.
+ */
+ if (!__atomic_compare_exchange_n(&tld_metadata_p->cnt, &cnt, cnt + 1, true,
+ __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+ goto retry;
+
+ strncpy(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN);
+ __atomic_store_n(&tld_metadata_p->metadata[i].size, size, __ATOMIC_RELEASE);
+ return (tld_key_t) {.off = off};
+ }
+
+ return (tld_key_t) {.off = -ENOSPC};
+}
+
+__attribute__((unused))
+static inline bool tld_key_is_err(tld_key_t key)
+{
+ return key.off < 0;
+}
+
+__attribute__((unused))
+static inline int tld_key_err_or_zero(tld_key_t key)
+{
+ return tld_key_is_err(key) ? key.off : 0;
+}
+
+/**
+ * tld_get_data() - Gets a pointer to the TLD associated with the key.
+ *
+ * @map_fd: A file descriptor of the underlying task local storage map,
+ * tld_data_map
+ * @key: A key object returned by tld_create_key().
+ *
+ * Returns a pointer to the TLD if the key is valid; NULL if no key has been
+ * added, not enough memory for TLD for this thread, or the key is invalid.
+ *
+ * Threads that call tld_get_data() must call tld_free() on exit to prevent
+ * memory leak.
+ */
+__attribute__((unused))
+static void *tld_get_data(int map_fd, tld_key_t key)
+{
+ if (!READ_ONCE(tld_metadata_p))
+ return NULL;
+
+ if (!tld_data_p && __tld_init_data(map_fd))
+ return NULL;
+
+ return tld_data_p->data + key.off;
+}
+
+/**
+ * tld_free() - Frees task local data memory of the calling thread
+ */
+__attribute__((unused))
+static void tld_free(void)
+{
+ if (tld_data_p)
+ free(tld_data_p);
+}
+
+#endif /* __TASK_LOCAL_DATA_H */
diff --git a/tools/testing/selftests/bpf/progs/task_local_data.bpf.h b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
new file mode 100644
index 000000000000..5f48e408a5e5
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
@@ -0,0 +1,220 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __TASK_LOCAL_DATA_BPF_H
+#define __TASK_LOCAL_DATA_BPF_H
+
+/*
+ * Task local data is a library that facilitates sharing per-task data
+ * between user space and bpf programs.
+ *
+ *
+ * PREREQUISITE
+ *
+ * A TLD, an entry of data in task local data, first needs to be created by the
+ * user space. This is done by calling user space API, tld_create_key(), with
+ * the name of the TLD and the size.
+ *
+ * tld_key_t prio, in_cs;
+ *
+ * prio = tld_create_key("priority", sizeof(int));
+ * in_cs = tld_create_key("in_critical_section", sizeof(bool));
+ *
+ * A key associated with the TLD, which has an opaque type tld_key_t, will be
+ * returned. It can be used to get a pointer to the TLD in the user space by
+ * calling tld_get_data().
+ *
+ *
+ * USAGE
+ *
+ * Similar to user space, bpf programs locate a TLD using the same key.
+ * tld_fetch_key() allows bpf programs to retrieve a key created in the user
+ * space by name, which is specified in the second argument of tld_create_key().
+ * tld_fetch_key() additionally will cache the key in a task local storage map,
+ * tld_key_map, to avoid performing costly string comparisons every time when
+ * accessing a TLD. This requires the developer to define the map value type of
+ * tld_key_map, struct tld_keys. It only needs to contain keys used by bpf
+ * programs in the compilation unit.
+ *
+ * struct tld_keys {
+ * tld_key_t prio;
+ * tld_key_t in_cs;
+ * };
+ *
+ * Then, for every new task, a bpf program will fetch and cache keys once and
+ * for all. This should be done ideally in a non-critical path (e.g., in
+ * sched_ext_ops::init_task).
+ *
+ * struct tld_object tld_obj;
+ *
+ * err = tld_object_init(task, &tld);
+ * if (err)
+ * return 0;
+ *
+ * tld_fetch_key(&tld_obj, "priority", prio);
+ * tld_fetch_key(&tld_obj, "in_critical_section", in_cs);
+ *
+ * Note that, the first argument of tld_fetch_key() is a pointer to tld_object.
+ * It should be declared as a stack variable and initialized via tld_object_init().
+ *
+ * Finally, just like user space programs, bpf programs can get a pointer to a
+ * TLD by calling tld_get_data(), with cached keys.
+ *
+ * int *p;
+ *
+ * p = tld_get_data(&tld_obj, prio, sizeof(int));
+ * if (p)
+ * // do something depending on *p
+ */
+#include <errno.h>
+#include <bpf/bpf_helpers.h>
+
+#define TLD_DATA_SIZE __PAGE_SIZE
+#define TLD_DATA_CNT 63
+#define TLD_NAME_LEN 62
+
+typedef struct {
+ __s16 off;
+} tld_key_t;
+
+struct u_tld_data *dummy_data;
+struct u_tld_metadata *dummy_metadata;
+
+struct tld_metadata {
+ char name[TLD_NAME_LEN];
+ __u16 size;
+};
+
+struct u_tld_metadata {
+ __u8 cnt;
+ __u8 padding[63];
+ struct tld_metadata metadata[TLD_DATA_CNT];
+};
+
+struct u_tld_data {
+ char data[TLD_DATA_SIZE];
+};
+
+struct tld_map_value {
+ struct u_tld_data __uptr *data;
+ struct u_tld_metadata __uptr *metadata;
+};
+
+struct tld_object {
+ struct tld_map_value *data_map;
+ struct tld_keys *key_map;
+};
+
+/*
+ * Map value of tld_key_map for caching keys. Must be defined by the developer.
+ * Members should be tld_key_t and passed to the 3rd argument of tld_fetch_key().
+ */
+struct tld_keys;
+
+struct {
+ __uint(type, BPF_MAP_TYPE_TASK_STORAGE);
+ __uint(map_flags, BPF_F_NO_PREALLOC);
+ __type(key, int);
+ __type(value, struct tld_map_value);
+} tld_data_map SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_TASK_STORAGE);
+ __uint(map_flags, BPF_F_NO_PREALLOC);
+ __type(key, int);
+ __type(value, struct tld_keys);
+} tld_key_map SEC(".maps");
+
+/**
+ * tld_object_init() - Initializes a tld_object.
+ *
+ * @task: The task_struct of the target task
+ * @tld_obj: A pointer to a tld_object to be initialized
+ *
+ * Returns 0 on success; -ENODATA if the task has no TLD; -ENOMEM if the creation
+ * of tld_key_map fails
+ */
+__attribute__((unused))
+static int tld_object_init(struct task_struct *task, struct tld_object *tld_obj)
+{
+ tld_obj->data_map = bpf_task_storage_get(&tld_data_map, task, 0, 0);
+ if (!tld_obj->data_map)
+ return -ENODATA;
+
+ tld_obj->key_map = bpf_task_storage_get(&tld_key_map, task, 0,
+ BPF_LOCAL_STORAGE_GET_F_CREATE);
+ if (!tld_obj->key_map)
+ return -ENOMEM;
+
+ return 0;
+}
+
+/**
+ * tld_fetch_key() - Fetches the key to a TLD by name. The key has to be created
+ * by user space first with tld_create_key().
+ *
+ * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init()
+ * @name: The name of the key associated with a TLD
+ * @key: The key in struct tld_keys to be saved to
+ *
+ * Returns a positive integer on success; 0 otherwise
+ */
+#define tld_fetch_key(tld_obj, name, key) \
+ ({ \
+ (tld_obj)->key_map->key.off = __tld_fetch_key(tld_obj, name); \
+ })
+
+__attribute__((unused))
+static u16 __tld_fetch_key(struct tld_object *tld_obj, const char *name)
+{
+ int i, meta_off, cnt;
+ void *metadata, *nm, *sz;
+ u16 off = 0;
+
+ if (!tld_obj->data_map || !tld_obj->data_map->metadata)
+ return 0;
+
+ cnt = tld_obj->data_map->metadata->cnt;
+ metadata = tld_obj->data_map->metadata->metadata;
+
+ bpf_for(i, 0, cnt) {
+ meta_off = i * sizeof(struct tld_metadata);
+ if (meta_off > TLD_DATA_SIZE - offsetof(struct u_tld_metadata, metadata)
+ - sizeof(struct tld_metadata))
+ break;
+
+ nm = metadata + meta_off + offsetof(struct tld_metadata, name);
+ sz = metadata + meta_off + offsetof(struct tld_metadata, size);
+
+ /*
+ * Reserve 0 for uninitialized keys. Increase the offset of a valid key
+ * by one and subtract it later in tld_get_data().
+ */
+ if (!bpf_strncmp(nm, TLD_NAME_LEN, name))
+ return off + 1;
+
+ off += *(u16 *)sz;
+ }
+
+ return 0;
+}
+
+/**
+ * tld_get_data() - Retrieves a pointer to the TLD associated with the key.
+ *
+ * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init()
+ * @key: The key of a TLD saved in tld_maps
+ * @size: The size of the TLD. Must be a known constant value
+ *
+ * Returns a pointer to the TLD data associated with the key; NULL if the key
+ * is not valid or the size is too big
+ */
+#define tld_get_data(tld_obj, key, size) \
+ __tld_get_data(tld_obj, (tld_obj)->key_map->key.off - 1, size)
+
+__attribute__((unused))
+__always_inline void *__tld_get_data(struct tld_object *tld_obj, u32 off, u32 size)
+{
+ return (tld_obj->data_map->data && off >= 0 && off < TLD_DATA_SIZE - size) ?
+ (void *)tld_obj->data_map->data + off : NULL;
+}
+
+#endif
--
2.47.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH bpf-next v4 2/3] selftests/bpf: Test basic task local data operations
2025-05-15 21:15 [PATCH bpf-next v4 0/3] Task local data Amery Hung
2025-05-15 21:16 ` [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task " Amery Hung
@ 2025-05-15 21:16 ` Amery Hung
2025-05-23 10:18 ` Tony Ambardar
2025-05-15 21:16 ` [PATCH bpf-next v4 3/3] selftests/bpf: Test concurrent task local data key creation Amery Hung
2 siblings, 1 reply; 17+ messages in thread
From: Amery Hung @ 2025-05-15 21:16 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, tj, memxor,
martin.lau, ameryhung, kernel-team
Test basic operations of task local data with valid and invalid
tld_create_key(). For invalid calls, make sure they return the right
error code, and verifiy that no TLDs are inserted by trying fetching
keys in the bpf program. For valid calls, first make sure the TLDs
are created using tld_fetch_key(). Then, verify that they are task-
specific with multiple user threads. This done by writing values unique
to each thread to TLDs, reading them from both user space and bpf, and
checking if the value read back are the same as the value written.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
.../bpf/prog_tests/test_task_local_data.c | 163 ++++++++++++++++++
.../bpf/progs/test_task_local_data.c | 81 +++++++++
2 files changed, 244 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/test_task_local_data.c
create mode 100644 tools/testing/selftests/bpf/progs/test_task_local_data.c
diff --git a/tools/testing/selftests/bpf/prog_tests/test_task_local_data.c b/tools/testing/selftests/bpf/prog_tests/test_task_local_data.c
new file mode 100644
index 000000000000..738fc1c9d8a4
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/test_task_local_data.c
@@ -0,0 +1,163 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <pthread.h>
+#include <bpf/btf.h>
+#include <test_progs.h>
+
+struct test_struct {
+ __u64 a;
+ __u64 b;
+ __u64 c;
+ __u64 d;
+};
+
+#include "test_task_local_data.skel.h"
+#include "task_local_data.h"
+
+/*
+ * Reset task local data between subtests by clearing metadata. This is only safe
+ * in selftests as subtests run sequentially. Users of task local data libraries
+ * should not do this.
+ */
+static void reset_tld(void)
+{
+ if (tld_metadata_p)
+ memset(tld_metadata_p, 0, PAGE_SIZE);
+}
+
+/* Serialize access to bpf program's global variables */
+static pthread_mutex_t global_mutex;
+
+#define TEST_BASIC_THREAD_NUM 63
+static tld_key_t tld_keys[TEST_BASIC_THREAD_NUM];
+
+void *test_task_local_data_basic_thread(void *arg)
+{
+ LIBBPF_OPTS(bpf_test_run_opts, opts);
+ struct test_task_local_data *skel = (struct test_task_local_data *)arg;
+ struct test_struct *value2;
+ int fd, err, tid, *value1;
+
+ fd = bpf_map__fd(skel->maps.tld_data_map);
+
+ value1 = tld_get_data(fd, tld_keys[0]);
+ if (!ASSERT_OK_PTR(value1, "tld_get_data"))
+ goto out;
+
+ value2 = tld_get_data(fd, tld_keys[1]);
+ if (!ASSERT_OK_PTR(value1, "tld_get_data"))
+ goto out;
+
+ tid = gettid();
+
+ *value1 = tid + 0;
+ value2->a = tid + 1;
+ value2->b = tid + 2;
+ value2->c = tid + 3;
+ value2->d = tid + 4;
+
+ pthread_mutex_lock(&global_mutex);
+ /*
+ * Run task_init which simulates an initialization bpf prog that runs once
+ * for every new task. The program saves keys for subsequent bpf programs.
+ */
+ err = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.task_init), &opts);
+ ASSERT_OK(err, "run task_init");
+ ASSERT_OK(opts.retval, "task_init retval");
+ /* Run task_main that read task local data and save to global variables */
+ err = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.task_main), &opts);
+ ASSERT_OK(err, "run task_main");
+ ASSERT_OK(opts.retval, "task_main retval");
+
+ ASSERT_EQ(skel->bss->test_value1, tid + 0, "tld_get_data value1");
+ ASSERT_EQ(skel->bss->test_value2.a, tid + 1, "tld_get_data value2.a");
+ ASSERT_EQ(skel->bss->test_value2.b, tid + 2, "tld_get_data value2.b");
+ ASSERT_EQ(skel->bss->test_value2.c, tid + 3, "tld_get_data value2.c");
+ ASSERT_EQ(skel->bss->test_value2.d, tid + 4, "tld_get_data value2.d");
+ pthread_mutex_unlock(&global_mutex);
+
+ /* Make sure valueX are indeed local to threads */
+ ASSERT_EQ(*value1, tid + 0, "value1");
+ ASSERT_EQ(value2->a, tid + 1, "value2.a");
+ ASSERT_EQ(value2->b, tid + 2, "value2.b");
+ ASSERT_EQ(value2->c, tid + 3, "value2.c");
+ ASSERT_EQ(value2->d, tid + 4, "value2.d");
+
+ *value1 = tid + 4;
+ value2->a = tid + 3;
+ value2->b = tid + 2;
+ value2->c = tid + 1;
+ value2->d = tid + 0;
+
+ /* Run task_main again */
+ pthread_mutex_lock(&global_mutex);
+ err = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.task_main), &opts);
+ ASSERT_OK(err, "run task_main");
+ ASSERT_OK(opts.retval, "task_main retval");
+
+ ASSERT_EQ(skel->bss->test_value1, tid + 4, "tld_get_data value1");
+ ASSERT_EQ(skel->bss->test_value2.a, tid + 3, "tld_get_data value2.a");
+ ASSERT_EQ(skel->bss->test_value2.b, tid + 2, "tld_get_data value2.b");
+ ASSERT_EQ(skel->bss->test_value2.c, tid + 1, "tld_get_data value2.c");
+ ASSERT_EQ(skel->bss->test_value2.d, tid + 0, "tld_get_data value2.d");
+ pthread_mutex_unlock(&global_mutex);
+
+ tld_free();
+out:
+ pthread_exit(NULL);
+}
+
+static void test_task_local_data_basic(void)
+{
+ struct test_task_local_data *skel;
+ pthread_t thread[TEST_BASIC_THREAD_NUM];
+ char dummy_key_name[TLD_NAME_LEN];
+ tld_key_t key;
+ int i, fd, err;
+
+ reset_tld();
+
+ ASSERT_OK(pthread_mutex_init(&global_mutex, NULL), "pthread_mutex_init");
+
+ skel = test_task_local_data__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
+ return;
+
+ fd = bpf_map__fd(skel->maps.tld_data_map);
+
+ tld_keys[0] = tld_create_key(fd, "value1", sizeof(int));
+ ASSERT_FALSE(tld_key_is_err(tld_keys[0]), "tld_create_key");
+ tld_keys[1] = tld_create_key(fd, "value2", sizeof(struct test_struct));
+ ASSERT_FALSE(tld_key_is_err(tld_keys[1]), "tld_create_key");
+
+ key = tld_create_key(fd, "value_not_exist",
+ PAGE_SIZE - sizeof(int) - sizeof(struct test_struct) + 1);
+ ASSERT_EQ(tld_key_err_or_zero(key), -E2BIG, "tld_create_key");
+
+ key = tld_create_key(fd, "value2", sizeof(struct test_struct));
+ ASSERT_EQ(tld_key_err_or_zero(key), -EEXIST, "tld_create_key");
+
+ for (i = 2; i < TLD_DATA_CNT; i++) {
+ snprintf(dummy_key_name, TLD_NAME_LEN, "dummy_value%d", i);
+ tld_keys[i] = tld_create_key(fd, dummy_key_name, sizeof(int));
+ ASSERT_FALSE(tld_key_is_err(tld_keys[i]), "tld_create_key");
+ }
+
+ key = tld_create_key(fd, "value_not_exist", sizeof(struct test_struct));
+ ASSERT_EQ(tld_key_err_or_zero(key), -ENOSPC, "tld_create_key");
+
+ for (i = 0; i < TEST_BASIC_THREAD_NUM; i++) {
+ err = pthread_create(&thread[i], NULL, test_task_local_data_basic_thread, skel);
+ if (!ASSERT_OK(err, "pthread_create"))
+ goto out;
+ }
+
+out:
+ for (i = 0; i < TEST_BASIC_THREAD_NUM; i++)
+ pthread_join(thread[i], NULL);
+}
+
+void test_task_local_data(void)
+{
+ if (test__start_subtest("task_local_data_basic"))
+ test_task_local_data_basic();
+}
diff --git a/tools/testing/selftests/bpf/progs/test_task_local_data.c b/tools/testing/selftests/bpf/progs/test_task_local_data.c
new file mode 100644
index 000000000000..4cf0630b19bd
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_task_local_data.c
@@ -0,0 +1,81 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <errno.h>
+#include <bpf/bpf_helpers.h>
+
+#include "task_local_data.bpf.h"
+
+struct tld_keys {
+ tld_key_t value1;
+ tld_key_t value2;
+ tld_key_t value_not_exist;
+};
+
+struct test_struct {
+ unsigned long a;
+ unsigned long b;
+ unsigned long c;
+ unsigned long d;
+};
+
+int test_value1;
+struct test_struct test_value2;
+
+SEC("syscall")
+int task_init(void *ctx)
+{
+ struct tld_object tld_obj;
+ struct task_struct *task;
+ int err;
+
+ task = bpf_get_current_task_btf();
+ err = tld_object_init(task, &tld_obj);
+ if (err)
+ return 1;
+
+ if (!tld_fetch_key(&tld_obj, "value1", value1))
+ return 2;
+
+ if (!tld_fetch_key(&tld_obj, "value2", value2))
+ return 3;
+
+ if (tld_fetch_key(&tld_obj, "value_not_exist", value_not_exist))
+ return 6;
+
+ return 0;
+}
+
+SEC("syscall")
+int task_main(void *ctx)
+{
+ struct tld_object tld_obj;
+ struct test_struct *struct_p;
+ struct task_struct *task;
+ int err, *int_p;
+
+ task = bpf_get_current_task_btf();
+ err = tld_object_init(task, &tld_obj);
+ if (err)
+ return 1;
+
+ int_p = tld_get_data(&tld_obj, value1, sizeof(int));
+ if (int_p)
+ test_value1 = *int_p;
+ else
+ return 2;
+
+ struct_p = tld_get_data(&tld_obj, value2, sizeof(struct test_struct));
+ if (struct_p)
+ test_value2 = *struct_p;
+ else
+ return 3;
+
+ int_p = tld_get_data(&tld_obj, value_not_exist, sizeof(int));
+ if (int_p)
+ return 4;
+
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+
--
2.47.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH bpf-next v4 3/3] selftests/bpf: Test concurrent task local data key creation
2025-05-15 21:15 [PATCH bpf-next v4 0/3] Task local data Amery Hung
2025-05-15 21:16 ` [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task " Amery Hung
2025-05-15 21:16 ` [PATCH bpf-next v4 2/3] selftests/bpf: Test basic task local data operations Amery Hung
@ 2025-05-15 21:16 ` Amery Hung
2 siblings, 0 replies; 17+ messages in thread
From: Amery Hung @ 2025-05-15 21:16 UTC (permalink / raw)
To: bpf
Cc: netdev, alexei.starovoitov, andrii, daniel, tj, memxor,
martin.lau, ameryhung, kernel-team
Test thread-safety of tld_create_key(). Since tld_create_key() does
not rely on locks but memory barriers and atomic operations to protect
the shared metadata, the thread-safety of the function is non-trivial.
Make sure concurrent tld_key_create(), both valid and invalid, can not
race and corrupt metatada, which may leads to TLDs not being thread-
specific or duplicate TLDs with the same name.
Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
.../bpf/prog_tests/test_task_local_data.c | 91 +++++++++++++++++++
1 file changed, 91 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/test_task_local_data.c b/tools/testing/selftests/bpf/prog_tests/test_task_local_data.c
index 738fc1c9d8a4..5743b753a4a1 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_task_local_data.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_task_local_data.c
@@ -156,8 +156,99 @@ static void test_task_local_data_basic(void)
pthread_join(thread[i], NULL);
}
+#define TEST_RACE_THREAD_NUM 61
+
+void *test_task_local_data_race_thread(void *arg)
+{
+ char key_name[32];
+ tld_key_t key;
+ int id, fd;
+
+ id = (intptr_t)arg & 0x0000ffff;
+ fd = ((intptr_t)arg & 0xffff0000) >> 16;
+
+ key = tld_create_key(fd, "value_not_exist", PAGE_SIZE + 1);
+ ASSERT_EQ(tld_key_err_or_zero(key), -E2BIG, "tld_create_key");
+
+ /*
+ * If more than one thread succeed in creating value1 or value2,
+ * some threads will fail to create thread_<id> later.
+ */
+ key = tld_create_key(fd, "value1", sizeof(int));
+ if (!tld_key_is_err(key))
+ tld_keys[TEST_RACE_THREAD_NUM] = key;
+ key = tld_create_key(fd, "value2", sizeof(struct test_struct));
+ if (!tld_key_is_err(key))
+ tld_keys[TEST_RACE_THREAD_NUM + 1] = key;
+
+ snprintf(key_name, 32, "thread_%d", id);
+ tld_keys[id] = tld_create_key(fd, key_name, sizeof(int));
+ ASSERT_FALSE(tld_key_is_err(tld_keys[id]), "tld_create_key");
+
+ pthread_exit(NULL);
+}
+
+static void test_task_local_data_race(void)
+{
+ LIBBPF_OPTS(bpf_test_run_opts, opts);
+ pthread_t thread[TEST_RACE_THREAD_NUM];
+ struct test_task_local_data *skel;
+ int fd, i, j, err, *data, arg;
+
+ skel = test_task_local_data__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
+ return;
+
+ fd = bpf_map__fd(skel->maps.tld_data_map);
+
+ for (j = 0; j < 100; j++) {
+ reset_tld();
+
+ for (i = 0; i < TEST_RACE_THREAD_NUM; i++) {
+ /*
+ * Try to make tld_create_key() race with each other. Call
+ * tld_create_key(), both valid and invalid, from different threads.
+ */
+ arg = i | fd << 16;
+ err = pthread_create(&thread[i], NULL, test_task_local_data_race_thread,
+ (void *)(intptr_t)arg);
+ if (!ASSERT_OK(err, "pthread_create"))
+ goto out;
+ }
+
+ /* Wait for all tld_create_key() to return */
+ for (i = 0; i < TEST_RACE_THREAD_NUM; i++)
+ pthread_join(thread[i], NULL);
+
+ /* Run task_init to make sure no invalid TLDs are added */
+ err = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.task_init), &opts);
+ ASSERT_OK(err, "run task_init");
+ ASSERT_OK(opts.retval, "task_init retval");
+
+ /* Write a unique number in the range of [0, TEST_RACE_THREAD_NUM) to each TLD */
+ for (i = 0; i < TEST_RACE_THREAD_NUM; i++) {
+ data = tld_get_data(fd, tld_keys[i]);
+ if (!ASSERT_OK_PTR(data, "tld_get_data"))
+ goto out;
+ *data = i;
+ }
+
+ /* Read TLDs and check the value to see if any address collides with another */
+ for (i = 0; i < TEST_RACE_THREAD_NUM; i++) {
+ data = tld_get_data(fd, tld_keys[i]);
+ if (!ASSERT_OK_PTR(data, "tld_get_data"))
+ goto out;
+ ASSERT_EQ(*data, i, "check TLD");
+ }
+ }
+out:
+ tld_free();
+}
+
void test_task_local_data(void)
{
if (test__start_subtest("task_local_data_basic"))
test_task_local_data_basic();
+ if (test__start_subtest("task_local_data_race"))
+ test_task_local_data_race();
}
--
2.47.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-15 21:16 ` [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task " Amery Hung
@ 2025-05-16 18:57 ` Alexei Starovoitov
2025-05-16 20:41 ` Amery Hung
2025-05-19 20:04 ` Tejun Heo
` (2 subsequent siblings)
3 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2025-05-16 18:57 UTC (permalink / raw)
To: Amery Hung
Cc: bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
Tejun Heo, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Kernel Team
On Thu, May 15, 2025 at 2:16 PM Amery Hung <ameryhung@gmail.com> wrote:
>
> Task local data defines an abstract storage type for storing task-
> specific data (TLD). This patch provides user space and bpf
> implementation as header-only libraries for accessing task local data.
>
> Task local data is a bpf task local storage map with two UPTRs:
> 1) u_tld_metadata, shared by all tasks of the same process, consists of
> the total count of TLDs and an array of metadata of TLDs. A metadata of
> a TLD comprises the size and the name. The name is used to identify a
> specific TLD in bpf 2) data is memory for storing TLDs specific to the
> task.
>
> The following are the basic task local data API:
>
> User space BPF
> Create key tld_create_key() -
> Fetch key - tld_fetch_key()
> Get data tld_get_data() tld_get_data()
>
> A TLD is first created by the user space with tld_create_key(). First,
> it goes through the metadata array to check if the TLD can be added.
> The total TLD size needs to fit into a page (limited by UPTR), and no
> two TLDs can have the same name. It also calculates the offset, the next
> available space in u_tld_data, by summing sizes of TLDs. If the TLD
> can be added, it increases the count using cmpxchg as there may be other
> concurrent tld_create_key(). After a successful cmpxchg, the last
> metadata slot now belongs to the calling thread and will be updated.
> tld_create_key() returns the offset encapsulated as a opaque object key
> to prevent user misuse.
>
> Then user space can pass the key to tld_get_data() to get a pointer
> to the TLD. The pointer will remain valid for the lifetime of the
> thread.
>
> BPF programs also locate TLDs with the keys. This is done by calling
> tld_fetch_key() with the name of the TLD. Similar to tld_create_key(),
> it scans through metadata array, compare the name of TLDs and compute
> the offset. Once found, the offset is also returned as a key, which
> can be passed to the bpf version of tld_get_data() to retrieve a
> pointer to the TLD.
>
> User space task local data library uses a light way approach to ensure
> thread safety (i.e., atomic operation + compiler and memory barriers).
> While a metadata is being updated, other threads may also try to read it.
> To prevent them from seeing incomplete data, metadata::size is used to
> signal the completion of the update, where 0 means the update is still
> ongoing. Threads will wait until seeing a non-zero size to read a
> metadata. Acquire/release order is required for metadata::size to
> prevent hardware reordering. For example, moving store to metadata::name
> after store to metadata::size or moving load from metadata::name before
> load from metadata::size.
>
> Signed-off-by: Amery Hung <ameryhung@gmail.com>
> ---
> .../bpf/prog_tests/task_local_data.h | 263 ++++++++++++++++++
> .../selftests/bpf/progs/task_local_data.bpf.h | 220 +++++++++++++++
> 2 files changed, 483 insertions(+)
> create mode 100644 tools/testing/selftests/bpf/prog_tests/task_local_data.h
> create mode 100644 tools/testing/selftests/bpf/progs/task_local_data.bpf.h
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_data.h b/tools/testing/selftests/bpf/prog_tests/task_local_data.h
> new file mode 100644
> index 000000000000..ec43ea59267c
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/task_local_data.h
> @@ -0,0 +1,263 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __TASK_LOCAL_DATA_H
> +#define __TASK_LOCAL_DATA_H
> +
> +#include <fcntl.h>
> +#include <errno.h>
> +#include <sched.h>
> +#include <stddef.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <sys/syscall.h>
> +#include <sys/types.h>
> +
> +#include <bpf/bpf.h>
> +
> +#ifndef PIDFD_THREAD
> +#define PIDFD_THREAD O_EXCL
> +#endif
> +
> +#define PAGE_SIZE 4096
> +
> +#ifndef __round_mask
> +#define __round_mask(x, y) ((__typeof__(x))((y)-1))
> +#endif
> +#ifndef round_up
> +#define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1)
> +#endif
> +
> +#ifndef READ_ONCE
> +#define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
> +#endif
> +
> +#ifndef WRITE_ONCE
> +#define WRITE_ONCE(x, val) ((*(volatile typeof(x) *)&(x)) = (val))
> +#endif
> +
> +#define TLD_DATA_SIZE PAGE_SIZE
wrap it with #ifndef ?
So that application can #define something smaller.
> +#define TLD_DATA_CNT 63
> +#define TLD_NAME_LEN 62
> +
> +typedef struct {
> + __s16 off;
> +} tld_key_t;
> +
> +struct tld_metadata {
> + char name[TLD_NAME_LEN];
> + __u16 size;
> +};
> +
> +struct u_tld_metadata {
> + __u8 cnt;
> + __u8 padding[63];
> + struct tld_metadata metadata[TLD_DATA_CNT];
> +};
> +
> +struct u_tld_data {
> + char data[TLD_DATA_SIZE];
> +};
> +
> +struct tld_map_value {
> + struct u_tld_data *data;
> + struct u_tld_metadata *metadata;
> +};
> +
> +struct u_tld_metadata *tld_metadata_p __attribute__((weak));
> +__thread struct u_tld_data *tld_data_p __attribute__((weak));
> +
> +static int __tld_init_metadata(int map_fd)
> +{
> + struct u_tld_metadata *new_metadata;
> + struct tld_map_value map_val;
> + int task_fd = 0, err;
> +
> + task_fd = syscall(SYS_pidfd_open, getpid(), 0);
this task_fd and task_fd in __tld_init_data() are two different things.
Please name them differently.
> + if (task_fd < 0) {
> + err = -errno;
> + goto out;
> + }
> +
> + new_metadata = aligned_alloc(PAGE_SIZE, PAGE_SIZE);
> + if (!new_metadata) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + memset(new_metadata, 0, PAGE_SIZE);
> +
> + map_val.data = NULL;
> + map_val.metadata = new_metadata;
> +
> + /*
> + * bpf_map_update_elem(.., pid_fd,..., BPF_NOEXIST) guarantees that
> + * only one tld_create_key() can update tld_metadata_p.
> + */
> + err = bpf_map_update_elem(map_fd, &task_fd, &map_val, BPF_NOEXIST);
> + if (err) {
> + free(new_metadata);
> + if (err == -EEXIST || err == -EAGAIN)
> + err = 0;
> + goto out;
> + }
> +
> + WRITE_ONCE(tld_metadata_p, new_metadata);
> +out:
> + if (task_fd > 0)
>=
> + close(task_fd);
> + return err;
> +}
> +
> +static int __tld_init_data(int map_fd)
> +{
> + struct u_tld_data *new_data = NULL;
> + struct tld_map_value map_val;
> + int err, task_fd = 0;
> +
> + task_fd = syscall(SYS_pidfd_open, gettid(), PIDFD_THREAD);
> + if (task_fd < 0) {
> + err = -errno;
> + goto out;
> + }
> +
> + new_data = aligned_alloc(PAGE_SIZE, TLD_DATA_SIZE);
probably roundup_power_of_two(TLD_DATA_SIZE) ?
Don't know about libc, but musl implementation of aligned_alloc()
is naive. It allocs align+size.
So it's a memory waste.
> + if (!new_data) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + map_val.data = new_data;
> + map_val.metadata = READ_ONCE(tld_metadata_p);
> +
> + err = bpf_map_update_elem(map_fd, &task_fd, &map_val, 0);
> + if (err) {
> + free(new_data);
> + goto out;
> + }
> +
> + tld_data_p = new_data;
> +out:
> + if (task_fd > 0)
>=
> + close(task_fd);
> + return err;
> +}
> +
> +/**
> + * tld_create_key() - Create a key associated with a TLD.
> + *
> + * @map_fd: A file descriptor of the underlying task local storage map,
> + * tld_data_map
> + * @name: The name the TLD will be associated with
> + * @size: Size of the TLD
> + *
> + * Returns an opaque object key. Use tld_key_is_err() or tld_key_err_or_zero() to
> + * check if the key creation succeed. Pass to tld_get_data() to get a pointer to
> + * the TLD. bpf programs can also fetch the same key by name.
> + */
> +__attribute__((unused))
> +static tld_key_t tld_create_key(int map_fd, const char *name, size_t size)
> +{
> + int err, i, cnt, sz, off = 0;
> +
> + if (!READ_ONCE(tld_metadata_p)) {
> + err = __tld_init_metadata(map_fd);
> + if (err)
> + return (tld_key_t) {.off = err};
> + }
> +
> + if (!tld_data_p) {
> + err = __tld_init_data(map_fd);
> + if (err)
> + return (tld_key_t) {.off = err};
> + }
> +
> + size = round_up(size, 8);
why roundup ? and to 8 in particular?
If user space wants byte size keys, why not let it?
> +
> + for (i = 0; i < TLD_DATA_CNT; i++) {
> +retry:
> + cnt = __atomic_load_n(&tld_metadata_p->cnt, __ATOMIC_RELAXED);
Instead of explicit __atomic builtins use _Atomic __u8 cnt;
?
> + if (i < cnt) {
> + /*
> + * Pending tld_create_key() uses size to signal if the metadata has
> + * been fully updated.
> + */
> + while (!(sz = __atomic_load_n(&tld_metadata_p->metadata[i].size,
> + __ATOMIC_ACQUIRE)))
> + sched_yield();
> +
> + if (!strncmp(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN))
> + return (tld_key_t) {.off = -EEXIST};
> +
> + off += sz;
> + continue;
> + }
> +
> + if (off + size > TLD_DATA_SIZE)
> + return (tld_key_t) {.off = -E2BIG};
> +
> + /*
> + * Only one tld_create_key() can increase the current cnt by one and
> + * takes the latest available slot. Other threads will check again if a new
> + * TLD can still be added, and then compete for the new slot after the
> + * succeeding thread update the size.
> + */
> + if (!__atomic_compare_exchange_n(&tld_metadata_p->cnt, &cnt, cnt + 1, true,
> + __ATOMIC_RELAXED, __ATOMIC_RELAXED))
weak and relaxed/relaxed ?
That's unusual.
I don't know what it is supposed to do.
I think weak=false and __ATOMIC_ACQUIRE, __ATOMIC_RELAXED
would look as expected.
> + goto retry;
> +
> + strncpy(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN);
> + __atomic_store_n(&tld_metadata_p->metadata[i].size, size, __ATOMIC_RELEASE);
> + return (tld_key_t) {.off = off};
> + }
> +
> + return (tld_key_t) {.off = -ENOSPC};
> +}
> +
> +__attribute__((unused))
> +static inline bool tld_key_is_err(tld_key_t key)
> +{
> + return key.off < 0;
> +}
> +
> +__attribute__((unused))
> +static inline int tld_key_err_or_zero(tld_key_t key)
> +{
> + return tld_key_is_err(key) ? key.off : 0;
> +}
> +
> +/**
> + * tld_get_data() - Gets a pointer to the TLD associated with the key.
> + *
> + * @map_fd: A file descriptor of the underlying task local storage map,
> + * tld_data_map
> + * @key: A key object returned by tld_create_key().
> + *
> + * Returns a pointer to the TLD if the key is valid; NULL if no key has been
> + * added, not enough memory for TLD for this thread, or the key is invalid.
> + *
> + * Threads that call tld_get_data() must call tld_free() on exit to prevent
> + * memory leak.
> + */
> +__attribute__((unused))
> +static void *tld_get_data(int map_fd, tld_key_t key)
> +{
> + if (!READ_ONCE(tld_metadata_p))
> + return NULL;
> +
> + if (!tld_data_p && __tld_init_data(map_fd))
> + return NULL;
Why call it again?
tld_create_key() should have done it, no?
> +
> + return tld_data_p->data + key.off;
> +}
> +
> +/**
> + * tld_free() - Frees task local data memory of the calling thread
> + */
> +__attribute__((unused))
> +static void tld_free(void)
> +{
> + if (tld_data_p)
> + free(tld_data_p);
> +}
Since this .h allocates tld_metadata_p, it probably needs
a helper to free it too.
> +
> +#endif /* __TASK_LOCAL_DATA_H */
> diff --git a/tools/testing/selftests/bpf/progs/task_local_data.bpf.h b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> new file mode 100644
> index 000000000000..5f48e408a5e5
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> @@ -0,0 +1,220 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __TASK_LOCAL_DATA_BPF_H
> +#define __TASK_LOCAL_DATA_BPF_H
> +
> +/*
> + * Task local data is a library that facilitates sharing per-task data
> + * between user space and bpf programs.
> + *
> + *
> + * PREREQUISITE
> + *
> + * A TLD, an entry of data in task local data, first needs to be created by the
> + * user space. This is done by calling user space API, tld_create_key(), with
> + * the name of the TLD and the size.
> + *
> + * tld_key_t prio, in_cs;
> + *
> + * prio = tld_create_key("priority", sizeof(int));
> + * in_cs = tld_create_key("in_critical_section", sizeof(bool));
> + *
> + * A key associated with the TLD, which has an opaque type tld_key_t, will be
> + * returned. It can be used to get a pointer to the TLD in the user space by
> + * calling tld_get_data().
> + *
> + *
> + * USAGE
> + *
> + * Similar to user space, bpf programs locate a TLD using the same key.
> + * tld_fetch_key() allows bpf programs to retrieve a key created in the user
> + * space by name, which is specified in the second argument of tld_create_key().
> + * tld_fetch_key() additionally will cache the key in a task local storage map,
> + * tld_key_map, to avoid performing costly string comparisons every time when
> + * accessing a TLD. This requires the developer to define the map value type of
> + * tld_key_map, struct tld_keys. It only needs to contain keys used by bpf
> + * programs in the compilation unit.
> + *
> + * struct tld_keys {
> + * tld_key_t prio;
> + * tld_key_t in_cs;
> + * };
> + *
> + * Then, for every new task, a bpf program will fetch and cache keys once and
> + * for all. This should be done ideally in a non-critical path (e.g., in
> + * sched_ext_ops::init_task).
> + *
> + * struct tld_object tld_obj;
> + *
> + * err = tld_object_init(task, &tld);
> + * if (err)
> + * return 0;
> + *
> + * tld_fetch_key(&tld_obj, "priority", prio);
> + * tld_fetch_key(&tld_obj, "in_critical_section", in_cs);
> + *
> + * Note that, the first argument of tld_fetch_key() is a pointer to tld_object.
> + * It should be declared as a stack variable and initialized via tld_object_init().
> + *
> + * Finally, just like user space programs, bpf programs can get a pointer to a
> + * TLD by calling tld_get_data(), with cached keys.
> + *
> + * int *p;
> + *
> + * p = tld_get_data(&tld_obj, prio, sizeof(int));
> + * if (p)
> + * // do something depending on *p
> + */
> +#include <errno.h>
> +#include <bpf/bpf_helpers.h>
> +
> +#define TLD_DATA_SIZE __PAGE_SIZE
> +#define TLD_DATA_CNT 63
> +#define TLD_NAME_LEN 62
> +
> +typedef struct {
> + __s16 off;
> +} tld_key_t;
> +
> +struct u_tld_data *dummy_data;
> +struct u_tld_metadata *dummy_metadata;
> +
> +struct tld_metadata {
> + char name[TLD_NAME_LEN];
> + __u16 size;
> +};
> +
> +struct u_tld_metadata {
> + __u8 cnt;
> + __u8 padding[63];
> + struct tld_metadata metadata[TLD_DATA_CNT];
> +};
> +
> +struct u_tld_data {
> + char data[TLD_DATA_SIZE];
> +};
> +
> +struct tld_map_value {
> + struct u_tld_data __uptr *data;
> + struct u_tld_metadata __uptr *metadata;
> +};
> +
> +struct tld_object {
> + struct tld_map_value *data_map;
> + struct tld_keys *key_map;
> +};
> +
> +/*
> + * Map value of tld_key_map for caching keys. Must be defined by the developer.
> + * Members should be tld_key_t and passed to the 3rd argument of tld_fetch_key().
> + */
> +struct tld_keys;
> +
> +struct {
> + __uint(type, BPF_MAP_TYPE_TASK_STORAGE);
> + __uint(map_flags, BPF_F_NO_PREALLOC);
> + __type(key, int);
> + __type(value, struct tld_map_value);
> +} tld_data_map SEC(".maps");
> +
> +struct {
> + __uint(type, BPF_MAP_TYPE_TASK_STORAGE);
> + __uint(map_flags, BPF_F_NO_PREALLOC);
> + __type(key, int);
> + __type(value, struct tld_keys);
> +} tld_key_map SEC(".maps");
> +
> +/**
> + * tld_object_init() - Initializes a tld_object.
> + *
> + * @task: The task_struct of the target task
> + * @tld_obj: A pointer to a tld_object to be initialized
> + *
> + * Returns 0 on success; -ENODATA if the task has no TLD; -ENOMEM if the creation
> + * of tld_key_map fails
> + */
> +__attribute__((unused))
> +static int tld_object_init(struct task_struct *task, struct tld_object *tld_obj)
> +{
> + tld_obj->data_map = bpf_task_storage_get(&tld_data_map, task, 0, 0);
> + if (!tld_obj->data_map)
> + return -ENODATA;
> +
> + tld_obj->key_map = bpf_task_storage_get(&tld_key_map, task, 0,
> + BPF_LOCAL_STORAGE_GET_F_CREATE);
> + if (!tld_obj->key_map)
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +/**
> + * tld_fetch_key() - Fetches the key to a TLD by name. The key has to be created
> + * by user space first with tld_create_key().
> + *
> + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init()
> + * @name: The name of the key associated with a TLD
> + * @key: The key in struct tld_keys to be saved to
> + *
> + * Returns a positive integer on success; 0 otherwise
> + */
> +#define tld_fetch_key(tld_obj, name, key) \
> + ({ \
> + (tld_obj)->key_map->key.off = __tld_fetch_key(tld_obj, name); \
> + })
> +
> +__attribute__((unused))
> +static u16 __tld_fetch_key(struct tld_object *tld_obj, const char *name)
> +{
> + int i, meta_off, cnt;
> + void *metadata, *nm, *sz;
> + u16 off = 0;
> +
> + if (!tld_obj->data_map || !tld_obj->data_map->metadata)
> + return 0;
> +
> + cnt = tld_obj->data_map->metadata->cnt;
> + metadata = tld_obj->data_map->metadata->metadata;
> +
> + bpf_for(i, 0, cnt) {
> + meta_off = i * sizeof(struct tld_metadata);
> + if (meta_off > TLD_DATA_SIZE - offsetof(struct u_tld_metadata, metadata)
> + - sizeof(struct tld_metadata))
> + break;
> +
> + nm = metadata + meta_off + offsetof(struct tld_metadata, name);
> + sz = metadata + meta_off + offsetof(struct tld_metadata, size);
> +
> + /*
> + * Reserve 0 for uninitialized keys. Increase the offset of a valid key
> + * by one and subtract it later in tld_get_data().
> + */
> + if (!bpf_strncmp(nm, TLD_NAME_LEN, name))
> + return off + 1;
I think all this +1, -1 dance is doing is helping to catch an
error when tld_get_data() is called without tld_fetch_key().
I feel this is too defensive.
Let tld_fetch_key() return proper negative error code.
> +
> + off += *(u16 *)sz;
> + }
> +
> + return 0;
> +}
> +
> +/**
> + * tld_get_data() - Retrieves a pointer to the TLD associated with the key.
> + *
> + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init()
> + * @key: The key of a TLD saved in tld_maps
> + * @size: The size of the TLD. Must be a known constant value
> + *
> + * Returns a pointer to the TLD data associated with the key; NULL if the key
> + * is not valid or the size is too big
> + */
> +#define tld_get_data(tld_obj, key, size) \
> + __tld_get_data(tld_obj, (tld_obj)->key_map->key.off - 1, size)
> +
> +__attribute__((unused))
> +__always_inline void *__tld_get_data(struct tld_object *tld_obj, u32 off, u32 size)
> +{
> + return (tld_obj->data_map->data && off >= 0 && off < TLD_DATA_SIZE - size) ?
> + (void *)tld_obj->data_map->data + off : NULL;
If we require users to call tld_fetch_key() first and check for errors
tld_get_data() can be faster:
#define tld_get_data(tld_obj, key)
(void *)tld_obj->data_map->data + tld_obj->key_map->key.off
I wouldn't bother with extra checks, especially for size.
Bigger question.. can we cache the whole pointer for each key ?
and then
#define tld_get_data(tld_obj, key) ld_obj->key_map->key
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-16 18:57 ` Alexei Starovoitov
@ 2025-05-16 20:41 ` Amery Hung
2025-05-16 22:22 ` Alexei Starovoitov
0 siblings, 1 reply; 17+ messages in thread
From: Amery Hung @ 2025-05-16 20:41 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
Tejun Heo, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Kernel Team
On Fri, May 16, 2025 at 2:57 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, May 15, 2025 at 2:16 PM Amery Hung <ameryhung@gmail.com> wrote:
> >
> > Task local data defines an abstract storage type for storing task-
> > specific data (TLD). This patch provides user space and bpf
> > implementation as header-only libraries for accessing task local data.
> >
> > Task local data is a bpf task local storage map with two UPTRs:
> > 1) u_tld_metadata, shared by all tasks of the same process, consists of
> > the total count of TLDs and an array of metadata of TLDs. A metadata of
> > a TLD comprises the size and the name. The name is used to identify a
> > specific TLD in bpf 2) data is memory for storing TLDs specific to the
> > task.
> >
> > The following are the basic task local data API:
> >
> > User space BPF
> > Create key tld_create_key() -
> > Fetch key - tld_fetch_key()
> > Get data tld_get_data() tld_get_data()
> >
> > A TLD is first created by the user space with tld_create_key(). First,
> > it goes through the metadata array to check if the TLD can be added.
> > The total TLD size needs to fit into a page (limited by UPTR), and no
> > two TLDs can have the same name. It also calculates the offset, the next
> > available space in u_tld_data, by summing sizes of TLDs. If the TLD
> > can be added, it increases the count using cmpxchg as there may be other
> > concurrent tld_create_key(). After a successful cmpxchg, the last
> > metadata slot now belongs to the calling thread and will be updated.
> > tld_create_key() returns the offset encapsulated as a opaque object key
> > to prevent user misuse.
> >
> > Then user space can pass the key to tld_get_data() to get a pointer
> > to the TLD. The pointer will remain valid for the lifetime of the
> > thread.
> >
> > BPF programs also locate TLDs with the keys. This is done by calling
> > tld_fetch_key() with the name of the TLD. Similar to tld_create_key(),
> > it scans through metadata array, compare the name of TLDs and compute
> > the offset. Once found, the offset is also returned as a key, which
> > can be passed to the bpf version of tld_get_data() to retrieve a
> > pointer to the TLD.
> >
> > User space task local data library uses a light way approach to ensure
> > thread safety (i.e., atomic operation + compiler and memory barriers).
> > While a metadata is being updated, other threads may also try to read it.
> > To prevent them from seeing incomplete data, metadata::size is used to
> > signal the completion of the update, where 0 means the update is still
> > ongoing. Threads will wait until seeing a non-zero size to read a
> > metadata. Acquire/release order is required for metadata::size to
> > prevent hardware reordering. For example, moving store to metadata::name
> > after store to metadata::size or moving load from metadata::name before
> > load from metadata::size.
> >
> > Signed-off-by: Amery Hung <ameryhung@gmail.com>
> > ---
> > .../bpf/prog_tests/task_local_data.h | 263 ++++++++++++++++++
> > .../selftests/bpf/progs/task_local_data.bpf.h | 220 +++++++++++++++
> > 2 files changed, 483 insertions(+)
> > create mode 100644 tools/testing/selftests/bpf/prog_tests/task_local_data.h
> > create mode 100644 tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_data.h b/tools/testing/selftests/bpf/prog_tests/task_local_data.h
> > new file mode 100644
> > index 000000000000..ec43ea59267c
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/task_local_data.h
> > @@ -0,0 +1,263 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef __TASK_LOCAL_DATA_H
> > +#define __TASK_LOCAL_DATA_H
> > +
> > +#include <fcntl.h>
> > +#include <errno.h>
> > +#include <sched.h>
> > +#include <stddef.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <unistd.h>
> > +#include <sys/syscall.h>
> > +#include <sys/types.h>
> > +
> > +#include <bpf/bpf.h>
> > +
> > +#ifndef PIDFD_THREAD
> > +#define PIDFD_THREAD O_EXCL
> > +#endif
> > +
> > +#define PAGE_SIZE 4096
> > +
> > +#ifndef __round_mask
> > +#define __round_mask(x, y) ((__typeof__(x))((y)-1))
> > +#endif
> > +#ifndef round_up
> > +#define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1)
> > +#endif
> > +
> > +#ifndef READ_ONCE
> > +#define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
> > +#endif
> > +
> > +#ifndef WRITE_ONCE
> > +#define WRITE_ONCE(x, val) ((*(volatile typeof(x) *)&(x)) = (val))
> > +#endif
> > +
> > +#define TLD_DATA_SIZE PAGE_SIZE
>
> wrap it with #ifndef ?
>
> So that application can #define something smaller.
>
> > +#define TLD_DATA_CNT 63
> > +#define TLD_NAME_LEN 62
> > +
> > +typedef struct {
> > + __s16 off;
> > +} tld_key_t;
> > +
> > +struct tld_metadata {
> > + char name[TLD_NAME_LEN];
> > + __u16 size;
> > +};
> > +
> > +struct u_tld_metadata {
> > + __u8 cnt;
> > + __u8 padding[63];
> > + struct tld_metadata metadata[TLD_DATA_CNT];
> > +};
> > +
> > +struct u_tld_data {
> > + char data[TLD_DATA_SIZE];
> > +};
> > +
> > +struct tld_map_value {
> > + struct u_tld_data *data;
> > + struct u_tld_metadata *metadata;
> > +};
> > +
> > +struct u_tld_metadata *tld_metadata_p __attribute__((weak));
> > +__thread struct u_tld_data *tld_data_p __attribute__((weak));
> > +
> > +static int __tld_init_metadata(int map_fd)
> > +{
> > + struct u_tld_metadata *new_metadata;
> > + struct tld_map_value map_val;
> > + int task_fd = 0, err;
> > +
> > + task_fd = syscall(SYS_pidfd_open, getpid(), 0);
>
> this task_fd and task_fd in __tld_init_data() are two different things.
> Please name them differently.
I will rename them to tid_fd and pid_fd.
>
> > + if (task_fd < 0) {
> > + err = -errno;
> > + goto out;
> > + }
> > +
> > + new_metadata = aligned_alloc(PAGE_SIZE, PAGE_SIZE);
> > + if (!new_metadata) {
> > + err = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + memset(new_metadata, 0, PAGE_SIZE);
> > +
> > + map_val.data = NULL;
> > + map_val.metadata = new_metadata;
> > +
> > + /*
> > + * bpf_map_update_elem(.., pid_fd,..., BPF_NOEXIST) guarantees that
> > + * only one tld_create_key() can update tld_metadata_p.
> > + */
> > + err = bpf_map_update_elem(map_fd, &task_fd, &map_val, BPF_NOEXIST);
> > + if (err) {
> > + free(new_metadata);
> > + if (err == -EEXIST || err == -EAGAIN)
> > + err = 0;
> > + goto out;
> > + }
> > +
> > + WRITE_ONCE(tld_metadata_p, new_metadata);
> > +out:
> > + if (task_fd > 0)
>
> >=
>
> > + close(task_fd);
> > + return err;
> > +}
> > +
> > +static int __tld_init_data(int map_fd)
> > +{
> > + struct u_tld_data *new_data = NULL;
> > + struct tld_map_value map_val;
> > + int err, task_fd = 0;
> > +
> > + task_fd = syscall(SYS_pidfd_open, gettid(), PIDFD_THREAD);
> > + if (task_fd < 0) {
> > + err = -errno;
> > + goto out;
> > + }
> > +
> > + new_data = aligned_alloc(PAGE_SIZE, TLD_DATA_SIZE);
>
> probably roundup_power_of_two(TLD_DATA_SIZE) ?
> Don't know about libc, but musl implementation of aligned_alloc()
> is naive. It allocs align+size.
> So it's a memory waste.
>
I will do roundup_power_of_two for the size in the next respin, and
also check libc implementation.
> > + if (!new_data) {
> > + err = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + map_val.data = new_data;
> > + map_val.metadata = READ_ONCE(tld_metadata_p);
> > +
> > + err = bpf_map_update_elem(map_fd, &task_fd, &map_val, 0);
> > + if (err) {
> > + free(new_data);
> > + goto out;
> > + }
> > +
> > + tld_data_p = new_data;
> > +out:
> > + if (task_fd > 0)
>
> >=
>
> > + close(task_fd);
> > + return err;
> > +}
> > +
> > +/**
> > + * tld_create_key() - Create a key associated with a TLD.
> > + *
> > + * @map_fd: A file descriptor of the underlying task local storage map,
> > + * tld_data_map
> > + * @name: The name the TLD will be associated with
> > + * @size: Size of the TLD
> > + *
> > + * Returns an opaque object key. Use tld_key_is_err() or tld_key_err_or_zero() to
> > + * check if the key creation succeed. Pass to tld_get_data() to get a pointer to
> > + * the TLD. bpf programs can also fetch the same key by name.
> > + */
> > +__attribute__((unused))
> > +static tld_key_t tld_create_key(int map_fd, const char *name, size_t size)
> > +{
> > + int err, i, cnt, sz, off = 0;
> > +
> > + if (!READ_ONCE(tld_metadata_p)) {
> > + err = __tld_init_metadata(map_fd);
> > + if (err)
> > + return (tld_key_t) {.off = err};
> > + }
> > +
> > + if (!tld_data_p) {
> > + err = __tld_init_data(map_fd);
> > + if (err)
> > + return (tld_key_t) {.off = err};
> > + }
> > +
> > + size = round_up(size, 8);
>
> why roundup ? and to 8 in particular?
> If user space wants byte size keys, why not let it?
I will remove it. This was to prevent breaking using TLD in atomic
operations, but it should be very unlikely as they are thread
specific.
>
> > +
> > + for (i = 0; i < TLD_DATA_CNT; i++) {
> > +retry:
> > + cnt = __atomic_load_n(&tld_metadata_p->cnt, __ATOMIC_RELAXED);
>
> Instead of explicit __atomic builtins use _Atomic __u8 cnt;
> ?
>
Got it. I will use builtins in stdatomic.h.
> > + if (i < cnt) {
> > + /*
> > + * Pending tld_create_key() uses size to signal if the metadata has
> > + * been fully updated.
> > + */
> > + while (!(sz = __atomic_load_n(&tld_metadata_p->metadata[i].size,
> > + __ATOMIC_ACQUIRE)))
> > + sched_yield();
> > +
> > + if (!strncmp(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN))
> > + return (tld_key_t) {.off = -EEXIST};
> > +
> > + off += sz;
> > + continue;
> > + }
> > +
> > + if (off + size > TLD_DATA_SIZE)
> > + return (tld_key_t) {.off = -E2BIG};
> > +
> > + /*
> > + * Only one tld_create_key() can increase the current cnt by one and
> > + * takes the latest available slot. Other threads will check again if a new
> > + * TLD can still be added, and then compete for the new slot after the
> > + * succeeding thread update the size.
> > + */
> > + if (!__atomic_compare_exchange_n(&tld_metadata_p->cnt, &cnt, cnt + 1, true,
> > + __ATOMIC_RELAXED, __ATOMIC_RELAXED))
>
> weak and relaxed/relaxed ?
I can't see reordering issue with cnt so I choose to use relax. I can
change to strong acq/rel just to be safe.
> That's unusual.
> I don't know what it is supposed to do.
> I think weak=false and __ATOMIC_ACQUIRE, __ATOMIC_RELAXED
> would look as expected.
>
Do you mean weak=false and __ATOMIC_RELAXED, __ATOMIC_ACQUIRE?
> > + goto retry;
> > +
> > + strncpy(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN);
> > + __atomic_store_n(&tld_metadata_p->metadata[i].size, size, __ATOMIC_RELEASE);
> > + return (tld_key_t) {.off = off};
> > + }
> > +
> > + return (tld_key_t) {.off = -ENOSPC};
> > +}
> > +
> > +__attribute__((unused))
> > +static inline bool tld_key_is_err(tld_key_t key)
> > +{
> > + return key.off < 0;
> > +}
> > +
> > +__attribute__((unused))
> > +static inline int tld_key_err_or_zero(tld_key_t key)
> > +{
> > + return tld_key_is_err(key) ? key.off : 0;
> > +}
> > +
> > +/**
> > + * tld_get_data() - Gets a pointer to the TLD associated with the key.
> > + *
> > + * @map_fd: A file descriptor of the underlying task local storage map,
> > + * tld_data_map
> > + * @key: A key object returned by tld_create_key().
> > + *
> > + * Returns a pointer to the TLD if the key is valid; NULL if no key has been
> > + * added, not enough memory for TLD for this thread, or the key is invalid.
> > + *
> > + * Threads that call tld_get_data() must call tld_free() on exit to prevent
> > + * memory leak.
> > + */
> > +__attribute__((unused))
> > +static void *tld_get_data(int map_fd, tld_key_t key)
> > +{
> > + if (!READ_ONCE(tld_metadata_p))
> > + return NULL;
> > +
> > + if (!tld_data_p && __tld_init_data(map_fd))
> > + return NULL;
>
> Why call it again?
> tld_create_key() should have done it, no?
>
A TLD is created by calling tld_create_key() once. Then, threads may
call tld_get_data() to get their thread-specific TLD. So it is
possible for a thread to call tld_get_data() with tld_data_p==NULL.
> > +
> > + return tld_data_p->data + key.off;
> > +}
> > +
> > +/**
> > + * tld_free() - Frees task local data memory of the calling thread
> > + */
> > +__attribute__((unused))
> > +static void tld_free(void)
> > +{
> > + if (tld_data_p)
> > + free(tld_data_p);
> > +}
>
> Since this .h allocates tld_metadata_p, it probably needs
> a helper to free it too.
>
> > +
> > +#endif /* __TASK_LOCAL_DATA_H */
> > diff --git a/tools/testing/selftests/bpf/progs/task_local_data.bpf.h b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> > new file mode 100644
> > index 000000000000..5f48e408a5e5
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> > @@ -0,0 +1,220 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef __TASK_LOCAL_DATA_BPF_H
> > +#define __TASK_LOCAL_DATA_BPF_H
> > +
> > +/*
> > + * Task local data is a library that facilitates sharing per-task data
> > + * between user space and bpf programs.
> > + *
> > + *
> > + * PREREQUISITE
> > + *
> > + * A TLD, an entry of data in task local data, first needs to be created by the
> > + * user space. This is done by calling user space API, tld_create_key(), with
> > + * the name of the TLD and the size.
> > + *
> > + * tld_key_t prio, in_cs;
> > + *
> > + * prio = tld_create_key("priority", sizeof(int));
> > + * in_cs = tld_create_key("in_critical_section", sizeof(bool));
> > + *
> > + * A key associated with the TLD, which has an opaque type tld_key_t, will be
> > + * returned. It can be used to get a pointer to the TLD in the user space by
> > + * calling tld_get_data().
> > + *
> > + *
> > + * USAGE
> > + *
> > + * Similar to user space, bpf programs locate a TLD using the same key.
> > + * tld_fetch_key() allows bpf programs to retrieve a key created in the user
> > + * space by name, which is specified in the second argument of tld_create_key().
> > + * tld_fetch_key() additionally will cache the key in a task local storage map,
> > + * tld_key_map, to avoid performing costly string comparisons every time when
> > + * accessing a TLD. This requires the developer to define the map value type of
> > + * tld_key_map, struct tld_keys. It only needs to contain keys used by bpf
> > + * programs in the compilation unit.
> > + *
> > + * struct tld_keys {
> > + * tld_key_t prio;
> > + * tld_key_t in_cs;
> > + * };
> > + *
> > + * Then, for every new task, a bpf program will fetch and cache keys once and
> > + * for all. This should be done ideally in a non-critical path (e.g., in
> > + * sched_ext_ops::init_task).
> > + *
> > + * struct tld_object tld_obj;
> > + *
> > + * err = tld_object_init(task, &tld);
> > + * if (err)
> > + * return 0;
> > + *
> > + * tld_fetch_key(&tld_obj, "priority", prio);
> > + * tld_fetch_key(&tld_obj, "in_critical_section", in_cs);
> > + *
> > + * Note that, the first argument of tld_fetch_key() is a pointer to tld_object.
> > + * It should be declared as a stack variable and initialized via tld_object_init().
> > + *
> > + * Finally, just like user space programs, bpf programs can get a pointer to a
> > + * TLD by calling tld_get_data(), with cached keys.
> > + *
> > + * int *p;
> > + *
> > + * p = tld_get_data(&tld_obj, prio, sizeof(int));
> > + * if (p)
> > + * // do something depending on *p
> > + */
> > +#include <errno.h>
> > +#include <bpf/bpf_helpers.h>
> > +
> > +#define TLD_DATA_SIZE __PAGE_SIZE
> > +#define TLD_DATA_CNT 63
> > +#define TLD_NAME_LEN 62
> > +
> > +typedef struct {
> > + __s16 off;
> > +} tld_key_t;
> > +
> > +struct u_tld_data *dummy_data;
> > +struct u_tld_metadata *dummy_metadata;
> > +
> > +struct tld_metadata {
> > + char name[TLD_NAME_LEN];
> > + __u16 size;
> > +};
> > +
> > +struct u_tld_metadata {
> > + __u8 cnt;
> > + __u8 padding[63];
> > + struct tld_metadata metadata[TLD_DATA_CNT];
> > +};
> > +
> > +struct u_tld_data {
> > + char data[TLD_DATA_SIZE];
> > +};
> > +
> > +struct tld_map_value {
> > + struct u_tld_data __uptr *data;
> > + struct u_tld_metadata __uptr *metadata;
> > +};
> > +
> > +struct tld_object {
> > + struct tld_map_value *data_map;
> > + struct tld_keys *key_map;
> > +};
> > +
> > +/*
> > + * Map value of tld_key_map for caching keys. Must be defined by the developer.
> > + * Members should be tld_key_t and passed to the 3rd argument of tld_fetch_key().
> > + */
> > +struct tld_keys;
> > +
> > +struct {
> > + __uint(type, BPF_MAP_TYPE_TASK_STORAGE);
> > + __uint(map_flags, BPF_F_NO_PREALLOC);
> > + __type(key, int);
> > + __type(value, struct tld_map_value);
> > +} tld_data_map SEC(".maps");
> > +
> > +struct {
> > + __uint(type, BPF_MAP_TYPE_TASK_STORAGE);
> > + __uint(map_flags, BPF_F_NO_PREALLOC);
> > + __type(key, int);
> > + __type(value, struct tld_keys);
> > +} tld_key_map SEC(".maps");
> > +
> > +/**
> > + * tld_object_init() - Initializes a tld_object.
> > + *
> > + * @task: The task_struct of the target task
> > + * @tld_obj: A pointer to a tld_object to be initialized
> > + *
> > + * Returns 0 on success; -ENODATA if the task has no TLD; -ENOMEM if the creation
> > + * of tld_key_map fails
> > + */
> > +__attribute__((unused))
> > +static int tld_object_init(struct task_struct *task, struct tld_object *tld_obj)
> > +{
> > + tld_obj->data_map = bpf_task_storage_get(&tld_data_map, task, 0, 0);
> > + if (!tld_obj->data_map)
> > + return -ENODATA;
> > +
> > + tld_obj->key_map = bpf_task_storage_get(&tld_key_map, task, 0,
> > + BPF_LOCAL_STORAGE_GET_F_CREATE);
> > + if (!tld_obj->key_map)
> > + return -ENOMEM;
> > +
> > + return 0;
> > +}
> > +
> > +/**
> > + * tld_fetch_key() - Fetches the key to a TLD by name. The key has to be created
> > + * by user space first with tld_create_key().
> > + *
> > + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init()
> > + * @name: The name of the key associated with a TLD
> > + * @key: The key in struct tld_keys to be saved to
> > + *
> > + * Returns a positive integer on success; 0 otherwise
> > + */
> > +#define tld_fetch_key(tld_obj, name, key) \
> > + ({ \
> > + (tld_obj)->key_map->key.off = __tld_fetch_key(tld_obj, name); \
> > + })
> > +
> > +__attribute__((unused))
> > +static u16 __tld_fetch_key(struct tld_object *tld_obj, const char *name)
> > +{
> > + int i, meta_off, cnt;
> > + void *metadata, *nm, *sz;
> > + u16 off = 0;
> > +
> > + if (!tld_obj->data_map || !tld_obj->data_map->metadata)
> > + return 0;
> > +
> > + cnt = tld_obj->data_map->metadata->cnt;
> > + metadata = tld_obj->data_map->metadata->metadata;
> > +
> > + bpf_for(i, 0, cnt) {
> > + meta_off = i * sizeof(struct tld_metadata);
> > + if (meta_off > TLD_DATA_SIZE - offsetof(struct u_tld_metadata, metadata)
> > + - sizeof(struct tld_metadata))
> > + break;
> > +
> > + nm = metadata + meta_off + offsetof(struct tld_metadata, name);
> > + sz = metadata + meta_off + offsetof(struct tld_metadata, size);
> > +
> > + /*
> > + * Reserve 0 for uninitialized keys. Increase the offset of a valid key
> > + * by one and subtract it later in tld_get_data().
> > + */
> > + if (!bpf_strncmp(nm, TLD_NAME_LEN, name))
> > + return off + 1;
>
> I think all this +1, -1 dance is doing is helping to catch an
> error when tld_get_data() is called without tld_fetch_key().
> I feel this is too defensive.
>
> Let tld_fetch_key() return proper negative error code.
>
I can certainly return negative error code.
But for the +1, -1 logic, I think is a simpler way to differentiate an
uninitialized key in tld_key_map from the first TLD (both key.off ==
0). After all, bpf programs can call tld_get_data() without
tld_fetch_key().
> > +
> > + off += *(u16 *)sz;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +/**
> > + * tld_get_data() - Retrieves a pointer to the TLD associated with the key.
> > + *
> > + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init()
> > + * @key: The key of a TLD saved in tld_maps
> > + * @size: The size of the TLD. Must be a known constant value
> > + *
> > + * Returns a pointer to the TLD data associated with the key; NULL if the key
> > + * is not valid or the size is too big
> > + */
> > +#define tld_get_data(tld_obj, key, size) \
> > + __tld_get_data(tld_obj, (tld_obj)->key_map->key.off - 1, size)
> > +
> > +__attribute__((unused))
> > +__always_inline void *__tld_get_data(struct tld_object *tld_obj, u32 off, u32 size)
> > +{
> > + return (tld_obj->data_map->data && off >= 0 && off < TLD_DATA_SIZE - size) ?
> > + (void *)tld_obj->data_map->data + off : NULL;
>
> If we require users to call tld_fetch_key() first and check for errors
> tld_get_data() can be faster:
> #define tld_get_data(tld_obj, key)
> (void *)tld_obj->data_map->data + tld_obj->key_map->key.off
>
tld_get_data() can be called in a bpf program without tld_fetch_key(),
so checking tld_obj->data_map->data is needed as the first load from
tld_obj->data_map->data yields a "mem_or_null".
I did try to save this uptr "mem" after null check to stack (e.g., in
a tld_object) so that we can save subsequent checks. However, the
compiler sometime will do a fresh load from map_ptr when reading
tld_obj->data_map->data. Might need some work or trick to make it
happen.
> I wouldn't bother with extra checks, especially for size.
>
It needs to be bound-checked. If tld_get_data() doesn't do it, bpf
programs have to do it before acceess. Otherwise:
; return (tld_obj->data_map->data && off >= 0) ? @ task_local_data.bpf.h:218
25: (bf) r3 = r1 ; R1_w=mem(sz=4096) R3_w=mem(sz=4096)
26: (0f) r3 += r2 ;
R2_w=scalar(smin=0,smax=umax=0xffffffff,smin32=0xffff7fff,smax32=32766,var_off=(0x0;
0xffffffff)) R3_w=mem(sz=4096,smin=0,smax=umax=0xffffffff,var_off=(0x0;
0xfffffff)
; test_value1 = *int_p; @ test_task_local_data.c:63
27: (61) r2 = *(u32 *)(r3 +0)
R3 unbounded memory access, make sure to bounds check any such access
> Bigger question.. can we cache the whole pointer for each key ?
> and then
> #define tld_get_data(tld_obj, key) ld_obj->key_map->key
Maybe define the member type of tld_key_map as __uptr and allow bpf
programs to update a uptr field with a valid uptr?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-16 20:41 ` Amery Hung
@ 2025-05-16 22:22 ` Alexei Starovoitov
2025-05-20 7:36 ` Amery Hung
0 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2025-05-16 22:22 UTC (permalink / raw)
To: Amery Hung
Cc: bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
Tejun Heo, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Kernel Team
On Fri, May 16, 2025 at 1:41 PM Amery Hung <ameryhung@gmail.com> wrote:
>
> > > + size = round_up(size, 8);
> >
> > why roundup ? and to 8 in particular?
> > If user space wants byte size keys, why not let it?
>
> I will remove it. This was to prevent breaking using TLD in atomic
> operations, but it should be very unlikely as they are thread
> specific.
You mean for a case where one part of the app (like a shared library)
is using u32, but the other is using u64 and doing atomic ops on it?
Make sense to align the off set by tld_create_key(),
but it can be done without rounding up all previous keys to 8.
63 bytes in the header are wasted. So use 2 as an offset.
A single cmpxchg 4 byte can update cnt+offset.
Actually why store size in each tld_metadata ?
Won't the logic will be simpler if it's an offset ?
bpf side tld_fetch_key() wouldn't need to count.
> > > + if (i < cnt) {
> > > + /*
> > > + * Pending tld_create_key() uses size to signal if the metadata has
> > > + * been fully updated.
> > > + */
> > > + while (!(sz = __atomic_load_n(&tld_metadata_p->metadata[i].size,
> > > + __ATOMIC_ACQUIRE)))
> > > + sched_yield();
> > > +
> > > + if (!strncmp(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN))
> > > + return (tld_key_t) {.off = -EEXIST};
> > > +
> > > + off += sz;
> > > + continue;
> > > + }
> > > +
> > > + if (off + size > TLD_DATA_SIZE)
> > > + return (tld_key_t) {.off = -E2BIG};
> > > +
> > > + /*
> > > + * Only one tld_create_key() can increase the current cnt by one and
> > > + * takes the latest available slot. Other threads will check again if a new
> > > + * TLD can still be added, and then compete for the new slot after the
> > > + * succeeding thread update the size.
> > > + */
> > > + if (!__atomic_compare_exchange_n(&tld_metadata_p->cnt, &cnt, cnt + 1, true,
> > > + __ATOMIC_RELAXED, __ATOMIC_RELAXED))
> >
> > weak and relaxed/relaxed ?
>
> I can't see reordering issue with cnt so I choose to use relax. I can
> change to strong acq/rel just to be safe.
>
> > That's unusual.
> > I don't know what it is supposed to do.
> > I think weak=false and __ATOMIC_ACQUIRE, __ATOMIC_RELAXED
> > would look as expected.
> >
>
> Do you mean weak=false and __ATOMIC_RELAXED, __ATOMIC_ACQUIRE?
no idea. I just grepped the kernel and saw:
TEST_KERNEL_LOCKED(atomic_builtin_with_memorder,
__atomic_compare_exchange_n(flag, &v, 1, 0,
__ATOMIC_ACQUIRE, __ATOMIC_RELAXED),
__atomic_store_n(flag, 0, __ATOMIC_RELEASE));
TEST_KERNEL_LOCKED(atomic_builtin_wrong_memorder,
__atomic_compare_exchange_n(flag, &v, 1, 0,
__ATOMIC_RELAXED, __ATOMIC_RELAXED),
__atomic_store_n(flag, 0, __ATOMIC_RELAXED));
I'd just use __ATOMIC_SEQ_CST everywhere.
Speed is not important here.
>
> > > + goto retry;
> > > +
> > > + strncpy(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN);
> > > + __atomic_store_n(&tld_metadata_p->metadata[i].size, size, __ATOMIC_RELEASE);
> > > + return (tld_key_t) {.off = off};
> > > + }
> > > +
> > > + return (tld_key_t) {.off = -ENOSPC};
> > > +}
> > > +
> > > +__attribute__((unused))
> > > +static inline bool tld_key_is_err(tld_key_t key)
> > > +{
> > > + return key.off < 0;
> > > +}
> > > +
> > > +__attribute__((unused))
> > > +static inline int tld_key_err_or_zero(tld_key_t key)
> > > +{
> > > + return tld_key_is_err(key) ? key.off : 0;
> > > +}
> > > +
> > > +/**
> > > + * tld_get_data() - Gets a pointer to the TLD associated with the key.
> > > + *
> > > + * @map_fd: A file descriptor of the underlying task local storage map,
> > > + * tld_data_map
> > > + * @key: A key object returned by tld_create_key().
> > > + *
> > > + * Returns a pointer to the TLD if the key is valid; NULL if no key has been
> > > + * added, not enough memory for TLD for this thread, or the key is invalid.
> > > + *
> > > + * Threads that call tld_get_data() must call tld_free() on exit to prevent
> > > + * memory leak.
> > > + */
> > > +__attribute__((unused))
> > > +static void *tld_get_data(int map_fd, tld_key_t key)
> > > +{
> > > + if (!READ_ONCE(tld_metadata_p))
> > > + return NULL;
> > > +
> > > + if (!tld_data_p && __tld_init_data(map_fd))
> > > + return NULL;
> >
> > Why call it again?
> > tld_create_key() should have done it, no?
> >
>
> A TLD is created by calling tld_create_key() once. Then, threads may
> call tld_get_data() to get their thread-specific TLD. So it is
> possible for a thread to call tld_get_data() with tld_data_p==NULL.
I see. Please add a comment.
> > > +
> > > + return tld_data_p->data + key.off;
> > > +}
> > > +
> > > +/**
> > > + * tld_free() - Frees task local data memory of the calling thread
> > > + */
> > > +__attribute__((unused))
> > > +static void tld_free(void)
> > > +{
> > > + if (tld_data_p)
> > > + free(tld_data_p);
> > > +}
> >
> > Since this .h allocates tld_metadata_p, it probably needs
> > a helper to free it too.
> >
> > > +
> > > +#endif /* __TASK_LOCAL_DATA_H */
> > > diff --git a/tools/testing/selftests/bpf/progs/task_local_data.bpf.h b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> > > new file mode 100644
> > > index 000000000000..5f48e408a5e5
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> > > @@ -0,0 +1,220 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +#ifndef __TASK_LOCAL_DATA_BPF_H
> > > +#define __TASK_LOCAL_DATA_BPF_H
> > > +
> > > +/*
> > > + * Task local data is a library that facilitates sharing per-task data
> > > + * between user space and bpf programs.
> > > + *
> > > + *
> > > + * PREREQUISITE
> > > + *
> > > + * A TLD, an entry of data in task local data, first needs to be created by the
> > > + * user space. This is done by calling user space API, tld_create_key(), with
> > > + * the name of the TLD and the size.
> > > + *
> > > + * tld_key_t prio, in_cs;
> > > + *
> > > + * prio = tld_create_key("priority", sizeof(int));
> > > + * in_cs = tld_create_key("in_critical_section", sizeof(bool));
> > > + *
> > > + * A key associated with the TLD, which has an opaque type tld_key_t, will be
> > > + * returned. It can be used to get a pointer to the TLD in the user space by
> > > + * calling tld_get_data().
> > > + *
> > > + *
> > > + * USAGE
> > > + *
> > > + * Similar to user space, bpf programs locate a TLD using the same key.
> > > + * tld_fetch_key() allows bpf programs to retrieve a key created in the user
> > > + * space by name, which is specified in the second argument of tld_create_key().
> > > + * tld_fetch_key() additionally will cache the key in a task local storage map,
> > > + * tld_key_map, to avoid performing costly string comparisons every time when
> > > + * accessing a TLD. This requires the developer to define the map value type of
> > > + * tld_key_map, struct tld_keys. It only needs to contain keys used by bpf
> > > + * programs in the compilation unit.
> > > + *
> > > + * struct tld_keys {
> > > + * tld_key_t prio;
> > > + * tld_key_t in_cs;
> > > + * };
> > > + *
> > > + * Then, for every new task, a bpf program will fetch and cache keys once and
> > > + * for all. This should be done ideally in a non-critical path (e.g., in
> > > + * sched_ext_ops::init_task).
> > > + *
> > > + * struct tld_object tld_obj;
> > > + *
> > > + * err = tld_object_init(task, &tld);
> > > + * if (err)
> > > + * return 0;
> > > + *
> > > + * tld_fetch_key(&tld_obj, "priority", prio);
> > > + * tld_fetch_key(&tld_obj, "in_critical_section", in_cs);
> > > + *
> > > + * Note that, the first argument of tld_fetch_key() is a pointer to tld_object.
> > > + * It should be declared as a stack variable and initialized via tld_object_init().
> > > + *
> > > + * Finally, just like user space programs, bpf programs can get a pointer to a
> > > + * TLD by calling tld_get_data(), with cached keys.
> > > + *
> > > + * int *p;
> > > + *
> > > + * p = tld_get_data(&tld_obj, prio, sizeof(int));
> > > + * if (p)
> > > + * // do something depending on *p
> > > + */
> > > +#include <errno.h>
> > > +#include <bpf/bpf_helpers.h>
> > > +
> > > +#define TLD_DATA_SIZE __PAGE_SIZE
> > > +#define TLD_DATA_CNT 63
> > > +#define TLD_NAME_LEN 62
> > > +
> > > +typedef struct {
> > > + __s16 off;
> > > +} tld_key_t;
> > > +
> > > +struct u_tld_data *dummy_data;
> > > +struct u_tld_metadata *dummy_metadata;
> > > +
> > > +struct tld_metadata {
> > > + char name[TLD_NAME_LEN];
> > > + __u16 size;
> > > +};
> > > +
> > > +struct u_tld_metadata {
> > > + __u8 cnt;
> > > + __u8 padding[63];
> > > + struct tld_metadata metadata[TLD_DATA_CNT];
> > > +};
> > > +
> > > +struct u_tld_data {
> > > + char data[TLD_DATA_SIZE];
> > > +};
> > > +
> > > +struct tld_map_value {
> > > + struct u_tld_data __uptr *data;
> > > + struct u_tld_metadata __uptr *metadata;
> > > +};
> > > +
> > > +struct tld_object {
> > > + struct tld_map_value *data_map;
> > > + struct tld_keys *key_map;
> > > +};
> > > +
> > > +/*
> > > + * Map value of tld_key_map for caching keys. Must be defined by the developer.
> > > + * Members should be tld_key_t and passed to the 3rd argument of tld_fetch_key().
> > > + */
> > > +struct tld_keys;
> > > +
> > > +struct {
> > > + __uint(type, BPF_MAP_TYPE_TASK_STORAGE);
> > > + __uint(map_flags, BPF_F_NO_PREALLOC);
> > > + __type(key, int);
> > > + __type(value, struct tld_map_value);
> > > +} tld_data_map SEC(".maps");
> > > +
> > > +struct {
> > > + __uint(type, BPF_MAP_TYPE_TASK_STORAGE);
> > > + __uint(map_flags, BPF_F_NO_PREALLOC);
> > > + __type(key, int);
> > > + __type(value, struct tld_keys);
> > > +} tld_key_map SEC(".maps");
> > > +
> > > +/**
> > > + * tld_object_init() - Initializes a tld_object.
> > > + *
> > > + * @task: The task_struct of the target task
> > > + * @tld_obj: A pointer to a tld_object to be initialized
> > > + *
> > > + * Returns 0 on success; -ENODATA if the task has no TLD; -ENOMEM if the creation
> > > + * of tld_key_map fails
> > > + */
> > > +__attribute__((unused))
> > > +static int tld_object_init(struct task_struct *task, struct tld_object *tld_obj)
> > > +{
> > > + tld_obj->data_map = bpf_task_storage_get(&tld_data_map, task, 0, 0);
> > > + if (!tld_obj->data_map)
> > > + return -ENODATA;
> > > +
> > > + tld_obj->key_map = bpf_task_storage_get(&tld_key_map, task, 0,
> > > + BPF_LOCAL_STORAGE_GET_F_CREATE);
> > > + if (!tld_obj->key_map)
> > > + return -ENOMEM;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +/**
> > > + * tld_fetch_key() - Fetches the key to a TLD by name. The key has to be created
> > > + * by user space first with tld_create_key().
> > > + *
> > > + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init()
> > > + * @name: The name of the key associated with a TLD
> > > + * @key: The key in struct tld_keys to be saved to
> > > + *
> > > + * Returns a positive integer on success; 0 otherwise
> > > + */
> > > +#define tld_fetch_key(tld_obj, name, key) \
> > > + ({ \
> > > + (tld_obj)->key_map->key.off = __tld_fetch_key(tld_obj, name); \
> > > + })
> > > +
> > > +__attribute__((unused))
> > > +static u16 __tld_fetch_key(struct tld_object *tld_obj, const char *name)
> > > +{
> > > + int i, meta_off, cnt;
> > > + void *metadata, *nm, *sz;
> > > + u16 off = 0;
> > > +
> > > + if (!tld_obj->data_map || !tld_obj->data_map->metadata)
> > > + return 0;
> > > +
> > > + cnt = tld_obj->data_map->metadata->cnt;
> > > + metadata = tld_obj->data_map->metadata->metadata;
> > > +
> > > + bpf_for(i, 0, cnt) {
> > > + meta_off = i * sizeof(struct tld_metadata);
> > > + if (meta_off > TLD_DATA_SIZE - offsetof(struct u_tld_metadata, metadata)
> > > + - sizeof(struct tld_metadata))
> > > + break;
> > > +
> > > + nm = metadata + meta_off + offsetof(struct tld_metadata, name);
> > > + sz = metadata + meta_off + offsetof(struct tld_metadata, size);
> > > +
> > > + /*
> > > + * Reserve 0 for uninitialized keys. Increase the offset of a valid key
> > > + * by one and subtract it later in tld_get_data().
> > > + */
> > > + if (!bpf_strncmp(nm, TLD_NAME_LEN, name))
> > > + return off + 1;
> >
> > I think all this +1, -1 dance is doing is helping to catch an
> > error when tld_get_data() is called without tld_fetch_key().
> > I feel this is too defensive.
> >
> > Let tld_fetch_key() return proper negative error code.
> >
>
> I can certainly return negative error code.
>
> But for the +1, -1 logic, I think is a simpler way to differentiate an
> uninitialized key in tld_key_map from the first TLD (both key.off ==
> 0). After all, bpf programs can call tld_get_data() without
> tld_fetch_key().
I'm saying we don't need to catch this case.
progs should not call tld_get_data() without tld_fetch_key().
If they do, it's a bug.
>
> > > +
> > > + off += *(u16 *)sz;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +/**
> > > + * tld_get_data() - Retrieves a pointer to the TLD associated with the key.
> > > + *
> > > + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init()
> > > + * @key: The key of a TLD saved in tld_maps
> > > + * @size: The size of the TLD. Must be a known constant value
> > > + *
> > > + * Returns a pointer to the TLD data associated with the key; NULL if the key
> > > + * is not valid or the size is too big
> > > + */
> > > +#define tld_get_data(tld_obj, key, size) \
> > > + __tld_get_data(tld_obj, (tld_obj)->key_map->key.off - 1, size)
> > > +
> > > +__attribute__((unused))
> > > +__always_inline void *__tld_get_data(struct tld_object *tld_obj, u32 off, u32 size)
> > > +{
> > > + return (tld_obj->data_map->data && off >= 0 && off < TLD_DATA_SIZE - size) ?
> > > + (void *)tld_obj->data_map->data + off : NULL;
> >
> > If we require users to call tld_fetch_key() first and check for errors
> > tld_get_data() can be faster:
> > #define tld_get_data(tld_obj, key)
> > (void *)tld_obj->data_map->data + tld_obj->key_map->key.off
> >
>
> tld_get_data() can be called in a bpf program without tld_fetch_key(),
> so checking tld_obj->data_map->data is needed as the first load from
> tld_obj->data_map->data yields a "mem_or_null".
>
> I did try to save this uptr "mem" after null check to stack (e.g., in
> a tld_object) so that we can save subsequent checks. However, the
> compiler sometime will do a fresh load from map_ptr when reading
> tld_obj->data_map->data. Might need some work or trick to make it
> happen.
likely because you do tld_obj->data_map->data twice.
> > I wouldn't bother with extra checks, especially for size.
> >
>
> It needs to be bound-checked. If tld_get_data() doesn't do it, bpf
> programs have to do it before acceess. Otherwise:
>
> ; return (tld_obj->data_map->data && off >= 0) ? @ task_local_data.bpf.h:218
> 25: (bf) r3 = r1 ; R1_w=mem(sz=4096) R3_w=mem(sz=4096)
> 26: (0f) r3 += r2 ;
> R2_w=scalar(smin=0,smax=umax=0xffffffff,smin32=0xffff7fff,smax32=32766,var_off=(0x0;
> 0xffffffff)) R3_w=mem(sz=4096,smin=0,smax=umax=0xffffffff,var_off=(0x0;
> 0xfffffff)
> ; test_value1 = *int_p; @ test_task_local_data.c:63
> 27: (61) r2 = *(u32 *)(r3 +0)
> R3 unbounded memory access, make sure to bounds check any such access
That's easy to fix.
Then something like:
#define tld_get_data(tld_obj, key) \
({
void * data = tld_obj->data_map->data;
if (data)
data += tld_obj->key_map->key.off & (PAGE_SIZE - 1);
data;
})
size is really not needed. The verifier sees it as one page.
Bad bpf prog can write into the wrong key and the verifier cannot stop it.
> > Bigger question.. can we cache the whole pointer for each key ?
> > and then
> > #define tld_get_data(tld_obj, key) ld_obj->key_map->key
>
> Maybe define the member type of tld_key_map as __uptr and allow bpf
> programs to update a uptr field with a valid uptr?
yeah. That indeed gets complicated. Maybe it's possible with some
verifier changes, but let's not go there yet.
The tld_get_data() proposed above is speedy enough.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-15 21:16 ` [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task " Amery Hung
2025-05-16 18:57 ` Alexei Starovoitov
@ 2025-05-19 20:04 ` Tejun Heo
2025-05-20 6:44 ` Amery Hung
2025-05-20 22:57 ` Andrii Nakryiko
2025-05-22 8:36 ` Tony Ambardar
3 siblings, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2025-05-19 20:04 UTC (permalink / raw)
To: Amery Hung
Cc: bpf, netdev, alexei.starovoitov, andrii, daniel, memxor,
martin.lau, kernel-team
Hello,
On Thu, May 15, 2025 at 02:16:00PM -0700, Amery Hung wrote:
...
> +#define PAGE_SIZE 4096
This might conflict with other definitions. Looks like non-4k page sizes are
a lot more popular on arm. Would this be a problem?
> +static int __tld_init_metadata(int map_fd)
> +{
> + struct u_tld_metadata *new_metadata;
> + struct tld_map_value map_val;
> + int task_fd = 0, err;
> +
> + task_fd = syscall(SYS_pidfd_open, getpid(), 0);
> + if (task_fd < 0) {
> + err = -errno;
> + goto out;
> + }
> +
> + new_metadata = aligned_alloc(PAGE_SIZE, PAGE_SIZE);
Is 4k size limit from UPTR? Is it still 4k on machines with >4k pages? If
this isn't a hard limit from UPTR, would it make sense to encode the size in
the header part of the metadata?
> +static int __tld_init_data(int map_fd)
> +{
> + struct u_tld_data *new_data = NULL;
> + struct tld_map_value map_val;
> + int err, task_fd = 0;
> +
> + task_fd = syscall(SYS_pidfd_open, gettid(), PIDFD_THREAD);
> + if (task_fd < 0) {
> + err = -errno;
> + goto out;
> + }
> +
> + new_data = aligned_alloc(PAGE_SIZE, TLD_DATA_SIZE);
Ditto.
Noob question. Does this means that each thread will map a 4k page no matter
how much data it actually uses?
> +__attribute__((unused))
> +static tld_key_t tld_create_key(int map_fd, const char *name, size_t size)
> +{
> + int err, i, cnt, sz, off = 0;
> +
> + if (!READ_ONCE(tld_metadata_p)) {
> + err = __tld_init_metadata(map_fd);
> + if (err)
> + return (tld_key_t) {.off = err};
> + }
> +
> + if (!tld_data_p) {
> + err = __tld_init_data(map_fd);
> + if (err)
> + return (tld_key_t) {.off = err};
> + }
> +
> + size = round_up(size, 8);
> +
> + for (i = 0; i < TLD_DATA_CNT; i++) {
> +retry:
> + cnt = __atomic_load_n(&tld_metadata_p->cnt, __ATOMIC_RELAXED);
> + if (i < cnt) {
> + /*
> + * Pending tld_create_key() uses size to signal if the metadata has
> + * been fully updated.
> + */
> + while (!(sz = __atomic_load_n(&tld_metadata_p->metadata[i].size,
> + __ATOMIC_ACQUIRE)))
> + sched_yield();
> +
> + if (!strncmp(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN))
> + return (tld_key_t) {.off = -EEXIST};
> +
> + off += sz;
> + continue;
> + }
> +
> + if (off + size > TLD_DATA_SIZE)
> + return (tld_key_t) {.off = -E2BIG};
> +
> + /*
> + * Only one tld_create_key() can increase the current cnt by one and
> + * takes the latest available slot. Other threads will check again if a new
> + * TLD can still be added, and then compete for the new slot after the
> + * succeeding thread update the size.
> + */
> + if (!__atomic_compare_exchange_n(&tld_metadata_p->cnt, &cnt, cnt + 1, true,
> + __ATOMIC_RELAXED, __ATOMIC_RELAXED))
> + goto retry;
> +
> + strncpy(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN);
> + __atomic_store_n(&tld_metadata_p->metadata[i].size, size, __ATOMIC_RELEASE);
> + return (tld_key_t) {.off = off};
> + }
> +
> + return (tld_key_t) {.off = -ENOSPC};
> +}
This looks fine to me but I wonder whether run-length encoding the key
strings would be more efficient and less restrictive in terms of key length.
e.g.:
struct key {
u32 data_len;
u16 key_off;
u16 key_len;
};
struct metadata {
struct key keys[MAX_KEYS];
char key_strs[SOME_SIZE];
};
The logic can be mostly the same. The only difference would be that key
string is not inline. Determine winner in the creation path by compxchg'ing
on data_len, but set key_off and key_len only after key string is updated.
Losing on cmpxhcg or seeing an entry where key_len is zero means that that
one lost and should relax and retry. It can still use the same 4k metadata
page but will likely be able to allow more keys while also relaxing
restrictions on key length.
Hmm... maybe making the key string variably sized makes things difficult for
the BPF code. If so (or for any other reasons), please feel free to ignore
the above.
> +#endif /* __TASK_LOCAL_DATA_H */
> diff --git a/tools/testing/selftests/bpf/progs/task_local_data.bpf.h b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> new file mode 100644
> index 000000000000..5f48e408a5e5
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
...
> +/**
> + * tld_get_data() - Retrieves a pointer to the TLD associated with the key.
> + *
> + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init()
> + * @key: The key of a TLD saved in tld_maps
> + * @size: The size of the TLD. Must be a known constant value
> + *
> + * Returns a pointer to the TLD data associated with the key; NULL if the key
> + * is not valid or the size is too big
> + */
> +#define tld_get_data(tld_obj, key, size) \
> + __tld_get_data(tld_obj, (tld_obj)->key_map->key.off - 1, size)
> +
> +__attribute__((unused))
> +__always_inline void *__tld_get_data(struct tld_object *tld_obj, u32 off, u32 size)
> +{
> + return (tld_obj->data_map->data && off >= 0 && off < TLD_DATA_SIZE - size) ?
> + (void *)tld_obj->data_map->data + off : NULL;
> +}
Neat.
Generally looks great to me. The only thing I wonder is whether the data
area sizing can be determined at init time rather than fixed to 4k.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-19 20:04 ` Tejun Heo
@ 2025-05-20 6:44 ` Amery Hung
0 siblings, 0 replies; 17+ messages in thread
From: Amery Hung @ 2025-05-20 6:44 UTC (permalink / raw)
To: Tejun Heo
Cc: bpf, netdev, alexei.starovoitov, andrii, daniel, memxor,
martin.lau, kernel-team
On Mon, May 19, 2025 at 1:04 PM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Thu, May 15, 2025 at 02:16:00PM -0700, Amery Hung wrote:
> ...
> > +#define PAGE_SIZE 4096
>
> This might conflict with other definitions. Looks like non-4k page sizes are
> a lot more popular on arm. Would this be a problem?
>
> > +static int __tld_init_metadata(int map_fd)
> > +{
> > + struct u_tld_metadata *new_metadata;
> > + struct tld_map_value map_val;
> > + int task_fd = 0, err;
> > +
> > + task_fd = syscall(SYS_pidfd_open, getpid(), 0);
> > + if (task_fd < 0) {
> > + err = -errno;
> > + goto out;
> > + }
> > +
> > + new_metadata = aligned_alloc(PAGE_SIZE, PAGE_SIZE);
>
> Is 4k size limit from UPTR? Is it still 4k on machines with >4k pages? If
> this isn't a hard limit from UPTR, would it make sense to encode the size in
> the header part of the metadata?
>
UPTR size limit is a page. I will make PAGE_SIZE arch dependent. For
metadata, since all threads of a process share one metadata page, I
think it is okay to make it a fixed size. For data, I think it makes
sense to encode it in the header of metadata.
> > +static int __tld_init_data(int map_fd)
> > +{
> > + struct u_tld_data *new_data = NULL;
> > + struct tld_map_value map_val;
> > + int err, task_fd = 0;
> > +
> > + task_fd = syscall(SYS_pidfd_open, gettid(), PIDFD_THREAD);
> > + if (task_fd < 0) {
> > + err = -errno;
> > + goto out;
> > + }
> > +
> > + new_data = aligned_alloc(PAGE_SIZE, TLD_DATA_SIZE);
>
> Ditto.
>
> Noob question. Does this means that each thread will map a 4k page no matter
> how much data it actually uses?
Unfortunately this is the case currently, but hey maybe we can make
data size dynamic
>
> > +__attribute__((unused))
> > +static tld_key_t tld_create_key(int map_fd, const char *name, size_t size)
> > +{
> > + int err, i, cnt, sz, off = 0;
> > +
> > + if (!READ_ONCE(tld_metadata_p)) {
> > + err = __tld_init_metadata(map_fd);
> > + if (err)
> > + return (tld_key_t) {.off = err};
> > + }
> > +
> > + if (!tld_data_p) {
> > + err = __tld_init_data(map_fd);
> > + if (err)
> > + return (tld_key_t) {.off = err};
> > + }
> > +
> > + size = round_up(size, 8);
> > +
> > + for (i = 0; i < TLD_DATA_CNT; i++) {
> > +retry:
> > + cnt = __atomic_load_n(&tld_metadata_p->cnt, __ATOMIC_RELAXED);
> > + if (i < cnt) {
> > + /*
> > + * Pending tld_create_key() uses size to signal if the metadata has
> > + * been fully updated.
> > + */
> > + while (!(sz = __atomic_load_n(&tld_metadata_p->metadata[i].size,
> > + __ATOMIC_ACQUIRE)))
> > + sched_yield();
> > +
> > + if (!strncmp(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN))
> > + return (tld_key_t) {.off = -EEXIST};
> > +
> > + off += sz;
> > + continue;
> > + }
> > +
> > + if (off + size > TLD_DATA_SIZE)
> > + return (tld_key_t) {.off = -E2BIG};
> > +
> > + /*
> > + * Only one tld_create_key() can increase the current cnt by one and
> > + * takes the latest available slot. Other threads will check again if a new
> > + * TLD can still be added, and then compete for the new slot after the
> > + * succeeding thread update the size.
> > + */
> > + if (!__atomic_compare_exchange_n(&tld_metadata_p->cnt, &cnt, cnt + 1, true,
> > + __ATOMIC_RELAXED, __ATOMIC_RELAXED))
> > + goto retry;
> > +
> > + strncpy(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN);
> > + __atomic_store_n(&tld_metadata_p->metadata[i].size, size, __ATOMIC_RELEASE);
> > + return (tld_key_t) {.off = off};
> > + }
> > +
> > + return (tld_key_t) {.off = -ENOSPC};
> > +}
>
> This looks fine to me but I wonder whether run-length encoding the key
> strings would be more efficient and less restrictive in terms of key length.
> e.g.:
>
> struct key {
> u32 data_len;
> u16 key_off;
> u16 key_len;
> };
>
> struct metadata {
> struct key keys[MAX_KEYS];
> char key_strs[SOME_SIZE];
> };
>
> The logic can be mostly the same. The only difference would be that key
> string is not inline. Determine winner in the creation path by compxchg'ing
> on data_len, but set key_off and key_len only after key string is updated.
> Losing on cmpxhcg or seeing an entry where key_len is zero means that that
> one lost and should relax and retry. It can still use the same 4k metadata
> page but will likely be able to allow more keys while also relaxing
> restrictions on key length.
>
> Hmm... maybe making the key string variably sized makes things difficult for
> the BPF code. If so (or for any other reasons), please feel free to ignore
> the above.
I think this is a great suggestion. The current implementation may
waste spaces in metadata if a key does not use all 62 bytes. I don't
see an obvious obstacle in bpf. I will try to incorporate this in the
next respin.
>
> > +#endif /* __TASK_LOCAL_DATA_H */
> > diff --git a/tools/testing/selftests/bpf/progs/task_local_data.bpf.h b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> > new file mode 100644
> > index 000000000000..5f48e408a5e5
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> ...
> > +/**
> > + * tld_get_data() - Retrieves a pointer to the TLD associated with the key.
> > + *
> > + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init()
> > + * @key: The key of a TLD saved in tld_maps
> > + * @size: The size of the TLD. Must be a known constant value
> > + *
> > + * Returns a pointer to the TLD data associated with the key; NULL if the key
> > + * is not valid or the size is too big
> > + */
> > +#define tld_get_data(tld_obj, key, size) \
> > + __tld_get_data(tld_obj, (tld_obj)->key_map->key.off - 1, size)
> > +
> > +__attribute__((unused))
> > +__always_inline void *__tld_get_data(struct tld_object *tld_obj, u32 off, u32 size)
> > +{
> > + return (tld_obj->data_map->data && off >= 0 && off < TLD_DATA_SIZE - size) ?
> > + (void *)tld_obj->data_map->data + off : NULL;
> > +}
>
> Neat.
>
> Generally looks great to me. The only thing I wonder is whether the data
> area sizing can be determined at init time rather than fixed to 4k.
>
I think we can achieve it by first limiting tld_create_key() to the
init phase (i.e., only calling them in C/C++ constructor). Then,
tld_create_key() will not allocate memory for data. Instead, on the
first call to tld_get_data(), we freeze the size of the data area and
allocate the memory just enough or round up to the power of two.
For C, we can define a new macro API (e.g., tld_define_key()) that
generates a __attribute__((constructor)) function that in turn calls
tld_create_key().
> Thanks.
>
> --
> tejun
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-16 22:22 ` Alexei Starovoitov
@ 2025-05-20 7:36 ` Amery Hung
2025-05-20 20:03 ` Alexei Starovoitov
0 siblings, 1 reply; 17+ messages in thread
From: Amery Hung @ 2025-05-20 7:36 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
Tejun Heo, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Kernel Team
On Fri, May 16, 2025 at 3:22 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, May 16, 2025 at 1:41 PM Amery Hung <ameryhung@gmail.com> wrote:
> >
> > > > + size = round_up(size, 8);
> > >
> > > why roundup ? and to 8 in particular?
> > > If user space wants byte size keys, why not let it?
> >
> > I will remove it. This was to prevent breaking using TLD in atomic
> > operations, but it should be very unlikely as they are thread
> > specific.
>
> You mean for a case where one part of the app (like a shared library)
> is using u32, but the other is using u64 and doing atomic ops on it?
>
> Make sense to align the off set by tld_create_key(),
> but it can be done without rounding up all previous keys to 8.
> 63 bytes in the header are wasted. So use 2 as an offset.
> A single cmpxchg 4 byte can update cnt+offset.
> Actually why store size in each tld_metadata ?
> Won't the logic will be simpler if it's an offset ?
> bpf side tld_fetch_key() wouldn't need to count.
>
I changed to size since metadata is initialized to 0 and size == 0 can
be used to signal pending metadata update, while 0 is a valid offset.
I will save offset in metadata in the next respin. Tejun suggested a
run-length key encoding, and there are other fields in the metadata
that can be used for the signaling.
> > > > + if (i < cnt) {
> > > > + /*
> > > > + * Pending tld_create_key() uses size to signal if the metadata has
> > > > + * been fully updated.
> > > > + */
> > > > + while (!(sz = __atomic_load_n(&tld_metadata_p->metadata[i].size,
> > > > + __ATOMIC_ACQUIRE)))
> > > > + sched_yield();
> > > > +
> > > > + if (!strncmp(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN))
> > > > + return (tld_key_t) {.off = -EEXIST};
> > > > +
> > > > + off += sz;
> > > > + continue;
> > > > + }
> > > > +
> > > > + if (off + size > TLD_DATA_SIZE)
> > > > + return (tld_key_t) {.off = -E2BIG};
> > > > +
> > > > + /*
> > > > + * Only one tld_create_key() can increase the current cnt by one and
> > > > + * takes the latest available slot. Other threads will check again if a new
> > > > + * TLD can still be added, and then compete for the new slot after the
> > > > + * succeeding thread update the size.
> > > > + */
> > > > + if (!__atomic_compare_exchange_n(&tld_metadata_p->cnt, &cnt, cnt + 1, true,
> > > > + __ATOMIC_RELAXED, __ATOMIC_RELAXED))
> > >
> > > weak and relaxed/relaxed ?
> >
> > I can't see reordering issue with cnt so I choose to use relax. I can
> > change to strong acq/rel just to be safe.
> >
> > > That's unusual.
> > > I don't know what it is supposed to do.
> > > I think weak=false and __ATOMIC_ACQUIRE, __ATOMIC_RELAXED
> > > would look as expected.
> > >
> >
> > Do you mean weak=false and __ATOMIC_RELAXED, __ATOMIC_ACQUIRE?
>
> no idea. I just grepped the kernel and saw:
> TEST_KERNEL_LOCKED(atomic_builtin_with_memorder,
> __atomic_compare_exchange_n(flag, &v, 1, 0,
> __ATOMIC_ACQUIRE, __ATOMIC_RELAXED),
> __atomic_store_n(flag, 0, __ATOMIC_RELEASE));
> TEST_KERNEL_LOCKED(atomic_builtin_wrong_memorder,
> __atomic_compare_exchange_n(flag, &v, 1, 0,
> __ATOMIC_RELAXED, __ATOMIC_RELAXED),
> __atomic_store_n(flag, 0, __ATOMIC_RELAXED));
>
> I'd just use __ATOMIC_SEQ_CST everywhere.
> Speed is not important here.
Make sense. I will use __ATOMIC_SEQ_CST. Thanks for the suggestion.
>
> >
> > > > + goto retry;
> > > > +
> > > > + strncpy(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN);
> > > > + __atomic_store_n(&tld_metadata_p->metadata[i].size, size, __ATOMIC_RELEASE);
> > > > + return (tld_key_t) {.off = off};
> > > > + }
> > > > +
> > > > + return (tld_key_t) {.off = -ENOSPC};
> > > > +}
> > > > +
> > > > +__attribute__((unused))
> > > > +static inline bool tld_key_is_err(tld_key_t key)
> > > > +{
> > > > + return key.off < 0;
> > > > +}
> > > > +
> > > > +__attribute__((unused))
> > > > +static inline int tld_key_err_or_zero(tld_key_t key)
> > > > +{
> > > > + return tld_key_is_err(key) ? key.off : 0;
> > > > +}
> > > > +
> > > > +/**
> > > > + * tld_get_data() - Gets a pointer to the TLD associated with the key.
> > > > + *
> > > > + * @map_fd: A file descriptor of the underlying task local storage map,
> > > > + * tld_data_map
> > > > + * @key: A key object returned by tld_create_key().
> > > > + *
> > > > + * Returns a pointer to the TLD if the key is valid; NULL if no key has been
> > > > + * added, not enough memory for TLD for this thread, or the key is invalid.
> > > > + *
> > > > + * Threads that call tld_get_data() must call tld_free() on exit to prevent
> > > > + * memory leak.
> > > > + */
> > > > +__attribute__((unused))
> > > > +static void *tld_get_data(int map_fd, tld_key_t key)
> > > > +{
> > > > + if (!READ_ONCE(tld_metadata_p))
> > > > + return NULL;
> > > > +
> > > > + if (!tld_data_p && __tld_init_data(map_fd))
> > > > + return NULL;
> > >
> > > Why call it again?
> > > tld_create_key() should have done it, no?
> > >
> >
> > A TLD is created by calling tld_create_key() once. Then, threads may
> > call tld_get_data() to get their thread-specific TLD. So it is
> > possible for a thread to call tld_get_data() with tld_data_p==NULL.
>
> I see. Please add a comment.
I will explain it in the comment.
>
> > > > +
> > > > + return tld_data_p->data + key.off;
> > > > +}
> > > > +
> > > > +/**
> > > > + * tld_free() - Frees task local data memory of the calling thread
> > > > + */
> > > > +__attribute__((unused))
> > > > +static void tld_free(void)
> > > > +{
> > > > + if (tld_data_p)
> > > > + free(tld_data_p);
> > > > +}
> > >
> > > Since this .h allocates tld_metadata_p, it probably needs
> > > a helper to free it too.
> > >
> > > > +
> > > > +#endif /* __TASK_LOCAL_DATA_H */
> > > > diff --git a/tools/testing/selftests/bpf/progs/task_local_data.bpf.h b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> > > > new file mode 100644
> > > > index 000000000000..5f48e408a5e5
> > > > --- /dev/null
> > > > +++ b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> > > > @@ -0,0 +1,220 @@
> > > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > > +#ifndef __TASK_LOCAL_DATA_BPF_H
> > > > +#define __TASK_LOCAL_DATA_BPF_H
> > > > +
> > > > +/*
> > > > + * Task local data is a library that facilitates sharing per-task data
> > > > + * between user space and bpf programs.
> > > > + *
> > > > + *
> > > > + * PREREQUISITE
> > > > + *
> > > > + * A TLD, an entry of data in task local data, first needs to be created by the
> > > > + * user space. This is done by calling user space API, tld_create_key(), with
> > > > + * the name of the TLD and the size.
> > > > + *
> > > > + * tld_key_t prio, in_cs;
> > > > + *
> > > > + * prio = tld_create_key("priority", sizeof(int));
> > > > + * in_cs = tld_create_key("in_critical_section", sizeof(bool));
> > > > + *
> > > > + * A key associated with the TLD, which has an opaque type tld_key_t, will be
> > > > + * returned. It can be used to get a pointer to the TLD in the user space by
> > > > + * calling tld_get_data().
> > > > + *
> > > > + *
> > > > + * USAGE
> > > > + *
> > > > + * Similar to user space, bpf programs locate a TLD using the same key.
> > > > + * tld_fetch_key() allows bpf programs to retrieve a key created in the user
> > > > + * space by name, which is specified in the second argument of tld_create_key().
> > > > + * tld_fetch_key() additionally will cache the key in a task local storage map,
> > > > + * tld_key_map, to avoid performing costly string comparisons every time when
> > > > + * accessing a TLD. This requires the developer to define the map value type of
> > > > + * tld_key_map, struct tld_keys. It only needs to contain keys used by bpf
> > > > + * programs in the compilation unit.
> > > > + *
> > > > + * struct tld_keys {
> > > > + * tld_key_t prio;
> > > > + * tld_key_t in_cs;
> > > > + * };
> > > > + *
> > > > + * Then, for every new task, a bpf program will fetch and cache keys once and
> > > > + * for all. This should be done ideally in a non-critical path (e.g., in
> > > > + * sched_ext_ops::init_task).
> > > > + *
> > > > + * struct tld_object tld_obj;
> > > > + *
> > > > + * err = tld_object_init(task, &tld);
> > > > + * if (err)
> > > > + * return 0;
> > > > + *
> > > > + * tld_fetch_key(&tld_obj, "priority", prio);
> > > > + * tld_fetch_key(&tld_obj, "in_critical_section", in_cs);
> > > > + *
> > > > + * Note that, the first argument of tld_fetch_key() is a pointer to tld_object.
> > > > + * It should be declared as a stack variable and initialized via tld_object_init().
> > > > + *
> > > > + * Finally, just like user space programs, bpf programs can get a pointer to a
> > > > + * TLD by calling tld_get_data(), with cached keys.
> > > > + *
> > > > + * int *p;
> > > > + *
> > > > + * p = tld_get_data(&tld_obj, prio, sizeof(int));
> > > > + * if (p)
> > > > + * // do something depending on *p
> > > > + */
> > > > +#include <errno.h>
> > > > +#include <bpf/bpf_helpers.h>
> > > > +
> > > > +#define TLD_DATA_SIZE __PAGE_SIZE
> > > > +#define TLD_DATA_CNT 63
> > > > +#define TLD_NAME_LEN 62
> > > > +
> > > > +typedef struct {
> > > > + __s16 off;
> > > > +} tld_key_t;
> > > > +
> > > > +struct u_tld_data *dummy_data;
> > > > +struct u_tld_metadata *dummy_metadata;
> > > > +
> > > > +struct tld_metadata {
> > > > + char name[TLD_NAME_LEN];
> > > > + __u16 size;
> > > > +};
> > > > +
> > > > +struct u_tld_metadata {
> > > > + __u8 cnt;
> > > > + __u8 padding[63];
> > > > + struct tld_metadata metadata[TLD_DATA_CNT];
> > > > +};
> > > > +
> > > > +struct u_tld_data {
> > > > + char data[TLD_DATA_SIZE];
> > > > +};
> > > > +
> > > > +struct tld_map_value {
> > > > + struct u_tld_data __uptr *data;
> > > > + struct u_tld_metadata __uptr *metadata;
> > > > +};
> > > > +
> > > > +struct tld_object {
> > > > + struct tld_map_value *data_map;
> > > > + struct tld_keys *key_map;
> > > > +};
> > > > +
> > > > +/*
> > > > + * Map value of tld_key_map for caching keys. Must be defined by the developer.
> > > > + * Members should be tld_key_t and passed to the 3rd argument of tld_fetch_key().
> > > > + */
> > > > +struct tld_keys;
> > > > +
> > > > +struct {
> > > > + __uint(type, BPF_MAP_TYPE_TASK_STORAGE);
> > > > + __uint(map_flags, BPF_F_NO_PREALLOC);
> > > > + __type(key, int);
> > > > + __type(value, struct tld_map_value);
> > > > +} tld_data_map SEC(".maps");
> > > > +
> > > > +struct {
> > > > + __uint(type, BPF_MAP_TYPE_TASK_STORAGE);
> > > > + __uint(map_flags, BPF_F_NO_PREALLOC);
> > > > + __type(key, int);
> > > > + __type(value, struct tld_keys);
> > > > +} tld_key_map SEC(".maps");
> > > > +
> > > > +/**
> > > > + * tld_object_init() - Initializes a tld_object.
> > > > + *
> > > > + * @task: The task_struct of the target task
> > > > + * @tld_obj: A pointer to a tld_object to be initialized
> > > > + *
> > > > + * Returns 0 on success; -ENODATA if the task has no TLD; -ENOMEM if the creation
> > > > + * of tld_key_map fails
> > > > + */
> > > > +__attribute__((unused))
> > > > +static int tld_object_init(struct task_struct *task, struct tld_object *tld_obj)
> > > > +{
> > > > + tld_obj->data_map = bpf_task_storage_get(&tld_data_map, task, 0, 0);
> > > > + if (!tld_obj->data_map)
> > > > + return -ENODATA;
> > > > +
> > > > + tld_obj->key_map = bpf_task_storage_get(&tld_key_map, task, 0,
> > > > + BPF_LOCAL_STORAGE_GET_F_CREATE);
> > > > + if (!tld_obj->key_map)
> > > > + return -ENOMEM;
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > +/**
> > > > + * tld_fetch_key() - Fetches the key to a TLD by name. The key has to be created
> > > > + * by user space first with tld_create_key().
> > > > + *
> > > > + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init()
> > > > + * @name: The name of the key associated with a TLD
> > > > + * @key: The key in struct tld_keys to be saved to
> > > > + *
> > > > + * Returns a positive integer on success; 0 otherwise
> > > > + */
> > > > +#define tld_fetch_key(tld_obj, name, key) \
> > > > + ({ \
> > > > + (tld_obj)->key_map->key.off = __tld_fetch_key(tld_obj, name); \
> > > > + })
> > > > +
> > > > +__attribute__((unused))
> > > > +static u16 __tld_fetch_key(struct tld_object *tld_obj, const char *name)
> > > > +{
> > > > + int i, meta_off, cnt;
> > > > + void *metadata, *nm, *sz;
> > > > + u16 off = 0;
> > > > +
> > > > + if (!tld_obj->data_map || !tld_obj->data_map->metadata)
> > > > + return 0;
> > > > +
> > > > + cnt = tld_obj->data_map->metadata->cnt;
> > > > + metadata = tld_obj->data_map->metadata->metadata;
> > > > +
> > > > + bpf_for(i, 0, cnt) {
> > > > + meta_off = i * sizeof(struct tld_metadata);
> > > > + if (meta_off > TLD_DATA_SIZE - offsetof(struct u_tld_metadata, metadata)
> > > > + - sizeof(struct tld_metadata))
> > > > + break;
> > > > +
> > > > + nm = metadata + meta_off + offsetof(struct tld_metadata, name);
> > > > + sz = metadata + meta_off + offsetof(struct tld_metadata, size);
> > > > +
> > > > + /*
> > > > + * Reserve 0 for uninitialized keys. Increase the offset of a valid key
> > > > + * by one and subtract it later in tld_get_data().
> > > > + */
> > > > + if (!bpf_strncmp(nm, TLD_NAME_LEN, name))
> > > > + return off + 1;
> > >
> > > I think all this +1, -1 dance is doing is helping to catch an
> > > error when tld_get_data() is called without tld_fetch_key().
> > > I feel this is too defensive.
> > >
> > > Let tld_fetch_key() return proper negative error code.
> > >
> >
> > I can certainly return negative error code.
> >
> > But for the +1, -1 logic, I think is a simpler way to differentiate an
> > uninitialized key in tld_key_map from the first TLD (both key.off ==
> > 0). After all, bpf programs can call tld_get_data() without
> > tld_fetch_key().
>
> I'm saying we don't need to catch this case.
> progs should not call tld_get_data() without tld_fetch_key().
> If they do, it's a bug.
>
Got it. I will document this in the comment of tld_get_data().
> >
> > > > +
> > > > + off += *(u16 *)sz;
> > > > + }
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > +/**
> > > > + * tld_get_data() - Retrieves a pointer to the TLD associated with the key.
> > > > + *
> > > > + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init()
> > > > + * @key: The key of a TLD saved in tld_maps
> > > > + * @size: The size of the TLD. Must be a known constant value
> > > > + *
> > > > + * Returns a pointer to the TLD data associated with the key; NULL if the key
> > > > + * is not valid or the size is too big
> > > > + */
> > > > +#define tld_get_data(tld_obj, key, size) \
> > > > + __tld_get_data(tld_obj, (tld_obj)->key_map->key.off - 1, size)
> > > > +
> > > > +__attribute__((unused))
> > > > +__always_inline void *__tld_get_data(struct tld_object *tld_obj, u32 off, u32 size)
> > > > +{
> > > > + return (tld_obj->data_map->data && off >= 0 && off < TLD_DATA_SIZE - size) ?
> > > > + (void *)tld_obj->data_map->data + off : NULL;
> > >
> > > If we require users to call tld_fetch_key() first and check for errors
> > > tld_get_data() can be faster:
> > > #define tld_get_data(tld_obj, key)
> > > (void *)tld_obj->data_map->data + tld_obj->key_map->key.off
> > >
> >
> > tld_get_data() can be called in a bpf program without tld_fetch_key(),
> > so checking tld_obj->data_map->data is needed as the first load from
> > tld_obj->data_map->data yields a "mem_or_null".
> >
> > I did try to save this uptr "mem" after null check to stack (e.g., in
> > a tld_object) so that we can save subsequent checks. However, the
> > compiler sometime will do a fresh load from map_ptr when reading
> > tld_obj->data_map->data. Might need some work or trick to make it
> > happen.
>
> likely because you do tld_obj->data_map->data twice.
>
> > > I wouldn't bother with extra checks, especially for size.
> > >
> >
> > It needs to be bound-checked. If tld_get_data() doesn't do it, bpf
> > programs have to do it before acceess. Otherwise:
> >
> > ; return (tld_obj->data_map->data && off >= 0) ? @ task_local_data.bpf.h:218
> > 25: (bf) r3 = r1 ; R1_w=mem(sz=4096) R3_w=mem(sz=4096)
> > 26: (0f) r3 += r2 ;
> > R2_w=scalar(smin=0,smax=umax=0xffffffff,smin32=0xffff7fff,smax32=32766,var_off=(0x0;
> > 0xffffffff)) R3_w=mem(sz=4096,smin=0,smax=umax=0xffffffff,var_off=(0x0;
> > 0xfffffff)
> > ; test_value1 = *int_p; @ test_task_local_data.c:63
> > 27: (61) r2 = *(u32 *)(r3 +0)
> > R3 unbounded memory access, make sure to bounds check any such access
>
> That's easy to fix.
> Then something like:
> #define tld_get_data(tld_obj, key) \
> ({
> void * data = tld_obj->data_map->data;
> if (data)
> data += tld_obj->key_map->key.off & (PAGE_SIZE - 1);
> data;
> })
>
> size is really not needed. The verifier sees it as one page.
> Bad bpf prog can write into the wrong key and the verifier cannot stop it.
>
key.off is a variable offset, so the verifier may assume key.off ==
PAGE_SIZE - 1. If a bpf program tries to dereference a pointer
returned by the proposed tld_get_data() as an int * without bound
check, the verifier will still consider this a potential out-of-bound
access:
invalid access to memory, mem_size=4096 off=4095 size=4
I think if there needs to be a bound check anyways, hiding it
tld_get_data() makes the user written part less complex.
> > > Bigger question.. can we cache the whole pointer for each key ?
> > > and then
> > > #define tld_get_data(tld_obj, key) ld_obj->key_map->key
> >
> > Maybe define the member type of tld_key_map as __uptr and allow bpf
> > programs to update a uptr field with a valid uptr?
>
> yeah. That indeed gets complicated. Maybe it's possible with some
> verifier changes, but let's not go there yet.
> The tld_get_data() proposed above is speedy enough.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-20 7:36 ` Amery Hung
@ 2025-05-20 20:03 ` Alexei Starovoitov
0 siblings, 0 replies; 17+ messages in thread
From: Alexei Starovoitov @ 2025-05-20 20:03 UTC (permalink / raw)
To: Amery Hung
Cc: bpf, Network Development, Andrii Nakryiko, Daniel Borkmann,
Tejun Heo, Kumar Kartikeya Dwivedi, Martin KaFai Lau, Kernel Team
On Tue, May 20, 2025 at 12:37 AM Amery Hung <ameryhung@gmail.com> wrote:
>
> > Then something like:
> > #define tld_get_data(tld_obj, key) \
> > ({
> > void * data = tld_obj->data_map->data;
> > if (data)
> > data += tld_obj->key_map->key.off & (PAGE_SIZE - 1);
> > data;
> > })
> >
> > size is really not needed. The verifier sees it as one page.
> > Bad bpf prog can write into the wrong key and the verifier cannot stop it.
> >
>
> key.off is a variable offset, so the verifier may assume key.off ==
> PAGE_SIZE - 1. If a bpf program tries to dereference a pointer
> returned by the proposed tld_get_data() as an int * without bound
> check, the verifier will still consider this a potential out-of-bound
> access:
>
> invalid access to memory, mem_size=4096 off=4095 size=4
>
> I think if there needs to be a bound check anyways, hiding it
> tld_get_data() makes the user written part less complex.
I see. Yeah off < TLD_DATA_SIZE - size check cannot be removed.
I was hoping to save an extra branch. oh well.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-15 21:16 ` [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task " Amery Hung
2025-05-16 18:57 ` Alexei Starovoitov
2025-05-19 20:04 ` Tejun Heo
@ 2025-05-20 22:57 ` Andrii Nakryiko
2025-05-22 17:27 ` Amery Hung
2025-05-22 8:36 ` Tony Ambardar
3 siblings, 1 reply; 17+ messages in thread
From: Andrii Nakryiko @ 2025-05-20 22:57 UTC (permalink / raw)
To: Amery Hung
Cc: bpf, netdev, alexei.starovoitov, andrii, daniel, tj, memxor,
martin.lau, kernel-team
On Thu, May 15, 2025 at 2:16 PM Amery Hung <ameryhung@gmail.com> wrote:
>
> Task local data defines an abstract storage type for storing task-
> specific data (TLD). This patch provides user space and bpf
> implementation as header-only libraries for accessing task local data.
>
> Task local data is a bpf task local storage map with two UPTRs:
> 1) u_tld_metadata, shared by all tasks of the same process, consists of
> the total count of TLDs and an array of metadata of TLDs. A metadata of
> a TLD comprises the size and the name. The name is used to identify a
> specific TLD in bpf 2) data is memory for storing TLDs specific to the
> task.
>
> The following are the basic task local data API:
>
> User space BPF
> Create key tld_create_key() -
> Fetch key - tld_fetch_key()
> Get data tld_get_data() tld_get_data()
>
> A TLD is first created by the user space with tld_create_key(). First,
> it goes through the metadata array to check if the TLD can be added.
> The total TLD size needs to fit into a page (limited by UPTR), and no
> two TLDs can have the same name. It also calculates the offset, the next
> available space in u_tld_data, by summing sizes of TLDs. If the TLD
> can be added, it increases the count using cmpxchg as there may be other
> concurrent tld_create_key(). After a successful cmpxchg, the last
> metadata slot now belongs to the calling thread and will be updated.
> tld_create_key() returns the offset encapsulated as a opaque object key
> to prevent user misuse.
>
> Then user space can pass the key to tld_get_data() to get a pointer
> to the TLD. The pointer will remain valid for the lifetime of the
> thread.
>
> BPF programs also locate TLDs with the keys. This is done by calling
> tld_fetch_key() with the name of the TLD. Similar to tld_create_key(),
> it scans through metadata array, compare the name of TLDs and compute
> the offset. Once found, the offset is also returned as a key, which
> can be passed to the bpf version of tld_get_data() to retrieve a
> pointer to the TLD.
>
> User space task local data library uses a light way approach to ensure
> thread safety (i.e., atomic operation + compiler and memory barriers).
> While a metadata is being updated, other threads may also try to read it.
> To prevent them from seeing incomplete data, metadata::size is used to
> signal the completion of the update, where 0 means the update is still
> ongoing. Threads will wait until seeing a non-zero size to read a
> metadata. Acquire/release order is required for metadata::size to
> prevent hardware reordering. For example, moving store to metadata::name
> after store to metadata::size or moving load from metadata::name before
> load from metadata::size.
>
> Signed-off-by: Amery Hung <ameryhung@gmail.com>
> ---
> .../bpf/prog_tests/task_local_data.h | 263 ++++++++++++++++++
> .../selftests/bpf/progs/task_local_data.bpf.h | 220 +++++++++++++++
> 2 files changed, 483 insertions(+)
> create mode 100644 tools/testing/selftests/bpf/prog_tests/task_local_data.h
> create mode 100644 tools/testing/selftests/bpf/progs/task_local_data.bpf.h
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_data.h b/tools/testing/selftests/bpf/prog_tests/task_local_data.h
> new file mode 100644
> index 000000000000..ec43ea59267c
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/task_local_data.h
> @@ -0,0 +1,263 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __TASK_LOCAL_DATA_H
> +#define __TASK_LOCAL_DATA_H
> +
> +#include <fcntl.h>
> +#include <errno.h>
> +#include <sched.h>
> +#include <stddef.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <sys/syscall.h>
> +#include <sys/types.h>
> +
> +#include <bpf/bpf.h>
> +
> +#ifndef PIDFD_THREAD
> +#define PIDFD_THREAD O_EXCL
> +#endif
> +
> +#define PAGE_SIZE 4096
> +
> +#ifndef __round_mask
> +#define __round_mask(x, y) ((__typeof__(x))((y)-1))
> +#endif
> +#ifndef round_up
> +#define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1)
> +#endif
> +
> +#ifndef READ_ONCE
> +#define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
> +#endif
> +
> +#ifndef WRITE_ONCE
> +#define WRITE_ONCE(x, val) ((*(volatile typeof(x) *)&(x)) = (val))
> +#endif
this is a lot of pollution of user applications with generic names...
consider TLD_ prefixing all of them?
> +
> +#define TLD_DATA_SIZE PAGE_SIZE
> +#define TLD_DATA_CNT 63
> +#define TLD_NAME_LEN 62
> +
> +typedef struct {
> + __s16 off;
> +} tld_key_t;
> +
> +struct tld_metadata {
> + char name[TLD_NAME_LEN];
> + __u16 size;
> +};
> +
> +struct u_tld_metadata {
> + __u8 cnt;
> + __u8 padding[63];
> + struct tld_metadata metadata[TLD_DATA_CNT];
> +};
> +
> +struct u_tld_data {
> + char data[TLD_DATA_SIZE];
> +};
> +
> +struct tld_map_value {
> + struct u_tld_data *data;
> + struct u_tld_metadata *metadata;
> +};
> +
> +struct u_tld_metadata *tld_metadata_p __attribute__((weak));
> +__thread struct u_tld_data *tld_data_p __attribute__((weak));
> +
> +static int __tld_init_metadata(int map_fd)
> +{
> + struct u_tld_metadata *new_metadata;
> + struct tld_map_value map_val;
> + int task_fd = 0, err;
> +
[...]
> +
> + map_val.data = new_data;
> + map_val.metadata = READ_ONCE(tld_metadata_p);
> +
> + err = bpf_map_update_elem(map_fd, &task_fd, &map_val, 0);
> + if (err) {
> + free(new_data);
> + goto out;
> + }
> +
> + tld_data_p = new_data;
> +out:
> + if (task_fd > 0)
task_fd can be zero, so >= 0 and init to -1; same in init_metadata
> + close(task_fd);
> + return err;
> +}
> +
> +/**
> + * tld_create_key() - Create a key associated with a TLD.
> + *
> + * @map_fd: A file descriptor of the underlying task local storage map,
> + * tld_data_map
> + * @name: The name the TLD will be associated with
> + * @size: Size of the TLD
> + *
> + * Returns an opaque object key. Use tld_key_is_err() or tld_key_err_or_zero() to
> + * check if the key creation succeed. Pass to tld_get_data() to get a pointer to
typo: succeeded
> + * the TLD. bpf programs can also fetch the same key by name.
> + */
> +__attribute__((unused))
> +static tld_key_t tld_create_key(int map_fd, const char *name, size_t size)
> +{
> + int err, i, cnt, sz, off = 0;
> +
> + if (!READ_ONCE(tld_metadata_p)) {
> + err = __tld_init_metadata(map_fd);
> + if (err)
> + return (tld_key_t) {.off = err};
> + }
> +
> + if (!tld_data_p) {
> + err = __tld_init_data(map_fd);
> + if (err)
> + return (tld_key_t) {.off = err};
> + }
> +
> + size = round_up(size, 8);
I'd record actual requested size, but internally round up to 8 where
necessary (see below)
> +
> + for (i = 0; i < TLD_DATA_CNT; i++) {
> +retry:
> + cnt = __atomic_load_n(&tld_metadata_p->cnt, __ATOMIC_RELAXED);
> + if (i < cnt) {
> + /*
> + * Pending tld_create_key() uses size to signal if the metadata has
> + * been fully updated.
> + */
> + while (!(sz = __atomic_load_n(&tld_metadata_p->metadata[i].size,
> + __ATOMIC_ACQUIRE)))
> + sched_yield();
> +
> + if (!strncmp(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN))
> + return (tld_key_t) {.off = -EEXIST};
do you check mismatched size for the same key? if not, should it be checked?
but if name and size matches, shouldn't this be a success instead of
-EEXIST error?...
> +
> + off += sz;
you should probably specify alignment guarantees explicitly and round
that up somewhere here, so that if you allocate bool and then u64, u64
is properly 8 byte aligned and internally you know that the size was 1
and 8? With BPF ringbuf we guarantee 8 byte alignment, and so far it
worked out great, so I'd just document 8 and go with that.
> + continue;
> + }
> +
> + if (off + size > TLD_DATA_SIZE)
> + return (tld_key_t) {.off = -E2BIG};
> +
> + /*
> + * Only one tld_create_key() can increase the current cnt by one and
> + * takes the latest available slot. Other threads will check again if a new
> + * TLD can still be added, and then compete for the new slot after the
> + * succeeding thread update the size.
> + */
> + if (!__atomic_compare_exchange_n(&tld_metadata_p->cnt, &cnt, cnt + 1, true,
> + __ATOMIC_RELAXED, __ATOMIC_RELAXED))
> + goto retry;
> +
> + strncpy(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN);
> + __atomic_store_n(&tld_metadata_p->metadata[i].size, size, __ATOMIC_RELEASE);
> + return (tld_key_t) {.off = off};
> + }
> +
> + return (tld_key_t) {.off = -ENOSPC};
[...]
> + * USAGE
> + *
> + * Similar to user space, bpf programs locate a TLD using the same key.
> + * tld_fetch_key() allows bpf programs to retrieve a key created in the user
> + * space by name, which is specified in the second argument of tld_create_key().
> + * tld_fetch_key() additionally will cache the key in a task local storage map,
> + * tld_key_map, to avoid performing costly string comparisons every time when
> + * accessing a TLD. This requires the developer to define the map value type of
> + * tld_key_map, struct tld_keys. It only needs to contain keys used by bpf
> + * programs in the compilation unit.
> + *
> + * struct tld_keys {
> + * tld_key_t prio;
> + * tld_key_t in_cs;
> + * };
> + *
> + * Then, for every new task, a bpf program will fetch and cache keys once and
> + * for all. This should be done ideally in a non-critical path (e.g., in
> + * sched_ext_ops::init_task).
> + *
> + * struct tld_object tld_obj;
> + *
> + * err = tld_object_init(task, &tld);
tld_obj?
> + * if (err)
> + * return 0;
> + *
> + * tld_fetch_key(&tld_obj, "priority", prio);
> + * tld_fetch_key(&tld_obj, "in_critical_section", in_cs);
> + *
[...]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-15 21:16 ` [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task " Amery Hung
` (2 preceding siblings ...)
2025-05-20 22:57 ` Andrii Nakryiko
@ 2025-05-22 8:36 ` Tony Ambardar
2025-05-22 16:49 ` Amery Hung
3 siblings, 1 reply; 17+ messages in thread
From: Tony Ambardar @ 2025-05-22 8:36 UTC (permalink / raw)
To: Amery Hung
Cc: bpf, netdev, alexei.starovoitov, andrii, daniel, tj, memxor,
martin.lau, kernel-team
Hi Amery,
I'm trying out your series in an arm32 JIT testing env I'm working on.
On Thu, May 15, 2025 at 02:16:00PM -0700, Amery Hung wrote:
[...]
> diff --git a/tools/testing/selftests/bpf/progs/task_local_data.bpf.h b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> new file mode 100644
> index 000000000000..5f48e408a5e5
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> @@ -0,0 +1,220 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __TASK_LOCAL_DATA_BPF_H
> +#define __TASK_LOCAL_DATA_BPF_H
> +
> +/*
> + * Task local data is a library that facilitates sharing per-task data
> + * between user space and bpf programs.
> + *
> + *
> + * PREREQUISITE
> + *
> + * A TLD, an entry of data in task local data, first needs to be created by the
> + * user space. This is done by calling user space API, tld_create_key(), with
> + * the name of the TLD and the size.
> + *
> + * tld_key_t prio, in_cs;
> + *
> + * prio = tld_create_key("priority", sizeof(int));
> + * in_cs = tld_create_key("in_critical_section", sizeof(bool));
> + *
> + * A key associated with the TLD, which has an opaque type tld_key_t, will be
> + * returned. It can be used to get a pointer to the TLD in the user space by
> + * calling tld_get_data().
> + *
> + *
> + * USAGE
> + *
> + * Similar to user space, bpf programs locate a TLD using the same key.
> + * tld_fetch_key() allows bpf programs to retrieve a key created in the user
> + * space by name, which is specified in the second argument of tld_create_key().
> + * tld_fetch_key() additionally will cache the key in a task local storage map,
> + * tld_key_map, to avoid performing costly string comparisons every time when
> + * accessing a TLD. This requires the developer to define the map value type of
> + * tld_key_map, struct tld_keys. It only needs to contain keys used by bpf
> + * programs in the compilation unit.
> + *
> + * struct tld_keys {
> + * tld_key_t prio;
> + * tld_key_t in_cs;
> + * };
> + *
> + * Then, for every new task, a bpf program will fetch and cache keys once and
> + * for all. This should be done ideally in a non-critical path (e.g., in
> + * sched_ext_ops::init_task).
> + *
> + * struct tld_object tld_obj;
> + *
> + * err = tld_object_init(task, &tld);
> + * if (err)
> + * return 0;
> + *
> + * tld_fetch_key(&tld_obj, "priority", prio);
> + * tld_fetch_key(&tld_obj, "in_critical_section", in_cs);
> + *
> + * Note that, the first argument of tld_fetch_key() is a pointer to tld_object.
> + * It should be declared as a stack variable and initialized via tld_object_init().
> + *
> + * Finally, just like user space programs, bpf programs can get a pointer to a
> + * TLD by calling tld_get_data(), with cached keys.
> + *
> + * int *p;
> + *
> + * p = tld_get_data(&tld_obj, prio, sizeof(int));
> + * if (p)
> + * // do something depending on *p
> + */
> +#include <errno.h>
> +#include <bpf/bpf_helpers.h>
> +
> +#define TLD_DATA_SIZE __PAGE_SIZE
> +#define TLD_DATA_CNT 63
> +#define TLD_NAME_LEN 62
> +
> +typedef struct {
> + __s16 off;
> +} tld_key_t;
> +
> +struct u_tld_data *dummy_data;
> +struct u_tld_metadata *dummy_metadata;
I suspect I've overlooked something, but what are these 2 "dummy" globals
used for? The code builds OK without them, although I do see test errors
as noted below.
I'll also mention the only reason I noticed these is that "bpftool gen
skeleton" automatically maps these to user space, but results in an
ASSERT() failure during build on 32-bit targets due to lack of support,
so dropping them avoids that.
root@qemu-armhf:/usr/libexec/kselftests-bpf# ./test_progs -w 0 -a task_local_data
test_task_local_data_basic:PASS:pthread_mutex_init 0 nsec
libbpf: prog 'task_init': BPF program load failed: -EACCES
libbpf: prog 'task_init': -- BEGIN PROG LOAD LOG --
arg#0 reference type('UNKNOWN ') size cannot be determined: -22
0: R1=ctx() R10=fp0
; task = bpf_get_current_task_btf(); @ test_task_local_data.c:31
0: (85) call bpf_get_current_task_btf#158 ; R0_w=trusted_ptr_task_struct()
1: (bf) r6 = r0 ; R0_w=trusted_ptr_task_struct() R6_w=trusted_ptr_task_struct()
; tld_obj->data_map = bpf_task_storage_get(&tld_data_map, task, 0, 0); @ task_local_data.bpf.h:135
2: (18) r1 = 0xc25ddc00 ; R1_w=map_ptr(map=tld_data_map,ks=4,vs=16)
4: (bf) r2 = r6 ; R2_w=trusted_ptr_task_struct() R6_w=trusted_ptr_task_struct()
5: (b7) r3 = 0 ; R3_w=0
6: (b7) r4 = 0 ; R4_w=0
7: (85) call bpf_task_storage_get#156 ; R0=map_value_or_null(id=1,map=tld_data_map,ks=4,vs=16)
8: (b4) w7 = 1 ; R7_w=1
9: (7b) *(u64 *)(r10 -16) = r0 ; R0=map_value_or_null(id=1,map=tld_data_map,ks=4,vs=16) R10=fp0 fp-16_w=map_value_or_null(id=1,map=tld_data_map,ks=4,vs=16)
; if (!tld_obj->data_map) @ task_local_data.bpf.h:136
10: (15) if r0 == 0x0 goto pc+37 ; R0=map_value(map=tld_data_map,ks=4,vs=16)
; tld_obj->key_map = bpf_task_storage_get(&tld_key_map, task, 0, @ task_local_data.bpf.h:139
11: (18) r1 = 0xc3ade000 ; R1_w=map_ptr(map=tld_key_map,ks=4,vs=6)
13: (bf) r2 = r6 ; R2_w=trusted_ptr_task_struct() R6=trusted_ptr_task_struct()
14: (b7) r3 = 0 ; R3_w=0
15: (b7) r4 = 1 ; R4_w=1
16: (85) call bpf_task_storage_get#156 ; R0=map_value_or_null(id=2,map=tld_key_map,ks=4,vs=6)
17: (bf) r6 = r0 ; R0=map_value_or_null(id=2,map=tld_key_map,ks=4,vs=6) R6_w=map_value_or_null(id=2,map=tld_key_map,ks=4,vs=6)
18: (7b) *(u64 *)(r10 -8) = r6 ; R6_w=map_value_or_null(id=2,map=tld_key_map,ks=4,vs=6) R10=fp0 fp-8_w=map_value_or_null(id=2,map=tld_key_map,ks=4,vs=6)
; @ task_local_data.bpf.h:0
19: (15) if r6 == 0x0 goto pc+28 ; R6_w=map_value(map=tld_key_map,ks=4,vs=6)
20: (bf) r1 = r10 ; R1_w=fp0 R10=fp0
; @ test_task_local_data.c:0
21: (07) r1 += -16 ; R1_w=fp-16
; if (!tld_fetch_key(&tld_obj, "value1", value1)) @ test_task_local_data.c:36
22: (18) r2 = 0xc2ddddd0 ; R2_w=map_value(map=.rodata.str1.1,ks=4,vs=30)
24: (85) call pc+25
caller:
R6_w=map_value(map=tld_key_map,ks=4,vs=6) R7=1 R10=fp0 fp-8_w=map_value(map=tld_key_map,ks=4,vs=6) fp-16=map_value(map=tld_data_map,ks=4,vs=16)
callee:
frame1: R1_w=fp[0]-16 R2_w=map_value(map=.rodata.str1.1,ks=4,vs=30) R10=fp0
50: frame1: R1_w=fp[0]-16 R2_w=map_value(map=.rodata.str1.1,ks=4,vs=30) R10=fp0
; static u16 __tld_fetch_key(struct tld_object *tld_obj, const char *name) @ task_local_data.bpf.h:163
50: (7b) *(u64 *)(r10 -16) = r2 ; frame1: R2_w=map_value(map=.rodata.str1.1,ks=4,vs=30) R10=fp0 fp-16_w=map_value(map=.rodata.str1.1,ks=4,vs=30)
51: (b4) w7 = 0 ; frame1: R7_w=0
; if (!tld_obj->data_map || !tld_obj->data_map->metadata) @ task_local_data.bpf.h:169
52: (79) r1 = *(u64 *)(r1 +0) ; frame1: R1=map_value(map=tld_data_map,ks=4,vs=16) fp-16=map_value(map=.rodata.str1.1,ks=4,vs=30)
53: (15) if r1 == 0x0 goto pc+36 ; frame1: R1=map_value(map=tld_data_map,ks=4,vs=16)
54: (79) r6 = *(u64 *)(r1 +8) ; frame1: R1=map_value(map=tld_data_map,ks=4,vs=16) R6_w=scalar()
55: (15) if r6 == 0x0 goto pc+34 ; frame1: R6_w=scalar(umin=1)
; cnt = tld_obj->data_map->metadata->cnt; @ task_local_data.bpf.h:172
56: (71) r8 = *(u8 *)(r6 +0)
R6 invalid mem access 'scalar'
processed 29 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 1
-- END PROG LOAD LOG --
libbpf: prog 'task_init': failed to load: -EACCES
libbpf: failed to load object 'test_task_local_data'
libbpf: failed to load BPF skeleton 'test_task_local_data': -EACCES
test_task_local_data_basic:FAIL:skel_open_and_load unexpected error: -13
#409/1 task_local_data/task_local_data_basic:FAIL
I'm unsure if this verifier error is related to the dummy pointers, but
it does seem there's a pointer issue...
Further thoughts or suggestions (from anyone) would be most welcome.
Thanks,
Tony
> +
> +struct tld_metadata {
> + char name[TLD_NAME_LEN];
> + __u16 size;
> +};
> +
> +struct u_tld_metadata {
> + __u8 cnt;
> + __u8 padding[63];
> + struct tld_metadata metadata[TLD_DATA_CNT];
> +};
> +
> +struct u_tld_data {
> + char data[TLD_DATA_SIZE];
> +};
> +
> +struct tld_map_value {
> + struct u_tld_data __uptr *data;
> + struct u_tld_metadata __uptr *metadata;
> +};
> +
> +struct tld_object {
> + struct tld_map_value *data_map;
> + struct tld_keys *key_map;
> +};
> +
[...]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-22 8:36 ` Tony Ambardar
@ 2025-05-22 16:49 ` Amery Hung
2025-05-23 10:09 ` Tony Ambardar
0 siblings, 1 reply; 17+ messages in thread
From: Amery Hung @ 2025-05-22 16:49 UTC (permalink / raw)
To: Tony Ambardar
Cc: bpf, netdev, alexei.starovoitov, andrii, daniel, tj, memxor,
martin.lau, kernel-team
On Thu, May 22, 2025 at 1:36 AM Tony Ambardar <tony.ambardar@gmail.com> wrote:
>
> Hi Amery,
>
> I'm trying out your series in an arm32 JIT testing env I'm working on.
>
>
> On Thu, May 15, 2025 at 02:16:00PM -0700, Amery Hung wrote:
>
> [...]
>
> > diff --git a/tools/testing/selftests/bpf/progs/task_local_data.bpf.h b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> > new file mode 100644
> > index 000000000000..5f48e408a5e5
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> > @@ -0,0 +1,220 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef __TASK_LOCAL_DATA_BPF_H
> > +#define __TASK_LOCAL_DATA_BPF_H
> > +
> > +/*
> > + * Task local data is a library that facilitates sharing per-task data
> > + * between user space and bpf programs.
> > + *
> > + *
> > + * PREREQUISITE
> > + *
> > + * A TLD, an entry of data in task local data, first needs to be created by the
> > + * user space. This is done by calling user space API, tld_create_key(), with
> > + * the name of the TLD and the size.
> > + *
> > + * tld_key_t prio, in_cs;
> > + *
> > + * prio = tld_create_key("priority", sizeof(int));
> > + * in_cs = tld_create_key("in_critical_section", sizeof(bool));
> > + *
> > + * A key associated with the TLD, which has an opaque type tld_key_t, will be
> > + * returned. It can be used to get a pointer to the TLD in the user space by
> > + * calling tld_get_data().
> > + *
> > + *
> > + * USAGE
> > + *
> > + * Similar to user space, bpf programs locate a TLD using the same key.
> > + * tld_fetch_key() allows bpf programs to retrieve a key created in the user
> > + * space by name, which is specified in the second argument of tld_create_key().
> > + * tld_fetch_key() additionally will cache the key in a task local storage map,
> > + * tld_key_map, to avoid performing costly string comparisons every time when
> > + * accessing a TLD. This requires the developer to define the map value type of
> > + * tld_key_map, struct tld_keys. It only needs to contain keys used by bpf
> > + * programs in the compilation unit.
> > + *
> > + * struct tld_keys {
> > + * tld_key_t prio;
> > + * tld_key_t in_cs;
> > + * };
> > + *
> > + * Then, for every new task, a bpf program will fetch and cache keys once and
> > + * for all. This should be done ideally in a non-critical path (e.g., in
> > + * sched_ext_ops::init_task).
> > + *
> > + * struct tld_object tld_obj;
> > + *
> > + * err = tld_object_init(task, &tld);
> > + * if (err)
> > + * return 0;
> > + *
> > + * tld_fetch_key(&tld_obj, "priority", prio);
> > + * tld_fetch_key(&tld_obj, "in_critical_section", in_cs);
> > + *
> > + * Note that, the first argument of tld_fetch_key() is a pointer to tld_object.
> > + * It should be declared as a stack variable and initialized via tld_object_init().
> > + *
> > + * Finally, just like user space programs, bpf programs can get a pointer to a
> > + * TLD by calling tld_get_data(), with cached keys.
> > + *
> > + * int *p;
> > + *
> > + * p = tld_get_data(&tld_obj, prio, sizeof(int));
> > + * if (p)
> > + * // do something depending on *p
> > + */
> > +#include <errno.h>
> > +#include <bpf/bpf_helpers.h>
> > +
> > +#define TLD_DATA_SIZE __PAGE_SIZE
> > +#define TLD_DATA_CNT 63
> > +#define TLD_NAME_LEN 62
> > +
> > +typedef struct {
> > + __s16 off;
> > +} tld_key_t;
> > +
> > +struct u_tld_data *dummy_data;
> > +struct u_tld_metadata *dummy_metadata;
>
> I suspect I've overlooked something, but what are these 2 "dummy" globals
> used for? The code builds OK without them, although I do see test errors
> as noted below.
>
Hi, sorry for the confusion. The forward declaration is to prevent
dummy_data/metadata tld_map_value to be fwd_kind. I will explain this
in the comment.
The BTF should look like this:
[9] STRUCT 'tld_map_value' size=16 vlen=2
'data' type_id=10 bits_offset=0
'metadata' type_id=11 bits_offset=64
[10] PTR '(anon)' type_id=74
[11] PTR '(anon)' type_id=73
[57] STRUCT 'u_tld_data' size=4096 vlen=1
'data' type_id=58 bits_offset=0
[61] STRUCT 'u_tld_metadata' size=4096 vlen=3
'cnt' type_id=62 bits_offset=0
'padding' type_id=64 bits_offset=8
'metadata' type_id=67 bits_offset=512
[73] TYPE_TAG 'uptr' type_id=61
[74] TYPE_TAG 'uptr' type_id=57
Without the forward declaration, the BTF will look like this:
[9] STRUCT 'tld_map_value' size=16 vlen=2
'data' type_id=10 bits_offset=0
'metadata' type_id=11 bits_offset=64
[10] PTR '(anon)' type_id=63
[11] PTR '(anon)' type_id=61
[60] FWD 'u_tld_metadata' fwd_kind=struct
[61] TYPE_TAG 'uptr' type_id=60
[62] FWD 'u_tld_data' fwd_kind=struct
[63] TYPE_TAG 'uptr' type_id=62
> I'll also mention the only reason I noticed these is that "bpftool gen
> skeleton" automatically maps these to user space, but results in an
> ASSERT() failure during build on 32-bit targets due to lack of support,
> so dropping them avoids that.
Can you provide more details of the error?
>
>
> 24: (85) call pc+25
> caller:
> R6_w=map_value(map=tld_key_map,ks=4,vs=6) R7=1 R10=fp0 fp-8_w=map_value(map=tld_key_map,ks=4,vs=6) fp-16=map_value(map=tld_data_map,ks=4,vs=16)
> callee:
> frame1: R1_w=fp[0]-16 R2_w=map_value(map=.rodata.str1.1,ks=4,vs=30) R10=fp0
> 50: frame1: R1_w=fp[0]-16 R2_w=map_value(map=.rodata.str1.1,ks=4,vs=30) R10=fp0
> ; static u16 __tld_fetch_key(struct tld_object *tld_obj, const char *name) @ task_local_data.bpf.h:163
> 50: (7b) *(u64 *)(r10 -16) = r2 ; frame1: R2_w=map_value(map=.rodata.str1.1,ks=4,vs=30) R10=fp0 fp-16_w=map_value(map=.rodata.str1.1,ks=4,vs=30)
> 51: (b4) w7 = 0 ; frame1: R7_w=0
> ; if (!tld_obj->data_map || !tld_obj->data_map->metadata) @ task_local_data.bpf.h:169
> 52: (79) r1 = *(u64 *)(r1 +0) ; frame1: R1=map_value(map=tld_data_map,ks=4,vs=16) fp-16=map_value(map=.rodata.str1.1,ks=4,vs=30)
> 53: (15) if r1 == 0x0 goto pc+36 ; frame1: R1=map_value(map=tld_data_map,ks=4,vs=16)
> 54: (79) r6 = *(u64 *)(r1 +8) ; frame1: R1=map_value(map=tld_data_map,ks=4,vs=16) R6_w=scalar()
> 55: (15) if r6 == 0x0 goto pc+34 ; frame1: R6_w=scalar(umin=1)
> ; cnt = tld_obj->data_map->metadata->cnt; @ task_local_data.bpf.h:172
> 56: (71) r8 = *(u8 *)(r6 +0)
> R6 invalid mem access 'scalar'
> processed 29 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 1
> -- END PROG LOAD LOG --
> libbpf: prog 'task_init': failed to load: -EACCES
> libbpf: failed to load object 'test_task_local_data'
> libbpf: failed to load BPF skeleton 'test_task_local_data': -EACCES
> test_task_local_data_basic:FAIL:skel_open_and_load unexpected error: -13
> #409/1 task_local_data/task_local_data_basic:FAIL
>
>
> I'm unsure if this verifier error is related to the dummy pointers, but
> it does seem there's a pointer issue...
>
The error is exactly caused by removing the dummy_xxx.
> Further thoughts or suggestions (from anyone) would be most welcome.
>
> Thanks,
> Tony
>
> > +
> > +struct tld_metadata {
> > + char name[TLD_NAME_LEN];
> > + __u16 size;
> > +};
> > +
> > +struct u_tld_metadata {
> > + __u8 cnt;
> > + __u8 padding[63];
> > + struct tld_metadata metadata[TLD_DATA_CNT];
> > +};
> > +
> > +struct u_tld_data {
> > + char data[TLD_DATA_SIZE];
> > +};
> > +
> > +struct tld_map_value {
> > + struct u_tld_data __uptr *data;
> > + struct u_tld_metadata __uptr *metadata;
> > +};
> > +
> > +struct tld_object {
> > + struct tld_map_value *data_map;
> > + struct tld_keys *key_map;
> > +};
> > +
>
> [...]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-20 22:57 ` Andrii Nakryiko
@ 2025-05-22 17:27 ` Amery Hung
0 siblings, 0 replies; 17+ messages in thread
From: Amery Hung @ 2025-05-22 17:27 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: bpf, netdev, alexei.starovoitov, andrii, daniel, tj, memxor,
martin.lau, kernel-team
On Tue, May 20, 2025 at 3:58 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, May 15, 2025 at 2:16 PM Amery Hung <ameryhung@gmail.com> wrote:
> >
> > Task local data defines an abstract storage type for storing task-
> > specific data (TLD). This patch provides user space and bpf
> > implementation as header-only libraries for accessing task local data.
> >
> > Task local data is a bpf task local storage map with two UPTRs:
> > 1) u_tld_metadata, shared by all tasks of the same process, consists of
> > the total count of TLDs and an array of metadata of TLDs. A metadata of
> > a TLD comprises the size and the name. The name is used to identify a
> > specific TLD in bpf 2) data is memory for storing TLDs specific to the
> > task.
> >
> > The following are the basic task local data API:
> >
> > User space BPF
> > Create key tld_create_key() -
> > Fetch key - tld_fetch_key()
> > Get data tld_get_data() tld_get_data()
> >
> > A TLD is first created by the user space with tld_create_key(). First,
> > it goes through the metadata array to check if the TLD can be added.
> > The total TLD size needs to fit into a page (limited by UPTR), and no
> > two TLDs can have the same name. It also calculates the offset, the next
> > available space in u_tld_data, by summing sizes of TLDs. If the TLD
> > can be added, it increases the count using cmpxchg as there may be other
> > concurrent tld_create_key(). After a successful cmpxchg, the last
> > metadata slot now belongs to the calling thread and will be updated.
> > tld_create_key() returns the offset encapsulated as a opaque object key
> > to prevent user misuse.
> >
> > Then user space can pass the key to tld_get_data() to get a pointer
> > to the TLD. The pointer will remain valid for the lifetime of the
> > thread.
> >
> > BPF programs also locate TLDs with the keys. This is done by calling
> > tld_fetch_key() with the name of the TLD. Similar to tld_create_key(),
> > it scans through metadata array, compare the name of TLDs and compute
> > the offset. Once found, the offset is also returned as a key, which
> > can be passed to the bpf version of tld_get_data() to retrieve a
> > pointer to the TLD.
> >
> > User space task local data library uses a light way approach to ensure
> > thread safety (i.e., atomic operation + compiler and memory barriers).
> > While a metadata is being updated, other threads may also try to read it.
> > To prevent them from seeing incomplete data, metadata::size is used to
> > signal the completion of the update, where 0 means the update is still
> > ongoing. Threads will wait until seeing a non-zero size to read a
> > metadata. Acquire/release order is required for metadata::size to
> > prevent hardware reordering. For example, moving store to metadata::name
> > after store to metadata::size or moving load from metadata::name before
> > load from metadata::size.
> >
> > Signed-off-by: Amery Hung <ameryhung@gmail.com>
> > ---
> > .../bpf/prog_tests/task_local_data.h | 263 ++++++++++++++++++
> > .../selftests/bpf/progs/task_local_data.bpf.h | 220 +++++++++++++++
> > 2 files changed, 483 insertions(+)
> > create mode 100644 tools/testing/selftests/bpf/prog_tests/task_local_data.h
> > create mode 100644 tools/testing/selftests/bpf/progs/task_local_data.bpf.h
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_data.h b/tools/testing/selftests/bpf/prog_tests/task_local_data.h
> > new file mode 100644
> > index 000000000000..ec43ea59267c
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/task_local_data.h
> > @@ -0,0 +1,263 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef __TASK_LOCAL_DATA_H
> > +#define __TASK_LOCAL_DATA_H
> > +
> > +#include <fcntl.h>
> > +#include <errno.h>
> > +#include <sched.h>
> > +#include <stddef.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <unistd.h>
> > +#include <sys/syscall.h>
> > +#include <sys/types.h>
> > +
> > +#include <bpf/bpf.h>
> > +
> > +#ifndef PIDFD_THREAD
> > +#define PIDFD_THREAD O_EXCL
> > +#endif
> > +
> > +#define PAGE_SIZE 4096
> > +
> > +#ifndef __round_mask
> > +#define __round_mask(x, y) ((__typeof__(x))((y)-1))
> > +#endif
> > +#ifndef round_up
> > +#define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1)
> > +#endif
> > +
> > +#ifndef READ_ONCE
> > +#define READ_ONCE(x) (*(volatile typeof(x) *)&(x))
> > +#endif
> > +
> > +#ifndef WRITE_ONCE
> > +#define WRITE_ONCE(x, val) ((*(volatile typeof(x) *)&(x)) = (val))
> > +#endif
>
> this is a lot of pollution of user applications with generic names...
> consider TLD_ prefixing all of them?
>
I will make the name more specific by adding TLD_ prefix.
> > +
> > +#define TLD_DATA_SIZE PAGE_SIZE
> > +#define TLD_DATA_CNT 63
> > +#define TLD_NAME_LEN 62
> > +
> > +typedef struct {
> > + __s16 off;
> > +} tld_key_t;
> > +
> > +struct tld_metadata {
> > + char name[TLD_NAME_LEN];
> > + __u16 size;
> > +};
> > +
> > +struct u_tld_metadata {
> > + __u8 cnt;
> > + __u8 padding[63];
> > + struct tld_metadata metadata[TLD_DATA_CNT];
> > +};
> > +
> > +struct u_tld_data {
> > + char data[TLD_DATA_SIZE];
> > +};
> > +
> > +struct tld_map_value {
> > + struct u_tld_data *data;
> > + struct u_tld_metadata *metadata;
> > +};
> > +
> > +struct u_tld_metadata *tld_metadata_p __attribute__((weak));
> > +__thread struct u_tld_data *tld_data_p __attribute__((weak));
> > +
> > +static int __tld_init_metadata(int map_fd)
> > +{
> > + struct u_tld_metadata *new_metadata;
> > + struct tld_map_value map_val;
> > + int task_fd = 0, err;
> > +
>
> [...]
>
> > +
> > + map_val.data = new_data;
> > + map_val.metadata = READ_ONCE(tld_metadata_p);
> > +
> > + err = bpf_map_update_elem(map_fd, &task_fd, &map_val, 0);
> > + if (err) {
> > + free(new_data);
> > + goto out;
> > + }
> > +
> > + tld_data_p = new_data;
> > +out:
> > + if (task_fd > 0)
>
> task_fd can be zero, so >= 0 and init to -1; same in init_metadata
Yeah. I will fix this in the next respin.
>
> > + close(task_fd);
> > + return err;
> > +}
> > +
> > +/**
> > + * tld_create_key() - Create a key associated with a TLD.
> > + *
> > + * @map_fd: A file descriptor of the underlying task local storage map,
> > + * tld_data_map
> > + * @name: The name the TLD will be associated with
> > + * @size: Size of the TLD
> > + *
> > + * Returns an opaque object key. Use tld_key_is_err() or tld_key_err_or_zero() to
> > + * check if the key creation succeed. Pass to tld_get_data() to get a pointer to
>
> typo: succeeded
Will fix the typo. Thanks
>
> > + * the TLD. bpf programs can also fetch the same key by name.
> > + */
> > +__attribute__((unused))
> > +static tld_key_t tld_create_key(int map_fd, const char *name, size_t size)
> > +{
> > + int err, i, cnt, sz, off = 0;
> > +
> > + if (!READ_ONCE(tld_metadata_p)) {
> > + err = __tld_init_metadata(map_fd);
> > + if (err)
> > + return (tld_key_t) {.off = err};
> > + }
> > +
> > + if (!tld_data_p) {
> > + err = __tld_init_data(map_fd);
> > + if (err)
> > + return (tld_key_t) {.off = err};
> > + }
> > +
> > + size = round_up(size, 8);
>
> I'd record actual requested size, but internally round up to 8 where
> necessary (see below)
>
> > +
> > + for (i = 0; i < TLD_DATA_CNT; i++) {
> > +retry:
> > + cnt = __atomic_load_n(&tld_metadata_p->cnt, __ATOMIC_RELAXED);
> > + if (i < cnt) {
> > + /*
> > + * Pending tld_create_key() uses size to signal if the metadata has
> > + * been fully updated.
> > + */
> > + while (!(sz = __atomic_load_n(&tld_metadata_p->metadata[i].size,
> > + __ATOMIC_ACQUIRE)))
> > + sched_yield();
> > +
> > + if (!strncmp(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN))
> > + return (tld_key_t) {.off = -EEXIST};
>
> do you check mismatched size for the same key? if not, should it be checked?
>
> but if name and size matches, shouldn't this be a success instead of
> -EEXIST error?...
>
I think users should only call tld_create_key() once for a TLD.
Returning an -EEXIST gives us a way to detect conflict. If users knows
that they will be calling tld_create_key() for a TLD in multiple
places, they could still do it by treating -EEXIST as success.
Therefore, no checking mismatching size here.
>
> > +
> > + off += sz;
>
> you should probably specify alignment guarantees explicitly and round
> that up somewhere here, so that if you allocate bool and then u64, u64
> is properly 8 byte aligned and internally you know that the size was 1
> and 8? With BPF ringbuf we guarantee 8 byte alignment, and so far it
> worked out great, so I'd just document 8 and go with that.
>
Thanks for the suggestion. I will document the alignment guarantee.
> > + continue;
> > + }
> > +
> > + if (off + size > TLD_DATA_SIZE)
> > + return (tld_key_t) {.off = -E2BIG};
> > +
> > + /*
> > + * Only one tld_create_key() can increase the current cnt by one and
> > + * takes the latest available slot. Other threads will check again if a new
> > + * TLD can still be added, and then compete for the new slot after the
> > + * succeeding thread update the size.
> > + */
> > + if (!__atomic_compare_exchange_n(&tld_metadata_p->cnt, &cnt, cnt + 1, true,
> > + __ATOMIC_RELAXED, __ATOMIC_RELAXED))
> > + goto retry;
> > +
> > + strncpy(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN);
> > + __atomic_store_n(&tld_metadata_p->metadata[i].size, size, __ATOMIC_RELEASE);
> > + return (tld_key_t) {.off = off};
> > + }
> > +
> > + return (tld_key_t) {.off = -ENOSPC};
>
> [...]
>
> > + * USAGE
> > + *
> > + * Similar to user space, bpf programs locate a TLD using the same key.
> > + * tld_fetch_key() allows bpf programs to retrieve a key created in the user
> > + * space by name, which is specified in the second argument of tld_create_key().
> > + * tld_fetch_key() additionally will cache the key in a task local storage map,
> > + * tld_key_map, to avoid performing costly string comparisons every time when
> > + * accessing a TLD. This requires the developer to define the map value type of
> > + * tld_key_map, struct tld_keys. It only needs to contain keys used by bpf
> > + * programs in the compilation unit.
> > + *
> > + * struct tld_keys {
> > + * tld_key_t prio;
> > + * tld_key_t in_cs;
> > + * };
> > + *
> > + * Then, for every new task, a bpf program will fetch and cache keys once and
> > + * for all. This should be done ideally in a non-critical path (e.g., in
> > + * sched_ext_ops::init_task).
> > + *
> > + * struct tld_object tld_obj;
> > + *
> > + * err = tld_object_init(task, &tld);
>
> tld_obj?
Will fix the typo
>
> > + * if (err)
> > + * return 0;
> > + *
> > + * tld_fetch_key(&tld_obj, "priority", prio);
> > + * tld_fetch_key(&tld_obj, "in_critical_section", in_cs);
> > + *
>
> [...]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task local data
2025-05-22 16:49 ` Amery Hung
@ 2025-05-23 10:09 ` Tony Ambardar
0 siblings, 0 replies; 17+ messages in thread
From: Tony Ambardar @ 2025-05-23 10:09 UTC (permalink / raw)
To: Amery Hung
Cc: bpf, netdev, alexei.starovoitov, andrii, daniel, tj, memxor,
martin.lau, kernel-team
On Thu, May 22, 2025 at 09:49:23AM -0700, Amery Hung wrote:
> On Thu, May 22, 2025 at 1:36 AM Tony Ambardar <tony.ambardar@gmail.com> wrote:
> >
> > Hi Amery,
> >
> > I'm trying out your series in an arm32 JIT testing env I'm working on.
> >
> >
> > On Thu, May 15, 2025 at 02:16:00PM -0700, Amery Hung wrote:
> >
> > > +
[...]
> > > +struct u_tld_data *dummy_data;
> > > +struct u_tld_metadata *dummy_metadata;
> >
> > I suspect I've overlooked something, but what are these 2 "dummy" globals
> > used for? The code builds OK without them, although I do see test errors
> > as noted below.
> >
>
> Hi, sorry for the confusion. The forward declaration is to prevent
> dummy_data/metadata tld_map_value to be fwd_kind. I will explain this
> in the comment.
>
> The BTF should look like this:
>
> [9] STRUCT 'tld_map_value' size=16 vlen=2
> 'data' type_id=10 bits_offset=0
> 'metadata' type_id=11 bits_offset=64
> [10] PTR '(anon)' type_id=74
> [11] PTR '(anon)' type_id=73
> [57] STRUCT 'u_tld_data' size=4096 vlen=1
> 'data' type_id=58 bits_offset=0
> [61] STRUCT 'u_tld_metadata' size=4096 vlen=3
> 'cnt' type_id=62 bits_offset=0
> 'padding' type_id=64 bits_offset=8
> 'metadata' type_id=67 bits_offset=512
> [73] TYPE_TAG 'uptr' type_id=61
> [74] TYPE_TAG 'uptr' type_id=57
>
> Without the forward declaration, the BTF will look like this:
>
> [9] STRUCT 'tld_map_value' size=16 vlen=2
> 'data' type_id=10 bits_offset=0
> 'metadata' type_id=11 bits_offset=64
> [10] PTR '(anon)' type_id=63
> [11] PTR '(anon)' type_id=61
> [60] FWD 'u_tld_metadata' fwd_kind=struct
> [61] TYPE_TAG 'uptr' type_id=60
> [62] FWD 'u_tld_data' fwd_kind=struct
> [63] TYPE_TAG 'uptr' type_id=62
>
Oh, I see. Yes, having some commentary/links would be helpful since this
makes for some tricky debugging. Was there ever any discussion of
alternatives to using the forward decl?
I'm afraid I missed the 'uptr' tag introduction and need to understand
the background.
> > I'll also mention the only reason I noticed these is that "bpftool gen
> > skeleton" automatically maps these to user space, but results in an
> > ASSERT() failure during build on 32-bit targets due to lack of support,
> > so dropping them avoids that.
>
> Can you provide more details of the error?
>
bpftool skeleton includes checks when mapping global vars to user space.
These fail on 32-bit archs for types whose sizes differ from the BPF VM
(i.e. longs, pointers):
In file included from .../tools/testing/selftests/bpf/prog_tests/test_task_local_data.c:13:
./test_task_local_data.skel.h: In function ‘test_task_local_data__assert’:
./test_task_local_data.skel.h:531:9: error: static assertion failed: "unexpected size of \'dummy_data\'"
531 | _Static_assert(sizeof(s->bss->dummy_data) == 8, "unexpected size of 'dummy_data'");
| ^~~~~~~~~~~~~~
./test_task_local_data.skel.h:532:9: error: static assertion failed: "unexpected size of \'dummy_metadata\'"
532 | _Static_assert(sizeof(s->bss->dummy_metadata) == 8, "unexpected size of 'dummy_metadata'");
| ^~~~~~~~~~~~~~
On a whim, I tried making the dummy_xxxxxxx vars static but this still
leaves the fwd_kind and leads to verifier rejection.
> >
> >
> > 24: (85) call pc+25
> > caller:
> > R6_w=map_value(map=tld_key_map,ks=4,vs=6) R7=1 R10=fp0 fp-8_w=map_value(map=tld_key_map,ks=4,vs=6) fp-16=map_value(map=tld_data_map,ks=4,vs=16)
> > callee:
> > frame1: R1_w=fp[0]-16 R2_w=map_value(map=.rodata.str1.1,ks=4,vs=30) R10=fp0
> > 50: frame1: R1_w=fp[0]-16 R2_w=map_value(map=.rodata.str1.1,ks=4,vs=30) R10=fp0
> > ; static u16 __tld_fetch_key(struct tld_object *tld_obj, const char *name) @ task_local_data.bpf.h:163
> > 50: (7b) *(u64 *)(r10 -16) = r2 ; frame1: R2_w=map_value(map=.rodata.str1.1,ks=4,vs=30) R10=fp0 fp-16_w=map_value(map=.rodata.str1.1,ks=4,vs=30)
> > 51: (b4) w7 = 0 ; frame1: R7_w=0
> > ; if (!tld_obj->data_map || !tld_obj->data_map->metadata) @ task_local_data.bpf.h:169
> > 52: (79) r1 = *(u64 *)(r1 +0) ; frame1: R1=map_value(map=tld_data_map,ks=4,vs=16) fp-16=map_value(map=.rodata.str1.1,ks=4,vs=30)
> > 53: (15) if r1 == 0x0 goto pc+36 ; frame1: R1=map_value(map=tld_data_map,ks=4,vs=16)
> > 54: (79) r6 = *(u64 *)(r1 +8) ; frame1: R1=map_value(map=tld_data_map,ks=4,vs=16) R6_w=scalar()
> > 55: (15) if r6 == 0x0 goto pc+34 ; frame1: R6_w=scalar(umin=1)
> > ; cnt = tld_obj->data_map->metadata->cnt; @ task_local_data.bpf.h:172
> > 56: (71) r8 = *(u8 *)(r6 +0)
> > R6 invalid mem access 'scalar'
> > processed 29 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 1
> > -- END PROG LOAD LOG --
> > libbpf: prog 'task_init': failed to load: -EACCES
> > libbpf: failed to load object 'test_task_local_data'
> > libbpf: failed to load BPF skeleton 'test_task_local_data': -EACCES
> > test_task_local_data_basic:FAIL:skel_open_and_load unexpected error: -13
> > #409/1 task_local_data/task_local_data_basic:FAIL
> >
> >
> > I'm unsure if this verifier error is related to the dummy pointers, but
> > it does seem there's a pointer issue...
> >
>
> The error is exactly caused by removing the dummy_xxx.
>
> > Further thoughts or suggestions (from anyone) would be most welcome.
> >
> > Thanks,
> > Tony
> >
> > > +
> > > +struct tld_metadata {
> > > + char name[TLD_NAME_LEN];
> > > + __u16 size;
> > > +};
> > > +
> > > +struct u_tld_metadata {
> > > + __u8 cnt;
> > > + __u8 padding[63];
> > > + struct tld_metadata metadata[TLD_DATA_CNT];
> > > +};
> > > +
> > > +struct u_tld_data {
> > > + char data[TLD_DATA_SIZE];
> > > +};
> > > +
> > > +struct tld_map_value {
> > > + struct u_tld_data __uptr *data;
> > > + struct u_tld_metadata __uptr *metadata;
> > > +};
> > > +
> > > +struct tld_object {
> > > + struct tld_map_value *data_map;
> > > + struct tld_keys *key_map;
> > > +};
> > > +
> >
> > [...]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v4 2/3] selftests/bpf: Test basic task local data operations
2025-05-15 21:16 ` [PATCH bpf-next v4 2/3] selftests/bpf: Test basic task local data operations Amery Hung
@ 2025-05-23 10:18 ` Tony Ambardar
0 siblings, 0 replies; 17+ messages in thread
From: Tony Ambardar @ 2025-05-23 10:18 UTC (permalink / raw)
To: Amery Hung
Cc: bpf, netdev, alexei.starovoitov, andrii, daniel, tj, memxor,
martin.lau, kernel-team
On Thu, May 15, 2025 at 02:16:01PM -0700, Amery Hung wrote:
> Test basic operations of task local data with valid and invalid
> tld_create_key(). For invalid calls, make sure they return the right
> error code, and verifiy that no TLDs are inserted by trying fetching
> keys in the bpf program. For valid calls, first make sure the TLDs
> are created using tld_fetch_key(). Then, verify that they are task-
> specific with multiple user threads. This done by writing values unique
> to each thread to TLDs, reading them from both user space and bpf, and
> checking if the value read back are the same as the value written.
>
> Signed-off-by: Amery Hung <ameryhung@gmail.com>
> ---
> .../bpf/prog_tests/test_task_local_data.c | 163 ++++++++++++++++++
> .../bpf/progs/test_task_local_data.c | 81 +++++++++
> 2 files changed, 244 insertions(+)
> create mode 100644 tools/testing/selftests/bpf/prog_tests/test_task_local_data.c
> create mode 100644 tools/testing/selftests/bpf/progs/test_task_local_data.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/test_task_local_data.c b/tools/testing/selftests/bpf/prog_tests/test_task_local_data.c
> new file mode 100644
> index 000000000000..738fc1c9d8a4
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/test_task_local_data.c
> @@ -0,0 +1,163 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <pthread.h>
> +#include <bpf/btf.h>
> +#include <test_progs.h>
> +
> +struct test_struct {
> + __u64 a;
> + __u64 b;
> + __u64 c;
> + __u64 d;
> +};
> +
> +#include "test_task_local_data.skel.h"
> +#include "task_local_data.h"
> +
> +/*
> + * Reset task local data between subtests by clearing metadata. This is only safe
> + * in selftests as subtests run sequentially. Users of task local data libraries
> + * should not do this.
> + */
> +static void reset_tld(void)
> +{
> + if (tld_metadata_p)
> + memset(tld_metadata_p, 0, PAGE_SIZE);
> +}
> +
> +/* Serialize access to bpf program's global variables */
> +static pthread_mutex_t global_mutex;
> +
> +#define TEST_BASIC_THREAD_NUM 63
> +static tld_key_t tld_keys[TEST_BASIC_THREAD_NUM];
> +
> +void *test_task_local_data_basic_thread(void *arg)
> +{
> + LIBBPF_OPTS(bpf_test_run_opts, opts);
> + struct test_task_local_data *skel = (struct test_task_local_data *)arg;
> + struct test_struct *value2;
> + int fd, err, tid, *value1;
> +
> + fd = bpf_map__fd(skel->maps.tld_data_map);
> +
> + value1 = tld_get_data(fd, tld_keys[0]);
> + if (!ASSERT_OK_PTR(value1, "tld_get_data"))
> + goto out;
> +
> + value2 = tld_get_data(fd, tld_keys[1]);
> + if (!ASSERT_OK_PTR(value1, "tld_get_data"))
Should this be 'value2'?
> + goto out;
> +
> + tid = gettid();
> +
> + *value1 = tid + 0;
> + value2->a = tid + 1;
> + value2->b = tid + 2;
> + value2->c = tid + 3;
> + value2->d = tid + 4;
> +
> + pthread_mutex_lock(&global_mutex);
> + /*
> + * Run task_init which simulates an initialization bpf prog that runs once
> + * for every new task. The program saves keys for subsequent bpf programs.
> + */
> + err = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.task_init), &opts);
> + ASSERT_OK(err, "run task_init");
> + ASSERT_OK(opts.retval, "task_init retval");
> + /* Run task_main that read task local data and save to global variables */
> + err = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.task_main), &opts);
> + ASSERT_OK(err, "run task_main");
> + ASSERT_OK(opts.retval, "task_main retval");
> +
> + ASSERT_EQ(skel->bss->test_value1, tid + 0, "tld_get_data value1");
> + ASSERT_EQ(skel->bss->test_value2.a, tid + 1, "tld_get_data value2.a");
> + ASSERT_EQ(skel->bss->test_value2.b, tid + 2, "tld_get_data value2.b");
> + ASSERT_EQ(skel->bss->test_value2.c, tid + 3, "tld_get_data value2.c");
> + ASSERT_EQ(skel->bss->test_value2.d, tid + 4, "tld_get_data value2.d");
> + pthread_mutex_unlock(&global_mutex);
> +
> + /* Make sure valueX are indeed local to threads */
> + ASSERT_EQ(*value1, tid + 0, "value1");
> + ASSERT_EQ(value2->a, tid + 1, "value2.a");
> + ASSERT_EQ(value2->b, tid + 2, "value2.b");
> + ASSERT_EQ(value2->c, tid + 3, "value2.c");
> + ASSERT_EQ(value2->d, tid + 4, "value2.d");
> +
> + *value1 = tid + 4;
> + value2->a = tid + 3;
> + value2->b = tid + 2;
> + value2->c = tid + 1;
> + value2->d = tid + 0;
> +
> + /* Run task_main again */
> + pthread_mutex_lock(&global_mutex);
> + err = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.task_main), &opts);
> + ASSERT_OK(err, "run task_main");
> + ASSERT_OK(opts.retval, "task_main retval");
> +
> + ASSERT_EQ(skel->bss->test_value1, tid + 4, "tld_get_data value1");
> + ASSERT_EQ(skel->bss->test_value2.a, tid + 3, "tld_get_data value2.a");
> + ASSERT_EQ(skel->bss->test_value2.b, tid + 2, "tld_get_data value2.b");
> + ASSERT_EQ(skel->bss->test_value2.c, tid + 1, "tld_get_data value2.c");
> + ASSERT_EQ(skel->bss->test_value2.d, tid + 0, "tld_get_data value2.d");
> + pthread_mutex_unlock(&global_mutex);
> +
> + tld_free();
> +out:
> + pthread_exit(NULL);
> +}
> +
> +static void test_task_local_data_basic(void)
> +{
> + struct test_task_local_data *skel;
> + pthread_t thread[TEST_BASIC_THREAD_NUM];
> + char dummy_key_name[TLD_NAME_LEN];
> + tld_key_t key;
> + int i, fd, err;
> +
> + reset_tld();
> +
> + ASSERT_OK(pthread_mutex_init(&global_mutex, NULL), "pthread_mutex_init");
> +
> + skel = test_task_local_data__open_and_load();
> + if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
> + return;
> +
> + fd = bpf_map__fd(skel->maps.tld_data_map);
> +
> + tld_keys[0] = tld_create_key(fd, "value1", sizeof(int));
> + ASSERT_FALSE(tld_key_is_err(tld_keys[0]), "tld_create_key");
> + tld_keys[1] = tld_create_key(fd, "value2", sizeof(struct test_struct));
> + ASSERT_FALSE(tld_key_is_err(tld_keys[1]), "tld_create_key");
> +
> + key = tld_create_key(fd, "value_not_exist",
> + PAGE_SIZE - sizeof(int) - sizeof(struct test_struct) + 1);
> + ASSERT_EQ(tld_key_err_or_zero(key), -E2BIG, "tld_create_key");
> +
> + key = tld_create_key(fd, "value2", sizeof(struct test_struct));
> + ASSERT_EQ(tld_key_err_or_zero(key), -EEXIST, "tld_create_key");
> +
> + for (i = 2; i < TLD_DATA_CNT; i++) {
> + snprintf(dummy_key_name, TLD_NAME_LEN, "dummy_value%d", i);
> + tld_keys[i] = tld_create_key(fd, dummy_key_name, sizeof(int));
> + ASSERT_FALSE(tld_key_is_err(tld_keys[i]), "tld_create_key");
> + }
> +
> + key = tld_create_key(fd, "value_not_exist", sizeof(struct test_struct));
> + ASSERT_EQ(tld_key_err_or_zero(key), -ENOSPC, "tld_create_key");
> +
> + for (i = 0; i < TEST_BASIC_THREAD_NUM; i++) {
> + err = pthread_create(&thread[i], NULL, test_task_local_data_basic_thread, skel);
> + if (!ASSERT_OK(err, "pthread_create"))
> + goto out;
> + }
> +
> +out:
> + for (i = 0; i < TEST_BASIC_THREAD_NUM; i++)
> + pthread_join(thread[i], NULL);
> +}
> +
> +void test_task_local_data(void)
> +{
> + if (test__start_subtest("task_local_data_basic"))
> + test_task_local_data_basic();
> +}
> diff --git a/tools/testing/selftests/bpf/progs/test_task_local_data.c b/tools/testing/selftests/bpf/progs/test_task_local_data.c
> new file mode 100644
> index 000000000000..4cf0630b19bd
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_task_local_data.c
> @@ -0,0 +1,81 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <vmlinux.h>
> +#include <errno.h>
> +#include <bpf/bpf_helpers.h>
> +
> +#include "task_local_data.bpf.h"
> +
> +struct tld_keys {
> + tld_key_t value1;
> + tld_key_t value2;
> + tld_key_t value_not_exist;
> +};
> +
> +struct test_struct {
> + unsigned long a;
> + unsigned long b;
> + unsigned long c;
> + unsigned long d;
> +};
> +
> +int test_value1;
> +struct test_struct test_value2;
> +
> +SEC("syscall")
> +int task_init(void *ctx)
> +{
> + struct tld_object tld_obj;
> + struct task_struct *task;
> + int err;
> +
> + task = bpf_get_current_task_btf();
> + err = tld_object_init(task, &tld_obj);
> + if (err)
> + return 1;
> +
> + if (!tld_fetch_key(&tld_obj, "value1", value1))
> + return 2;
> +
> + if (!tld_fetch_key(&tld_obj, "value2", value2))
> + return 3;
> +
> + if (tld_fetch_key(&tld_obj, "value_not_exist", value_not_exist))
> + return 6;
> +
> + return 0;
> +}
> +
> +SEC("syscall")
> +int task_main(void *ctx)
> +{
> + struct tld_object tld_obj;
> + struct test_struct *struct_p;
> + struct task_struct *task;
> + int err, *int_p;
> +
> + task = bpf_get_current_task_btf();
> + err = tld_object_init(task, &tld_obj);
> + if (err)
> + return 1;
> +
> + int_p = tld_get_data(&tld_obj, value1, sizeof(int));
> + if (int_p)
> + test_value1 = *int_p;
> + else
> + return 2;
> +
> + struct_p = tld_get_data(&tld_obj, value2, sizeof(struct test_struct));
> + if (struct_p)
> + test_value2 = *struct_p;
> + else
> + return 3;
> +
> + int_p = tld_get_data(&tld_obj, value_not_exist, sizeof(int));
> + if (int_p)
> + return 4;
> +
> + return 0;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> +
> --
> 2.47.1
>
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-05-23 10:18 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-15 21:15 [PATCH bpf-next v4 0/3] Task local data Amery Hung
2025-05-15 21:16 ` [PATCH bpf-next v4 1/3] selftests/bpf: Introduce task " Amery Hung
2025-05-16 18:57 ` Alexei Starovoitov
2025-05-16 20:41 ` Amery Hung
2025-05-16 22:22 ` Alexei Starovoitov
2025-05-20 7:36 ` Amery Hung
2025-05-20 20:03 ` Alexei Starovoitov
2025-05-19 20:04 ` Tejun Heo
2025-05-20 6:44 ` Amery Hung
2025-05-20 22:57 ` Andrii Nakryiko
2025-05-22 17:27 ` Amery Hung
2025-05-22 8:36 ` Tony Ambardar
2025-05-22 16:49 ` Amery Hung
2025-05-23 10:09 ` Tony Ambardar
2025-05-15 21:16 ` [PATCH bpf-next v4 2/3] selftests/bpf: Test basic task local data operations Amery Hung
2025-05-23 10:18 ` Tony Ambardar
2025-05-15 21:16 ` [PATCH bpf-next v4 3/3] selftests/bpf: Test concurrent task local data key creation Amery Hung
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).