BPF List
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] uptr KV store
@ 2025-03-20 21:40 Amery Hung
  2025-03-20 21:40 ` [RFC PATCH 1/4] bpf: Allow creating dynptr from uptr Amery Hung
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Amery Hung @ 2025-03-20 21:40 UTC (permalink / raw)
  To: bpf; +Cc: daniel, andrii, alexei.starovoitov, martin.lau, ameryhung,
	kernel-team

Hi all,

I'd like to discuss uptr KV store in LSFMMBPF'25. This is just an RFC and
the code definitely needs more work, but I hope it delivers the high level
idea.

* Overview *

The uptr KV store implements a key-value store based on existing bpf
features with one small change to the bpf verifier code. A motivation of
this work is to make rolling out a new bpf program with changes to map
value layouts easier. Currently, there is not a simple and easy way to do
it. One may try to create a new map that has the new map value layout,
copy old values into the new one, and then starts the new program.
However, this process is not trivial to automate. uptr KV store provides
an alternative to this. By replacing a structure in a map with the use of
KV store with multiple key-value pairs, changing map layout becomes just
adding/deleting key-value pairs. In addition, by maintaining a manifest
of key-value pairs, the roll out process and be easily automated.

* Design & implementation *

- User space and bpf API

  The uptr KV store provides basic user space and bpf API (get/put/
  delete). In addition, there are APIs for managing space that are only
  provided to the user space. It is assume that all keys are known
  before deploying a bpf program. To use it, the user space program first
  needs to initialize the KV store by calling kv_store_init() and
  initializes all key-value pairs using kv_store_put(). Then, the bpf
  program and user space program can start using the KV store. 

- Single global map lookup in KV store bpf API

  Making the KV store performant enough for use in bpf programs on the hot
  paths is one of the design goal. To achieve this, a key is to keep map
  lookups as little as possible. The current implementation only requires
  one task local storage lookup during one program invocation. Then, get/
  put/delete only involves memory accesses in the uptr regions in the local
  storage.

  The uptr KV store mainly consists of two uptr regions, metadata and data.
  The metadata is an array of metadata indexed by key, where each metadata
  contains the page index and page offset and the size of the key-value
  pair. The data region are pages allocated on-demand for storing values,
  and one page is allocated initially.

  If using string key is desired, the metadata can be moved to a bpf
  hashmap indexed with string keys. However, this will add one hashmap
  lookup to every basic KV store operation.

- 1K max int keys; 1B - 4KB value

  The KV store is indexed by integer keys and the maximum number of keys
  supported is 1K. This is limited by the maximum number of metadata as
  shown below that can be stored in a 4KB uptr metadata array. The largest
  size of a value is also bound to 4KB for the same reason.

  struct kv_store_meta {
        __u32 page_idx:3;
        __u32 page_off:12;
        __u32 size:12;
        __u32 init:1;
  }

- Growable storage backed by uptr

  To be able to accommodate future storage space needed for new key-value
  pairs or temporarily storing old and new key-value pairs at the same
  time during transition, the KV store is growable. This is possible as
  the KV store leverages uptr, which can be allocated in user space on
  demand. The current implementation supports 8 pages, but this maybe
  chanaged if it is too big or small.

- Variable-size data copy with dynptr

  get/put involves copying variable-size data between uptr region and
  stack. Since llvm does not support emitting bytecode for memcpy with
  variable size, byte-by-byte copy in a for loop needs to be used.
  This can be improved by using dynptr. Please refer to the patch 1 for
  details.

* Todo *

- Allocate smaller chunks of memory and grow on demand 


Amery Hung (4):
  bpf: Allow creating dynptr from uptr
  selftests/bpf: Implement basic uptr KV store
  selftests/bpf: Test basic uptr KV store operations from user space and
    bpf
  selftests/bpf: Test changing KV store value layout

 include/uapi/linux/bpf.h                      |   4 +-
 kernel/bpf/verifier.c                         |   3 +-
 .../bpf/prog_tests/test_uptr_kv_store.c       | 154 ++++++++++
 .../selftests/bpf/prog_tests/uptr_kv_store.c  | 282 ++++++++++++++++++
 .../selftests/bpf/prog_tests/uptr_kv_store.h  |  22 ++
 .../selftests/bpf/progs/test_uptr_kv_store.c  |  46 +++
 .../bpf/progs/test_uptr_kv_store_v1.c         |  46 +++
 .../selftests/bpf/progs/uptr_kv_store.h       | 120 ++++++++
 .../selftests/bpf/test_uptr_kv_store_common.h |  22 ++
 .../selftests/bpf/uptr_kv_store_common.h      |  47 +++
 10 files changed, 744 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_uptr_kv_store.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/uptr_kv_store.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/uptr_kv_store.h
 create mode 100644 tools/testing/selftests/bpf/progs/test_uptr_kv_store.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_uptr_kv_store_v1.c
 create mode 100644 tools/testing/selftests/bpf/progs/uptr_kv_store.h
 create mode 100644 tools/testing/selftests/bpf/test_uptr_kv_store_common.h
 create mode 100644 tools/testing/selftests/bpf/uptr_kv_store_common.h

-- 
2.47.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH 1/4] bpf: Allow creating dynptr from uptr
  2025-03-20 21:40 [RFC PATCH 0/4] uptr KV store Amery Hung
@ 2025-03-20 21:40 ` Amery Hung
  2025-03-20 22:45   ` Andrii Nakryiko
  2025-03-20 21:40 ` [RFC PATCH 2/4] selftests/bpf: Implement basic uptr KV store Amery Hung
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Amery Hung @ 2025-03-20 21:40 UTC (permalink / raw)
  To: bpf; +Cc: daniel, andrii, alexei.starovoitov, martin.lau, ameryhung,
	kernel-team

Currently, bpf_dynptr_from_mem() only allows creating dynptr from local
memory of reg type PTR_TO_MAP_VALUE, specifically ringbuf. This patch
futher supports PTR_TO_MEM as a valid source of data.

For a reg to be PTR_TO_MEM in the verifier:
 - read map value with special field BPF_UPTR
 - ld_imm64 kfunc (MEM_RDONLY)
 - ld_imm64 other non-struct ksyms (MEM_RDONLY)
 - return from helper with RET_PTR_TO_MEM: ringbuf_reserve (MEM_RINGBUF)
   and dynptr_from_data
 - return from helper with RET_PTR_TO_MEM_OR_BTF_ID: this_cpu_ptr,
   per_cpu_ptr and the return type is not struct (both MEM_RDONLY)
 - return from special kfunc: dynptr_slice (MEM_RDONLY), dynptr_slice_rdwr
 - return from non-special kfunc that returns non-struct pointer:
   hid_bpf_get_data

Since this patch only allows PTR_TO_MEM without any flags, so only uptr,
global subprog argument, non-special kfunc that returns non-struct ptr,
return of bpf_dynptr_slice_rdwr() and bpf_dynptr_data() will be allowed
additionally.

The last two will allow creating dynptr from dynptr data. Will they create
any problem?

Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
 include/uapi/linux/bpf.h | 4 +++-
 kernel/bpf/verifier.c    | 3 ++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index beac5cdf2d2c..2b1335fa1173 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5562,7 +5562,9 @@ union bpf_attr {
  *	Description
  *		Get a dynptr to local memory *data*.
  *
- *		*data* must be a ptr to a map value.
+ *		*data* must be a ptr to valid local memory such as a map value, a uptr,
+ *		a null-checked non-void pointer pass to a global subprogram, and allocated
+ *		memory returned by a kfunc such as hid_bpf_get_data(),
  *		The maximum *size* supported is DYNPTR_MAX_SIZE.
  *		*flags* is currently unused.
  *	Return
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 22c4edc8695c..d22310d1642c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11307,7 +11307,8 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		}
 		break;
 	case BPF_FUNC_dynptr_from_mem:
-		if (regs[BPF_REG_1].type != PTR_TO_MAP_VALUE) {
+		if (regs[BPF_REG_1].type != PTR_TO_MAP_VALUE &&
+		    regs[BPF_REG_1].type != PTR_TO_MEM) {
 			verbose(env, "Unsupported reg type %s for bpf_dynptr_from_mem data\n",
 				reg_type_str(env, regs[BPF_REG_1].type));
 			return -EACCES;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH 2/4] selftests/bpf: Implement basic uptr KV store
  2025-03-20 21:40 [RFC PATCH 0/4] uptr KV store Amery Hung
  2025-03-20 21:40 ` [RFC PATCH 1/4] bpf: Allow creating dynptr from uptr Amery Hung
@ 2025-03-20 21:40 ` Amery Hung
  2025-03-20 21:40 ` [RFC PATCH 3/4] selftests/bpf: Test basic uptr KV store operations from user space and bpf Amery Hung
  2025-03-20 21:40 ` [RFC PATCH 4/4] selftests/bpf: Test changing KV store value layout Amery Hung
  3 siblings, 0 replies; 8+ messages in thread
From: Amery Hung @ 2025-03-20 21:40 UTC (permalink / raw)
  To: bpf; +Cc: daniel, andrii, alexei.starovoitov, martin.lau, ameryhung,
	kernel-team

The uptr KV store is a dynamically resizable key-value store that aims to
make rolling out bpf programs with map value layout changes easier by
hiding the layout from bpf program. It is built on top of existing bpf
features with both user space and bpf API. To support usage in bpf
programs on hot paths, only simple APIs such as get/put/delete are
provided in bpf, and space managing API are available in user space API.

To use uptr KV store, the user space program first needs to call
kv_store_init() to allocate memory and setup uptrs in the
task_local_storage of a given process. It will return a pointer to
"struct kv_store" on success, which will be used as a token to access the
KV store in other APIs. Secondly, it needs to initialize all key-value
pairs with kv_store_put(). Then, both bpf and user space program can start
their normal operation.

In the bpf program, the API is designed to minimize map lookups.
Therefore, the bpf program needs to first lookup the task_local_storage.
Then, all bpf API will take the map value as the first argument, and
these API do not incur additional map lookups.

A simple way of using KV store to allow easy bpf program rollout is to use
multiple key-value pairs where the values are primitive datatypes instead
of a structure. This way, adding/deleting fields are just adding/deleting
keys without moving data. The following is an example of how this would
work.

  user space: kv_store_init()
  user space: kv_store_put({key1, key2, key3})

  prog_v1: kv_store_get/put({key1, key2, key3})

  user space: kv_store_delete{key1}
  user space: kv_store_add{key4}
  user space: kv_store_set_map_reuse(prog_v2.data_map)

  prog_v2: kv_store_get/put({key2, key3, key4})

At the core of the KV store are metadata and data. To access the value
stored in the data, the metadata is first queried using an integer key.
The metadata is an array of metadata containing the offset and size of
the values. Both metadata and data are stored in uptr regions in the
task_local_storage.

Note that, it is also possible to support string keys by replacing the
backing storage of metadata with an hashmap. However, the additional map
lookup per API may suggest higher performance overhead.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
 .../selftests/bpf/prog_tests/uptr_kv_store.c  | 282 ++++++++++++++++++
 .../selftests/bpf/prog_tests/uptr_kv_store.h  |  22 ++
 .../selftests/bpf/progs/uptr_kv_store.h       | 120 ++++++++
 .../selftests/bpf/uptr_kv_store_common.h      |  47 +++
 4 files changed, 471 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/uptr_kv_store.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/uptr_kv_store.h
 create mode 100644 tools/testing/selftests/bpf/progs/uptr_kv_store.h
 create mode 100644 tools/testing/selftests/bpf/uptr_kv_store_common.h

diff --git a/tools/testing/selftests/bpf/prog_tests/uptr_kv_store.c b/tools/testing/selftests/bpf/prog_tests/uptr_kv_store.c
new file mode 100644
index 000000000000..18328b1d5a9a
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/uptr_kv_store.c
@@ -0,0 +1,282 @@
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <sys/mman.h>
+#include <linux/err.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include "task_local_storage_helpers.h"
+#include "uptr_kv_store.h"
+
+struct kv_store {
+	int data_map_fd;
+	int task_fd;
+	int page_cnt;
+	char *data_map_pin_path;
+	struct kv_store_data_map_value data;
+};
+
+static struct kv_store_page *__kv_store_add_page(struct kv_store *kvs)
+{
+	struct kv_store_page *p;
+
+	if (kvs->page_cnt > KVS_MAX_PAGE_ENTRIES)
+		return ERR_PTR(-ENOSPC);
+
+	p = mmap(NULL, sizeof(struct kv_store_page), PROT_READ | PROT_WRITE,
+		 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+
+	if (p == MAP_FAILED)
+		return ERR_PTR(-ENOMEM);
+
+	kvs->data.pages[kvs->page_cnt].page = p;
+	kvs->page_cnt++;
+
+	return p;
+}
+
+static void __kv_store_del_page(struct kv_store *kvs)
+{
+	struct kv_store_page *p;
+
+	p = kvs->data.pages[kvs->page_cnt - 1].page;
+	kvs->data.pages[kvs->page_cnt - 1].page = NULL;
+	kvs->page_cnt--;
+	munmap(p, sizeof(*p));
+}
+
+static struct kv_store_meta *kvs_store_get_meta(struct kv_store *kvs, int key)
+{
+	return key < KVS_MAX_VAL_ENTRIES ? &kvs->data.metas->meta[key] : NULL;
+}
+
+void kv_store_close(struct kv_store *kvs)
+{
+	int i;
+
+	munmap(kvs->data.metas, sizeof(struct kv_store_metas));
+
+	for (i = 0; i < kvs->page_cnt; i++)
+		__kv_store_del_page(kvs);
+
+	if (kvs->data_map_pin_path)
+		unlink(kvs->data_map_pin_path);
+
+	free(kvs);
+}
+
+struct kv_store *kv_store_init(int pid, struct bpf_map *data_map, const char *pin_path)
+{
+	struct kv_store_page *p;
+	struct kv_store *kvs;
+	int err;
+
+	kvs = calloc(1, sizeof(*kvs));
+	if (!kvs) {
+		errno = -ENOMEM;
+		return NULL;
+	}
+
+	kvs->data.metas = mmap(NULL, sizeof(struct kv_store_page),
+			       PROT_READ | PROT_WRITE,
+			       MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+
+	if (kvs->data.metas == MAP_FAILED) {
+		errno = -ENOMEM;
+		return NULL;
+	}
+
+	p = __kv_store_add_page(kvs);
+	if (IS_ERR(p)) {
+		errno = PTR_ERR(p);
+		goto err;
+	}
+
+	kvs->data_map_fd = bpf_map__fd(data_map);
+	if (!kvs->data_map_fd) {
+		errno = -ENOENT;
+		goto err;
+	}
+
+	kvs->task_fd = sys_pidfd_open(pid, 0);
+	if (!kvs->task_fd) {
+		errno = -ESRCH;
+		goto err;
+	}
+
+	err = bpf_map_update_elem(kvs->data_map_fd, &kvs->task_fd, &kvs->data, 0);
+	if (err) {
+		errno = err;
+		goto err;
+	}
+
+	kvs->data_map_pin_path = strdup(pin_path);
+	if (!kvs->data_map_pin_path)
+		goto err;
+
+	err = bpf_map__pin(data_map, kvs->data_map_pin_path);
+	if (err) {
+		errno = err;
+		goto err;
+	}
+
+	return kvs;
+err:
+	kv_store_close(kvs);
+	return NULL;
+}
+
+int kv_store_data_map_set_reuse(struct kv_store *kvs, struct bpf_map *data_map)
+{
+	return bpf_map__reuse_fd(data_map, kvs->data_map_fd);
+}
+
+void *kv_store_get(struct kv_store *kvs, int key)
+{
+	struct kv_store_meta *meta;
+	struct kv_store_page *p;
+
+	meta = kvs_store_get_meta(kvs, key);
+	if (!meta || !meta->init)
+		return NULL;
+
+	p = kvs->data.pages[meta->page_idx].page;
+
+	return p->data + meta->page_off;
+}
+
+static int linear_off(const struct kv_store_meta *meta)
+{
+	if (!meta->init)
+		return KVS_MAX_PAGE_ENTRIES * KVS_MAX_VAL_SIZE;
+
+	return meta->page_idx * KVS_MAX_VAL_SIZE + meta->page_off;
+}
+
+static int comp_meta(const void *m1, const void *m2)
+{
+	struct kv_store_meta *meta1 = (struct kv_store_meta *)m1;
+	struct kv_store_meta *meta2 = (struct kv_store_meta *)m2;
+	int off1, off2;
+
+	off1 = linear_off(meta1);
+	off2 = linear_off(meta2);
+
+	if (off1 > off2)
+		return 1;
+	else if (off1 < off2)
+		return -1;
+	else
+		return 0;
+}
+
+static int kv_store_find_next_slot(struct kv_store *kvs, int size, struct kv_store_meta *meta)
+{
+	struct kv_store_meta metas[KVS_MAX_VAL_ENTRIES];
+	int i, err, off, next_off = 0;
+	struct kv_store_page *p;
+
+	memcpy(metas, kvs->data.metas, sizeof(struct kv_store_meta) * KVS_MAX_VAL_ENTRIES);
+
+	qsort(metas, KVS_MAX_VAL_ENTRIES, sizeof(struct kv_store_meta), comp_meta);
+
+	for (i = 0; i < KVS_MAX_VAL_ENTRIES; i++) {
+		off = linear_off(&metas[i]);
+		if (off - next_off >= size &&
+		    next_off / PAGE_SIZE == (next_off + size - 1) / PAGE_SIZE) {
+			break;
+		}
+		next_off = off + metas[i].size;
+	}
+
+	meta->page_idx = next_off / PAGE_SIZE;
+	meta->page_off = next_off % PAGE_SIZE;
+	meta->size = size;
+
+	if (meta->page_idx >= kvs->page_cnt) {
+		p = __kv_store_add_page(kvs);
+		if (!p)
+			return -ENOMEM;
+
+		err = bpf_map_update_elem(kvs->data_map_fd, &kvs->task_fd, &kvs->data, 0);
+		if (err) {
+			__kv_store_del_page(kvs);
+			return err;
+		}
+	}
+
+	return 0;
+}
+
+int kv_store_put(struct kv_store *kvs, int key, void *val, unsigned int val_size)
+{
+	struct kv_store_meta *meta;
+	struct kv_store_page *p;
+	int err;
+
+	meta = kvs_store_get_meta(kvs, key);
+	if (!meta)
+		return -ENOENT;
+
+	if (!meta->init) {
+		if (val_size > KVS_MAX_VAL_SIZE)
+			return -E2BIG;
+
+		err = kv_store_find_next_slot(kvs, val_size, meta);
+		if (err)
+			return err;
+	}
+
+	p = kvs->data.pages[meta->page_idx].page;
+	val_size = val_size < meta->size ? val_size : meta->size;
+	memcpy((char *)p->data + meta->page_off, val, val_size);
+	meta->init = 1;
+	return 0;
+}
+
+void kv_store_delete(struct kv_store *kvs, int key)
+{
+	struct kv_store_meta *meta;
+	struct kv_store_page *p;
+
+	meta = kvs_store_get_meta(kvs, key);
+	if (!meta)
+		return;
+
+	p = kvs->data.pages[meta->page_idx].page;
+	memset(p->data + meta->page_off, 0, meta->size);
+	memset(meta, 0, sizeof(*meta));
+}
+
+int kv_store_update_value_size(struct kv_store *kvs, int key, unsigned int val_size)
+{
+	struct kv_store_meta *meta, new_meta;
+	struct kv_store_page *old_p, *new_p;
+	int err;
+
+	if (val_size > KVS_MAX_VAL_SIZE)
+		return -E2BIG;
+
+	meta = kvs_store_get_meta(kvs, key);
+	if (!meta || !meta->init)
+		return -ENOENT;
+
+	if (val_size <= meta->size) {
+		meta->size = val_size;
+		return 0;
+	}
+
+	err = kv_store_find_next_slot(kvs, val_size, &new_meta);
+	if (err)
+		return -ENOSPC;
+
+	old_p = kvs->data.pages[meta->page_idx].page;
+	new_p = kvs->data.pages[new_meta.page_idx].page;
+
+	memcpy(new_p->data + new_meta.page_off,
+	       old_p->data + meta->page_off, meta->size);
+
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/prog_tests/uptr_kv_store.h b/tools/testing/selftests/bpf/prog_tests/uptr_kv_store.h
new file mode 100644
index 000000000000..a1da3e6e2de3
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/uptr_kv_store.h
@@ -0,0 +1,22 @@
+#ifndef _UPTR_KV_STORE_H
+#define _UPTR_KV_STORE_H
+
+#include "uptr_kv_store_common.h"
+
+struct kv_store;
+
+void kv_store_close(struct kv_store *kvs);
+
+struct kv_store *kv_store_init(int pid, struct bpf_map *data_map, const char *pin_path);
+
+int kv_store_data_map_set_reuse(struct kv_store *kvs, struct bpf_map *data_map);
+
+void *kv_store_get(struct kv_store *kvs, int key);
+
+int kv_store_put(struct kv_store *kvs, int key, void *val, unsigned int val_size);
+
+void kv_store_delete(struct kv_store *kvs, int key);
+
+int kv_store_update_value_size(struct kv_store *kvs, int key, unsigned int val_size);
+
+#endif
diff --git a/tools/testing/selftests/bpf/progs/uptr_kv_store.h b/tools/testing/selftests/bpf/progs/uptr_kv_store.h
new file mode 100644
index 000000000000..9109073a4933
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/uptr_kv_store.h
@@ -0,0 +1,120 @@
+#ifndef _UPTR_KV_STORE_H
+#define _UPTR_KV_STORE_H
+
+#include <errno.h>
+#include <string.h>
+#include <bpf/bpf_helpers.h>
+
+#include "uptr_kv_store_common.h"
+
+struct {
+	__uint(type, BPF_MAP_TYPE_TASK_STORAGE);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+	__type(key, int);
+	__type(value, struct kv_store_data_map_value);
+} data_map SEC(".maps");
+
+static int bpf_dynptr_from_kv_store(struct kv_store_data_map_value *data, int key,
+				    unsigned int val_size, struct bpf_dynptr *ptr,
+				    struct kv_store_meta **meta)
+{
+	struct kv_store_page *p = NULL;
+	u16 _key = 0;
+
+	if (!data || !data->metas)
+		return -ENOENT;
+
+	/* workaround. llvm generates memory access with unbound key with the following code:
+	 * if (key >= KVS_MAX_VAL_ENTRIES)
+	 *         return -ENOENT;
+	 *
+	 * ; *meta = &data->metas->meta[key]; @ uptr_kv_store.h:37
+	 * 62: (bc) w2 = w2                      ; frame1: R2_w=scalar(id=3,smin=0,smax=umax=0xffffffff,smax32=1023,var_off=(0x0; 0xffffffff))
+	 * 63: (67) r2 <<= 32                    ; frame1: R2_w=scalar(smax=0x3ff00000000,umax=0xffffffff00000000,smin32=0,smax32=umax32=0,var_off=(0x0; 0xffffffff00000000))
+	 * 64: (c7) r2 s>>= 32                   ; frame1: R2_w=scalar(smin=0xffffffff80000000,smax=smax32=1023)
+	 * 65: (67) r2 <<= 2                     ; frame1: R2_w=scalar(smax=0x7ffffffffffffffc,umax=0xfffffffffffffffc,smax32=0x7ffffffc,umax32=0xfffffffc,var_off=(0x0; 0xfffffffffffffffc))
+	 * 66: (0f) r6 += r2
+	 * math between mem pointer and register with unbounded min value is not allowed
+	 */
+	_key += key;
+	if (_key >= KVS_MAX_VAL_ENTRIES)
+		return -ENOENT;
+
+	*meta = &data->metas->meta[_key];
+	if (!(*meta)->init)
+		return -ENOENT;
+
+	/* workaround for variable offset uptr access:
+	 * p = data->pages[meta->page_idx].page;
+	 */
+	switch((*meta)->page_idx) {
+	case 0: p = data->pages[0].page; break;
+	case 1: p = data->pages[1].page; break;
+	case 2: p = data->pages[2].page; break;
+	case 3: p = data->pages[3].page; break;
+	case 4: p = data->pages[4].page; break;
+	case 5: p = data->pages[5].page; break;
+	case 6: p = data->pages[6].page; break;
+	case 7: p = data->pages[7].page; break;
+	}
+
+	if (!p)
+		return -ENOENT;
+
+	val_size = val_size < (*meta)->size ? val_size : (*meta)->size;
+
+	if ((*meta)->page_off >= KVS_MAX_VAL_SIZE)
+		return -EINVAL;
+
+	return bpf_dynptr_from_mem(p->data, KVS_MAX_VAL_SIZE, 0, ptr);
+}
+
+__attribute__((unused))
+static int kv_store_put(struct kv_store_data_map_value *data, int key,
+			void *val, unsigned int val_size)
+{
+	struct kv_store_meta *meta;
+	struct bpf_dynptr ptr;
+	int err;
+
+	err = bpf_dynptr_from_kv_store(data, key, val_size, &ptr, &meta);
+	if (err)
+		return err;
+
+	return bpf_dynptr_write(&ptr, meta->page_off, val, val_size, 0);
+}
+
+__attribute__((unused))
+static int kv_store_get(struct kv_store_data_map_value *data, int key,
+			void *val, unsigned int val_size)
+{
+	struct kv_store_meta *meta;
+	struct bpf_dynptr ptr;
+	int err;
+
+	err = bpf_dynptr_from_kv_store(data, key, val_size, &ptr, &meta);
+	if (err)
+		return err;
+
+	return bpf_dynptr_read(val, val_size, &ptr, meta->page_off, 0);
+}
+
+__attribute__((unused))
+static int kv_store_delete(struct kv_store_data_map_value *data, int key)
+{
+	struct kv_store_meta *meta;
+	u16 _key = 0;
+
+	if (!data || !data->metas)
+		return -ENOENT;
+
+	_key += key;
+	if (_key >= KVS_MAX_VAL_ENTRIES)
+		return -ENOENT;
+
+	meta = &data->metas->meta[_key];
+	meta->init = 0;
+	return 0;
+}
+
+#endif
diff --git a/tools/testing/selftests/bpf/uptr_kv_store_common.h b/tools/testing/selftests/bpf/uptr_kv_store_common.h
new file mode 100644
index 000000000000..af69cd0b32da
--- /dev/null
+++ b/tools/testing/selftests/bpf/uptr_kv_store_common.h
@@ -0,0 +1,47 @@
+#ifndef _UPTR_KV_STORE_COMMON_H
+#define _UPTR_KV_STORE_COMMON_H
+
+#define PAGE_SIZE		4096
+#define KVS_MAX_KEY_SIZE 	32
+#define KVS_MAX_VAL_SIZE 	PAGE_SIZE
+#define KVS_MAX_VAL_ENTRIES 	1024
+
+#define KVS_VALUE_INFO_PAGE_IDX_BIT	3
+#define KVS_VALUE_INFO_PAGE_OFF_BIT	12
+#define KVS_VALUE_INFO_VAL_SIZE_BIT	12
+
+#define KVS_MAX_PAGE_ENTRIES	(1 << KVS_VALUE_INFO_PAGE_IDX_BIT)
+
+#ifdef __BPF__
+struct kv_store_page *dummy_page;
+struct kv_store_metas *dummy_metas;
+#else
+#define __uptr
+#define __kptr
+#endif
+
+struct kv_store_meta {
+	__u32 page_idx:KVS_VALUE_INFO_PAGE_IDX_BIT;
+	__u32 page_off:KVS_VALUE_INFO_PAGE_OFF_BIT;
+	__u32 size:KVS_VALUE_INFO_VAL_SIZE_BIT;
+	__u32 init:1;
+};
+
+struct kv_store_metas {
+	struct kv_store_meta meta[KVS_MAX_VAL_ENTRIES];
+};
+
+struct kv_store_page_entry {
+	struct kv_store_page __uptr *page;
+};
+
+struct kv_store_data_map_value {
+	struct kv_store_metas __uptr *metas;
+	struct kv_store_page_entry pages[KVS_MAX_PAGE_ENTRIES];
+};
+
+struct kv_store_page {
+	char data[KVS_MAX_VAL_SIZE];
+};
+
+#endif
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH 3/4] selftests/bpf: Test basic uptr KV store operations from user space and bpf
  2025-03-20 21:40 [RFC PATCH 0/4] uptr KV store Amery Hung
  2025-03-20 21:40 ` [RFC PATCH 1/4] bpf: Allow creating dynptr from uptr Amery Hung
  2025-03-20 21:40 ` [RFC PATCH 2/4] selftests/bpf: Implement basic uptr KV store Amery Hung
@ 2025-03-20 21:40 ` Amery Hung
  2025-03-20 21:40 ` [RFC PATCH 4/4] selftests/bpf: Test changing KV store value layout Amery Hung
  3 siblings, 0 replies; 8+ messages in thread
From: Amery Hung @ 2025-03-20 21:40 UTC (permalink / raw)
  To: bpf; +Cc: daniel, andrii, alexei.starovoitov, martin.lau, ameryhung,
	kernel-team

Make sure both user space and bpf programs can correctly manipulate the
KV store. In the first loop, for each key, we first try to get the value,
this should fail as it is not yet initialized. Then, we put the value as
the key. In the second loop, for each key, a bpf program is triggered to
get the value, and then put a new value. In the final loop, we get the
value again in the user space and make sure they are the new value.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
 .../bpf/prog_tests/test_uptr_kv_store.c       | 77 +++++++++++++++++++
 .../selftests/bpf/progs/test_uptr_kv_store.c  | 37 +++++++++
 .../selftests/bpf/test_uptr_kv_store_common.h |  9 +++
 3 files changed, 123 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_uptr_kv_store.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_uptr_kv_store.c
 create mode 100644 tools/testing/selftests/bpf/test_uptr_kv_store_common.h

diff --git a/tools/testing/selftests/bpf/prog_tests/test_uptr_kv_store.c b/tools/testing/selftests/bpf/prog_tests/test_uptr_kv_store.c
new file mode 100644
index 000000000000..2075b8e47972
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/test_uptr_kv_store.c
@@ -0,0 +1,77 @@
+#include <test_progs.h>
+
+#include "uptr_kv_store.h"
+#include "test_uptr_kv_store_common.h"
+#include "test_uptr_kv_store.skel.h"
+
+static void test_uptr_kv_store_basic(void)
+{
+	int err, i, pid, int_val, *int_val_p, max_int_entries;
+	struct test_uptr_kv_store *skel;
+	struct kv_store *kvs = NULL;
+
+	skel = test_uptr_kv_store__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
+		return;
+
+	skel->bss->target_pid = -1;
+	err = test_uptr_kv_store__attach(skel);
+	if (!ASSERT_OK(err, "skel_attach"))
+		return;
+
+	kvs = kv_store_init(getpid(), skel->maps.data_map, "/sys/fs/bpf/kv_store_data_map");
+	if (!ASSERT_OK_PTR(kvs, "kv_store_init"))
+		return;
+
+	max_int_entries = KVS_MAX_VAL_ENTRIES;
+
+	err = kv_store_update_value_size(kvs, 0, KVS_MAX_VAL_SIZE);
+	ASSERT_ERR(err, "kv_store_update_value_size");
+
+	err = kv_store_put(kvs, 0, &int_val, KVS_MAX_VAL_SIZE + 1);
+	ASSERT_ERR(err, "kv_store_put");
+
+	for (i = 0; i < max_int_entries; i++) {
+		int_val_p = kv_store_get(kvs, i);
+		if (!ASSERT_ERR_PTR(int_val_p, "kv_store_get int_val"))
+			goto out;
+
+		err = kv_store_put(kvs, i, &i, sizeof(i));
+		if (!ASSERT_OK(err, "kv_store_put int_val"))
+			goto out;
+	}
+
+	pid = sys_gettid();
+	skel->bss->target_pid = pid;
+	for (i = 0; i < max_int_entries; i++) {
+		skel->bss->test_key = i;
+		skel->bss->test_op = KVS_INT_GET;
+		sys_gettid();
+		ASSERT_EQ(skel->bss->test_int_val, i, "bpf: check int_val[i] = i");
+
+		skel->bss->test_int_val += 1;
+		skel->bss->test_op = KVS_INT_PUT;
+		sys_gettid();
+	}
+	skel->bss->target_pid = -1;
+
+	for (i = 0; i < max_int_entries; i++) {
+		int_val_p = kv_store_get(kvs, i);
+		if (!ASSERT_OK_PTR(int_val_p, "kv_store_get int_val"))
+			goto out;
+
+		ASSERT_EQ(*int_val_p, i + 1, "user space: check int_val[i] == i + 1");
+	}
+
+	err = kv_store_put(kvs, max_int_entries, &int_val, sizeof(int));
+	ASSERT_EQ(err, -ENOENT, "kv_store_put int_val");
+
+out:
+	kv_store_close(kvs);
+}
+
+void test_uptr_kv_store(void)
+{
+	if (test__start_subtest("uptr_kv_store_basic"))
+		test_uptr_kv_store_basic();
+}
diff --git a/tools/testing/selftests/bpf/progs/test_uptr_kv_store.c b/tools/testing/selftests/bpf/progs/test_uptr_kv_store.c
new file mode 100644
index 000000000000..b358cb7fb616
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_uptr_kv_store.c
@@ -0,0 +1,37 @@
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+
+#include "uptr_kv_store.h"
+#include "test_uptr_kv_store_common.h"
+
+pid_t target_pid = 0;
+int test_op;
+int test_key;
+int test_int_val;
+
+SEC("tp_btf/sys_enter")
+int on_enter(__u64 *ctx)
+{
+	struct kv_store_data_map_value *data;
+	struct task_struct *task;
+
+	task = bpf_get_current_task_btf();
+	if (task->pid != target_pid)
+		return 0;
+
+	data = bpf_task_storage_get(&data_map, task, 0, 0);
+
+	switch (test_op) {
+	case KVS_INT_PUT:
+		kv_store_put(data, test_key, &test_int_val, 4);
+		break;
+	case KVS_INT_GET:
+		kv_store_get(data, test_key, &test_int_val, 4);
+		break;
+	}
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+
diff --git a/tools/testing/selftests/bpf/test_uptr_kv_store_common.h b/tools/testing/selftests/bpf/test_uptr_kv_store_common.h
new file mode 100644
index 000000000000..ff7d010ed08f
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_uptr_kv_store_common.h
@@ -0,0 +1,9 @@
+#ifndef _TEST_UPTR_KV_STORE_COMMON_H
+#define _TEST_UPTR_KV_STORE_COMMON_H
+
+enum test_kvs_op {
+	KVS_INT_GET,
+	KVS_INT_PUT,
+};
+
+#endif
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH 4/4] selftests/bpf: Test changing KV store value layout
  2025-03-20 21:40 [RFC PATCH 0/4] uptr KV store Amery Hung
                   ` (2 preceding siblings ...)
  2025-03-20 21:40 ` [RFC PATCH 3/4] selftests/bpf: Test basic uptr KV store operations from user space and bpf Amery Hung
@ 2025-03-20 21:40 ` Amery Hung
  3 siblings, 0 replies; 8+ messages in thread
From: Amery Hung @ 2025-03-20 21:40 UTC (permalink / raw)
  To: bpf; +Cc: daniel, andrii, alexei.starovoitov, martin.lau, ameryhung,
	kernel-team

While it is not the most ideal way I imagine how the KV store to be
used. The test tries to show upgrading a bpf program with a change in
the definition of a structure. If using a structure for the value, while
adding a new member is easy, it is less clear how removing or changing
member layout would work.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
---
 .../bpf/prog_tests/test_uptr_kv_store.c       | 77 +++++++++++++++++++
 .../selftests/bpf/progs/test_uptr_kv_store.c  |  9 +++
 .../bpf/progs/test_uptr_kv_store_v1.c         | 46 +++++++++++
 .../selftests/bpf/test_uptr_kv_store_common.h | 13 ++++
 4 files changed, 145 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/test_uptr_kv_store_v1.c

diff --git a/tools/testing/selftests/bpf/prog_tests/test_uptr_kv_store.c b/tools/testing/selftests/bpf/prog_tests/test_uptr_kv_store.c
index 2075b8e47972..2e86bb08b26e 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_uptr_kv_store.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_uptr_kv_store.c
@@ -3,6 +3,7 @@
 #include "uptr_kv_store.h"
 #include "test_uptr_kv_store_common.h"
 #include "test_uptr_kv_store.skel.h"
+#include "test_uptr_kv_store_v1.skel.h"
 
 static void test_uptr_kv_store_basic(void)
 {
@@ -70,8 +71,84 @@ static void test_uptr_kv_store_basic(void)
 	kv_store_close(kvs);
 }
 
+static void test_uptr_kv_store_change_value(void)
+{
+	int err, pid;
+	struct test_uptr_kv_store_v1 *skel_v1;
+	struct test_uptr_kv_store *skel;
+	struct test_struct_v1 val_v1;
+	struct test_struct val, *val_p;
+	struct kv_store *kvs = NULL;
+
+	skel = test_uptr_kv_store__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
+		return;
+
+	skel->bss->target_pid = -1;
+	err = test_uptr_kv_store__attach(skel);
+	if (!ASSERT_OK(err, "skel_attach"))
+		return;
+
+	pid = sys_gettid();
+	kvs = kv_store_init(pid, skel->maps.data_map, "/sys/fs/bpf/kv_store_data_map");
+
+	/* update key 0 to test_struct in user space */
+	val.a = 1;
+	val.b = 2;
+	err = kv_store_put(kvs, 0, &val, sizeof(val));
+	ASSERT_OK(err, "kv_store_put struct val");
+	val_p = kv_store_get(kvs, 0);
+	ASSERT_OK_PTR(val_p, "kv_store_get struct val");
+	ASSERT_EQ(val_p->a, val.a, "user space: check get val.a == put val.a");
+	ASSERT_EQ(val_p->b, val.b, "user space: check get val.b == put val.b");
+
+	/* lookup test_struct at key 0 in test_uptr_kv_store */
+	skel->bss->test_key = 0;
+	skel->bss->test_op = KVS_STRUCT_GET;
+	skel->bss->target_pid = pid;
+	sys_gettid();
+	skel->bss->target_pid = -1;
+	ASSERT_EQ(skel->bss->test_struct_val.a, val.a, "bpf: check get val.a == put val.a");
+	ASSERT_EQ(skel->bss->test_struct_val.b, val.b, "bpf: check get val.b == put val.b");
+
+	/* add a new field to test_struct */
+	err = kv_store_update_value_size(kvs, 0, sizeof(val_v1));
+	ASSERT_OK(err, "kv_store_update_value_size");
+
+	/* rollout a new version */
+	skel_v1 = test_uptr_kv_store_v1__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open v1"))
+		goto out;
+
+	kv_store_data_map_set_reuse(kvs, skel_v1->maps.data_map);
+
+	err = test_uptr_kv_store_v1__load(skel_v1);
+	if (!ASSERT_OK(err, "skel_load v1"))
+		goto out;
+
+	skel_v1->bss->target_pid = -1;
+	err = test_uptr_kv_store_v1__attach(skel_v1);
+	if (!ASSERT_OK(err, "skel_attach v1"))
+		goto out;
+
+	/* lookup struct_key_0 in test_uptr_kv_store */
+	skel_v1->bss->test_key = 0;
+	skel_v1->bss->test_op = KVS_STRUCT_GET;
+	skel_v1->bss->target_pid = pid;
+	sys_gettid();
+	skel_v1->bss->target_pid = -1;
+
+	ASSERT_EQ(skel_v1->bss->test_struct_val.a, val.a, "bpf: check get val_v1.a == put val.a");
+	ASSERT_EQ(skel_v1->bss->test_struct_val.b, val.b, "bpf: check get val_v1.b == put val.b");
+
+out:
+	kv_store_close(kvs);
+}
+
 void test_uptr_kv_store(void)
 {
 	if (test__start_subtest("uptr_kv_store_basic"))
 		test_uptr_kv_store_basic();
+	if (test__start_subtest("uptr_kv_store_change_value"))
+		test_uptr_kv_store_change_value();
 }
diff --git a/tools/testing/selftests/bpf/progs/test_uptr_kv_store.c b/tools/testing/selftests/bpf/progs/test_uptr_kv_store.c
index b358cb7fb616..2ed993ab4b01 100644
--- a/tools/testing/selftests/bpf/progs/test_uptr_kv_store.c
+++ b/tools/testing/selftests/bpf/progs/test_uptr_kv_store.c
@@ -8,6 +8,7 @@ pid_t target_pid = 0;
 int test_op;
 int test_key;
 int test_int_val;
+struct test_struct test_struct_val;
 
 SEC("tp_btf/sys_enter")
 int on_enter(__u64 *ctx)
@@ -28,6 +29,14 @@ int on_enter(__u64 *ctx)
 	case KVS_INT_GET:
 		kv_store_get(data, test_key, &test_int_val, 4);
 		break;
+	case KVS_STRUCT_PUT:
+		kv_store_put(data, test_key, &test_struct_val,
+			     sizeof(test_struct_val));
+		break;
+	case KVS_STRUCT_GET:
+		kv_store_get(data, test_key, &test_struct_val,
+			     sizeof(test_struct_val));
+		break;
 	}
 
 	return 0;
diff --git a/tools/testing/selftests/bpf/progs/test_uptr_kv_store_v1.c b/tools/testing/selftests/bpf/progs/test_uptr_kv_store_v1.c
new file mode 100644
index 000000000000..e3dd11e7d11b
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_uptr_kv_store_v1.c
@@ -0,0 +1,46 @@
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+
+#include "uptr_kv_store.h"
+#include "test_uptr_kv_store_common.h"
+
+pid_t target_pid = 0;
+int test_op;
+int test_key;
+int test_int_val;
+struct test_struct_v1 test_struct_val;
+
+SEC("tp_btf/sys_enter")
+int on_enter(__u64 *ctx)
+{
+	struct kv_store_data_map_value *data;
+	struct task_struct *task;
+
+	task = bpf_get_current_task_btf();
+	if (task->pid != target_pid)
+		return 0;
+
+	data = bpf_task_storage_get(&data_map, task, 0, 0);
+
+	switch (test_op) {
+	case KVS_INT_PUT:
+		kv_store_put(data, test_key, &test_int_val, 4);
+		break;
+	case KVS_INT_GET:
+		kv_store_get(data, test_key, &test_int_val, 4);
+		break;
+	case KVS_STRUCT_PUT:
+		kv_store_put(data, test_key, &test_struct_val,
+			     sizeof(test_struct_val));
+		break;
+	case KVS_STRUCT_GET:
+		kv_store_get(data, test_key, &test_struct_val,
+			     sizeof(test_struct_val));
+		break;
+	}
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+
diff --git a/tools/testing/selftests/bpf/test_uptr_kv_store_common.h b/tools/testing/selftests/bpf/test_uptr_kv_store_common.h
index ff7d010ed08f..db91e862789f 100644
--- a/tools/testing/selftests/bpf/test_uptr_kv_store_common.h
+++ b/tools/testing/selftests/bpf/test_uptr_kv_store_common.h
@@ -4,6 +4,19 @@
 enum test_kvs_op {
 	KVS_INT_GET,
 	KVS_INT_PUT,
+	KVS_STRUCT_GET,
+	KVS_STRUCT_PUT,
+};
+
+struct test_struct {
+	int a;
+	int b;
+};
+
+struct test_struct_v1 {
+	int a;
+	int b;
+	int c;
 };
 
 #endif
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 1/4] bpf: Allow creating dynptr from uptr
  2025-03-20 21:40 ` [RFC PATCH 1/4] bpf: Allow creating dynptr from uptr Amery Hung
@ 2025-03-20 22:45   ` Andrii Nakryiko
  2025-03-20 23:20     ` Amery Hung
  0 siblings, 1 reply; 8+ messages in thread
From: Andrii Nakryiko @ 2025-03-20 22:45 UTC (permalink / raw)
  To: Amery Hung
  Cc: bpf, daniel, andrii, alexei.starovoitov, martin.lau, kernel-team

On Thu, Mar 20, 2025 at 2:41 PM Amery Hung <ameryhung@gmail.com> wrote:
>
> Currently, bpf_dynptr_from_mem() only allows creating dynptr from local
> memory of reg type PTR_TO_MAP_VALUE, specifically ringbuf. This patch
> futher supports PTR_TO_MEM as a valid source of data.
>
> For a reg to be PTR_TO_MEM in the verifier:
>  - read map value with special field BPF_UPTR
>  - ld_imm64 kfunc (MEM_RDONLY)
>  - ld_imm64 other non-struct ksyms (MEM_RDONLY)
>  - return from helper with RET_PTR_TO_MEM: ringbuf_reserve (MEM_RINGBUF)
>    and dynptr_from_data
>  - return from helper with RET_PTR_TO_MEM_OR_BTF_ID: this_cpu_ptr,
>    per_cpu_ptr and the return type is not struct (both MEM_RDONLY)
>  - return from special kfunc: dynptr_slice (MEM_RDONLY), dynptr_slice_rdwr
>  - return from non-special kfunc that returns non-struct pointer:
>    hid_bpf_get_data
>
> Since this patch only allows PTR_TO_MEM without any flags, so only uptr,
> global subprog argument, non-special kfunc that returns non-struct ptr,
> return of bpf_dynptr_slice_rdwr() and bpf_dynptr_data() will be allowed
> additionally.
>
> The last two will allow creating dynptr from dynptr data. Will they create
> any problem?

Yes, I think so. You need to make sure that dynptr you created from
that PTR_TO_MEM is invalidated if that memory "goes away". E.g., for
ringbuf case:

void *r = bpf_ringbuf_reserve(..., 100);

struct dynptr d;
bpf_dynptr_from_mem(r, 100, 0, &d);

void *p = bpf_dynptr_data(&d, 0, 100);
if (!p) return 0; /* can't happen */

bpf_ringbuf_submit(r, 0);


*(char *)p = '\0'; /* bad things happen */


Do you handle that situation? With PTR_TO_MAP_VALUE "bad things" can't
happen even if value is actually deleted/reused (besides overwriting
some other element's value, which we can do without dynptrs anyways),
because that memory won't go away due to RCU and it doesn't contain
any information important for correctness (ringbuf data area does have
it).


>
> Signed-off-by: Amery Hung <ameryhung@gmail.com>
> ---
>  include/uapi/linux/bpf.h | 4 +++-
>  kernel/bpf/verifier.c    | 3 ++-
>  2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index beac5cdf2d2c..2b1335fa1173 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5562,7 +5562,9 @@ union bpf_attr {
>   *     Description
>   *             Get a dynptr to local memory *data*.
>   *
> - *             *data* must be a ptr to a map value.
> + *             *data* must be a ptr to valid local memory such as a map value, a uptr,
> + *             a null-checked non-void pointer pass to a global subprogram, and allocated
> + *             memory returned by a kfunc such as hid_bpf_get_data(),
>   *             The maximum *size* supported is DYNPTR_MAX_SIZE.
>   *             *flags* is currently unused.
>   *     Return
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 22c4edc8695c..d22310d1642c 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -11307,7 +11307,8 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>                 }
>                 break;
>         case BPF_FUNC_dynptr_from_mem:
> -               if (regs[BPF_REG_1].type != PTR_TO_MAP_VALUE) {
> +               if (regs[BPF_REG_1].type != PTR_TO_MAP_VALUE &&
> +                   regs[BPF_REG_1].type != PTR_TO_MEM) {
>                         verbose(env, "Unsupported reg type %s for bpf_dynptr_from_mem data\n",
>                                 reg_type_str(env, regs[BPF_REG_1].type));
>                         return -EACCES;
> --
> 2.47.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 1/4] bpf: Allow creating dynptr from uptr
  2025-03-20 22:45   ` Andrii Nakryiko
@ 2025-03-20 23:20     ` Amery Hung
  2025-03-28 18:59       ` Andrii Nakryiko
  0 siblings, 1 reply; 8+ messages in thread
From: Amery Hung @ 2025-03-20 23:20 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, daniel, andrii, alexei.starovoitov, martin.lau, kernel-team

On Thu, Mar 20, 2025 at 3:45 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Mar 20, 2025 at 2:41 PM Amery Hung <ameryhung@gmail.com> wrote:
> >
> > Currently, bpf_dynptr_from_mem() only allows creating dynptr from local
> > memory of reg type PTR_TO_MAP_VALUE, specifically ringbuf. This patch
> > futher supports PTR_TO_MEM as a valid source of data.
> >
> > For a reg to be PTR_TO_MEM in the verifier:
> >  - read map value with special field BPF_UPTR
> >  - ld_imm64 kfunc (MEM_RDONLY)
> >  - ld_imm64 other non-struct ksyms (MEM_RDONLY)
> >  - return from helper with RET_PTR_TO_MEM: ringbuf_reserve (MEM_RINGBUF)
> >    and dynptr_from_data
> >  - return from helper with RET_PTR_TO_MEM_OR_BTF_ID: this_cpu_ptr,
> >    per_cpu_ptr and the return type is not struct (both MEM_RDONLY)
> >  - return from special kfunc: dynptr_slice (MEM_RDONLY), dynptr_slice_rdwr
> >  - return from non-special kfunc that returns non-struct pointer:
> >    hid_bpf_get_data
> >
> > Since this patch only allows PTR_TO_MEM without any flags, so only uptr,
> > global subprog argument, non-special kfunc that returns non-struct ptr,
> > return of bpf_dynptr_slice_rdwr() and bpf_dynptr_slice_rdwr() will be allowed
> > additionally.
> >
> > The last two will allow creating dynptr from dynptr data. Will they create
> > any problem?
>
> Yes, I think so. You need to make sure that dynptr you created from
> that PTR_TO_MEM is invalidated if that memory "goes away". E.g., for
> ringbuf case:
>
> void *r = bpf_ringbuf_reserve(..., 100);
>
> struct dynptr d;
> bpf_dynptr_from_mem(r, 100, 0, &d);
>

^ This will fail during verification because "r" will be PTR_TO_MEM |
MEM_RINGBUF.

Only five of the listed PTR_TO_MEM cases will be allowed with this
patch additionally: uptr, global subprog argument, hid_bpf_get_data,
bpf_dynptr_ptr_data and bpf_dynptr_slice_rdwr. For the former three,
the memory seems to be valid all the time. For the last two, IIUC,
bpf_dynptr_data or bpf_dynptr_slice_rdwr should be valid if null
checks pass. I am just so not sure about the nested situation (i.e.,
creating another dynptr from data behind a dynptr).

Thanks,
Amery

> void *p = bpf_dynptr_data(&d, 0, 100);
> if (!p) return 0; /* can't happen */
>
> bpf_ringbuf_submit(r, 0);
>
>
> *(char *)p = '\0'; /* bad things happen */
>
>
> Do you handle that situation? With PTR_TO_MAP_VALUE "bad things" can't
> happen even if value is actually deleted/reused (besides overwriting
> some other element's value, which we can do without dynptrs anyways),
> because that memory won't go away due to RCU and it doesn't contain
> any information important for correctness (ringbuf data area does have
> it).
>
>
> >
> > Signed-off-by: Amery Hung <ameryhung@gmail.com>
> > ---
> >  include/uapi/linux/bpf.h | 4 +++-
> >  kernel/bpf/verifier.c    | 3 ++-
> >  2 files changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index beac5cdf2d2c..2b1335fa1173 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -5562,7 +5562,9 @@ union bpf_attr {
> >   *     Description
> >   *             Get a dynptr to local memory *data*.
> >   *
> > - *             *data* must be a ptr to a map value.
> > + *             *data* must be a ptr to valid local memory such as a map value, a uptr,
> > + *             a null-checked non-void pointer pass to a global subprogram, and allocated
> > + *             memory returned by a kfunc such as hid_bpf_get_data(),
> >   *             The maximum *size* supported is DYNPTR_MAX_SIZE.
> >   *             *flags* is currently unused.
> >   *     Return
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 22c4edc8695c..d22310d1642c 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -11307,7 +11307,8 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> >                 }
> >                 break;
> >         case BPF_FUNC_dynptr_from_mem:
> > -               if (regs[BPF_REG_1].type != PTR_TO_MAP_VALUE) {
> > +               if (regs[BPF_REG_1].type != PTR_TO_MAP_VALUE &&
> > +                   regs[BPF_REG_1].type != PTR_TO_MEM) {
> >                         verbose(env, "Unsupported reg type %s for bpf_dynptr_from_mem data\n",
> >                                 reg_type_str(env, regs[BPF_REG_1].type));
> >                         return -EACCES;
> > --
> > 2.47.1
> >

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 1/4] bpf: Allow creating dynptr from uptr
  2025-03-20 23:20     ` Amery Hung
@ 2025-03-28 18:59       ` Andrii Nakryiko
  0 siblings, 0 replies; 8+ messages in thread
From: Andrii Nakryiko @ 2025-03-28 18:59 UTC (permalink / raw)
  To: Amery Hung
  Cc: bpf, daniel, andrii, alexei.starovoitov, martin.lau, kernel-team

On Thu, Mar 20, 2025 at 4:21 PM Amery Hung <ameryhung@gmail.com> wrote:
>
> On Thu, Mar 20, 2025 at 3:45 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Mar 20, 2025 at 2:41 PM Amery Hung <ameryhung@gmail.com> wrote:
> > >
> > > Currently, bpf_dynptr_from_mem() only allows creating dynptr from local
> > > memory of reg type PTR_TO_MAP_VALUE, specifically ringbuf. This patch
> > > futher supports PTR_TO_MEM as a valid source of data.
> > >
> > > For a reg to be PTR_TO_MEM in the verifier:
> > >  - read map value with special field BPF_UPTR
> > >  - ld_imm64 kfunc (MEM_RDONLY)
> > >  - ld_imm64 other non-struct ksyms (MEM_RDONLY)
> > >  - return from helper with RET_PTR_TO_MEM: ringbuf_reserve (MEM_RINGBUF)
> > >    and dynptr_from_data
> > >  - return from helper with RET_PTR_TO_MEM_OR_BTF_ID: this_cpu_ptr,
> > >    per_cpu_ptr and the return type is not struct (both MEM_RDONLY)
> > >  - return from special kfunc: dynptr_slice (MEM_RDONLY), dynptr_slice_rdwr
> > >  - return from non-special kfunc that returns non-struct pointer:
> > >    hid_bpf_get_data
> > >
> > > Since this patch only allows PTR_TO_MEM without any flags, so only uptr,
> > > global subprog argument, non-special kfunc that returns non-struct ptr,
> > > return of bpf_dynptr_slice_rdwr() and bpf_dynptr_slice_rdwr() will be allowed
> > > additionally.
> > >
> > > The last two will allow creating dynptr from dynptr data. Will they create
> > > any problem?
> >
> > Yes, I think so. You need to make sure that dynptr you created from
> > that PTR_TO_MEM is invalidated if that memory "goes away". E.g., for
> > ringbuf case:
> >
> > void *r = bpf_ringbuf_reserve(..., 100);
> >
> > struct dynptr d;
> > bpf_dynptr_from_mem(r, 100, 0, &d);
> >
>
> ^ This will fail during verification because "r" will be PTR_TO_MEM |
> MEM_RINGBUF.

We discussed all this at LSFMMBPF2025, but for those who follow and
didn't attend, we can just slightly modify the example by adding one
extra bpf_dynptr_slice() constructed from r:


void *r = bpf_ringbuf_reserve(..., 100);
void *m = bpf_dynpt_slice(&r, ...);
if (!m) return 0; /* shouldn't happen */

struct dynptr d;
bpf_dynptr_from_mem(m, 100, 0, &d);

void *p = bpf_dynptr_data(&d, 0, 100);
if (!p) return 0; /* can't happen */

bpf_ringbuf_submit(r, 0);

*(char *)p = '\0'; /* bad things happen */


And we can keep building this long chain of dependencies. So we'd need
to take all that into account and propagate invalidation across entire
tree of dependencies between dynptrs and slices.

>
> Only five of the listed PTR_TO_MEM cases will be allowed with this
> patch additionally: uptr, global subprog argument, hid_bpf_get_data,
> bpf_dynptr_ptr_data and bpf_dynptr_slice_rdwr. For the former three,
> the memory seems to be valid all the time. For the last two, IIUC,
> bpf_dynptr_data or bpf_dynptr_slice_rdwr should be valid if null
> checks pass. I am just so not sure about the nested situation (i.e.,
> creating another dynptr from data behind a dynptr).
>
> Thanks,
> Amery
>
> > void *p = bpf_dynptr_data(&d, 0, 100);
> > if (!p) return 0; /* can't happen */
> >
> > bpf_ringbuf_submit(r, 0);
> >
> >
> > *(char *)p = '\0'; /* bad things happen */
> >
> >
> > Do you handle that situation? With PTR_TO_MAP_VALUE "bad things" can't
> > happen even if value is actually deleted/reused (besides overwriting
> > some other element's value, which we can do without dynptrs anyways),
> > because that memory won't go away due to RCU and it doesn't contain
> > any information important for correctness (ringbuf data area does have
> > it).
> >
> >
> > >
> > > Signed-off-by: Amery Hung <ameryhung@gmail.com>
> > > ---
> > >  include/uapi/linux/bpf.h | 4 +++-
> > >  kernel/bpf/verifier.c    | 3 ++-
> > >  2 files changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index beac5cdf2d2c..2b1335fa1173 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -5562,7 +5562,9 @@ union bpf_attr {
> > >   *     Description
> > >   *             Get a dynptr to local memory *data*.
> > >   *
> > > - *             *data* must be a ptr to a map value.
> > > + *             *data* must be a ptr to valid local memory such as a map value, a uptr,
> > > + *             a null-checked non-void pointer pass to a global subprogram, and allocated
> > > + *             memory returned by a kfunc such as hid_bpf_get_data(),
> > >   *             The maximum *size* supported is DYNPTR_MAX_SIZE.
> > >   *             *flags* is currently unused.
> > >   *     Return
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index 22c4edc8695c..d22310d1642c 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -11307,7 +11307,8 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> > >                 }
> > >                 break;
> > >         case BPF_FUNC_dynptr_from_mem:
> > > -               if (regs[BPF_REG_1].type != PTR_TO_MAP_VALUE) {
> > > +               if (regs[BPF_REG_1].type != PTR_TO_MAP_VALUE &&
> > > +                   regs[BPF_REG_1].type != PTR_TO_MEM) {
> > >                         verbose(env, "Unsupported reg type %s for bpf_dynptr_from_mem data\n",
> > >                                 reg_type_str(env, regs[BPF_REG_1].type));
> > >                         return -EACCES;
> > > --
> > > 2.47.1
> > >

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-03-28 18:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-20 21:40 [RFC PATCH 0/4] uptr KV store Amery Hung
2025-03-20 21:40 ` [RFC PATCH 1/4] bpf: Allow creating dynptr from uptr Amery Hung
2025-03-20 22:45   ` Andrii Nakryiko
2025-03-20 23:20     ` Amery Hung
2025-03-28 18:59       ` Andrii Nakryiko
2025-03-20 21:40 ` [RFC PATCH 2/4] selftests/bpf: Implement basic uptr KV store Amery Hung
2025-03-20 21:40 ` [RFC PATCH 3/4] selftests/bpf: Test basic uptr KV store operations from user space and bpf Amery Hung
2025-03-20 21:40 ` [RFC PATCH 4/4] selftests/bpf: Test changing KV store value layout Amery Hung

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox