public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/12] io_uring: add IORING_OP_BPF for extending io_uring
@ 2026-03-24 16:37 Ming Lei
  2026-03-24 16:37 ` [PATCH V3 01/12] io_uring: make io_import_fixed() global Ming Lei
                   ` (11 more replies)
  0 siblings, 12 replies; 15+ messages in thread
From: Ming Lei @ 2026-03-24 16:37 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Caleb Sander Mateos, Akilesh Kailash, bpf, Xiao Ni,
	Alexei Starovoitov, Ming Lei

Hello,

Add IORING_OP_BPF for extending io_uring operations, follows typical cases:

- buffer registered zero copy [1]

Also there are some RAID like ublk servers which needs to generate data
parity in case of ublk zero copy

- extend io_uring operations from application

Easy to add one new syscall with IORING_OP_BPF

- extend 64 byte SQE

bpf map can store IO data conveniently

- communicate in IO chain

IORING_OP_BPF can be used for communicate among IOs seamlessly without requiring
extra syscall

- pretty handy to inject error for test purpose

Any comments & feedback are welcome!


[1] lpc2024: ublk based zero copy I/O - use case in Android

https://lpc.events/event/18/contributions/1710/attachments/1440/3070/LPC2024_ublk_zero_copy.pdf

V3:
	- add per-buffer iterator kfuncs for addressing Caleb's concern about
	  complicated kfunc interface, kernel mapped buffer produced by the
	  iterator can be read/written from bpf prog in zero copy style

	- rename to bpf_ext

	- change tests to copy between uring_buf and arena, which may match
	  with raid's use case

	- misc cleanup


V2:
	- per-ring struct ops (Stefan Metzmacher, Caleb Sander Mateos)
	- refactor io_import_fixed()/io_prep_reg_iovec()/io_import_reg_vec()
	  for allowing to handle multiple buffers for single request
	- kernel selftests
	- all kinds of comments from Caleb Sander Mateos
	- support vectored and registered vector buffer



Ming Lei (12):
  io_uring: make io_import_fixed() global
  io_uring: refactor io_prep_reg_iovec() for BPF kfunc use
  io_uring: refactor io_import_reg_vec() for BPF kfunc use
  io_uring: prepare for extending io_uring with bpf
  io_uring: bpf: extend io_uring with bpf struct_ops
  io_uring: bpf: implement struct_ops registration
  io_uring: bpf: add BPF buffer descriptor for IORING_OP_BPF
  io_uring: bpf: add per-buffer iterator kfuncs
  bpf: add bpf_uring_buf_dynptr to special_kfunc_list
  selftests/io_uring: add io_uring_unregister_buffers()
  selftests/io_uring: add BPF struct_ops and kfunc tests
  selftests/io_uring: add buffer iterator selftest with BPF arena

 include/linux/io_uring_types.h                |  14 +-
 include/uapi/linux/io_uring.h                 |  40 +
 init/Kconfig                                  |   7 +
 io_uring/Makefile                             |   1 +
 io_uring/bpf-ops.c                            |   7 +-
 io_uring/bpf_ext.c                            | 755 ++++++++++++++++++
 io_uring/bpf_ext.h                            |  71 ++
 io_uring/io_uring.c                           |   9 +-
 io_uring/io_uring.h                           |   6 +-
 io_uring/opdef.c                              |  16 +
 io_uring/rsrc.c                               |  46 +-
 io_uring/rsrc.h                               |  68 +-
 kernel/bpf/verifier.c                         |  12 +
 tools/include/io_uring/mini_liburing.h        |  10 +
 tools/testing/selftests/Makefile              |   3 +-
 tools/testing/selftests/io_uring/.gitignore   |   2 +
 tools/testing/selftests/io_uring/Makefile     | 173 ++++
 .../selftests/io_uring/bpf_ext_basic.bpf.c    |  94 +++
 .../selftests/io_uring/bpf_ext_basic.c        | 215 +++++
 .../selftests/io_uring/bpf_ext_memcpy.bpf.c   | 305 +++++++
 .../selftests/io_uring/bpf_ext_memcpy.c       | 517 ++++++++++++
 .../io_uring/include/bpf_ext_memcpy_defs.h    |  18 +
 .../selftests/io_uring/include/iou_test.h     |  98 +++
 tools/testing/selftests/io_uring/runner.c     | 206 +++++
 24 files changed, 2646 insertions(+), 47 deletions(-)
 create mode 100644 io_uring/bpf_ext.c
 create mode 100644 io_uring/bpf_ext.h
 create mode 100644 tools/testing/selftests/io_uring/.gitignore
 create mode 100644 tools/testing/selftests/io_uring/Makefile
 create mode 100644 tools/testing/selftests/io_uring/bpf_ext_basic.bpf.c
 create mode 100644 tools/testing/selftests/io_uring/bpf_ext_basic.c
 create mode 100644 tools/testing/selftests/io_uring/bpf_ext_memcpy.bpf.c
 create mode 100644 tools/testing/selftests/io_uring/bpf_ext_memcpy.c
 create mode 100644 tools/testing/selftests/io_uring/include/bpf_ext_memcpy_defs.h
 create mode 100644 tools/testing/selftests/io_uring/include/iou_test.h
 create mode 100644 tools/testing/selftests/io_uring/runner.c

-- 
2.53.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH V3 01/12] io_uring: make io_import_fixed() global
  2026-03-24 16:37 [PATCH v3 0/12] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
@ 2026-03-24 16:37 ` Ming Lei
  2026-03-24 16:37 ` [PATCH V3 02/12] io_uring: refactor io_prep_reg_iovec() for BPF kfunc use Ming Lei
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2026-03-24 16:37 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Caleb Sander Mateos, Akilesh Kailash, bpf, Xiao Ni,
	Alexei Starovoitov, Ming Lei

Refactor buffer import functions:
- Make io_import_fixed() global so BPF kfuncs can use it directly
- Make io_import_reg_buf() static inline in rsrc.h

This allows BPF kfuncs to import buffers without associating them
with a request, useful when one request has multiple buffers.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 io_uring/rsrc.c | 17 +++--------------
 io_uring/rsrc.h | 18 +++++++++++++++---
 2 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 52554ed89b11..d65d3b4149f8 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1046,9 +1046,9 @@ static int io_import_kbuf(int ddir, struct iov_iter *iter,
 	return 0;
 }
 
-static int io_import_fixed(int ddir, struct iov_iter *iter,
-			   struct io_mapped_ubuf *imu,
-			   u64 buf_addr, size_t len)
+int io_import_fixed(int ddir, struct iov_iter *iter,
+		    struct io_mapped_ubuf *imu,
+		    u64 buf_addr, size_t len)
 {
 	const struct bio_vec *bvec;
 	size_t folio_mask;
@@ -1117,17 +1117,6 @@ inline struct io_rsrc_node *io_find_buf_node(struct io_kiocb *req,
 	return NULL;
 }
 
-int io_import_reg_buf(struct io_kiocb *req, struct iov_iter *iter,
-			u64 buf_addr, size_t len, int ddir,
-			unsigned issue_flags)
-{
-	struct io_rsrc_node *node;
-
-	node = io_find_buf_node(req, issue_flags);
-	if (!node)
-		return -EFAULT;
-	return io_import_fixed(ddir, iter, node->buf, buf_addr, len);
-}
 
 /* Lock two rings at once. The rings must be different! */
 static void lock_two_rings(struct io_ring_ctx *ctx1, struct io_ring_ctx *ctx2)
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index cff0f8834c35..cc68cba919ab 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -65,9 +65,21 @@ int io_rsrc_data_alloc(struct io_rsrc_data *data, unsigned nr);
 
 struct io_rsrc_node *io_find_buf_node(struct io_kiocb *req,
 				      unsigned issue_flags);
-int io_import_reg_buf(struct io_kiocb *req, struct iov_iter *iter,
-			u64 buf_addr, size_t len, int ddir,
-			unsigned issue_flags);
+int io_import_fixed(int ddir, struct iov_iter *iter,
+		    struct io_mapped_ubuf *imu,
+		    u64 buf_addr, size_t len);
+
+static inline int io_import_reg_buf(struct io_kiocb *req, struct iov_iter *iter,
+				    u64 buf_addr, size_t len, int ddir,
+				    unsigned issue_flags)
+{
+	struct io_rsrc_node *node;
+
+	node = io_find_buf_node(req, issue_flags);
+	if (!node)
+		return -EFAULT;
+	return io_import_fixed(ddir, iter, node->buf, buf_addr, len);
+}
 int io_import_reg_vec(int ddir, struct iov_iter *iter,
 			struct io_kiocb *req, struct iou_vec *vec,
 			unsigned nr_iovs, unsigned issue_flags);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V3 02/12] io_uring: refactor io_prep_reg_iovec() for BPF kfunc use
  2026-03-24 16:37 [PATCH v3 0/12] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
  2026-03-24 16:37 ` [PATCH V3 01/12] io_uring: make io_import_fixed() global Ming Lei
@ 2026-03-24 16:37 ` Ming Lei
  2026-03-24 16:37 ` [PATCH V3 03/12] io_uring: refactor io_import_reg_vec() " Ming Lei
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2026-03-24 16:37 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Caleb Sander Mateos, Akilesh Kailash, bpf, Xiao Ni,
	Alexei Starovoitov, Ming Lei

Split io_prep_reg_iovec() into:
- __io_prep_reg_iovec(): core logic without request association
- io_prep_reg_iovec(): inline wrapper handling request flags

The core function takes explicit 'compat' and 'need_clean' parameters
instead of accessing req directly. This allows BPF kfuncs to prepare
vectored buffers without request association, enabling support for
multiple buffers per request.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 io_uring/rsrc.c | 11 +++++------
 io_uring/rsrc.h | 21 +++++++++++++++++++--
 2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index d65d3b4149f8..0b3e4cb5e879 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1517,8 +1517,8 @@ int io_import_reg_vec(int ddir, struct iov_iter *iter,
 	return io_vec_fill_bvec(ddir, iter, imu, iov, nr_iovs, vec);
 }
 
-int io_prep_reg_iovec(struct io_kiocb *req, struct iou_vec *iv,
-		      const struct iovec __user *uvec, size_t uvec_segs)
+int __io_prep_reg_iovec(struct iou_vec *iv, const struct iovec __user *uvec,
+			size_t uvec_segs, bool compat, bool *need_clean)
 {
 	struct iovec *iov;
 	int iovec_off, ret;
@@ -1528,17 +1528,16 @@ int io_prep_reg_iovec(struct io_kiocb *req, struct iou_vec *iv,
 		ret = io_vec_realloc(iv, uvec_segs);
 		if (ret)
 			return ret;
-		req->flags |= REQ_F_NEED_CLEANUP;
+		if (need_clean)
+			*need_clean = true;
 	}
 
 	/* pad iovec to the right */
 	iovec_off = iv->nr - uvec_segs;
 	iov = iv->iovec + iovec_off;
-	res = iovec_from_user(uvec, uvec_segs, uvec_segs, iov,
-			      io_is_compat(req->ctx));
+	res = iovec_from_user(uvec, uvec_segs, uvec_segs, iov, compat);
 	if (IS_ERR(res))
 		return PTR_ERR(res);
 
-	req->flags |= REQ_F_IMPORT_BUFFER;
 	return 0;
 }
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index cc68cba919ab..c2c729a9f568 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -4,6 +4,7 @@
 
 #include <linux/io_uring_types.h>
 #include <linux/lockdep.h>
+#include "io_uring.h"
 
 #define IO_VEC_CACHE_SOFT_CAP		256
 
@@ -83,8 +84,24 @@ static inline int io_import_reg_buf(struct io_kiocb *req, struct iov_iter *iter,
 int io_import_reg_vec(int ddir, struct iov_iter *iter,
 			struct io_kiocb *req, struct iou_vec *vec,
 			unsigned nr_iovs, unsigned issue_flags);
-int io_prep_reg_iovec(struct io_kiocb *req, struct iou_vec *iv,
-			const struct iovec __user *uvec, size_t uvec_segs);
+int __io_prep_reg_iovec(struct iou_vec *iv, const struct iovec __user *uvec,
+			size_t uvec_segs, bool compat, bool *need_clean);
+
+static inline int io_prep_reg_iovec(struct io_kiocb *req, struct iou_vec *iv,
+				    const struct iovec __user *uvec,
+				    size_t uvec_segs)
+{
+	bool need_clean = false;
+	int ret;
+
+	ret = __io_prep_reg_iovec(iv, uvec, uvec_segs,
+				  io_is_compat(req->ctx), &need_clean);
+	if (need_clean)
+		req->flags |= REQ_F_NEED_CLEANUP;
+	if (ret >= 0)
+		req->flags |= REQ_F_IMPORT_BUFFER;
+	return ret;
+}
 
 int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg);
 int io_sqe_buffers_unregister(struct io_ring_ctx *ctx);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V3 03/12] io_uring: refactor io_import_reg_vec() for BPF kfunc use
  2026-03-24 16:37 [PATCH v3 0/12] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
  2026-03-24 16:37 ` [PATCH V3 01/12] io_uring: make io_import_fixed() global Ming Lei
  2026-03-24 16:37 ` [PATCH V3 02/12] io_uring: refactor io_prep_reg_iovec() for BPF kfunc use Ming Lei
@ 2026-03-24 16:37 ` Ming Lei
  2026-03-24 16:37 ` [PATCH V3 04/12] io_uring: prepare for extending io_uring with bpf Ming Lei
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2026-03-24 16:37 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Caleb Sander Mateos, Akilesh Kailash, bpf, Xiao Ni,
	Alexei Starovoitov, Ming Lei

Split io_import_reg_vec() into:
- __io_import_reg_vec(): core logic taking io_mapped_ubuf directly
- io_import_reg_vec(): inline wrapper handling buffer lookup and
  request flags

The core function takes 'imu' and 'need_clean' parameters instead of
accessing req directly. This allows BPF kfuncs to import vectored
buffers without request association, enabling support for multiple
buffers per request.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 io_uring/rsrc.c | 18 +++++-------------
 io_uring/rsrc.h | 29 ++++++++++++++++++++++++++---
 2 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 0b3e4cb5e879..49d9f0b05e7c 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1453,23 +1453,14 @@ static int io_kern_bvec_size(struct iovec *iov, unsigned nr_iovs,
 	return 0;
 }
 
-int io_import_reg_vec(int ddir, struct iov_iter *iter,
-			struct io_kiocb *req, struct iou_vec *vec,
-			unsigned nr_iovs, unsigned issue_flags)
+int __io_import_reg_vec(int ddir, struct iov_iter *iter,
+			struct io_mapped_ubuf *imu, struct iou_vec *vec,
+			unsigned nr_iovs, bool *need_clean)
 {
-	struct io_rsrc_node *node;
-	struct io_mapped_ubuf *imu;
 	unsigned iovec_off;
 	struct iovec *iov;
 	unsigned nr_segs;
 
-	node = io_find_buf_node(req, issue_flags);
-	if (!node)
-		return -EFAULT;
-	imu = node->buf;
-	if (!(imu->dir & (1 << ddir)))
-		return -EFAULT;
-
 	iovec_off = vec->nr - nr_iovs;
 	iov = vec->iovec + iovec_off;
 
@@ -1508,7 +1499,8 @@ int io_import_reg_vec(int ddir, struct iov_iter *iter,
 
 		*vec = tmp_vec;
 		iov = vec->iovec + iovec_off;
-		req->flags |= REQ_F_NEED_CLEANUP;
+		if (need_clean)
+			*need_clean = true;
 	}
 
 	if (imu->flags & IO_REGBUF_F_KBUF)
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index c2c729a9f568..87ee5a76cbfc 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -81,9 +81,32 @@ static inline int io_import_reg_buf(struct io_kiocb *req, struct iov_iter *iter,
 		return -EFAULT;
 	return io_import_fixed(ddir, iter, node->buf, buf_addr, len);
 }
-int io_import_reg_vec(int ddir, struct iov_iter *iter,
-			struct io_kiocb *req, struct iou_vec *vec,
-			unsigned nr_iovs, unsigned issue_flags);
+int __io_import_reg_vec(int ddir, struct iov_iter *iter,
+			struct io_mapped_ubuf *imu, struct iou_vec *vec,
+			unsigned nr_iovs, bool *need_clean);
+
+static inline int io_import_reg_vec(int ddir, struct iov_iter *iter,
+				    struct io_kiocb *req, struct iou_vec *vec,
+				    unsigned nr_iovs, unsigned issue_flags)
+{
+	struct io_rsrc_node *node;
+	struct io_mapped_ubuf *imu;
+	bool need_clean = false;
+	int ret;
+
+	node = io_find_buf_node(req, issue_flags);
+	if (!node)
+		return -EFAULT;
+	imu = node->buf;
+	if (!(imu->dir & (1 << ddir)))
+		return -EFAULT;
+
+	ret = __io_import_reg_vec(ddir, iter, imu, vec, nr_iovs, &need_clean);
+	if (need_clean)
+		req->flags |= REQ_F_NEED_CLEANUP;
+	return ret;
+}
+
 int __io_prep_reg_iovec(struct iou_vec *iv, const struct iovec __user *uvec,
 			size_t uvec_segs, bool compat, bool *need_clean);
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V3 04/12] io_uring: prepare for extending io_uring with bpf
  2026-03-24 16:37 [PATCH v3 0/12] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
                   ` (2 preceding siblings ...)
  2026-03-24 16:37 ` [PATCH V3 03/12] io_uring: refactor io_import_reg_vec() " Ming Lei
@ 2026-03-24 16:37 ` Ming Lei
  2026-03-24 16:37 ` [PATCH V3 05/12] io_uring: bpf: extend io_uring with bpf struct_ops Ming Lei
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2026-03-24 16:37 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Caleb Sander Mateos, Akilesh Kailash, bpf, Xiao Ni,
	Alexei Starovoitov, Ming Lei

Add one bpf operation & related framework and prepare for extending io_uring
with bpf.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/uapi/linux/io_uring.h |  1 +
 init/Kconfig                  |  7 +++++++
 io_uring/Makefile             |  1 +
 io_uring/bpf_ext.c            | 26 ++++++++++++++++++++++++++
 io_uring/bpf_ext.h            | 15 +++++++++++++++
 io_uring/opdef.c              | 16 ++++++++++++++++
 6 files changed, 66 insertions(+)
 create mode 100644 io_uring/bpf_ext.c
 create mode 100644 io_uring/bpf_ext.h

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index c10c4827f6b0..cb1e888761c3 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -318,6 +318,7 @@ enum io_uring_op {
 	IORING_OP_PIPE,
 	IORING_OP_NOP128,
 	IORING_OP_URING_CMD128,
+	IORING_OP_BPF,
 
 	/* this goes last, obviously */
 	IORING_OP_LAST,
diff --git a/init/Kconfig b/init/Kconfig
index 444ce811ea67..c4587cf46765 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1884,6 +1884,13 @@ config IO_URING
 	  applications to submit and complete IO through submission and
 	  completion rings that are shared between the kernel and application.
 
+config IO_URING_BPF_EXT
+	bool "Enable IO uring bpf extension" if EXPERT
+	depends on IO_URING && BPF
+	help
+	  This option enables bpf extension for the io_uring interface, so
+	  application can define its own io_uring operation by bpf program.
+
 config GCOV_PROFILE_URING
 	bool "Enable GCOV profiling on the io_uring subsystem"
 	depends on IO_URING && GCOV_KERNEL
diff --git a/io_uring/Makefile b/io_uring/Makefile
index c54e328d1410..f540ba5d777e 100644
--- a/io_uring/Makefile
+++ b/io_uring/Makefile
@@ -26,3 +26,4 @@ obj-$(CONFIG_PROC_FS) += fdinfo.o
 obj-$(CONFIG_IO_URING_MOCK_FILE) += mock_file.o
 obj-$(CONFIG_IO_URING_BPF) += bpf_filter.o
 obj-$(CONFIG_IO_URING_BPF_OPS) += bpf-ops.o
+obj-$(CONFIG_IO_URING_BPF_EXT)	+= bpf_ext.o
diff --git a/io_uring/bpf_ext.c b/io_uring/bpf_ext.c
new file mode 100644
index 000000000000..146f70054c0a
--- /dev/null
+++ b/io_uring/bpf_ext.c
@@ -0,0 +1,26 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 Red Hat */
+
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <uapi/linux/io_uring.h>
+#include "io_uring.h"
+#include "bpf_ext.h"
+
+int io_uring_bpf_issue(struct io_kiocb *req, unsigned int issue_flags)
+{
+	return -EOPNOTSUPP;
+}
+
+int io_uring_bpf_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+	return -EOPNOTSUPP;
+}
+
+void io_uring_bpf_fail(struct io_kiocb *req)
+{
+}
+
+void io_uring_bpf_cleanup(struct io_kiocb *req)
+{
+}
diff --git a/io_uring/bpf_ext.h b/io_uring/bpf_ext.h
new file mode 100644
index 000000000000..179530ce865b
--- /dev/null
+++ b/io_uring/bpf_ext.h
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef IOU_BPF_OP_H
+#define IOU_BPF_OP_H
+
+struct io_kiocb;
+struct io_uring_sqe;
+
+#ifdef CONFIG_IO_URING_BPF_EXT
+int io_uring_bpf_issue(struct io_kiocb *req, unsigned int issue_flags);
+int io_uring_bpf_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
+void io_uring_bpf_fail(struct io_kiocb *req);
+void io_uring_bpf_cleanup(struct io_kiocb *req);
+#endif
+
+#endif
diff --git a/io_uring/opdef.c b/io_uring/opdef.c
index c3ef52b70811..451500c61917 100644
--- a/io_uring/opdef.c
+++ b/io_uring/opdef.c
@@ -38,6 +38,7 @@
 #include "futex.h"
 #include "truncate.h"
 #include "zcrx.h"
+#include "bpf_ext.h"
 
 static int io_no_issue(struct io_kiocb *req, unsigned int issue_flags)
 {
@@ -589,6 +590,14 @@ const struct io_issue_def io_issue_defs[] = {
 		.prep			= io_uring_cmd_prep,
 		.issue			= io_uring_cmd,
 	},
+	[IORING_OP_BPF] = {
+#if defined(CONFIG_IO_URING_BPF_EXT)
+		.prep			= io_uring_bpf_prep,
+		.issue			= io_uring_bpf_issue,
+#else
+		.prep			= io_eopnotsupp_prep,
+#endif
+	},
 };
 
 const struct io_cold_def io_cold_defs[] = {
@@ -847,6 +856,13 @@ const struct io_cold_def io_cold_defs[] = {
 		.sqe_copy		= io_uring_cmd_sqe_copy,
 		.cleanup		= io_uring_cmd_cleanup,
 	},
+	[IORING_OP_BPF] = {
+		.name			= "BPF",
+#if defined(CONFIG_IO_URING_BPF_EXT)
+		.cleanup		= io_uring_bpf_cleanup,
+		.fail			= io_uring_bpf_fail,
+#endif
+	},
 };
 
 const char *io_uring_get_opcode(u8 opcode)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V3 05/12] io_uring: bpf: extend io_uring with bpf struct_ops
  2026-03-24 16:37 [PATCH v3 0/12] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
                   ` (3 preceding siblings ...)
  2026-03-24 16:37 ` [PATCH V3 04/12] io_uring: prepare for extending io_uring with bpf Ming Lei
@ 2026-03-24 16:37 ` Ming Lei
  2026-03-26  1:49   ` Jens Axboe
  2026-03-26  2:09   ` Jens Axboe
  2026-03-24 16:37 ` [PATCH V3 06/12] io_uring: bpf: implement struct_ops registration Ming Lei
                   ` (6 subsequent siblings)
  11 siblings, 2 replies; 15+ messages in thread
From: Ming Lei @ 2026-03-24 16:37 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Caleb Sander Mateos, Akilesh Kailash, bpf, Xiao Ni,
	Alexei Starovoitov, Ming Lei

io_uring can be extended with bpf struct_ops in the following ways:

1) add new io_uring operation from application
- one typical use case is for operating device zero-copy buffer, which
belongs to kernel, and not visible or too expensive to export to
userspace, such as supporting copy data from this buffer to userspace,
decompressing data to zero-copy buffer in Android case[1][2], or
checksum/decrypting.

[1] https://lpc.events/event/18/contributions/1710/attachments/1440/3070/LPC2024_ublk_zero_copy.pdf

2) extend 64 byte SQE, since bpf map can be used to store IO data
   conveniently

3) communicate in IO chain, since bpf map can be shared among IOs,
when one bpf IO is completed, data can be written to IO chain wide
bpf map, then the following bpf IO can retrieve the data from this bpf
map, this way is more flexible than io_uring built-in buffer

4) pretty handy to inject error for test purpose

bpf struct_ops is one very handy way to attach bpf prog with kernel, and
this patch simply wires existed io_uring operation callbacks with added
uring bpf struct_ops, so application can define its own uring bpf
operations.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/linux/io_uring_types.h |  12 +-
 include/uapi/linux/io_uring.h  |  12 ++
 io_uring/bpf-ops.c             |   7 +-
 io_uring/bpf_ext.c             | 234 ++++++++++++++++++++++++++++++++-
 io_uring/bpf_ext.h             |  41 ++++++
 io_uring/io_uring.c            |   9 +-
 io_uring/io_uring.h            |   6 +-
 7 files changed, 314 insertions(+), 7 deletions(-)

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 328c3c1e2a31..3a558da86f83 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -10,6 +10,7 @@
 
 struct iou_loop_params;
 struct io_uring_bpf_ops;
+struct uring_bpf_ops_kern;
 
 enum {
 	/*
@@ -493,7 +494,16 @@ struct io_ring_ctx {
 	DECLARE_HASHTABLE(napi_ht, 4);
 #endif
 
-	struct io_uring_bpf_ops		*bpf_ops;
+	/*
+	 * bpf_ops and bpf_ext_ops are mutually exclusive: bpf_ops is used
+	 * for io_uring_bpf_ops struct_ops, while bpf_ext_ops provides
+	 * per-opcode BPF extension operations (IORING_SETUP_BPF_EXT).
+	 * The two cannot be active at the same time on the same ring.
+	 */
+	union {
+		struct io_uring_bpf_ops		*bpf_ops;
+		struct uring_bpf_ops_kern	*bpf_ext_ops;
+	};
 
 	/*
 	 * Protection for resize vs mmap races - both the mmap and resize
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index cb1e888761c3..3bf9be78a00a 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -76,6 +76,7 @@ struct io_uring_sqe {
 		__u32		install_fd_flags;
 		__u32		nop_flags;
 		__u32		pipe_flags;
+		__u32		bpf_op_flags;
 	};
 	__u64	user_data;	/* data to be passed back at completion time */
 	/* pack this to avoid bogus arm OABI complaints */
@@ -252,6 +253,9 @@ enum io_uring_sqe_flags_bit {
  */
 #define IORING_SETUP_SQ_REWIND		(1U << 20)
 
+/* Allow userspace to define io_uring operation by BPF prog */
+#define IORING_SETUP_BPF_EXT		(1U << 21)
+
 enum io_uring_op {
 	IORING_OP_NOP,
 	IORING_OP_READV,
@@ -442,6 +446,13 @@ enum io_uring_op {
 #define IORING_RECVSEND_BUNDLE		(1U << 4)
 #define IORING_SEND_VECTORIZED		(1U << 5)
 
+/*
+ * sqe->bpf_op_flags		top 8bits is for storing bpf prog sub op
+ *				The other 24bits are used for bpf prog
+ */
+#define IORING_BPF_OP_BITS	8
+#define IORING_BPF_OP_SHIFT	24
+
 /*
  * cqe.res for IORING_CQE_F_NOTIF if
  * IORING_SEND_ZC_REPORT_USAGE was requested
@@ -646,6 +657,7 @@ struct io_uring_params {
 #define IORING_FEAT_MIN_TIMEOUT		(1U << 15)
 #define IORING_FEAT_RW_ATTR		(1U << 16)
 #define IORING_FEAT_NO_IOWAIT		(1U << 17)
+#define IORING_FEAT_BPF			(1U << 18)
 
 /*
  * io_uring_register(2) opcodes and arguments
diff --git a/io_uring/bpf-ops.c b/io_uring/bpf-ops.c
index e4b244337aa9..e91c6964405c 100644
--- a/io_uring/bpf-ops.c
+++ b/io_uring/bpf-ops.c
@@ -162,7 +162,6 @@ static int io_install_bpf(struct io_ring_ctx *ctx, struct io_uring_bpf_ops *ops)
 		return -EOPNOTSUPP;
 	if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN))
 		return -EOPNOTSUPP;
-
 	if (ctx->bpf_ops)
 		return -EBUSY;
 	if (WARN_ON_ONCE(!ops->loop_step))
@@ -186,6 +185,12 @@ static int bpf_io_reg(void *kdata, struct bpf_link *link)
 		return PTR_ERR(file);
 	ctx = file->private_data;
 
+	/* bpf_ops and bpf_ext_ops share storage and are mutually exclusive */
+	if (ctx->flags & IORING_SETUP_BPF_EXT) {
+		fput(file);
+		return -EINVAL;
+	}
+
 	scoped_guard(mutex, &io_bpf_ctrl_mutex) {
 		guard(mutex)(&ctx->uring_lock);
 		ret = io_install_bpf(ctx, ops);
diff --git a/io_uring/bpf_ext.c b/io_uring/bpf_ext.c
index 146f70054c0a..e2151cc7f9f5 100644
--- a/io_uring/bpf_ext.c
+++ b/io_uring/bpf_ext.c
@@ -3,24 +3,254 @@
 
 #include <linux/kernel.h>
 #include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/bpf_verifier.h>
+#include <linux/bpf.h>
+#include <linux/btf.h>
+#include <linux/btf_ids.h>
+#include <linux/filter.h>
 #include <uapi/linux/io_uring.h>
 #include "io_uring.h"
 #include "bpf_ext.h"
 
-int io_uring_bpf_issue(struct io_kiocb *req, unsigned int issue_flags)
+static inline unsigned char uring_bpf_get_op(u32 op_flags)
 {
-	return -EOPNOTSUPP;
+	return (unsigned char)(op_flags >> IORING_BPF_OP_SHIFT);
+}
+
+static inline unsigned int uring_bpf_get_flags(u32 op_flags)
+{
+	return op_flags & ((1U << IORING_BPF_OP_SHIFT) - 1);
 }
 
 int io_uring_bpf_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
+	struct uring_bpf_data *data = io_kiocb_to_cmd(req, struct uring_bpf_data);
+	u32 opf = READ_ONCE(sqe->bpf_op_flags);
+	unsigned char bpf_op = uring_bpf_get_op(opf);
+	const struct uring_bpf_ops *ops;
+
+	if (unlikely(!(req->ctx->flags & IORING_SETUP_BPF_EXT)))
+		goto fail;
+
+	if (bpf_op >= IO_RING_MAX_BPF_OPS)
+		return -EINVAL;
+
+	ops = req->ctx->bpf_ext_ops[bpf_op].ops;
+	data->opf = opf;
+	data->ops = ops;
+	if (ops && ops->prep_fn)
+		return ops->prep_fn(data, sqe);
+fail:
 	return -EOPNOTSUPP;
 }
 
+static int __io_uring_bpf_issue(struct io_kiocb *req)
+{
+	struct uring_bpf_data *data = io_kiocb_to_cmd(req, struct uring_bpf_data);
+	const struct uring_bpf_ops *ops = data->ops;
+	int ret = 0;
+
+	if (ops && ops->issue_fn) {
+		ret = ops->issue_fn(data);
+		if (ret == IOU_ISSUE_SKIP_COMPLETE)
+			return -EINVAL;
+	}
+	return ret;
+}
+
+int io_uring_bpf_issue(struct io_kiocb *req, unsigned int issue_flags)
+{
+	return __io_uring_bpf_issue(req);
+}
+
 void io_uring_bpf_fail(struct io_kiocb *req)
 {
+	struct uring_bpf_data *data = io_kiocb_to_cmd(req, struct uring_bpf_data);
+	const struct uring_bpf_ops *ops = data->ops;
+
+	if (ops && ops->fail_fn)
+		ops->fail_fn(data);
 }
 
 void io_uring_bpf_cleanup(struct io_kiocb *req)
 {
+	struct uring_bpf_data *data = io_kiocb_to_cmd(req, struct uring_bpf_data);
+	const struct uring_bpf_ops *ops = data->ops;
+
+	if (ops && ops->cleanup_fn)
+		ops->cleanup_fn(data);
+}
+
+static const struct btf_type *uring_bpf_data_type;
+
+static int uring_bpf_ops_btf_struct_access(struct bpf_verifier_log *log,
+					const struct bpf_reg_state *reg,
+					int off, int size)
+{
+	const struct btf_type *t;
+
+	t = btf_type_by_id(reg->btf, reg->btf_id);
+	if (t != uring_bpf_data_type) {
+		bpf_log(log, "only read is supported\n");
+		return -EACCES;
+	}
+
+	if (off < offsetof(struct uring_bpf_data, pdu) ||
+			off + size > sizeof(struct uring_bpf_data))
+		return -EACCES;
+
+	return NOT_INIT;
+}
+
+static const struct bpf_verifier_ops io_bpf_verifier_ops = {
+	.get_func_proto = bpf_base_func_proto,
+	.is_valid_access = bpf_tracing_btf_ctx_access,
+	.btf_struct_access = uring_bpf_ops_btf_struct_access,
+};
+
+static int uring_bpf_ops_init(struct btf *btf)
+{
+	s32 type_id;
+
+	type_id = btf_find_by_name_kind(btf, "uring_bpf_data", BTF_KIND_STRUCT);
+	if (type_id < 0)
+		return -EINVAL;
+	uring_bpf_data_type = btf_type_by_id(btf, type_id);
+	return 0;
+}
+
+static int uring_bpf_ops_check_member(const struct btf_type *t,
+				   const struct btf_member *member,
+				   const struct bpf_prog *prog)
+{
+	/*
+	 * All io_uring BPF ops callbacks are called in non-sleepable
+	 * context, so reject sleepable BPF programs.
+	 */
+	if (prog->sleepable)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int uring_bpf_ops_init_member(const struct btf_type *t,
+				 const struct btf_member *member,
+				 void *kdata, const void *udata)
+{
+	const struct uring_bpf_ops *uuring_bpf_ops;
+	struct uring_bpf_ops *kuring_bpf_ops;
+	u32 moff;
+
+	uuring_bpf_ops = udata;
+	kuring_bpf_ops = kdata;
+
+	moff = __btf_member_bit_offset(t, member) / 8;
+
+	switch (moff) {
+	case offsetof(struct uring_bpf_ops, id):
+		/* For id, this function has to copy it and return 1 to
+		 * indicate that the data has been handled by the struct_ops
+		 * type, or the verifier will reject the map if the value of
+		 * those fields is not zero.
+		 */
+		kuring_bpf_ops->id = uuring_bpf_ops->id;
+		return 1;
+	}
+	return 0;
+}
+
+static int io_bpf_prep_io(struct uring_bpf_data *data, const struct io_uring_sqe *sqe)
+{
+	return 0;
+}
+
+static int io_bpf_issue_io(struct uring_bpf_data *data)
+{
+	return 0;
+}
+
+static void io_bpf_fail_io(struct uring_bpf_data *data)
+{
+}
+
+static void io_bpf_cleanup_io(struct uring_bpf_data *data)
+{
+}
+
+static struct uring_bpf_ops __bpf_uring_bpf_ops = {
+	.prep_fn	= io_bpf_prep_io,
+	.issue_fn	= io_bpf_issue_io,
+	.fail_fn	= io_bpf_fail_io,
+	.cleanup_fn	= io_bpf_cleanup_io,
+};
+
+static struct bpf_struct_ops bpf_uring_bpf_ops = {
+	.verifier_ops = &io_bpf_verifier_ops,
+	.init = uring_bpf_ops_init,
+	.check_member = uring_bpf_ops_check_member,
+	.init_member = uring_bpf_ops_init_member,
+	.name = "uring_bpf_ops",
+	.cfi_stubs = &__bpf_uring_bpf_ops,
+	.owner = THIS_MODULE,
+};
+
+__bpf_kfunc_start_defs();
+__bpf_kfunc void uring_bpf_set_result(struct uring_bpf_data *data, int res)
+{
+	struct io_kiocb *req = cmd_to_io_kiocb(data);
+
+	if (res < 0)
+		req_set_fail(req);
+	io_req_set_res(req, res, 0);
+}
+__bpf_kfunc_end_defs();
+
+BTF_KFUNCS_START(uring_bpf_kfuncs)
+BTF_ID_FLAGS(func, uring_bpf_set_result)
+BTF_KFUNCS_END(uring_bpf_kfuncs)
+
+static const struct btf_kfunc_id_set uring_kfunc_set = {
+	.owner = THIS_MODULE,
+	.set   = &uring_bpf_kfuncs,
+};
+
+int io_bpf_alloc(struct io_ring_ctx *ctx)
+{
+	if (!(ctx->flags & IORING_SETUP_BPF_EXT))
+		return 0;
+
+	ctx->bpf_ext_ops = kcalloc(IO_RING_MAX_BPF_OPS,
+			sizeof(struct uring_bpf_ops_kern), GFP_KERNEL);
+	if (!ctx->bpf_ext_ops)
+		return -ENOMEM;
+	return 0;
+}
+
+void io_bpf_free(struct io_ring_ctx *ctx)
+{
+	/* bpf_ops and bpf_ext_ops share storage; only free if bpf_ext is active */
+	if (!(ctx->flags & IORING_SETUP_BPF_EXT))
+		return;
+	kfree(ctx->bpf_ext_ops);
+	ctx->bpf_ext_ops = NULL;
+}
+
+static int __init io_bpf_init(void)
+{
+	int err;
+
+	err = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &uring_kfunc_set);
+	if (err) {
+		pr_warn("error while setting io_uring BPF kfuncs: %d\n", err);
+		return err;
+	}
+
+	err = register_bpf_struct_ops(&bpf_uring_bpf_ops, uring_bpf_ops);
+	if (err)
+		pr_warn("error while registering io_uring BPF struct ops: %d\n", err);
+
+	return err;
 }
+__initcall(io_bpf_init);
diff --git a/io_uring/bpf_ext.h b/io_uring/bpf_ext.h
index 179530ce865b..5a74f91bdcad 100644
--- a/io_uring/bpf_ext.h
+++ b/io_uring/bpf_ext.h
@@ -4,12 +4,53 @@
 
 struct io_kiocb;
 struct io_uring_sqe;
+struct uring_bpf_ops;
 
+/* Arbitrary limit, can be raised if need be */
+#define IO_RING_MAX_BPF_OPS 16
+
+struct uring_bpf_data {
+	void				*req_data;  /* not for bpf prog */
+	const struct uring_bpf_ops	*ops;
+	u32				opf;
+
+	/* writeable for bpf prog */
+	u8              pdu[64 - sizeof(void *) -
+		sizeof(struct uring_bpf_ops *) - sizeof(u32)];
+};
+
+typedef int (*uring_bpf_prep_t)(struct uring_bpf_data *data,
+				const struct io_uring_sqe *sqe);
+typedef int (*uring_bpf_issue_t)(struct uring_bpf_data *data);
+typedef void (*uring_bpf_fail_t)(struct uring_bpf_data *data);
+typedef void (*uring_bpf_cleanup_t)(struct uring_bpf_data *data);
+
+struct uring_bpf_ops {
+	unsigned short		id;
+	uring_bpf_prep_t	prep_fn;
+	uring_bpf_issue_t	issue_fn;
+	uring_bpf_fail_t	fail_fn;
+	uring_bpf_cleanup_t	cleanup_fn;
+};
+
+struct uring_bpf_ops_kern {
+	const struct uring_bpf_ops *ops;
+};
 #ifdef CONFIG_IO_URING_BPF_EXT
 int io_uring_bpf_issue(struct io_kiocb *req, unsigned int issue_flags);
 int io_uring_bpf_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 void io_uring_bpf_fail(struct io_kiocb *req);
 void io_uring_bpf_cleanup(struct io_kiocb *req);
+int io_bpf_alloc(struct io_ring_ctx *ctx);
+void io_bpf_free(struct io_ring_ctx *ctx);
+#else
+static inline int io_bpf_alloc(struct io_ring_ctx *ctx)
+{
+	return 0;
+}
+static inline void io_bpf_free(struct io_ring_ctx *ctx)
+{
+}
 #endif
 
 #endif
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 6eaa21e09469..15e9735af559 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -97,6 +97,7 @@
 #include "wait.h"
 #include "bpf_filter.h"
 #include "loop.h"
+#include "bpf_ext.h"
 
 #define SQE_COMMON_FLAGS (IOSQE_FIXED_FILE | IOSQE_IO_LINK | \
 			  IOSQE_IO_HARDLINK | IOSQE_ASYNC)
@@ -294,6 +295,9 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	io_napi_init(ctx);
 	mutex_init(&ctx->mmap_lock);
 
+	if (io_bpf_alloc(ctx))
+		goto free_ref;
+
 	return ctx;
 
 free_ref:
@@ -2150,7 +2154,9 @@ static __cold void io_req_caches_free(struct io_ring_ctx *ctx)
 
 static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
 {
-	io_unregister_bpf_ops(ctx);
+	/* bpf_ops and bpf_ext_ops share storage; skip if bpf_ext_ops is active */
+	if (!(ctx->flags & IORING_SETUP_BPF_EXT))
+		io_unregister_bpf_ops(ctx);
 	io_sq_thread_finish(ctx);
 
 	mutex_lock(&ctx->uring_lock);
@@ -2196,6 +2202,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
 	if (ctx->hash_map)
 		io_wq_put_hash(ctx->hash_map);
 	io_napi_free(ctx);
+	io_bpf_free(ctx);
 	kvfree(ctx->cancel_table.hbs);
 	xa_destroy(&ctx->io_bl_xa);
 	kfree(ctx);
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 91cf67b5d85b..1af33a89ed2f 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -49,7 +49,8 @@ struct io_ctx_config {
 			IORING_FEAT_RECVSEND_BUNDLE |\
 			IORING_FEAT_MIN_TIMEOUT |\
 			IORING_FEAT_RW_ATTR |\
-			IORING_FEAT_NO_IOWAIT)
+			IORING_FEAT_NO_IOWAIT |\
+			IORING_FEAT_BPF)
 
 #define IORING_SETUP_FLAGS (IORING_SETUP_IOPOLL |\
 			IORING_SETUP_SQPOLL |\
@@ -71,7 +72,8 @@ struct io_ctx_config {
 			IORING_SETUP_HYBRID_IOPOLL |\
 			IORING_SETUP_CQE_MIXED |\
 			IORING_SETUP_SQE_MIXED |\
-			IORING_SETUP_SQ_REWIND)
+			IORING_SETUP_SQ_REWIND |\
+			IORING_SETUP_BPF_EXT)
 
 #define IORING_ENTER_FLAGS (IORING_ENTER_GETEVENTS |\
 			IORING_ENTER_SQ_WAKEUP |\
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V3 06/12] io_uring: bpf: implement struct_ops registration
  2026-03-24 16:37 [PATCH v3 0/12] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
                   ` (4 preceding siblings ...)
  2026-03-24 16:37 ` [PATCH V3 05/12] io_uring: bpf: extend io_uring with bpf struct_ops Ming Lei
@ 2026-03-24 16:37 ` Ming Lei
  2026-03-24 16:37 ` [PATCH V3 07/12] io_uring: bpf: add BPF buffer descriptor for IORING_OP_BPF Ming Lei
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2026-03-24 16:37 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Caleb Sander Mateos, Akilesh Kailash, bpf, Xiao Ni,
	Alexei Starovoitov, Ming Lei

Complete the BPF struct_ops registration mechanism by implementing
refcount-based lifecycle management:

- Add refcount field to struct uring_bpf_ops_kern for tracking active
  requests
- Add wait_queue_head_t bpf_wq to struct io_ring_ctx for synchronizing
  unregistration with in-flight requests
- Implement io_bpf_reg_unreg() to handle registration (refcount=1) and
  unregistration (wait for in-flight requests to complete)
- Update io_uring_bpf_prep() to increment refcount on success and reject
  new requests when refcount is zero (unregistration in progress)
- Update io_uring_bpf_cleanup() to decrement refcount and wake up waiters
  when it reaches zero

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/linux/io_uring_types.h |   2 +
 io_uring/bpf_ext.c             | 104 ++++++++++++++++++++++++++++++++-
 io_uring/bpf_ext.h             |   3 +
 3 files changed, 106 insertions(+), 3 deletions(-)

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 3a558da86f83..5a240c5705cb 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -516,6 +516,8 @@ struct io_ring_ctx {
 	struct io_mapped_region		ring_region;
 	/* used for optimised request parameter and wait argument passing  */
 	struct io_mapped_region		param_region;
+
+	wait_queue_head_t		bpf_wq;
 };
 
 /*
diff --git a/io_uring/bpf_ext.c b/io_uring/bpf_ext.c
index e2151cc7f9f5..96c77a6d6cc0 100644
--- a/io_uring/bpf_ext.c
+++ b/io_uring/bpf_ext.c
@@ -12,6 +12,7 @@
 #include <linux/filter.h>
 #include <uapi/linux/io_uring.h>
 #include "io_uring.h"
+#include "register.h"
 #include "bpf_ext.h"
 
 static inline unsigned char uring_bpf_get_op(u32 op_flags)
@@ -29,7 +30,9 @@ int io_uring_bpf_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	struct uring_bpf_data *data = io_kiocb_to_cmd(req, struct uring_bpf_data);
 	u32 opf = READ_ONCE(sqe->bpf_op_flags);
 	unsigned char bpf_op = uring_bpf_get_op(opf);
+	struct uring_bpf_ops_kern *ops_kern;
 	const struct uring_bpf_ops *ops;
+	int ret;
 
 	if (unlikely(!(req->ctx->flags & IORING_SETUP_BPF_EXT)))
 		goto fail;
@@ -37,11 +40,20 @@ int io_uring_bpf_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	if (bpf_op >= IO_RING_MAX_BPF_OPS)
 		return -EINVAL;
 
-	ops = req->ctx->bpf_ext_ops[bpf_op].ops;
+	ops_kern = &req->ctx->bpf_ext_ops[bpf_op];
+	ops = ops_kern->ops;
+	if (!ops || !ops->prep_fn || !ops_kern->refcount)
+		goto fail;
+
 	data->opf = opf;
 	data->ops = ops;
-	if (ops && ops->prep_fn)
-		return ops->prep_fn(data, sqe);
+	ret = ops->prep_fn(data, sqe);
+	if (!ret) {
+		/* Only increment refcount on success (uring_lock already held) */
+		req->flags |= REQ_F_NEED_CLEANUP;
+		ops_kern->refcount++;
+	}
+	return ret;
 fail:
 	return -EOPNOTSUPP;
 }
@@ -78,9 +90,18 @@ void io_uring_bpf_cleanup(struct io_kiocb *req)
 {
 	struct uring_bpf_data *data = io_kiocb_to_cmd(req, struct uring_bpf_data);
 	const struct uring_bpf_ops *ops = data->ops;
+	struct uring_bpf_ops_kern *ops_kern;
+	unsigned char bpf_op;
 
 	if (ops && ops->cleanup_fn)
 		ops->cleanup_fn(data);
+
+	bpf_op = uring_bpf_get_op(data->opf);
+	ops_kern = &req->ctx->bpf_ext_ops[bpf_op];
+
+	/* Decrement refcount after cleanup (uring_lock already held) */
+	if (--ops_kern->refcount == 0)
+		wake_up(&req->ctx->bpf_wq);
 }
 
 static const struct btf_type *uring_bpf_data_type;
@@ -157,10 +178,82 @@ static int uring_bpf_ops_init_member(const struct btf_type *t,
 		 */
 		kuring_bpf_ops->id = uuring_bpf_ops->id;
 		return 1;
+	case offsetof(struct uring_bpf_ops, ring_fd):
+		kuring_bpf_ops->ring_fd = uuring_bpf_ops->ring_fd;
+		return 1;
 	}
 	return 0;
 }
 
+static int io_bpf_reg_unreg(struct uring_bpf_ops *ops, bool reg)
+{
+	struct uring_bpf_ops_kern *ops_kern;
+	struct io_ring_ctx *ctx;
+	struct file *file;
+	int ret = -EINVAL;
+
+	if (ops->id >= IO_RING_MAX_BPF_OPS)
+		return -EINVAL;
+
+	file = io_uring_register_get_file(ops->ring_fd, false);
+	if (IS_ERR(file))
+		return PTR_ERR(file);
+
+	ctx = file->private_data;
+	if (!(ctx->flags & IORING_SETUP_BPF_EXT))
+		goto out;
+
+	ops_kern = &ctx->bpf_ext_ops[ops->id];
+
+	mutex_lock(&ctx->uring_lock);
+	if (reg) {
+		/* Registration: set refcount to 1 and store ops */
+		if (ops_kern->ops) {
+			ret = -EBUSY;
+		} else {
+			ops_kern->ops = ops;
+			ops_kern->refcount = 1;
+			ret = 0;
+		}
+	} else {
+		/* Unregistration */
+		if (!ops_kern->ops) {
+			ret = -EINVAL;
+		} else {
+			ops_kern->refcount--;
+retry:
+			if (ops_kern->refcount == 0) {
+				ops_kern->ops = NULL;
+				ret = 0;
+			} else {
+				mutex_unlock(&ctx->uring_lock);
+				wait_event(ctx->bpf_wq, ops_kern->refcount == 0);
+				mutex_lock(&ctx->uring_lock);
+				goto retry;
+			}
+		}
+	}
+	mutex_unlock(&ctx->uring_lock);
+
+out:
+	fput(file);
+	return ret;
+}
+
+static int io_bpf_reg(void *kdata, struct bpf_link *link)
+{
+	struct uring_bpf_ops *ops = kdata;
+
+	return io_bpf_reg_unreg(ops, true);
+}
+
+static void io_bpf_unreg(void *kdata, struct bpf_link *link)
+{
+	struct uring_bpf_ops *ops = kdata;
+
+	io_bpf_reg_unreg(ops, false);
+}
+
 static int io_bpf_prep_io(struct uring_bpf_data *data, const struct io_uring_sqe *sqe)
 {
 	return 0;
@@ -191,6 +284,8 @@ static struct bpf_struct_ops bpf_uring_bpf_ops = {
 	.init = uring_bpf_ops_init,
 	.check_member = uring_bpf_ops_check_member,
 	.init_member = uring_bpf_ops_init_member,
+	.reg = io_bpf_reg,
+	.unreg = io_bpf_unreg,
 	.name = "uring_bpf_ops",
 	.cfi_stubs = &__bpf_uring_bpf_ops,
 	.owner = THIS_MODULE,
@@ -218,6 +313,8 @@ static const struct btf_kfunc_id_set uring_kfunc_set = {
 
 int io_bpf_alloc(struct io_ring_ctx *ctx)
 {
+	init_waitqueue_head(&ctx->bpf_wq);
+
 	if (!(ctx->flags & IORING_SETUP_BPF_EXT))
 		return 0;
 
@@ -225,6 +322,7 @@ int io_bpf_alloc(struct io_ring_ctx *ctx)
 			sizeof(struct uring_bpf_ops_kern), GFP_KERNEL);
 	if (!ctx->bpf_ext_ops)
 		return -ENOMEM;
+
 	return 0;
 }
 
diff --git a/io_uring/bpf_ext.h b/io_uring/bpf_ext.h
index 5a74f91bdcad..a568ea31a51a 100644
--- a/io_uring/bpf_ext.h
+++ b/io_uring/bpf_ext.h
@@ -27,14 +27,17 @@ typedef void (*uring_bpf_cleanup_t)(struct uring_bpf_data *data);
 
 struct uring_bpf_ops {
 	unsigned short		id;
+	int			ring_fd;
 	uring_bpf_prep_t	prep_fn;
 	uring_bpf_issue_t	issue_fn;
 	uring_bpf_fail_t	fail_fn;
 	uring_bpf_cleanup_t	cleanup_fn;
 };
 
+/* TODO: manage it via `io_rsrc_node` */
 struct uring_bpf_ops_kern {
 	const struct uring_bpf_ops *ops;
+	int refcount;
 };
 #ifdef CONFIG_IO_URING_BPF_EXT
 int io_uring_bpf_issue(struct io_kiocb *req, unsigned int issue_flags);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V3 07/12] io_uring: bpf: add BPF buffer descriptor for IORING_OP_BPF
  2026-03-24 16:37 [PATCH v3 0/12] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
                   ` (5 preceding siblings ...)
  2026-03-24 16:37 ` [PATCH V3 06/12] io_uring: bpf: implement struct_ops registration Ming Lei
@ 2026-03-24 16:37 ` Ming Lei
  2026-03-24 16:37 ` [PATCH V3 08/12] io_uring: bpf: add per-buffer iterator kfuncs Ming Lei
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2026-03-24 16:37 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Caleb Sander Mateos, Akilesh Kailash, bpf, Xiao Ni,
	Alexei Starovoitov, Ming Lei

Add io_bpf_buf_desc struct and io_bpf_buf_type enum to describe
buffer parameters for IORING_OP_BPF kfuncs. Supports plain userspace,
registered, vectored, and registered vectored buffer types.

Registered buffers (FIXED, KFIXED, REG_VEC) refer to buffers
pre-registered with io_uring and can be either userspace or kernel
buffers depending on how they were registered.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/uapi/linux/io_uring.h | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 3bf9be78a00a..6a265661bc20 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -453,6 +453,33 @@ enum io_uring_op {
 #define IORING_BPF_OP_BITS	8
 #define IORING_BPF_OP_SHIFT	24
 
+/*
+ * BPF buffer descriptor types.
+ *
+ * Registered buffers (FIXED, KFIXED, REG_VEC) refer to buffers pre-registered
+ * with io_uring. These can be either userspace or kernel buffers depending on
+ * how they were registered.
+ *
+ * For KFIXED, addr is an offset from the registered buffer start.
+ * For REG_VEC with kernel buffers, each iov.iov_base is offset-based.
+ */
+enum io_bpf_buf_type {
+	IO_BPF_BUF_USER		= 0,	/* plain userspace buffer */
+	IO_BPF_BUF_FIXED	= 1,	/* registered buffer (absolute address) */
+	IO_BPF_BUF_VEC		= 2,	/* vectored buffer (iovec array) */
+	IO_BPF_BUF_KFIXED	= 3,	/* registered buffer (offset-based) */
+	IO_BPF_BUF_REG_VEC	= 4,	/* registered vectored buffer */
+};
+
+/* BPF buffer descriptor for IORING_OP_BPF */
+struct io_bpf_buf_desc {
+	__u8  type;		/* IO_BPF_BUF_* */
+	__u8  reserved;
+	__u16 buf_index;	/* registered buffer index (FIXED/KFIXED/REG_VEC) */
+	__u32 len;		/* length (non-vec) or nr_vecs (vec types) */
+	__u64 addr;		/* userspace address, iovec ptr, or offset (KFIXED) */
+};
+
 /*
  * cqe.res for IORING_CQE_F_NOTIF if
  * IORING_SEND_ZC_REPORT_USAGE was requested
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V3 08/12] io_uring: bpf: add per-buffer iterator kfuncs
  2026-03-24 16:37 [PATCH v3 0/12] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
                   ` (6 preceding siblings ...)
  2026-03-24 16:37 ` [PATCH V3 07/12] io_uring: bpf: add BPF buffer descriptor for IORING_OP_BPF Ming Lei
@ 2026-03-24 16:37 ` Ming Lei
  2026-03-24 16:37 ` [PATCH V3 09/12] bpf: add bpf_uring_buf_dynptr to special_kfunc_list Ming Lei
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2026-03-24 16:37 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Caleb Sander Mateos, Akilesh Kailash, bpf, Xiao Ni,
	Alexei Starovoitov, Ming Lei

Add per-buffer KF_ITER kfuncs for page-level access from BPF programs.
Each buffer gets its own iterator on the BPF stack, and the BPF program
coordinates multiple iterators for multi-buffer operations. The verifier
enforces proper iterator lifecycle via KF_ITER_NEW/NEXT/DESTROY.

kfunc API:
- bpf_iter_uring_buf_new(iter, data, desc, direction): import one buffer,
  take submit lock (refcounted via data->lock_depth). Supports all 5
  buffer types (USER, FIXED, VEC, KFIXED, REG_VEC) and both directions
  (ITER_SOURCE for reading, ITER_DEST for writing).
- bpf_iter_uring_buf_next(iter): extract the next page, kmap it, return
  int * pointing to the avail byte count (non-NULL = more data, NULL =
  done). The actual page data is accessed via bpf_uring_buf_dynptr() or
  bpf_uring_buf_dynptr_rdwr().
- bpf_iter_uring_buf_destroy(iter): unmap page, free resources, release
  submit lock when lock_depth reaches zero.
- bpf_uring_buf_dynptr(it__iter, ptr__uninit): populate a read-only LOCAL
  dynptr bounded to avail bytes, preventing data leaks beyond valid data.
- bpf_uring_buf_dynptr_rdwr(it__iter, ptr__uninit): populate a writable
  LOCAL dynptr, requires direction == ITER_DEST (checked via
  iter.data_source).

The dynptr approach replaces the old uring_buf_page_t typedef which
exposed PAGE_SIZE bytes to BPF programs even when only avail bytes
contained valid data.

Note: bpf_dynptr_slice() requires a compile-time constant size, so BPF
programs typically process pages in fixed-size chunks (e.g. 512 bytes).
Buffer addresses and lengths should be at least 512-byte aligned for
efficient access.

Add helper functions for buffer import:
- io_bpf_import_fixed_buf(): handles FIXED/KFIXED types
- io_bpf_import_reg_vec(): handles REG_VEC type
- io_bpf_import_vec_buf(): handles VEC type
- io_bpf_import_buffer(): unified dispatcher for all buffer types

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 io_uring/bpf_ext.c | 413 ++++++++++++++++++++++++++++++++++++++++++++-
 io_uring/bpf_ext.h |  16 +-
 2 files changed, 421 insertions(+), 8 deletions(-)

diff --git a/io_uring/bpf_ext.c b/io_uring/bpf_ext.c
index 96c77a6d6cc0..c9787ee64b55 100644
--- a/io_uring/bpf_ext.c
+++ b/io_uring/bpf_ext.c
@@ -10,9 +10,12 @@
 #include <linux/btf.h>
 #include <linux/btf_ids.h>
 #include <linux/filter.h>
+#include <linux/uio.h>
+#include <linux/highmem.h>
 #include <uapi/linux/io_uring.h>
 #include "io_uring.h"
 #include "register.h"
+#include "rsrc.h"
 #include "bpf_ext.h"
 
 static inline unsigned char uring_bpf_get_op(u32 op_flags)
@@ -20,11 +23,6 @@ static inline unsigned char uring_bpf_get_op(u32 op_flags)
 	return (unsigned char)(op_flags >> IORING_BPF_OP_SHIFT);
 }
 
-static inline unsigned int uring_bpf_get_flags(u32 op_flags)
-{
-	return op_flags & ((1U << IORING_BPF_OP_SHIFT) - 1);
-}
-
 int io_uring_bpf_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct uring_bpf_data *data = io_kiocb_to_cmd(req, struct uring_bpf_data);
@@ -47,6 +45,8 @@ int io_uring_bpf_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 
 	data->opf = opf;
 	data->ops = ops;
+	data->issue_flags = 0;
+	data->lock_depth = 0;
 	ret = ops->prep_fn(data, sqe);
 	if (!ret) {
 		/* Only increment refcount on success (uring_lock already held) */
@@ -74,7 +74,13 @@ static int __io_uring_bpf_issue(struct io_kiocb *req)
 
 int io_uring_bpf_issue(struct io_kiocb *req, unsigned int issue_flags)
 {
-	return __io_uring_bpf_issue(req);
+	struct uring_bpf_data *data = io_kiocb_to_cmd(req, struct uring_bpf_data);
+	int ret;
+
+	data->issue_flags = issue_flags;
+	ret = __io_uring_bpf_issue(req);
+	data->issue_flags = 0;
+	return ret;
 }
 
 void io_uring_bpf_fail(struct io_kiocb *req)
@@ -291,6 +297,206 @@ static struct bpf_struct_ops bpf_uring_bpf_ops = {
 	.owner = THIS_MODULE,
 };
 
+/*
+ * Per-buffer iterator kernel state (heap-allocated, one per buffer).
+ */
+struct bpf_iter_uring_buf_kern {
+	struct uring_bpf_data	*data;
+	struct iov_iter		iter;
+	struct iou_vec		vec;
+	struct io_rsrc_node	*node;
+	struct page		*page;		/* current extracted page */
+	void			*kmap_base;	/* kmap_local_page() + offset */
+	int			avail;		/* valid bytes in current page */
+};
+
+static inline struct bpf_iter_uring_buf_kern *
+iter_kern(const struct bpf_iter_uring_buf *iter)
+{
+	return (struct bpf_iter_uring_buf_kern *)&iter->__opaque[0];
+}
+
+static void iter_unmap_page(struct bpf_iter_uring_buf_kern *kern)
+{
+	if (kern->kmap_base) {
+		kunmap_local(kern->kmap_base);
+		kern->kmap_base = NULL;
+	}
+	if (kern->page && iov_iter_extract_will_pin(&kern->iter)) {
+		unpin_user_page(kern->page);
+		kern->page = NULL;
+	}
+}
+
+/*
+ * Helper to import fixed buffer (FIXED or KFIXED).
+ * Must be called with submit lock held.
+ *
+ * FIXED: addr is absolute userspace address within buffer
+ * KFIXED: addr is offset from buffer start
+ *
+ * Returns node with incremented refcount on success, ERR_PTR on failure.
+ */
+static struct io_rsrc_node *io_bpf_import_fixed_buf(struct io_ring_ctx *ctx,
+						    struct iov_iter *iter,
+						    const struct io_bpf_buf_desc *desc,
+						    int ddir)
+{
+	struct io_rsrc_node *node;
+	struct io_mapped_ubuf *imu;
+	int ret;
+
+	node = io_rsrc_node_lookup(&ctx->buf_table, desc->buf_index);
+	if (!node)
+		return ERR_PTR(-EFAULT);
+
+	imu = node->buf;
+	if (!(imu->dir & (1 << ddir)))
+		return ERR_PTR(-EFAULT);
+
+	node->refs++;
+
+	ret = io_import_fixed(ddir, iter, imu, desc->addr, desc->len);
+	if (ret) {
+		node->refs--;
+		return ERR_PTR(ret);
+	}
+
+	return node;
+}
+
+/*
+ * Helper to import registered vectored buffer (REG_VEC).
+ * Must be called with submit lock held.
+ *
+ * addr: userspace iovec pointer
+ * len: number of iovecs
+ * buf_index: registered buffer index
+ *
+ * Returns node with incremented refcount on success, ERR_PTR on failure.
+ * Caller must call io_vec_free(vec) after use.
+ */
+static struct io_rsrc_node *io_bpf_import_reg_vec(struct io_ring_ctx *ctx,
+						   struct iov_iter *iter,
+						   const struct io_bpf_buf_desc *desc,
+						   int ddir, struct iou_vec *vec)
+{
+	struct io_rsrc_node *node;
+	struct io_mapped_ubuf *imu;
+	int ret;
+
+	node = io_rsrc_node_lookup(&ctx->buf_table, desc->buf_index);
+	if (!node)
+		return ERR_PTR(-EFAULT);
+
+	imu = node->buf;
+	if (!(imu->dir & (1 << ddir)))
+		return ERR_PTR(-EFAULT);
+
+	node->refs++;
+
+	/* Prepare iovec from userspace */
+	ret = __io_prep_reg_iovec(vec, u64_to_user_ptr(desc->addr),
+				  desc->len, io_is_compat(ctx), NULL);
+	if (ret)
+		goto err;
+
+	/* Import vectored buffer from registered buffer */
+	ret = __io_import_reg_vec(ddir, iter, imu, vec, desc->len, NULL);
+	if (ret)
+		goto err;
+
+	return node;
+err:
+	node->refs--;
+	return ERR_PTR(ret);
+}
+
+/*
+ * Helper to import a vectored user buffer (VEC) into iou_vec.
+ * Allocates space in vec and copies iovec from userspace.
+ *
+ * Returns 0 on success, negative error code on failure.
+ * Caller must call io_vec_free(vec) after use.
+ */
+static int io_bpf_import_vec_buf(struct io_ring_ctx *ctx,
+				 struct iov_iter *iter,
+				 const struct io_bpf_buf_desc *desc,
+				 int ddir, struct iou_vec *vec)
+{
+	unsigned nr_vecs = desc->len;
+	struct iovec *iov;
+	size_t total_len = 0;
+	void *res;
+	int ret, i;
+
+	if (nr_vecs > vec->nr) {
+		ret = io_vec_realloc(vec, nr_vecs);
+		if (ret)
+			return ret;
+	}
+
+	iov = vec->iovec;
+	res = iovec_from_user(u64_to_user_ptr(desc->addr), nr_vecs,
+			      nr_vecs, iov, io_is_compat(ctx));
+	if (IS_ERR(res))
+		return PTR_ERR(res);
+
+	for (i = 0; i < nr_vecs; i++)
+		total_len += iov[i].iov_len;
+
+	iov_iter_init(iter, ddir, iov, nr_vecs, total_len);
+	return 0;
+}
+
+/*
+ * Helper to import a buffer into an iov_iter based on io_bpf_buf_desc.
+ * Supports all 5 buffer types: USER, FIXED, VEC, KFIXED, REG_VEC.
+ * Must be called with submit lock held for FIXED/KFIXED/REG_VEC types.
+ *
+ * @ctx: ring context
+ * @iter: output iterator
+ * @desc: buffer descriptor
+ * @ddir: direction (ITER_SOURCE for source, ITER_DEST for destination)
+ * @vec: iou_vec for VEC/REG_VEC types (caller must call io_vec_free after use)
+ *
+ * Returns node pointer (may be NULL for USER/VEC), or ERR_PTR on failure.
+ * Caller must drop node reference when done if non-NULL.
+ */
+static struct io_rsrc_node *io_bpf_import_buffer(struct io_ring_ctx *ctx,
+						 struct iov_iter *iter,
+						 const struct io_bpf_buf_desc *desc,
+						 int ddir, struct iou_vec *vec)
+{
+	int ret;
+
+	switch (desc->type) {
+	case IO_BPF_BUF_USER:
+		/* Plain user buffer */
+		ret = import_ubuf(ddir, u64_to_user_ptr(desc->addr),
+				  desc->len, iter);
+		return ret ? ERR_PTR(ret) : NULL;
+
+	case IO_BPF_BUF_FIXED:
+	case IO_BPF_BUF_KFIXED:
+		/* FIXED: addr is absolute address within buffer */
+		/* KFIXED: addr is offset from buffer start */
+		return io_bpf_import_fixed_buf(ctx, iter, desc, ddir);
+
+	case IO_BPF_BUF_VEC:
+		/* Vectored user buffer - addr is iovec ptr, len is nr_vecs */
+		ret = io_bpf_import_vec_buf(ctx, iter, desc, ddir, vec);
+		return ret ? ERR_PTR(ret) : NULL;
+
+	case IO_BPF_BUF_REG_VEC:
+		/* Registered vectored buffer */
+		return io_bpf_import_reg_vec(ctx, iter, desc, ddir, vec);
+
+	default:
+		return ERR_PTR(-EINVAL);
+	}
+}
+
 __bpf_kfunc_start_defs();
 __bpf_kfunc void uring_bpf_set_result(struct uring_bpf_data *data, int res)
 {
@@ -300,10 +506,205 @@ __bpf_kfunc void uring_bpf_set_result(struct uring_bpf_data *data, int res)
 		req_set_fail(req);
 	io_req_set_res(req, res, 0);
 }
+
+/**
+ * bpf_iter_uring_buf_new - Initialize per-buffer iterator (KF_ITER_NEW)
+ * @iter: BPF-visible iterator state (on BPF stack)
+ * @data: BPF request data containing request context
+ * @desc: Single buffer descriptor
+ * @direction: ITER_SOURCE (read from buffer) or ITER_DEST (write to buffer)
+ *
+ * Takes the submit lock (refcounted via data->lock_depth, first caller
+ * acquires, last _destroy releases).
+ *
+ * Returns 0 on success, negative error code on failure.
+ */
+__bpf_kfunc int bpf_iter_uring_buf_new(struct bpf_iter_uring_buf *iter,
+					struct uring_bpf_data *data,
+					struct io_bpf_buf_desc *desc,
+					int direction)
+{
+	struct io_kiocb *req = cmd_to_io_kiocb(data);
+	struct io_ring_ctx *ctx = req->ctx;
+	struct bpf_iter_uring_buf_kern *kern = iter_kern(iter);
+	struct io_rsrc_node *node;
+
+	BUILD_BUG_ON(sizeof(struct bpf_iter_uring_buf_kern) >
+		     sizeof(struct bpf_iter_uring_buf));
+	BUILD_BUG_ON(__alignof__(struct bpf_iter_uring_buf_kern) !=
+		     __alignof__(struct bpf_iter_uring_buf));
+
+	memset(kern, 0, sizeof(*kern));
+
+	if (desc->type > IO_BPF_BUF_REG_VEC)
+		return -EINVAL;
+	if (direction != ITER_SOURCE && direction != ITER_DEST)
+		return -EINVAL;
+
+	kern->data = data;
+
+	if (data->lock_depth++ == 0)
+		io_ring_submit_lock(ctx, data->issue_flags);
+
+	node = io_bpf_import_buffer(ctx, &kern->iter, desc,
+				    direction, &kern->vec);
+	if (IS_ERR(node)) {
+		if (--data->lock_depth == 0)
+			io_ring_submit_unlock(ctx, data->issue_flags);
+		kern->data = NULL;
+		return PTR_ERR(node);
+	}
+
+	kern->node = node;
+	return 0;
+}
+
+/**
+ * bpf_iter_uring_buf_next - Get next page chunk (KF_ITER_NEXT)
+ * @iter: BPF-visible iterator state
+ *
+ * Unmaps the previous page and extracts the next one.
+ *
+ * Returns a non-NULL pointer when data is available, NULL when done.
+ * The returned pointer is to the avail count (int); the actual page data
+ * must be obtained via bpf_uring_buf_dynptr().
+ *
+ * Note: bpf_dynptr_slice() requires a compile-time constant size, so BPF
+ * programs typically process pages in fixed-size chunks (e.g. 512 bytes).
+ * If the page offset or extracted length is not aligned to the chunk size,
+ * the trailing bytes cannot be accessed via bpf_dynptr_slice() and are
+ * silently dropped.  For best efficiency, callers should ensure buffer
+ * addresses and lengths are at least 512-byte aligned.
+ */
+__bpf_kfunc int *
+bpf_iter_uring_buf_next(struct bpf_iter_uring_buf *iter)
+{
+	struct bpf_iter_uring_buf_kern *kern = iter_kern(iter);
+	struct page *pages[1];
+	struct page **pp = pages;
+	size_t offset;
+	ssize_t extracted;
+
+	if (!kern->data)
+		return NULL;
+
+	/* Unmap and release previous page */
+	iter_unmap_page(kern);
+	kern->avail = 0;
+
+	if (iov_iter_count(&kern->iter) == 0)
+		return NULL;
+
+	/* Extract next page */
+	extracted = iov_iter_extract_pages(&kern->iter, &pp,
+					   PAGE_SIZE, 1, 0, &offset);
+	if (extracted <= 0)
+		return NULL;
+
+	kern->page = pp[0];
+	kern->kmap_base = kmap_local_page(kern->page) + offset;
+	kern->avail = extracted;
+
+	return &kern->avail;
+}
+
+/**
+ * bpf_uring_buf_dynptr - Get dynptr for current page data
+ * @it__iter: Buffer iterator (must have a current page from _next())
+ * @ptr__uninit: Dynptr to initialize (LOCAL type, read-only)
+ *
+ * Initializes @ptr__uninit as a read-only LOCAL dynptr whose size equals
+ * the valid byte count in the current page chunk.  This prevents reads
+ * beyond the actual buffer data, unlike the old uring_buf_page_t approach
+ * which exposed a full PAGE_SIZE pointer.
+ *
+ * Returns 0 on success, -EINVAL if no current page is available.
+ */
+__bpf_kfunc int bpf_uring_buf_dynptr(struct bpf_iter_uring_buf *it__iter,
+				     struct bpf_dynptr *ptr__uninit)
+{
+	struct bpf_dynptr_kern *dynptr = (struct bpf_dynptr_kern *)ptr__uninit;
+	struct bpf_iter_uring_buf_kern *kern = iter_kern(it__iter);
+
+	if (!kern->kmap_base || kern->avail <= 0) {
+		bpf_dynptr_set_null(dynptr);
+		return -EINVAL;
+	}
+
+	bpf_dynptr_init(dynptr, kern->kmap_base,
+			BPF_DYNPTR_TYPE_LOCAL, 0, kern->avail);
+	bpf_dynptr_set_rdonly(dynptr);
+	return 0;
+}
+
+/**
+ * bpf_uring_buf_dynptr_rdwr - Get writable dynptr for current page data
+ * @it__iter: Buffer iterator (must have a current page from _next())
+ * @ptr__uninit: Dynptr to initialize (LOCAL type, read-write)
+ *
+ * Like bpf_uring_buf_dynptr() but returns a writable dynptr.  The iterator
+ * must have been created with direction == ITER_DEST; otherwise returns
+ * -EPERM.  This allows writing data into user buffers (e.g. copying from
+ * BPF arena to a user-provided destination buffer).
+ *
+ * Returns 0 on success, -EINVAL if no current page, -EPERM if not ITER_DEST.
+ */
+__bpf_kfunc int bpf_uring_buf_dynptr_rdwr(struct bpf_iter_uring_buf *it__iter,
+					   struct bpf_dynptr *ptr__uninit)
+{
+	struct bpf_dynptr_kern *dynptr = (struct bpf_dynptr_kern *)ptr__uninit;
+	struct bpf_iter_uring_buf_kern *kern = iter_kern(it__iter);
+
+	if (!kern->kmap_base || kern->avail <= 0) {
+		bpf_dynptr_set_null(dynptr);
+		return -EINVAL;
+	}
+
+	if (kern->iter.data_source) {  /* ITER_SOURCE — read-only buffer */
+		bpf_dynptr_set_null(dynptr);
+		return -EPERM;
+	}
+
+	bpf_dynptr_init(dynptr, kern->kmap_base,
+			BPF_DYNPTR_TYPE_LOCAL, 0, kern->avail);
+	return 0;
+}
+
+/**
+ * bpf_iter_uring_buf_destroy - Destroy per-buffer iterator (KF_ITER_DESTROY)
+ * @iter: BPF-visible iterator state
+ *
+ * Unmaps page, frees resources, releases submit lock if this
+ * iterator owns it.
+ */
+__bpf_kfunc void bpf_iter_uring_buf_destroy(struct bpf_iter_uring_buf *iter)
+{
+	struct bpf_iter_uring_buf_kern *kern = iter_kern(iter);
+	struct io_ring_ctx *ctx;
+
+	if (!kern->data)
+		return;
+
+	ctx = cmd_to_io_kiocb(kern->data)->ctx;
+
+	iter_unmap_page(kern);
+	io_vec_free(&kern->vec);
+	if (kern->node)
+		io_put_rsrc_node(ctx, kern->node);
+	if (--kern->data->lock_depth == 0)
+		io_ring_submit_unlock(ctx, kern->data->issue_flags);
+	kern->data = NULL;
+}
+
 __bpf_kfunc_end_defs();
 
 BTF_KFUNCS_START(uring_bpf_kfuncs)
 BTF_ID_FLAGS(func, uring_bpf_set_result)
+BTF_ID_FLAGS(func, bpf_iter_uring_buf_new, KF_ITER_NEW)
+BTF_ID_FLAGS(func, bpf_iter_uring_buf_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_uring_buf_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_uring_buf_dynptr)
+BTF_ID_FLAGS(func, bpf_uring_buf_dynptr_rdwr)
 BTF_KFUNCS_END(uring_bpf_kfuncs)
 
 static const struct btf_kfunc_id_set uring_kfunc_set = {
diff --git a/io_uring/bpf_ext.h b/io_uring/bpf_ext.h
index a568ea31a51a..b0ead4b19293 100644
--- a/io_uring/bpf_ext.h
+++ b/io_uring/bpf_ext.h
@@ -13,10 +13,13 @@ struct uring_bpf_data {
 	void				*req_data;  /* not for bpf prog */
 	const struct uring_bpf_ops	*ops;
 	u32				opf;
+	u32				issue_flags; /* io_uring issue flags */
+	unsigned int			lock_depth; /* not for bpf prog */
 
 	/* writeable for bpf prog */
 	u8              pdu[64 - sizeof(void *) -
-		sizeof(struct uring_bpf_ops *) - sizeof(u32)];
+		sizeof(struct uring_bpf_ops *) - 2 * sizeof(u32) -
+		sizeof(unsigned int)];
 };
 
 typedef int (*uring_bpf_prep_t)(struct uring_bpf_data *data,
@@ -37,8 +40,17 @@ struct uring_bpf_ops {
 /* TODO: manage it via `io_rsrc_node` */
 struct uring_bpf_ops_kern {
 	const struct uring_bpf_ops *ops;
-	int refcount;
+	int refcount;	/* Protected by ctx->uring_lock */
 };
+
+/*
+ * Per-buffer BPF iterator state (lives on BPF stack).
+ * Uses bpf_iter_ prefix for KF_ITER verifier enforcement.
+ * Kernel-internal state is stored inline in the __opaque[] array.
+ */
+struct bpf_iter_uring_buf {
+	__u64 __opaque[12];
+} __aligned(8);
 #ifdef CONFIG_IO_URING_BPF_EXT
 int io_uring_bpf_issue(struct io_kiocb *req, unsigned int issue_flags);
 int io_uring_bpf_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V3 09/12] bpf: add bpf_uring_buf_dynptr to special_kfunc_list
  2026-03-24 16:37 [PATCH v3 0/12] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
                   ` (7 preceding siblings ...)
  2026-03-24 16:37 ` [PATCH V3 08/12] io_uring: bpf: add per-buffer iterator kfuncs Ming Lei
@ 2026-03-24 16:37 ` Ming Lei
  2026-03-24 16:37 ` [PATCH V3 10/12] selftests/io_uring: add io_uring_unregister_buffers() Ming Lei
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2026-03-24 16:37 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Caleb Sander Mateos, Akilesh Kailash, bpf, Xiao Ni,
	Alexei Starovoitov, Ming Lei

Register bpf_uring_buf_dynptr() and bpf_uring_buf_dynptr_rdwr() in the
verifier's special_kfunc_list so they can produce LOCAL dynptrs.
Without these entries, the verifier cannot determine the dynptr type
for the __uninit output parameter.

We cannot reuse the existing bpf_dynptr_from_mem() helper because
kfunc PTR_TO_MEM returns are always marked MEM_RDONLY by the verifier
(meta.r0_rdonly = true), but bpf_dynptr_from_mem() requires writable
memory (ARG_PTR_TO_UNINIT_MEM).  A dedicated kfunc with a
special_kfunc_list entry is the only way to produce a dynptr from
read-only kernel-provided memory.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 kernel/bpf/verifier.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 159b25f8269d..42d0672336e1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12531,6 +12531,8 @@ enum special_kfunc_type {
 	KF_bpf_session_is_return,
 	KF_bpf_stream_vprintk,
 	KF_bpf_stream_print_stack,
+	KF_bpf_uring_buf_dynptr,
+	KF_bpf_uring_buf_dynptr_rdwr,
 };
 
 BTF_ID_LIST(special_kfunc_list)
@@ -12611,6 +12613,13 @@ BTF_ID(func, bpf_arena_reserve_pages)
 BTF_ID(func, bpf_session_is_return)
 BTF_ID(func, bpf_stream_vprintk)
 BTF_ID(func, bpf_stream_print_stack)
+#ifdef CONFIG_IO_URING_BPF_EXT
+BTF_ID(func, bpf_uring_buf_dynptr)
+BTF_ID(func, bpf_uring_buf_dynptr_rdwr)
+#else
+BTF_ID_UNUSED
+BTF_ID_UNUSED
+#endif
 
 static bool is_task_work_add_kfunc(u32 func_id)
 {
@@ -13590,6 +13599,9 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				dynptr_arg_type |= DYNPTR_TYPE_SKB_META;
 			} else if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_from_file]) {
 				dynptr_arg_type |= DYNPTR_TYPE_FILE;
+			} else if (meta->func_id == special_kfunc_list[KF_bpf_uring_buf_dynptr] ||
+				   meta->func_id == special_kfunc_list[KF_bpf_uring_buf_dynptr_rdwr]) {
+				dynptr_arg_type |= DYNPTR_TYPE_LOCAL;
 			} else if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_file_discard]) {
 				dynptr_arg_type |= DYNPTR_TYPE_FILE;
 				meta->release_regno = regno;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V3 10/12] selftests/io_uring: add io_uring_unregister_buffers()
  2026-03-24 16:37 [PATCH v3 0/12] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
                   ` (8 preceding siblings ...)
  2026-03-24 16:37 ` [PATCH V3 09/12] bpf: add bpf_uring_buf_dynptr to special_kfunc_list Ming Lei
@ 2026-03-24 16:37 ` Ming Lei
  2026-03-24 16:37 ` [PATCH V3 11/12] selftests/io_uring: add BPF struct_ops and kfunc tests Ming Lei
  2026-03-24 16:37 ` [PATCH V3 12/12] selftests/io_uring: add buffer iterator selftest with BPF arena Ming Lei
  11 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2026-03-24 16:37 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Caleb Sander Mateos, Akilesh Kailash, bpf, Xiao Ni,
	Alexei Starovoitov, Ming Lei

Add io_uring_unregister_buffers(), so that kernel selftest can call it
explicitly.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 tools/include/io_uring/mini_liburing.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tools/include/io_uring/mini_liburing.h b/tools/include/io_uring/mini_liburing.h
index 44be4446feda..b9163a41024f 100644
--- a/tools/include/io_uring/mini_liburing.h
+++ b/tools/include/io_uring/mini_liburing.h
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: MIT */
 
 #include <linux/io_uring.h>
+#include <signal.h>
 #include <sys/mman.h>
 #include <sys/syscall.h>
 #include <stdio.h>
@@ -284,6 +285,15 @@ static inline int io_uring_register_buffers(struct io_uring *ring,
 	return (ret < 0) ? -errno : ret;
 }
 
+static inline int io_uring_unregister_buffers(struct io_uring *ring)
+{
+	int ret;
+
+	ret = syscall(__NR_io_uring_register, ring->ring_fd,
+		      IORING_UNREGISTER_BUFFERS, NULL, 0);
+	return (ret < 0) ? -errno : ret;
+}
+
 static inline void io_uring_prep_send(struct io_uring_sqe *sqe, int sockfd,
 				      const void *buf, size_t len, int flags)
 {
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V3 11/12] selftests/io_uring: add BPF struct_ops and kfunc tests
  2026-03-24 16:37 [PATCH v3 0/12] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
                   ` (9 preceding siblings ...)
  2026-03-24 16:37 ` [PATCH V3 10/12] selftests/io_uring: add io_uring_unregister_buffers() Ming Lei
@ 2026-03-24 16:37 ` Ming Lei
  2026-03-24 16:37 ` [PATCH V3 12/12] selftests/io_uring: add buffer iterator selftest with BPF arena Ming Lei
  11 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2026-03-24 16:37 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Caleb Sander Mateos, Akilesh Kailash, bpf, Xiao Ni,
	Alexei Starovoitov, Ming Lei

Add selftests for io_uring BPF struct_ops and kfunc functionality:

- bpf_ext_basic: Tests IORING_OP_BPF struct_ops registration and execution
  with multiple struct_ops support

The test framework includes:
- runner.c: Main test runner with auto-discovery
- iou_test.h: Common test infrastructure
- Makefile: Build system with BPF skeleton generation

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 tools/testing/selftests/io_uring/.gitignore   |   2 +
 tools/testing/selftests/io_uring/Makefile     | 173 ++++++++++++++
 .../selftests/io_uring/bpf_ext_basic.bpf.c    |  94 ++++++++
 .../selftests/io_uring/bpf_ext_basic.c        | 215 ++++++++++++++++++
 .../selftests/io_uring/include/iou_test.h     |  98 ++++++++
 tools/testing/selftests/io_uring/runner.c     | 206 +++++++++++++++++
 6 files changed, 788 insertions(+)
 create mode 100644 tools/testing/selftests/io_uring/.gitignore
 create mode 100644 tools/testing/selftests/io_uring/Makefile
 create mode 100644 tools/testing/selftests/io_uring/bpf_ext_basic.bpf.c
 create mode 100644 tools/testing/selftests/io_uring/bpf_ext_basic.c
 create mode 100644 tools/testing/selftests/io_uring/include/iou_test.h
 create mode 100644 tools/testing/selftests/io_uring/runner.c

diff --git a/tools/testing/selftests/io_uring/.gitignore b/tools/testing/selftests/io_uring/.gitignore
new file mode 100644
index 000000000000..c0e488dc0622
--- /dev/null
+++ b/tools/testing/selftests/io_uring/.gitignore
@@ -0,0 +1,2 @@
+/build/
+/runner
diff --git a/tools/testing/selftests/io_uring/Makefile b/tools/testing/selftests/io_uring/Makefile
new file mode 100644
index 000000000000..ad659e6f5361
--- /dev/null
+++ b/tools/testing/selftests/io_uring/Makefile
@@ -0,0 +1,173 @@
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2025 Red Hat, Inc.
+include ../../../build/Build.include
+include ../../../scripts/Makefile.arch
+include ../../../scripts/Makefile.include
+
+TEST_GEN_PROGS := runner
+
+# override lib.mk's default rules
+OVERRIDE_TARGETS := 1
+include ../lib.mk
+
+CURDIR := $(abspath .)
+REPOROOT := $(abspath ../../../..)
+TOOLSDIR := $(REPOROOT)/tools
+LIBDIR := $(TOOLSDIR)/lib
+BPFDIR := $(LIBDIR)/bpf
+TOOLSINCDIR := $(TOOLSDIR)/include
+BPFTOOLDIR := $(TOOLSDIR)/bpf/bpftool
+APIDIR := $(TOOLSINCDIR)/uapi
+GENDIR := $(REPOROOT)/include/generated
+GENHDR := $(GENDIR)/autoconf.h
+
+OUTPUT_DIR := $(OUTPUT)/build
+OBJ_DIR := $(OUTPUT_DIR)/obj
+INCLUDE_DIR := $(OUTPUT_DIR)/include
+BPFOBJ_DIR := $(OBJ_DIR)/libbpf
+IOUOBJ_DIR := $(OBJ_DIR)/io_uring
+BPFOBJ := $(BPFOBJ_DIR)/libbpf.a
+LIBBPF_OUTPUT := $(OBJ_DIR)/libbpf/libbpf.a
+
+DEFAULT_BPFTOOL := $(OUTPUT_DIR)/host/sbin/bpftool
+HOST_OBJ_DIR := $(OBJ_DIR)/host/bpftool
+HOST_LIBBPF_OUTPUT := $(OBJ_DIR)/host/libbpf/
+HOST_LIBBPF_DESTDIR := $(OUTPUT_DIR)/host/
+HOST_DESTDIR := $(OUTPUT_DIR)/host/
+
+VMLINUX_BTF_PATHS ?= $(if $(O),$(O)/vmlinux)					\
+		     $(if $(KBUILD_OUTPUT),$(KBUILD_OUTPUT)/vmlinux)		\
+		     ../../../../vmlinux					\
+		     /sys/kernel/btf/vmlinux					\
+		     /boot/vmlinux-$(shell uname -r)
+VMLINUX_BTF ?= $(abspath $(firstword $(wildcard $(VMLINUX_BTF_PATHS))))
+ifeq ($(VMLINUX_BTF),)
+$(error Cannot find a vmlinux for VMLINUX_BTF at any of "$(VMLINUX_BTF_PATHS)")
+endif
+
+BPFTOOL ?= $(DEFAULT_BPFTOOL)
+
+ifneq ($(wildcard $(GENHDR)),)
+  GENFLAGS := -DHAVE_GENHDR
+endif
+
+CFLAGS += -g -O2 -rdynamic -pthread -Wall -Werror $(GENFLAGS)			\
+	  -I$(INCLUDE_DIR) -I$(GENDIR) -I$(LIBDIR)				\
+	  -I$(REPOROOT)/usr/include						\
+	  -I$(TOOLSINCDIR) -I$(APIDIR) -I$(CURDIR)/include
+
+# Silence some warnings when compiled with clang
+ifneq ($(LLVM),)
+CFLAGS += -Wno-unused-command-line-argument
+endif
+
+LDFLAGS = -lelf -lz -lpthread -lzstd
+
+IS_LITTLE_ENDIAN = $(shell $(CC) -dM -E - </dev/null |				\
+			grep 'define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__')
+
+# Get Clang's default includes on this system
+define get_sys_includes
+$(shell $(1) $(2) -v -E - </dev/null 2>&1 \
+	| sed -n '/<...> search starts here:/,/End of search list./{ s| \(/.*\)|-idirafter \1|p }') \
+$(shell $(1) $(2) -dM -E - </dev/null | grep '__riscv_xlen ' | awk '{printf("-D__riscv_xlen=%d -D__BITS_PER_LONG=%d", $$3, $$3)}')
+endef
+
+ifneq ($(CROSS_COMPILE),)
+CLANG_TARGET_ARCH = --target=$(notdir $(CROSS_COMPILE:%-=%))
+endif
+
+CLANG_SYS_INCLUDES = $(call get_sys_includes,$(CLANG),$(CLANG_TARGET_ARCH))
+
+BPF_CFLAGS = -g -D__TARGET_ARCH_$(SRCARCH)					\
+	     $(if $(IS_LITTLE_ENDIAN),-mlittle-endian,-mbig-endian)		\
+	     -I$(CURDIR)/include -I$(CURDIR)/include/bpf-compat			\
+	     -I$(INCLUDE_DIR) -I$(APIDIR)					\
+	     -I$(REPOROOT)/include						\
+	     $(CLANG_SYS_INCLUDES)						\
+	     -Wall -Wno-compare-distinct-pointer-types				\
+	     -Wno-incompatible-function-pointer-types				\
+	     -Wno-missing-declarations						\
+	     -O2 -mcpu=v3
+
+# sort removes libbpf duplicates when not cross-building
+MAKE_DIRS := $(sort $(OBJ_DIR)/libbpf $(OBJ_DIR)/libbpf				\
+	       $(OBJ_DIR)/bpftool $(OBJ_DIR)/resolve_btfids			\
+	       $(HOST_OBJ_DIR) $(INCLUDE_DIR) $(IOUOBJ_DIR))
+
+$(MAKE_DIRS):
+	$(call msg,MKDIR,,$@)
+	$(Q)mkdir -p $@
+
+$(BPFOBJ): $(wildcard $(BPFDIR)/*.[ch] $(BPFDIR)/Makefile)			\
+	   $(APIDIR)/linux/bpf.h						\
+	   | $(OBJ_DIR)/libbpf
+	$(Q)$(MAKE) $(submake_extras) -C $(BPFDIR) OUTPUT=$(OBJ_DIR)/libbpf/	\
+		    ARCH=$(ARCH) CC="$(CC)" CROSS_COMPILE=$(CROSS_COMPILE)	\
+		    EXTRA_CFLAGS='-g -O0 -fPIC'					\
+		    DESTDIR=$(OUTPUT_DIR) prefix= all install_headers
+
+$(DEFAULT_BPFTOOL): $(wildcard $(BPFTOOLDIR)/*.[ch] $(BPFTOOLDIR)/Makefile)	\
+		    $(LIBBPF_OUTPUT) | $(HOST_OBJ_DIR)
+	$(Q)$(MAKE) $(submake_extras)  -C $(BPFTOOLDIR)				\
+		    ARCH= CROSS_COMPILE= CC=$(HOSTCC) LD=$(HOSTLD)		\
+		    EXTRA_CFLAGS='-g -O0'					\
+		    OUTPUT=$(HOST_OBJ_DIR)/					\
+		    LIBBPF_OUTPUT=$(HOST_LIBBPF_OUTPUT)				\
+		    LIBBPF_DESTDIR=$(HOST_LIBBPF_DESTDIR)			\
+		    prefix= DESTDIR=$(HOST_DESTDIR) install-bin
+
+$(INCLUDE_DIR)/vmlinux.h: $(VMLINUX_BTF) $(BPFTOOL) | $(INCLUDE_DIR)
+ifeq ($(VMLINUX_H),)
+	$(call msg,GEN,,$@)
+	$(Q)$(BPFTOOL) btf dump file $(VMLINUX_BTF) format c > $@
+else
+	$(call msg,CP,,$@)
+	$(Q)cp "$(VMLINUX_H)" $@
+endif
+
+$(IOUOBJ_DIR)/%.bpf.o: %.bpf.c $(INCLUDE_DIR)/vmlinux.h	| $(BPFOBJ) $(IOUOBJ_DIR)
+	$(call msg,CLNG-BPF,,$(notdir $@))
+	$(Q)$(CLANG) $(BPF_CFLAGS) -target bpf -c $< -o $@
+
+$(INCLUDE_DIR)/%.bpf.skel.h: $(IOUOBJ_DIR)/%.bpf.o $(INCLUDE_DIR)/vmlinux.h $(BPFTOOL) | $(INCLUDE_DIR)
+	$(eval sched=$(notdir $@))
+	$(call msg,GEN-SKEL,,$(sched))
+	$(Q)$(BPFTOOL) gen object $(<:.o=.linked1.o) $<
+	$(Q)$(BPFTOOL) gen object $(<:.o=.linked2.o) $(<:.o=.linked1.o)
+	$(Q)$(BPFTOOL) gen object $(<:.o=.linked3.o) $(<:.o=.linked2.o)
+	$(Q)diff $(<:.o=.linked2.o) $(<:.o=.linked3.o)
+	$(Q)$(BPFTOOL) gen skeleton $(<:.o=.linked3.o) name $(subst .bpf.skel.h,,$(sched)) > $@
+	$(Q)$(BPFTOOL) gen subskeleton $(<:.o=.linked3.o) name $(subst .bpf.skel.h,,$(sched)) > $(@:.skel.h=.subskel.h)
+
+override define CLEAN
+	rm -rf $(OUTPUT_DIR)
+	rm -f $(TEST_GEN_PROGS)
+endef
+
+# Every testcase takes all of the BPF progs as dependencies by default.
+all_test_bpfprogs := $(foreach prog,$(wildcard *.bpf.c),$(INCLUDE_DIR)/$(patsubst %.c,%.skel.h,$(prog)))
+
+auto-test-targets :=			\
+	bpf_ext_basic			\
+	bpf_ext_memcpy			\
+
+testcase-targets := $(addsuffix .o,$(addprefix $(IOUOBJ_DIR)/,$(auto-test-targets)))
+
+$(IOUOBJ_DIR)/runner.o: runner.c | $(IOUOBJ_DIR) $(BPFOBJ)
+	$(call msg,CC,,$@)
+	$(Q)$(CC) $(CFLAGS) -c $< -o $@
+
+$(testcase-targets): $(IOUOBJ_DIR)/%.o: %.c $(IOUOBJ_DIR)/runner.o $(all_test_bpfprogs) | $(IOUOBJ_DIR)
+	$(call msg,CC,,$@)
+	$(Q)$(CC) $(CFLAGS) -c $< -o $@
+
+$(OUTPUT)/runner: $(IOUOBJ_DIR)/runner.o $(BPFOBJ) $(testcase-targets)
+	$(call msg,LINK,,$@)
+	$(Q)$(CC) $(CFLAGS) -o $@ $^ $(LDFLAGS)
+
+.DEFAULT_GOAL := all
+
+.DELETE_ON_ERROR:
+
+.SECONDARY:
diff --git a/tools/testing/selftests/io_uring/bpf_ext_basic.bpf.c b/tools/testing/selftests/io_uring/bpf_ext_basic.bpf.c
new file mode 100644
index 000000000000..1fec378c8c62
--- /dev/null
+++ b/tools/testing/selftests/io_uring/bpf_ext_basic.bpf.c
@@ -0,0 +1,94 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2025 Red Hat, Inc.
+ * Basic io_uring BPF struct_ops test.
+ *
+ * This tests registering a minimal uring_bpf_ops struct_ops
+ * with prep/issue/cleanup callbacks.
+ */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char LICENSE[] SEC("license") = "GPL";
+
+/* Counters to verify callbacks are invoked */
+int prep_count = 0;
+int issue_count = 0;
+int cleanup_count = 0;
+
+/* Test result stored in pdu */
+#define PDU_MAGIC 0xdeadbeef
+
+SEC("struct_ops/basic_prep")
+int BPF_PROG(basic_prep, struct uring_bpf_data *data,
+	     const struct io_uring_sqe *sqe)
+{
+	__u32 *magic;
+
+	prep_count++;
+
+	/* Store magic value in pdu to verify data flow */
+	magic = (__u32 *)data->pdu;
+	*magic = PDU_MAGIC;
+
+	bpf_printk("basic_prep: count=%d", prep_count);
+	return 0;
+}
+
+extern void uring_bpf_set_result(struct uring_bpf_data *data, int res) __ksym;
+
+SEC("struct_ops/basic_issue")
+int BPF_PROG(basic_issue, struct uring_bpf_data *data)
+{
+	__u32 *magic;
+
+	issue_count++;
+
+	/* Verify pdu contains the magic value from prep */
+	magic = (__u32 *)data->pdu;
+	if (*magic != PDU_MAGIC) {
+		bpf_printk("basic_issue: pdu magic mismatch!");
+		uring_bpf_set_result(data, -22); /* -EINVAL */
+		return 0;
+	}
+
+	bpf_printk("basic_issue: count=%d, pdu_magic=0x%x", issue_count, *magic);
+
+	/* Set successful result */
+	uring_bpf_set_result(data, 42);
+	return 0;
+}
+
+SEC("struct_ops/basic_fail")
+void BPF_PROG(basic_fail, struct uring_bpf_data *data)
+{
+	bpf_printk("basic_fail: invoked");
+}
+
+SEC("struct_ops/basic_cleanup")
+void BPF_PROG(basic_cleanup, struct uring_bpf_data *data)
+{
+	cleanup_count++;
+	bpf_printk("basic_cleanup: count=%d", cleanup_count);
+}
+
+SEC(".struct_ops.link")
+struct uring_bpf_ops bpf_ext_basic = {
+	.id		= 0,
+	.prep_fn	= (void *)basic_prep,
+	.issue_fn	= (void *)basic_issue,
+	.fail_fn	= (void *)basic_fail,
+	.cleanup_fn	= (void *)basic_cleanup,
+};
+
+/* Second struct_ops to verify multiple registrations work */
+SEC(".struct_ops.link")
+struct uring_bpf_ops bpf_ext_basic_2 = {
+	.id		= 1,
+	.prep_fn	= (void *)basic_prep,
+	.issue_fn	= (void *)basic_issue,
+	.fail_fn	= (void *)basic_fail,
+	.cleanup_fn	= (void *)basic_cleanup,
+};
diff --git a/tools/testing/selftests/io_uring/bpf_ext_basic.c b/tools/testing/selftests/io_uring/bpf_ext_basic.c
new file mode 100644
index 000000000000..0591204a2c1e
--- /dev/null
+++ b/tools/testing/selftests/io_uring/bpf_ext_basic.c
@@ -0,0 +1,215 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2025 Red Hat, Inc.
+ * Basic io_uring BPF struct_ops test - userspace part.
+ */
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+#include <errno.h>
+#include <linux/io_uring.h>
+#include <io_uring/mini_liburing.h>
+
+#include "iou_test.h"
+#include "bpf_ext_basic.bpf.skel.h"
+
+struct test_ctx {
+	struct bpf_ext_basic *skel;
+	struct bpf_link *link;
+	struct bpf_link *link_2;
+	struct io_uring ring;
+	int nr_ops;
+};
+
+static enum iou_test_status bpf_setup(struct test_ctx *ctx)
+{
+	int ret;
+
+	/* Load BPF skeleton */
+	ctx->skel = bpf_ext_basic__open();
+	if (!ctx->skel) {
+		IOU_ERR("Failed to open BPF skeleton");
+		return IOU_TEST_FAIL;
+	}
+
+	/* Set ring_fd in struct_ops before loading (id is hardcoded in BPF) */
+	ctx->skel->struct_ops.bpf_ext_basic->ring_fd = ctx->ring.ring_fd;
+	ctx->skel->struct_ops.bpf_ext_basic_2->ring_fd = ctx->ring.ring_fd;
+
+	ret = bpf_ext_basic__load(ctx->skel);
+	if (ret) {
+		IOU_ERR("Failed to load BPF skeleton: %d", ret);
+		bpf_ext_basic__destroy(ctx->skel);
+		ctx->skel = NULL;
+		return IOU_TEST_FAIL;
+	}
+
+	/* Attach first struct_ops */
+	ctx->link = bpf_map__attach_struct_ops(ctx->skel->maps.bpf_ext_basic);
+	if (!ctx->link) {
+		IOU_ERR("Failed to attach struct_ops");
+		bpf_ext_basic__destroy(ctx->skel);
+		ctx->skel = NULL;
+		return IOU_TEST_FAIL;
+	}
+	ctx->nr_ops++;
+
+	/* Attach second struct_ops */
+	ctx->link_2 = bpf_map__attach_struct_ops(ctx->skel->maps.bpf_ext_basic_2);
+	if (!ctx->link_2) {
+		IOU_ERR("Failed to attach struct_ops_2");
+		bpf_link__destroy(ctx->link);
+		ctx->link = NULL;
+		bpf_ext_basic__destroy(ctx->skel);
+		ctx->skel = NULL;
+		return IOU_TEST_FAIL;
+	}
+	ctx->nr_ops++;
+
+	return IOU_TEST_PASS;
+}
+
+static enum iou_test_status setup(void **ctx_out)
+{
+	struct io_uring_params p;
+	struct test_ctx *ctx;
+	enum iou_test_status status;
+	int ret;
+
+	ctx = calloc(1, sizeof(*ctx));
+	if (!ctx) {
+		IOU_ERR("Failed to allocate context");
+		return IOU_TEST_FAIL;
+	}
+
+	/* Setup io_uring ring with BPF_OP flag */
+	memset(&p, 0, sizeof(p));
+	p.flags = IORING_SETUP_BPF_EXT | IORING_SETUP_NO_SQARRAY;
+
+	ret = io_uring_queue_init_params(8, &ctx->ring, &p);
+	if (ret < 0) {
+		IOU_ERR("io_uring_queue_init_params failed: %s (flags=0x%x)",
+			strerror(-ret), p.flags);
+		free(ctx);
+		return IOU_TEST_SKIP;
+	}
+
+	status = bpf_setup(ctx);
+	if (status != IOU_TEST_PASS) {
+		io_uring_queue_exit(&ctx->ring);
+		free(ctx);
+		return status;
+	}
+
+	*ctx_out = ctx;
+	return IOU_TEST_PASS;
+}
+
+static enum iou_test_status test_bpf_op(struct test_ctx *ctx, int op_id)
+{
+	struct io_uring_sqe *sqe;
+	struct io_uring_cqe *cqe;
+	__u64 user_data = 0x12345678 + op_id;
+	int ret;
+
+	sqe = io_uring_get_sqe(&ctx->ring);
+	if (!sqe) {
+		IOU_ERR("Failed to get SQE for op %d", op_id);
+		return IOU_TEST_FAIL;
+	}
+
+	memset(sqe, 0, sizeof(*sqe));
+	sqe->opcode = IORING_OP_BPF;
+	sqe->fd = -1;
+	sqe->bpf_op_flags = (op_id << IORING_BPF_OP_SHIFT);
+	sqe->user_data = user_data;
+
+	ret = io_uring_submit(&ctx->ring);
+	if (ret < 0) {
+		IOU_ERR("io_uring_submit for op %d failed: %d", op_id, ret);
+		return IOU_TEST_FAIL;
+	}
+
+	ret = io_uring_wait_cqe(&ctx->ring, &cqe);
+	if (ret < 0) {
+		IOU_ERR("io_uring_wait_cqe for op %d failed: %d", op_id, ret);
+		return IOU_TEST_FAIL;
+	}
+
+	if (cqe->user_data != user_data) {
+		IOU_ERR("CQE user_data mismatch for op %d: 0x%llx", op_id, cqe->user_data);
+		return IOU_TEST_FAIL;
+	}
+
+	if (cqe->res != 42) {
+		IOU_ERR("CQE result mismatch for op %d: %d (expected 42)", op_id, cqe->res);
+		return IOU_TEST_FAIL;
+	}
+
+	io_uring_cqe_seen(&ctx->ring);
+	return IOU_TEST_PASS;
+}
+
+static enum iou_test_status verify_counters(struct test_ctx *ctx, int expected)
+{
+	if (ctx->skel->bss->prep_count != expected) {
+		IOU_ERR("prep_count mismatch: %d (expected %d)",
+			ctx->skel->bss->prep_count, expected);
+		return IOU_TEST_FAIL;
+	}
+	if (ctx->skel->bss->issue_count != expected) {
+		IOU_ERR("issue_count mismatch: %d (expected %d)",
+			ctx->skel->bss->issue_count, expected);
+		return IOU_TEST_FAIL;
+	}
+	if (ctx->skel->bss->cleanup_count != expected) {
+		IOU_ERR("cleanup_count mismatch: %d (expected %d)",
+			ctx->skel->bss->cleanup_count, expected);
+		return IOU_TEST_FAIL;
+	}
+	return IOU_TEST_PASS;
+}
+
+static enum iou_test_status run(void *ctx_ptr)
+{
+	struct test_ctx *ctx = ctx_ptr;
+	enum iou_test_status status;
+	int i;
+
+	/* Test all registered struct_ops */
+	for (i = 0; i < ctx->nr_ops; i++) {
+		status = test_bpf_op(ctx, i);
+		if (status != IOU_TEST_PASS)
+			return status;
+
+		/* Verify counters after each op */
+		status = verify_counters(ctx, i + 1);
+		if (status != IOU_TEST_PASS)
+			return status;
+	}
+
+	IOU_INFO("IORING_OP_BPF multiple struct_ops test passed");
+	return IOU_TEST_PASS;
+}
+
+static void cleanup(void *ctx_ptr)
+{
+	struct test_ctx *ctx = ctx_ptr;
+
+	if (ctx->link_2)
+		bpf_link__destroy(ctx->link_2);
+	if (ctx->link)
+		bpf_link__destroy(ctx->link);
+	if (ctx->skel)
+		bpf_ext_basic__destroy(ctx->skel);
+	io_uring_queue_exit(&ctx->ring);
+	free(ctx);
+}
+
+struct iou_test bpf_ext_basic_test = {
+	.name = "bpf_ext_basic",
+	.description = "Test IORING_OP_BPF struct_ops registration and execution",
+	.setup = setup,
+	.run = run,
+	.cleanup = cleanup,
+};
+REGISTER_IOU_TEST(bpf_ext_basic_test)
diff --git a/tools/testing/selftests/io_uring/include/iou_test.h b/tools/testing/selftests/io_uring/include/iou_test.h
new file mode 100644
index 000000000000..8e7880e81314
--- /dev/null
+++ b/tools/testing/selftests/io_uring/include/iou_test.h
@@ -0,0 +1,98 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2025 Red Hat, Inc.
+ */
+
+#ifndef __IOU_TEST_H__
+#define __IOU_TEST_H__
+
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+enum iou_test_status {
+	IOU_TEST_PASS = 0,
+	IOU_TEST_SKIP,
+	IOU_TEST_FAIL,
+};
+
+struct iou_test {
+	/**
+	 * name - The name of the testcase.
+	 */
+	const char *name;
+
+	/**
+	 * description - A description of the testcase.
+	 */
+	const char *description;
+
+	/**
+	 * setup - Setup callback to initialize the test.
+	 * @ctx: A pointer to a context object that will be passed to run
+	 *       and cleanup.
+	 *
+	 * Return: IOU_TEST_PASS if setup was successful, IOU_TEST_SKIP
+	 *         if the test should be skipped, or IOU_TEST_FAIL if the
+	 *         test should be marked as failed.
+	 */
+	enum iou_test_status (*setup)(void **ctx);
+
+	/**
+	 * run - The main test function.
+	 * @ctx: Context object returned from setup().
+	 *
+	 * Return: IOU_TEST_PASS if the test passed, or IOU_TEST_FAIL
+	 *         if it failed.
+	 */
+	enum iou_test_status (*run)(void *ctx);
+
+	/**
+	 * cleanup - Cleanup callback.
+	 * @ctx: Context object returned from setup().
+	 */
+	void (*cleanup)(void *ctx);
+};
+
+void iou_test_register(struct iou_test *test);
+
+#define REGISTER_IOU_TEST(__test)					\
+	__attribute__((constructor))					\
+	static void __test##_register(void)				\
+	{								\
+		iou_test_register(&(__test));				\
+	}
+
+#define IOU_BUG(__cond, __fmt, ...)					\
+	do {								\
+		if (__cond) {						\
+			fprintf(stderr, "FATAL (%s:%d): " __fmt "\n",	\
+				__FILE__, __LINE__,			\
+				##__VA_ARGS__);				\
+			exit(1);					\
+		}							\
+	} while (0)
+
+#define IOU_BUG_ON(__cond) IOU_BUG(__cond, "BUG: %s", #__cond)
+
+#define IOU_FAIL(__fmt, ...)						\
+	do {								\
+		fprintf(stderr, "FAIL (%s:%d): " __fmt "\n",		\
+			__FILE__, __LINE__, ##__VA_ARGS__);		\
+		return IOU_TEST_FAIL;					\
+	} while (0)
+
+#define IOU_FAIL_IF(__cond, __fmt, ...)					\
+	do {								\
+		if (__cond)						\
+			IOU_FAIL(__fmt, ##__VA_ARGS__);			\
+	} while (0)
+
+#define IOU_ERR(__fmt, ...)						\
+	fprintf(stderr, "ERR: " __fmt "\n", ##__VA_ARGS__)
+
+#define IOU_INFO(__fmt, ...)						\
+	fprintf(stdout, "INFO: " __fmt "\n", ##__VA_ARGS__)
+
+#endif /* __IOU_TEST_H__ */
diff --git a/tools/testing/selftests/io_uring/runner.c b/tools/testing/selftests/io_uring/runner.c
new file mode 100644
index 000000000000..09ac1ac2d633
--- /dev/null
+++ b/tools/testing/selftests/io_uring/runner.c
@@ -0,0 +1,206 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2025 Red Hat, Inc.
+ * Test runner for io_uring BPF selftests.
+ */
+#include <stdio.h>
+#include <unistd.h>
+#include <signal.h>
+#include <libgen.h>
+#include <bpf/bpf.h>
+#include "iou_test.h"
+
+const char help_fmt[] =
+"The runner for io_uring BPF tests.\n"
+"\n"
+"The runner is statically linked against all testcases, and runs them all serially.\n"
+"\n"
+"Usage: %s [-t TEST] [-h]\n"
+"\n"
+"  -t TEST       Only run tests whose name includes this string\n"
+"  -s            Include print output for skipped tests\n"
+"  -l            List all available tests\n"
+"  -q            Don't print the test descriptions during run\n"
+"  -h            Display this help and exit\n";
+
+static volatile int exit_req;
+static bool quiet, print_skipped, list;
+
+#define MAX_IOU_TESTS 2048
+
+static struct iou_test __iou_tests[MAX_IOU_TESTS];
+static unsigned __iou_num_tests = 0;
+
+static void sigint_handler(int simple)
+{
+	exit_req = 1;
+}
+
+static void print_test_preamble(const struct iou_test *test, bool quiet)
+{
+	printf("===== START =====\n");
+	printf("TEST: %s\n", test->name);
+	if (!quiet)
+		printf("DESCRIPTION: %s\n", test->description);
+	printf("OUTPUT:\n");
+
+	fflush(stdout);
+	fflush(stderr);
+}
+
+static const char *status_to_result(enum iou_test_status status)
+{
+	switch (status) {
+	case IOU_TEST_PASS:
+	case IOU_TEST_SKIP:
+		return "ok";
+	case IOU_TEST_FAIL:
+		return "not ok";
+	default:
+		return "<UNKNOWN>";
+	}
+}
+
+static void print_test_result(const struct iou_test *test,
+			      enum iou_test_status status,
+			      unsigned int testnum)
+{
+	const char *result = status_to_result(status);
+	const char *directive = status == IOU_TEST_SKIP ? "SKIP " : "";
+
+	printf("%s %u %s # %s\n", result, testnum, test->name, directive);
+	printf("=====  END  =====\n");
+}
+
+static bool should_skip_test(const struct iou_test *test, const char *filter)
+{
+	return !strstr(test->name, filter);
+}
+
+static enum iou_test_status run_test(const struct iou_test *test)
+{
+	enum iou_test_status status;
+	void *context = NULL;
+
+	if (test->setup) {
+		status = test->setup(&context);
+		if (status != IOU_TEST_PASS)
+			return status;
+	}
+
+	status = test->run(context);
+
+	if (test->cleanup)
+		test->cleanup(context);
+
+	return status;
+}
+
+static bool test_valid(const struct iou_test *test)
+{
+	if (!test) {
+		fprintf(stderr, "NULL test detected\n");
+		return false;
+	}
+
+	if (!test->name) {
+		fprintf(stderr,
+			"Test with no name found. Must specify test name.\n");
+		return false;
+	}
+
+	if (!test->description) {
+		fprintf(stderr, "Test %s requires description.\n", test->name);
+		return false;
+	}
+
+	if (!test->run) {
+		fprintf(stderr, "Test %s has no run() callback\n", test->name);
+		return false;
+	}
+
+	return true;
+}
+
+int main(int argc, char **argv)
+{
+	const char *filter = NULL;
+	unsigned testnum = 0, i;
+	unsigned passed = 0, skipped = 0, failed = 0;
+	int opt;
+
+	signal(SIGINT, sigint_handler);
+	signal(SIGTERM, sigint_handler);
+
+	libbpf_set_strict_mode(LIBBPF_STRICT_ALL);
+
+	while ((opt = getopt(argc, argv, "qslt:h")) != -1) {
+		switch (opt) {
+		case 'q':
+			quiet = true;
+			break;
+		case 's':
+			print_skipped = true;
+			break;
+		case 'l':
+			list = true;
+			break;
+		case 't':
+			filter = optarg;
+			break;
+		default:
+			fprintf(stderr, help_fmt, basename(argv[0]));
+			return opt != 'h';
+		}
+	}
+
+	for (i = 0; i < __iou_num_tests; i++) {
+		enum iou_test_status status;
+		struct iou_test *test = &__iou_tests[i];
+
+		if (list) {
+			printf("%s\n", test->name);
+			if (i == (__iou_num_tests - 1))
+				return 0;
+			continue;
+		}
+
+		if (filter && should_skip_test(test, filter)) {
+			if (print_skipped) {
+				print_test_preamble(test, quiet);
+				print_test_result(test, IOU_TEST_SKIP, ++testnum);
+			}
+			continue;
+		}
+
+		print_test_preamble(test, quiet);
+		status = run_test(test);
+		print_test_result(test, status, ++testnum);
+		switch (status) {
+		case IOU_TEST_PASS:
+			passed++;
+			break;
+		case IOU_TEST_SKIP:
+			skipped++;
+			break;
+		case IOU_TEST_FAIL:
+			failed++;
+			break;
+		}
+	}
+	printf("\n\n=============================\n\n");
+	printf("RESULTS:\n\n");
+	printf("PASSED:  %u\n", passed);
+	printf("SKIPPED: %u\n", skipped);
+	printf("FAILED:  %u\n", failed);
+
+	return failed ? 1 : 0;
+}
+
+void iou_test_register(struct iou_test *test)
+{
+	IOU_BUG_ON(!test_valid(test));
+	IOU_BUG_ON(__iou_num_tests >= MAX_IOU_TESTS);
+
+	__iou_tests[__iou_num_tests++] = *test;
+}
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V3 12/12] selftests/io_uring: add buffer iterator selftest with BPF arena
  2026-03-24 16:37 [PATCH v3 0/12] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
                   ` (10 preceding siblings ...)
  2026-03-24 16:37 ` [PATCH V3 11/12] selftests/io_uring: add BPF struct_ops and kfunc tests Ming Lei
@ 2026-03-24 16:37 ` Ming Lei
  11 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2026-03-24 16:37 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Caleb Sander Mateos, Akilesh Kailash, bpf, Xiao Ni,
	Alexei Starovoitov, Ming Lei

Add bpf_ext_memcpy test for per-buffer iterator kfuncs with dynptr.
Tests both read (buffer -> arena) and write (arena -> buffer)
directions across all buffer types (USER, VEC, FIXED, REG_VEC)
with 1MB+ buffers.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 tools/testing/selftests/Makefile              |   3 +-
 .../selftests/io_uring/bpf_ext_memcpy.bpf.c   | 305 +++++++++++
 .../selftests/io_uring/bpf_ext_memcpy.c       | 517 ++++++++++++++++++
 .../io_uring/include/bpf_ext_memcpy_defs.h    |  18 +
 4 files changed, 842 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/io_uring/bpf_ext_memcpy.bpf.c
 create mode 100644 tools/testing/selftests/io_uring/bpf_ext_memcpy.c
 create mode 100644 tools/testing/selftests/io_uring/include/bpf_ext_memcpy_defs.h

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 450f13ba4cca..e8d01d62f1ac 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -45,6 +45,7 @@ TARGETS += futex
 TARGETS += gpio
 TARGETS += hid
 TARGETS += intel_pstate
+TARGETS += io_uring
 TARGETS += iommu
 TARGETS += ipc
 TARGETS += ir
@@ -148,7 +149,7 @@ endif
 # User can optionally provide a TARGETS skiplist. By default we skip
 # targets using BPF since it has cutting edge build time dependencies
 # which require more effort to install.
-SKIP_TARGETS ?= bpf sched_ext
+SKIP_TARGETS ?= bpf io_uring sched_ext
 ifneq ($(SKIP_TARGETS),)
 	TMP := $(filter-out $(SKIP_TARGETS), $(TARGETS))
 	override TARGETS := $(TMP)
diff --git a/tools/testing/selftests/io_uring/bpf_ext_memcpy.bpf.c b/tools/testing/selftests/io_uring/bpf_ext_memcpy.bpf.c
new file mode 100644
index 000000000000..451c04f82e93
--- /dev/null
+++ b/tools/testing/selftests/io_uring/bpf_ext_memcpy.bpf.c
@@ -0,0 +1,305 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2025 Red Hat, Inc.
+ * Test for per-buffer iterator kfuncs (KF_ITER pattern) with dynptr.
+ *
+ * Two operations registered as struct_ops:
+ *   op_id=0: copy buffer → arena  (ITER_SOURCE + bpf_dynptr_slice read)
+ *   op_id=1: copy arena → buffer  (ITER_DEST   + bpf_dynptr_data write)
+ *
+ * sqe->addr points to a single io_bpf_buf_desc.
+ * sqe->len = 1.
+ */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <asm-generic/errno.h>
+
+char LICENSE[] SEC("license") = "GPL";
+
+#if defined(__BPF_FEATURE_ADDR_SPACE_CAST)
+#define __arena __attribute__((address_space(1)))
+#define cast_kern(ptr) /* nop for bpf prog */
+#define cast_user(ptr) /* nop for bpf prog */
+#else
+#define __arena
+#define cast_kern(ptr) bpf_addr_space_cast(ptr, 0, 1)
+#define cast_user(ptr) bpf_addr_space_cast(ptr, 1, 0)
+#endif
+
+#ifndef PAGE_SIZE
+#define PAGE_SIZE __PAGE_SIZE
+#endif
+
+#include "include/bpf_ext_memcpy_defs.h"
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARENA);
+	__uint(map_flags, BPF_F_MMAPABLE);
+	__uint(max_entries, 2048);
+#ifdef __TARGET_ARCH_arm64
+	__ulong(map_extra, 0x1ull << 32);
+#else
+	__ulong(map_extra, 0x1ull << 44);
+#endif
+} arena SEC(".maps");
+
+/* Arena buffer — set by userspace, shared by read and write ops */
+unsigned char __arena *arena_buf;
+
+/* PDU layout — shared by both ops */
+struct memcpy_pdu {
+	struct io_bpf_buf_desc desc;
+};
+
+/* kfunc declarations */
+
+extern void uring_bpf_set_result(struct uring_bpf_data *data, int res) __ksym;
+extern int bpf_iter_uring_buf_new(struct bpf_iter_uring_buf *iter,
+				  struct uring_bpf_data *data,
+				  struct io_bpf_buf_desc *desc,
+				  int direction) __ksym;
+extern int *bpf_iter_uring_buf_next(
+				  struct bpf_iter_uring_buf *iter) __ksym;
+extern void bpf_iter_uring_buf_destroy(struct bpf_iter_uring_buf *iter) __ksym;
+extern int bpf_uring_buf_dynptr(struct bpf_iter_uring_buf *it__iter,
+				struct bpf_dynptr *ptr__uninit) __ksym;
+extern int bpf_uring_buf_dynptr_rdwr(struct bpf_iter_uring_buf *it__iter,
+				     struct bpf_dynptr *ptr__uninit) __ksym;
+extern __u64 bpf_dynptr_size(const struct bpf_dynptr *ptr) __ksym;
+extern void *bpf_dynptr_slice(const struct bpf_dynptr *ptr, __u64 offset,
+			      void *buffer, __u64 buffer__szk) __ksym;
+void __arena *bpf_arena_alloc_pages(void *map, void __arena *addr,
+				    __u32 page_cnt, int node_id,
+				    __u64 flags) __ksym __weak;
+
+/* Word-copy helpers (__noinline: verified once by verifier) */
+
+static __noinline int
+copy_to_arena(unsigned char *src, unsigned char __arena *dst, int len)
+{
+	__u64 *s = (__u64 *)src;
+	__u64 *d = (__u64 *)dst;
+	int j;
+
+	for (j = 0; j < len / 8 && j < PAGE_SIZE / 8; j++)
+		d[j] = s[j];
+	return 0;
+}
+
+static __noinline int
+copy_from_arena(unsigned char __arena *src, unsigned char *dst, int len)
+{
+	__u64 *s = (__u64 *)src;
+	__u64 *d = (__u64 *)dst;
+	int j;
+
+	for (j = 0; j < len / 8 && j < PAGE_SIZE / 8; j++)
+		d[j] = s[j];
+	return 0;
+}
+
+/* Shared prep: both ops read the same descriptor */
+
+SEC("struct_ops/common_prep")
+int BPF_PROG(common_prep, struct uring_bpf_data *data,
+	     const struct io_uring_sqe *sqe)
+{
+	struct memcpy_pdu *pdu = (struct memcpy_pdu *)data->pdu;
+	struct io_bpf_buf_desc desc;
+	int ret;
+
+	if (sqe->len != 1)
+		return -EINVAL;
+
+	ret = bpf_probe_read_user(&desc, sizeof(desc), (void *)sqe->addr);
+	if (ret)
+		return ret;
+
+	__builtin_memcpy(&pdu->desc, &desc, sizeof(desc));
+	return 0;
+}
+
+/* Read: buffer -> arena (ITER_SOURCE) */
+
+struct iter_ctx {
+	struct bpf_iter_uring_buf *it;
+	unsigned char __arena *arena;
+	int total;
+};
+
+static int read_one_page(u32 idx, struct iter_ctx *ctx)
+{
+	struct bpf_dynptr dynptr;
+	unsigned char *p;
+	int copied = 0;
+	int *avail_ptr;
+	int avail, i;
+
+	avail_ptr = bpf_iter_uring_buf_next(ctx->it);
+	if (!avail_ptr)
+		return 1;
+	avail = *avail_ptr;
+
+	if (bpf_uring_buf_dynptr(ctx->it, &dynptr))
+		return 1;
+
+	/* Fast path: full page */
+	if (avail >= PAGE_SIZE) {
+		p = bpf_dynptr_slice(&dynptr, 0, NULL, PAGE_SIZE);
+		if (p) {
+			copy_to_arena(p, ctx->arena + ctx->total, PAGE_SIZE);
+			ctx->total += PAGE_SIZE;
+			return 0;
+		}
+	}
+
+	/* Slow path: CHUNK_SIZE slices */
+	for (i = 0; i < PAGE_SIZE / CHUNK_SIZE; i++) {
+		p = bpf_dynptr_slice(&dynptr, copied, NULL, CHUNK_SIZE);
+		if (!p)
+			break;
+		copy_to_arena(p, ctx->arena + ctx->total + copied, CHUNK_SIZE);
+		copied += CHUNK_SIZE;
+	}
+
+	ctx->total += copied;
+	return copied ? 0 : 1;
+}
+
+SEC("struct_ops/copy_to_arena_issue")
+int BPF_PROG(copy_to_arena_issue, struct uring_bpf_data *data)
+{
+	struct memcpy_pdu *pdu = (struct memcpy_pdu *)data->pdu;
+	struct io_bpf_buf_desc desc;
+	struct bpf_iter_uring_buf it;
+	unsigned char __arena *ptr = arena_buf;
+	int total = 0;
+
+	desc = pdu->desc;
+	bpf_arena_alloc_pages(&arena, NULL, 0, 0, 0);
+
+	if (!ptr) {
+		uring_bpf_set_result(data, -ENOMEM);
+		return 0;
+	}
+	cast_kern(ptr);
+
+	bpf_iter_uring_buf_new(&it, data, &desc, 1 /* ITER_SOURCE */);
+	{
+		struct iter_ctx ctx = { .it = &it, .arena = ptr };
+
+		bpf_loop(TEST_BUF_SIZE / PAGE_SIZE + 16, read_one_page,
+			 &ctx, 0);
+		total = ctx.total;
+	}
+	bpf_iter_uring_buf_destroy(&it);
+
+	uring_bpf_set_result(data, total);
+	return 0;
+}
+
+/* Write: arena -> buffer (ITER_DEST) */
+
+static int write_one_page(u32 idx, struct iter_ctx *ctx)
+{
+	struct bpf_dynptr dynptr;
+	unsigned char *p;
+	int copied = 0;
+	int *avail_ptr;
+	int avail, i;
+
+	avail_ptr = bpf_iter_uring_buf_next(ctx->it);
+	if (!avail_ptr)
+		return 1;
+	avail = *avail_ptr;
+
+	if (bpf_uring_buf_dynptr_rdwr(ctx->it, &dynptr))
+		return 1;
+
+	/* Fast path: full page */
+	if (avail >= PAGE_SIZE) {
+		p = bpf_dynptr_data(&dynptr, 0, PAGE_SIZE);
+		if (p) {
+			copy_from_arena(ctx->arena + ctx->total, p, PAGE_SIZE);
+			ctx->total += PAGE_SIZE;
+			return 0;
+		}
+	}
+
+	/* Slow path: CHUNK_SIZE blocks */
+	for (i = 0; i < PAGE_SIZE / CHUNK_SIZE; i++) {
+		p = bpf_dynptr_data(&dynptr, copied, CHUNK_SIZE);
+		if (!p)
+			break;
+		copy_from_arena(ctx->arena + ctx->total + copied, p,
+				CHUNK_SIZE);
+		copied += CHUNK_SIZE;
+	}
+
+	ctx->total += copied;
+	return copied ? 0 : 1;
+}
+
+SEC("struct_ops/copy_from_arena_issue")
+int BPF_PROG(copy_from_arena_issue, struct uring_bpf_data *data)
+{
+	struct memcpy_pdu *pdu = (struct memcpy_pdu *)data->pdu;
+	struct io_bpf_buf_desc desc;
+	struct bpf_iter_uring_buf it;
+	unsigned char __arena *ptr = arena_buf;
+	int total = 0;
+
+	desc = pdu->desc;
+	bpf_arena_alloc_pages(&arena, NULL, 0, 0, 0);
+
+	if (!ptr) {
+		uring_bpf_set_result(data, -ENOMEM);
+		return 0;
+	}
+	cast_kern(ptr);
+
+	bpf_iter_uring_buf_new(&it, data, &desc, 0 /* ITER_DEST */);
+	{
+		struct iter_ctx ctx = { .it = &it, .arena = ptr };
+
+		bpf_loop(TEST_BUF_SIZE / PAGE_SIZE + 16, write_one_page,
+			 &ctx, 0);
+		total = ctx.total;
+	}
+	bpf_iter_uring_buf_destroy(&it);
+
+	uring_bpf_set_result(data, total);
+	return 0;
+}
+
+/* Shared no-op callbacks */
+
+SEC("struct_ops/nop_fail")
+void BPF_PROG(nop_fail, struct uring_bpf_data *data)
+{
+}
+
+SEC("struct_ops/nop_cleanup")
+void BPF_PROG(nop_cleanup, struct uring_bpf_data *data)
+{
+}
+
+/* Struct ops registration */
+
+SEC(".struct_ops.link")
+struct uring_bpf_ops bpf_copy_to_arena_ops = {
+	.prep_fn	= (void *)common_prep,
+	.issue_fn	= (void *)copy_to_arena_issue,
+	.fail_fn	= (void *)nop_fail,
+	.cleanup_fn	= (void *)nop_cleanup,
+};
+
+SEC(".struct_ops.link")
+struct uring_bpf_ops bpf_copy_from_arena_ops = {
+	.prep_fn	= (void *)common_prep,
+	.issue_fn	= (void *)copy_from_arena_issue,
+	.fail_fn	= (void *)nop_fail,
+	.cleanup_fn	= (void *)nop_cleanup,
+};
diff --git a/tools/testing/selftests/io_uring/bpf_ext_memcpy.c b/tools/testing/selftests/io_uring/bpf_ext_memcpy.c
new file mode 100644
index 000000000000..c589eda27aae
--- /dev/null
+++ b/tools/testing/selftests/io_uring/bpf_ext_memcpy.c
@@ -0,0 +1,517 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2025 Red Hat, Inc.
+ * Test for buffer iterator kfuncs - userspace part.
+ *
+ * Copies each supported source buffer type to BPF arena via
+ * direct byte access in the BPF program. Verifies arena contents.
+ */
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+#include <errno.h>
+#include <linux/io_uring.h>
+#include <sys/mman.h>
+#include <sys/uio.h>
+#include <io_uring/mini_liburing.h>
+
+#include "iou_test.h"
+#include "bpf_ext_memcpy.bpf.skel.h"
+
+#include "include/bpf_ext_memcpy_defs.h"
+#define TEST_PATTERN		0xAB
+#define MAX_VECS		32
+
+struct test_ctx {
+	struct bpf_ext_memcpy *skel;
+	struct bpf_link *link_read;	/* copy_to_arena ops (op_id=0) */
+	struct bpf_link *link_write;	/* copy_from_arena ops (op_id=1) */
+	void *arena_base;
+	size_t arena_sz;
+	struct io_uring ring;
+
+	/* Buffer under test */
+	struct io_bpf_buf_desc desc;
+	char *buf;
+	size_t buf_size;
+	__u8 buf_type;
+	const char *test_desc;
+
+	/* Vectored buffer support */
+	struct iovec vecs[MAX_VECS];
+	int nr_vec;
+
+	/* Fixed buffer support */
+	__u16 buf_index;
+};
+
+static enum iou_test_status bpf_setup(struct test_ctx *ctx)
+{
+	int arena_fd;
+	__u64 map_extra;
+	int ret;
+
+	ctx->skel = bpf_ext_memcpy__open();
+	if (!ctx->skel) {
+		IOU_ERR("Failed to open BPF skeleton");
+		return IOU_TEST_FAIL;
+	}
+
+	/* op_id 0 = copy_to_arena, op_id 1 = copy_from_arena */
+	ctx->skel->struct_ops.bpf_copy_to_arena_ops->ring_fd = ctx->ring.ring_fd;
+	ctx->skel->struct_ops.bpf_copy_to_arena_ops->id = 0;
+	ctx->skel->struct_ops.bpf_copy_from_arena_ops->ring_fd = ctx->ring.ring_fd;
+	ctx->skel->struct_ops.bpf_copy_from_arena_ops->id = 1;
+
+	ret = bpf_ext_memcpy__load(ctx->skel);
+	if (ret) {
+		IOU_ERR("Failed to load BPF skeleton: %d", ret);
+		bpf_ext_memcpy__destroy(ctx->skel);
+		ctx->skel = NULL;
+		return IOU_TEST_FAIL;
+	}
+
+	/* Pre-allocate arena pages from userspace */
+	arena_fd = bpf_map__fd(ctx->skel->maps.arena);
+	map_extra = bpf_map__map_extra(ctx->skel->maps.arena);
+	ctx->arena_sz = bpf_map__max_entries(ctx->skel->maps.arena)
+			* getpagesize();
+
+	ctx->arena_base = mmap(map_extra ? (void *)map_extra : NULL,
+			       ctx->arena_sz, PROT_READ | PROT_WRITE,
+			       MAP_SHARED | (map_extra ? MAP_FIXED : 0),
+			       arena_fd, 0);
+	if (ctx->arena_base == MAP_FAILED) {
+		IOU_ERR("Failed to mmap arena: %s", strerror(errno));
+		bpf_ext_memcpy__destroy(ctx->skel);
+		ctx->skel = NULL;
+		return IOU_TEST_FAIL;
+	}
+	memset(ctx->arena_base, 0, ctx->arena_sz);
+	ctx->skel->bss->arena_buf = ctx->arena_base;
+
+	ctx->link_read = bpf_map__attach_struct_ops(
+				ctx->skel->maps.bpf_copy_to_arena_ops);
+	if (!ctx->link_read) {
+		IOU_ERR("Failed to attach copy_to_arena struct_ops");
+		goto err;
+	}
+
+	ctx->link_write = bpf_map__attach_struct_ops(
+				ctx->skel->maps.bpf_copy_from_arena_ops);
+	if (!ctx->link_write) {
+		IOU_ERR("Failed to attach copy_from_arena struct_ops");
+		goto err;
+	}
+
+	return IOU_TEST_PASS;
+err:
+	bpf_ext_memcpy__destroy(ctx->skel);
+	ctx->skel = NULL;
+	return IOU_TEST_FAIL;
+}
+
+static enum iou_test_status setup(void **ctx_out)
+{
+	struct io_uring_params p;
+	struct test_ctx *ctx;
+	enum iou_test_status status;
+	int ret;
+
+	ctx = calloc(1, sizeof(*ctx));
+	if (!ctx) {
+		IOU_ERR("Failed to allocate context");
+		return IOU_TEST_FAIL;
+	}
+
+	memset(&p, 0, sizeof(p));
+	p.flags = IORING_SETUP_BPF_EXT | IORING_SETUP_NO_SQARRAY;
+
+	ret = io_uring_queue_init_params(8, &ctx->ring, &p);
+	if (ret < 0) {
+		IOU_ERR("io_uring_queue_init_params failed: %s (flags=0x%x)",
+			strerror(-ret), p.flags);
+		free(ctx);
+		return IOU_TEST_SKIP;
+	}
+
+	status = bpf_setup(ctx);
+	if (status != IOU_TEST_PASS) {
+		io_uring_queue_exit(&ctx->ring);
+		free(ctx);
+		return status;
+	}
+
+	*ctx_out = ctx;
+	return IOU_TEST_PASS;
+}
+
+static int allocate_buf(struct test_ctx *ctx)
+{
+	char *p;
+	int i;
+
+	switch (ctx->buf_type) {
+	case IO_BPF_BUF_USER:
+	case IO_BPF_BUF_FIXED:
+		p = aligned_alloc(4096, ctx->buf_size);
+		if (!p)
+			return -ENOMEM;
+		ctx->buf = p;
+		return 0;
+	case IO_BPF_BUF_VEC:
+	case IO_BPF_BUF_REG_VEC:
+		if (ctx->nr_vec <= 0 || ctx->nr_vec > MAX_VECS)
+			return -EINVAL;
+		p = aligned_alloc(4096, ctx->buf_size);
+		if (!p)
+			return -ENOMEM;
+		ctx->buf = p;
+		for (i = 0; i < ctx->nr_vec; i++) {
+			size_t chunk = ctx->buf_size / ctx->nr_vec;
+
+			ctx->vecs[i].iov_base = p + i * chunk;
+			ctx->vecs[i].iov_len = chunk;
+		}
+		ctx->vecs[ctx->nr_vec - 1].iov_len +=
+			ctx->buf_size % ctx->nr_vec;
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
+static enum iou_test_status register_fixed(struct test_ctx *ctx)
+{
+	struct iovec iov;
+	int ret;
+
+	if (ctx->buf_type != IO_BPF_BUF_FIXED &&
+	    ctx->buf_type != IO_BPF_BUF_REG_VEC)
+		return IOU_TEST_PASS;
+
+	ctx->buf_index = 0;
+	iov.iov_base = ctx->buf;
+	iov.iov_len = ctx->buf_size;
+
+	ret = io_uring_register_buffers(&ctx->ring, &iov, 1);
+	if (ret) {
+		IOU_ERR("Failed to register buffers: %d", ret);
+		return IOU_TEST_FAIL;
+	}
+
+	return IOU_TEST_PASS;
+}
+
+static void build_desc(struct test_ctx *ctx)
+{
+	memset(&ctx->desc, 0, sizeof(ctx->desc));
+	ctx->desc.type = ctx->buf_type;
+
+	switch (ctx->buf_type) {
+	case IO_BPF_BUF_VEC:
+		ctx->desc.addr = (__u64)(uintptr_t)ctx->vecs;
+		ctx->desc.len = ctx->nr_vec;
+		break;
+	case IO_BPF_BUF_FIXED:
+		ctx->desc.addr = (__u64)(uintptr_t)ctx->buf;
+		ctx->desc.len = ctx->buf_size;
+		ctx->desc.buf_index = ctx->buf_index;
+		break;
+	case IO_BPF_BUF_REG_VEC:
+		ctx->desc.addr = (__u64)(uintptr_t)ctx->vecs;
+		ctx->desc.len = ctx->nr_vec;
+		ctx->desc.buf_index = ctx->buf_index;
+		break;
+	default: /* USER */
+		ctx->desc.addr = (__u64)(uintptr_t)ctx->buf;
+		ctx->desc.len = ctx->buf_size;
+		break;
+	}
+}
+
+/* Submit a BPF op and wait for completion.  op_id selects the struct_ops. */
+static enum iou_test_status submit_op(struct test_ctx *ctx, int op_id)
+{
+	struct io_uring_sqe *sqe;
+	struct io_uring_cqe *cqe;
+	int ret;
+
+	sqe = io_uring_get_sqe(&ctx->ring);
+	if (!sqe) {
+		IOU_ERR("Failed to get SQE");
+		return IOU_TEST_FAIL;
+	}
+
+	memset(sqe, 0, sizeof(*sqe));
+	sqe->opcode = IORING_OP_BPF;
+	sqe->fd = -1;
+	sqe->bpf_op_flags = (op_id << IORING_BPF_OP_SHIFT);
+	sqe->addr = (__u64)(uintptr_t)&ctx->desc;
+	sqe->len = 1;
+	sqe->user_data = 0xCAFEBABE;
+
+	ret = io_uring_submit(&ctx->ring);
+	if (ret < 0) {
+		IOU_ERR("io_uring_submit failed: %d", ret);
+		return IOU_TEST_FAIL;
+	}
+
+	ret = io_uring_wait_cqe(&ctx->ring, &cqe);
+	if (ret < 0) {
+		IOU_ERR("io_uring_wait_cqe failed: %d", ret);
+		return IOU_TEST_FAIL;
+	}
+
+	if (cqe->user_data != 0xCAFEBABE) {
+		IOU_ERR("CQE user_data mismatch: 0x%llx", cqe->user_data);
+		return IOU_TEST_FAIL;
+	}
+
+	if (cqe->res != (int)ctx->buf_size) {
+		IOU_ERR("CQE result mismatch: %d (expected %zu)",
+			cqe->res, ctx->buf_size);
+		if (cqe->res < 0)
+			IOU_ERR("Error: %s", strerror(-cqe->res));
+		return IOU_TEST_FAIL;
+	}
+
+	io_uring_cqe_seen(&ctx->ring);
+	return IOU_TEST_PASS;
+}
+
+static enum iou_test_status verify_arena(struct test_ctx *ctx, __u8 pattern)
+{
+	unsigned char *arena_data = ctx->skel->bss->arena_buf;
+
+	if (!arena_data) {
+		IOU_ERR("arena_buf pointer is NULL");
+		return IOU_TEST_FAIL;
+	}
+
+	for (size_t i = 0; i < ctx->buf_size; i++) {
+		if (arena_data[i] != pattern) {
+			IOU_ERR("Arena mismatch at offset %zu: 0x%02x (expected 0x%02x)",
+				i, arena_data[i], pattern);
+			return IOU_TEST_FAIL;
+		}
+	}
+	return IOU_TEST_PASS;
+}
+
+static enum iou_test_status verify_buf(struct test_ctx *ctx, __u8 pattern)
+{
+	for (size_t i = 0; i < ctx->buf_size; i++) {
+		if ((unsigned char)ctx->buf[i] != pattern) {
+			IOU_ERR("Buf mismatch at offset %zu: 0x%02x (expected 0x%02x)",
+				i, (unsigned char)ctx->buf[i], pattern);
+			return IOU_TEST_FAIL;
+		}
+	}
+	return IOU_TEST_PASS;
+}
+
+/* Read test: buf -> arena (op_id=0, ITER_SOURCE) */
+static enum iou_test_status test_read(struct test_ctx *ctx)
+{
+	enum iou_test_status status;
+
+	if (allocate_buf(ctx))
+		return IOU_TEST_FAIL;
+
+	memset(ctx->buf, TEST_PATTERN, ctx->buf_size);
+
+	status = register_fixed(ctx);
+	if (status != IOU_TEST_PASS)
+		goto out;
+
+	build_desc(ctx);
+	status = submit_op(ctx, 0);
+	if (status == IOU_TEST_PASS)
+		status = verify_arena(ctx, TEST_PATTERN);
+
+	if (ctx->buf_type == IO_BPF_BUF_FIXED ||
+	    ctx->buf_type == IO_BPF_BUF_REG_VEC)
+		io_uring_unregister_buffers(&ctx->ring);
+
+out:
+	free(ctx->buf);
+	ctx->buf = NULL;
+
+	if (status == IOU_TEST_PASS)
+		IOU_INFO("%s: %zu bytes", ctx->test_desc, ctx->buf_size);
+	return status;
+}
+
+/* Write test: arena -> buf (op_id=1, ITER_DEST) */
+static enum iou_test_status test_write(struct test_ctx *ctx)
+{
+	enum iou_test_status status;
+	unsigned char *arena_data;
+
+	if (allocate_buf(ctx))
+		return IOU_TEST_FAIL;
+
+	/* Clear destination buffer, fill arena with pattern */
+	memset(ctx->buf, 0, ctx->buf_size);
+	arena_data = ctx->skel->bss->arena_buf;
+	if (!arena_data) {
+		free(ctx->buf);
+		ctx->buf = NULL;
+		return IOU_TEST_FAIL;
+	}
+	memset(arena_data, TEST_PATTERN, ctx->buf_size);
+
+	status = register_fixed(ctx);
+	if (status != IOU_TEST_PASS)
+		goto out;
+
+	build_desc(ctx);
+	status = submit_op(ctx, 1);
+	if (status == IOU_TEST_PASS)
+		status = verify_buf(ctx, TEST_PATTERN);
+
+	if (ctx->buf_type == IO_BPF_BUF_FIXED ||
+	    ctx->buf_type == IO_BPF_BUF_REG_VEC)
+		io_uring_unregister_buffers(&ctx->ring);
+
+out:
+	free(ctx->buf);
+	ctx->buf = NULL;
+
+	if (status == IOU_TEST_PASS)
+		IOU_INFO("%s: %zu bytes", ctx->test_desc, ctx->buf_size);
+	return status;
+}
+
+static enum iou_test_status read_user(struct test_ctx *ctx)
+{
+	ctx->buf_type = IO_BPF_BUF_USER;
+	ctx->buf_size = TEST_BUF_SIZE;
+	ctx->test_desc = "USER -> arena";
+	return test_read(ctx);
+}
+
+static enum iou_test_status read_vec(struct test_ctx *ctx)
+{
+	ctx->buf_type = IO_BPF_BUF_VEC;
+	ctx->buf_size = TEST_BUF_SIZE;
+	ctx->nr_vec = 5;
+	ctx->test_desc = "VEC -> arena";
+	return test_read(ctx);
+}
+
+static enum iou_test_status read_fixed(struct test_ctx *ctx)
+{
+	ctx->buf_type = IO_BPF_BUF_FIXED;
+	ctx->buf_size = TEST_BUF_SIZE;
+	ctx->test_desc = "FIXED -> arena";
+	return test_read(ctx);
+}
+
+static enum iou_test_status read_reg_vec(struct test_ctx *ctx)
+{
+	ctx->buf_type = IO_BPF_BUF_REG_VEC;
+	ctx->buf_size = TEST_BUF_SIZE;
+	ctx->nr_vec = 5;
+	ctx->test_desc = "REG_VEC -> arena";
+	return test_read(ctx);
+}
+
+static enum iou_test_status write_user(struct test_ctx *ctx)
+{
+	ctx->buf_type = IO_BPF_BUF_USER;
+	ctx->buf_size = TEST_BUF_SIZE;
+	ctx->test_desc = "arena -> USER";
+	return test_write(ctx);
+}
+
+static enum iou_test_status write_vec(struct test_ctx *ctx)
+{
+	ctx->buf_type = IO_BPF_BUF_VEC;
+	ctx->buf_size = TEST_BUF_SIZE;
+	ctx->nr_vec = 5;
+	ctx->test_desc = "arena -> VEC";
+	return test_write(ctx);
+}
+
+static enum iou_test_status write_fixed(struct test_ctx *ctx)
+{
+	ctx->buf_type = IO_BPF_BUF_FIXED;
+	ctx->buf_size = TEST_BUF_SIZE;
+	ctx->test_desc = "arena -> FIXED";
+	return test_write(ctx);
+}
+
+static enum iou_test_status write_reg_vec(struct test_ctx *ctx)
+{
+	ctx->buf_type = IO_BPF_BUF_REG_VEC;
+	ctx->buf_size = TEST_BUF_SIZE;
+	ctx->nr_vec = 5;
+	ctx->test_desc = "arena -> REG_VEC";
+	return test_write(ctx);
+}
+
+static enum iou_test_status run(void *ctx_ptr)
+{
+	struct test_ctx *ctx = ctx_ptr;
+	enum iou_test_status status;
+
+	/* Read tests: buffer -> arena */
+	status = read_user(ctx);
+	if (status != IOU_TEST_PASS)
+		return status;
+
+	status = read_vec(ctx);
+	if (status != IOU_TEST_PASS)
+		return status;
+
+	status = read_fixed(ctx);
+	if (status != IOU_TEST_PASS)
+		return status;
+
+	status = read_reg_vec(ctx);
+	if (status != IOU_TEST_PASS)
+		return status;
+
+	/* Write tests: arena -> buffer */
+	status = write_user(ctx);
+	if (status != IOU_TEST_PASS)
+		return status;
+
+	status = write_vec(ctx);
+	if (status != IOU_TEST_PASS)
+		return status;
+
+	status = write_fixed(ctx);
+	if (status != IOU_TEST_PASS)
+		return status;
+
+	status = write_reg_vec(ctx);
+	if (status != IOU_TEST_PASS)
+		return status;
+
+	return IOU_TEST_PASS;
+}
+
+static void cleanup(void *ctx_ptr)
+{
+	struct test_ctx *ctx = ctx_ptr;
+
+	if (ctx->link_write)
+		bpf_link__destroy(ctx->link_write);
+	if (ctx->link_read)
+		bpf_link__destroy(ctx->link_read);
+	if (ctx->skel)
+		bpf_ext_memcpy__destroy(ctx->skel);
+	io_uring_queue_exit(&ctx->ring);
+	free(ctx);
+}
+
+struct iou_test bpf_ext_memcpy_test = {
+	.name = "bpf_ext_memcpy",
+	.description = "Test buffer iterator direct read to BPF arena",
+	.setup = setup,
+	.run = run,
+	.cleanup = cleanup,
+};
+REGISTER_IOU_TEST(bpf_ext_memcpy_test)
diff --git a/tools/testing/selftests/io_uring/include/bpf_ext_memcpy_defs.h b/tools/testing/selftests/io_uring/include/bpf_ext_memcpy_defs.h
new file mode 100644
index 000000000000..f924ca834865
--- /dev/null
+++ b/tools/testing/selftests/io_uring/include/bpf_ext_memcpy_defs.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef BPF_MEMCPY_DEFS_H
+#define BPF_MEMCPY_DEFS_H
+
+/*
+ * Shared definitions between bpf_memcpy.bpf.c (BPF program) and
+ * bpf_memcpy.c (userspace test).
+ *
+ * Buffer size must be CHUNK_SIZE-aligned so every dynptr slice is a
+ * clean multiple of CHUNK_SIZE with no partial tail.
+ */
+#define CHUNK_SIZE	512
+/* Must be divisible by CHUNK_SIZE and nr_vecs (5).
+ * 512 * 2050 = 1049600, 1049600 / 5 = 209920 = 512 * 410.
+ */
+#define TEST_BUF_SIZE	(CHUNK_SIZE * 2050)		/* ~1MB */
+
+#endif /* BPF_MEMCPY_DEFS_H */
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH V3 05/12] io_uring: bpf: extend io_uring with bpf struct_ops
  2026-03-24 16:37 ` [PATCH V3 05/12] io_uring: bpf: extend io_uring with bpf struct_ops Ming Lei
@ 2026-03-26  1:49   ` Jens Axboe
  2026-03-26  2:09   ` Jens Axboe
  1 sibling, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2026-03-26  1:49 UTC (permalink / raw)
  To: Ming Lei, io-uring
  Cc: Caleb Sander Mateos, Akilesh Kailash, bpf, Xiao Ni,
	Alexei Starovoitov

On 3/24/26 10:37 AM, Ming Lei wrote:
> @@ -493,7 +494,16 @@ struct io_ring_ctx {
>  	DECLARE_HASHTABLE(napi_ht, 4);
>  #endif
>  
> -	struct io_uring_bpf_ops		*bpf_ops;
> +	/*
> +	 * bpf_ops and bpf_ext_ops are mutually exclusive: bpf_ops is used
> +	 * for io_uring_bpf_ops struct_ops, while bpf_ext_ops provides
> +	 * per-opcode BPF extension operations (IORING_SETUP_BPF_EXT).
> +	 * The two cannot be active at the same time on the same ring.
> +	 */
> +	union {
> +		struct io_uring_bpf_ops		*bpf_ops;
> +		struct uring_bpf_ops_kern	*bpf_ext_ops;
> +	};

What am I missing here, why is this the case? What makes the use of both
at the same time impossible?

> diff --git a/io_uring/bpf-ops.c b/io_uring/bpf-ops.c
> index e4b244337aa9..e91c6964405c 100644
> --- a/io_uring/bpf-ops.c
> +++ b/io_uring/bpf-ops.c
> @@ -162,7 +162,6 @@ static int io_install_bpf(struct io_ring_ctx *ctx, struct io_uring_bpf_ops *ops)
>  		return -EOPNOTSUPP;
>  	if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN))
>  		return -EOPNOTSUPP;
> -
>  	if (ctx->bpf_ops)
>  		return -EBUSY;
>  	if (WARN_ON_ONCE(!ops->loop_step))

Spurious whitespace change.

> diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
> index 91cf67b5d85b..1af33a89ed2f 100644
> --- a/io_uring/io_uring.h
> +++ b/io_uring/io_uring.h
> @@ -49,7 +49,8 @@ struct io_ctx_config {
>  			IORING_FEAT_RECVSEND_BUNDLE |\
>  			IORING_FEAT_MIN_TIMEOUT |\
>  			IORING_FEAT_RW_ATTR |\
> -			IORING_FEAT_NO_IOWAIT)
> +			IORING_FEAT_NO_IOWAIT |\
> +			IORING_FEAT_BPF)

Do we need this FEAT flag? If you think so, then it should at least be
dependent on whether the kernel supports this feature, eg if
CONFIG_IO_URING_BPF_EXT is set

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V3 05/12] io_uring: bpf: extend io_uring with bpf struct_ops
  2026-03-24 16:37 ` [PATCH V3 05/12] io_uring: bpf: extend io_uring with bpf struct_ops Ming Lei
  2026-03-26  1:49   ` Jens Axboe
@ 2026-03-26  2:09   ` Jens Axboe
  1 sibling, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2026-03-26  2:09 UTC (permalink / raw)
  To: Ming Lei, io-uring
  Cc: Caleb Sander Mateos, Akilesh Kailash, bpf, Xiao Ni,
	Alexei Starovoitov

On 3/24/26 10:37 AM, Ming Lei wrote:
>  int io_uring_bpf_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>  {
> +	struct uring_bpf_data *data = io_kiocb_to_cmd(req, struct uring_bpf_data);
> +	u32 opf = READ_ONCE(sqe->bpf_op_flags);
> +	unsigned char bpf_op = uring_bpf_get_op(opf);
> +	const struct uring_bpf_ops *ops;
> +
> +	if (unlikely(!(req->ctx->flags & IORING_SETUP_BPF_EXT)))
> +		goto fail;
> +
> +	if (bpf_op >= IO_RING_MAX_BPF_OPS)
> +		return -EINVAL;
> +
> +	ops = req->ctx->bpf_ext_ops[bpf_op].ops;
> +	data->opf = opf;
> +	data->ops = ops;
> +	if (ops && ops->prep_fn)
> +		return ops->prep_fn(data, sqe);
> +fail:
>  	return -EOPNOTSUPP;
>  }

Any early exit should ensure 'data' is sane, so that the cleanup doesn't
potentially touch uninitialized crap. This is something that has bit us
in the past. Not an issue for this patch that adds the code, but it will
be once the next patch is applied. Better to clear ->opf/ops here
upfront, so that we never leave this function without 'data' being fully
initialized.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-03-26  2:09 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-24 16:37 [PATCH v3 0/12] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
2026-03-24 16:37 ` [PATCH V3 01/12] io_uring: make io_import_fixed() global Ming Lei
2026-03-24 16:37 ` [PATCH V3 02/12] io_uring: refactor io_prep_reg_iovec() for BPF kfunc use Ming Lei
2026-03-24 16:37 ` [PATCH V3 03/12] io_uring: refactor io_import_reg_vec() " Ming Lei
2026-03-24 16:37 ` [PATCH V3 04/12] io_uring: prepare for extending io_uring with bpf Ming Lei
2026-03-24 16:37 ` [PATCH V3 05/12] io_uring: bpf: extend io_uring with bpf struct_ops Ming Lei
2026-03-26  1:49   ` Jens Axboe
2026-03-26  2:09   ` Jens Axboe
2026-03-24 16:37 ` [PATCH V3 06/12] io_uring: bpf: implement struct_ops registration Ming Lei
2026-03-24 16:37 ` [PATCH V3 07/12] io_uring: bpf: add BPF buffer descriptor for IORING_OP_BPF Ming Lei
2026-03-24 16:37 ` [PATCH V3 08/12] io_uring: bpf: add per-buffer iterator kfuncs Ming Lei
2026-03-24 16:37 ` [PATCH V3 09/12] bpf: add bpf_uring_buf_dynptr to special_kfunc_list Ming Lei
2026-03-24 16:37 ` [PATCH V3 10/12] selftests/io_uring: add io_uring_unregister_buffers() Ming Lei
2026-03-24 16:37 ` [PATCH V3 11/12] selftests/io_uring: add BPF struct_ops and kfunc tests Ming Lei
2026-03-24 16:37 ` [PATCH V3 12/12] selftests/io_uring: add buffer iterator selftest with BPF arena Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox