* [RFC PATCH 00/22] ublk: support bpf
@ 2025-01-07 12:03 Ming Lei
2025-01-07 12:03 ` [RFC PATCH 01/22] ublk: remove two unused fields from 'struct ublk_queue' Ming Lei
` (21 more replies)
0 siblings, 22 replies; 28+ messages in thread
From: Ming Lei @ 2025-01-07 12:03 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
Ming Lei
Hello,
Patch 1~6 cleans up & prepares for supporting ublk-bpf, which should be
ready to go.
Patch 7~14 supports ublk-bpf over struct_ops and selftests code. And
please see detailed motivation in commit log of "ublk: bpf: add bpf struct_ops"
and the last document patch.
Patch 15~21 adds bpf aio over struct_ops and applies it for ublk-bpf, and
selftests code.
Patch 22 adds document for ublk-bpf.
Git tree:
https://github.com/ming1/linux.git ublk_bpf_rfc
https://github.com/ming1/linux/commits/ublk_bpf_rfc/
Kernel selftest:
make -C tools/testing/selftests TARGETS=ublk run_tests
Comments are welcome!
Ming Lei (22):
ublk: remove two unused fields from 'struct ublk_queue'
ublk: convert several bool type fields into bitfield of `ublk_queue`
ublk: add helper of ublk_need_map_io()
ublk: move ublk into one standalone directory
ublk: move private definitions into private header
ublk: move several helpers to private header
ublk: bpf: add bpf prog attach helpers
ublk: bpf: add bpf struct_ops
ublk: bpf: attach bpf prog to ublk device
ublk: bpf: add kfunc for ublk bpf prog
ublk: bpf: enable ublk-bpf
selftests: ublk: add tests for the ublk-bpf initial implementation
selftests: ublk: add tests for covering io split
selftests: ublk: add tests for covering redirecting to userspace
ublk: bpf: add bpf aio kfunc
ublk: bpf: add bpf aio struct_ops
ublk: bpf: attach bpf aio prog to ublk device
ublk: bpf: add several ublk bpf aio kfuncs
ublk: bpf: wire bpf aio with ublk io handling
selftests: add tests for ublk bpf aio
selftests: add tests for covering both bpf aio and split
ublk: document ublk-bpf & bpf-aio
Documentation/block/ublk.rst | 170 ++
MAINTAINERS | 3 +-
drivers/block/Kconfig | 32 +-
drivers/block/Makefile | 2 +-
drivers/block/ublk/Kconfig | 52 +
drivers/block/ublk/Makefile | 10 +
drivers/block/ublk/bpf.c | 370 ++++
drivers/block/ublk/bpf.h | 231 +++
drivers/block/ublk/bpf_aio.c | 266 +++
drivers/block/ublk/bpf_aio.h | 118 ++
drivers/block/ublk/bpf_aio_ops.c | 174 ++
drivers/block/ublk/bpf_ops.c | 344 ++++
drivers/block/ublk/bpf_reg.h | 77 +
drivers/block/{ublk_drv.c => ublk/main.c} | 267 +--
drivers/block/ublk/ublk.h | 237 +++
include/uapi/linux/ublk_cmd.h | 16 +-
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/ublk/.gitignore | 4 +
tools/testing/selftests/ublk/Makefile | 236 +++
tools/testing/selftests/ublk/config | 2 +
tools/testing/selftests/ublk/progs/ublk_bpf.h | 13 +
.../selftests/ublk/progs/ublk_bpf_kfunc.h | 44 +
.../testing/selftests/ublk/progs/ublk_loop.c | 166 ++
.../testing/selftests/ublk/progs/ublk_null.c | 177 ++
.../selftests/ublk/progs/ublk_stripe.c | 319 ++++
tools/testing/selftests/ublk/test_common.sh | 119 ++
tools/testing/selftests/ublk/test_loop_01.sh | 33 +
tools/testing/selftests/ublk/test_loop_02.sh | 24 +
tools/testing/selftests/ublk/test_null_01.sh | 19 +
tools/testing/selftests/ublk/test_null_02.sh | 23 +
tools/testing/selftests/ublk/test_null_03.sh | 21 +
tools/testing/selftests/ublk/test_null_04.sh | 21 +
.../testing/selftests/ublk/test_stripe_01.sh | 35 +
.../testing/selftests/ublk/test_stripe_02.sh | 26 +
tools/testing/selftests/ublk/ublk_bpf.c | 1673 +++++++++++++++++
35 files changed, 5101 insertions(+), 224 deletions(-)
create mode 100644 drivers/block/ublk/Kconfig
create mode 100644 drivers/block/ublk/Makefile
create mode 100644 drivers/block/ublk/bpf.c
create mode 100644 drivers/block/ublk/bpf.h
create mode 100644 drivers/block/ublk/bpf_aio.c
create mode 100644 drivers/block/ublk/bpf_aio.h
create mode 100644 drivers/block/ublk/bpf_aio_ops.c
create mode 100644 drivers/block/ublk/bpf_ops.c
create mode 100644 drivers/block/ublk/bpf_reg.h
rename drivers/block/{ublk_drv.c => ublk/main.c} (93%)
create mode 100644 drivers/block/ublk/ublk.h
create mode 100644 tools/testing/selftests/ublk/.gitignore
create mode 100644 tools/testing/selftests/ublk/Makefile
create mode 100644 tools/testing/selftests/ublk/config
create mode 100644 tools/testing/selftests/ublk/progs/ublk_bpf.h
create mode 100644 tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h
create mode 100644 tools/testing/selftests/ublk/progs/ublk_loop.c
create mode 100644 tools/testing/selftests/ublk/progs/ublk_null.c
create mode 100644 tools/testing/selftests/ublk/progs/ublk_stripe.c
create mode 100755 tools/testing/selftests/ublk/test_common.sh
create mode 100755 tools/testing/selftests/ublk/test_loop_01.sh
create mode 100755 tools/testing/selftests/ublk/test_loop_02.sh
create mode 100755 tools/testing/selftests/ublk/test_null_01.sh
create mode 100755 tools/testing/selftests/ublk/test_null_02.sh
create mode 100755 tools/testing/selftests/ublk/test_null_03.sh
create mode 100755 tools/testing/selftests/ublk/test_null_04.sh
create mode 100755 tools/testing/selftests/ublk/test_stripe_01.sh
create mode 100755 tools/testing/selftests/ublk/test_stripe_02.sh
create mode 100644 tools/testing/selftests/ublk/ublk_bpf.c
--
2.47.0
^ permalink raw reply [flat|nested] 28+ messages in thread
* [RFC PATCH 01/22] ublk: remove two unused fields from 'struct ublk_queue'
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
@ 2025-01-07 12:03 ` Ming Lei
2025-01-07 12:03 ` [RFC PATCH 02/22] ublk: convert several bool type fields into bitfield of `ublk_queue` Ming Lei
` (20 subsequent siblings)
21 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-01-07 12:03 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
Ming Lei
Remove two unused fields(`io_addr` & `max_io_sz`) from `struct ublk_queue`.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/block/ublk_drv.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 934ab9332c80..77ce3231eba4 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -143,8 +143,6 @@ struct ublk_queue {
struct llist_head io_cmds;
- unsigned long io_addr; /* mapped vm address */
- unsigned int max_io_sz;
bool force_abort;
bool timeout;
bool canceling;
--
2.47.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC PATCH 02/22] ublk: convert several bool type fields into bitfield of `ublk_queue`
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
2025-01-07 12:03 ` [RFC PATCH 01/22] ublk: remove two unused fields from 'struct ublk_queue' Ming Lei
@ 2025-01-07 12:03 ` Ming Lei
2025-01-07 12:03 ` [RFC PATCH 03/22] ublk: add helper of ublk_need_map_io() Ming Lei
` (19 subsequent siblings)
21 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-01-07 12:03 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
Ming Lei
Convert several `bool` type fields into bitfields of `ublk_queue`, so
that we can remove one padding and save one 4 bytes in `ublk_queue`.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/block/ublk_drv.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 77ce3231eba4..00363e8affc6 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -143,10 +143,10 @@ struct ublk_queue {
struct llist_head io_cmds;
- bool force_abort;
- bool timeout;
- bool canceling;
- bool fail_io; /* copy of dev->state == UBLK_S_DEV_FAIL_IO */
+ unsigned short force_abort:1;
+ unsigned short timeout:1;
+ unsigned short canceling:1;
+ unsigned short fail_io:1; /* copy of dev->state == UBLK_S_DEV_FAIL_IO */
unsigned short nr_io_ready; /* how many ios setup */
spinlock_t cancel_lock;
struct ublk_device *dev;
@@ -1257,7 +1257,7 @@ static enum blk_eh_timer_return ublk_timeout(struct request *rq)
if (ubq->flags & UBLK_F_UNPRIVILEGED_DEV) {
if (!ubq->timeout) {
send_sig(SIGKILL, ubq->ubq_daemon, 0);
- ubq->timeout = true;
+ ubq->timeout = 1;
}
return BLK_EH_DONE;
@@ -1459,7 +1459,7 @@ static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq)
spin_unlock(&ubq->cancel_lock);
return false;
}
- ubq->canceling = true;
+ ubq->canceling = 1;
spin_unlock(&ubq->cancel_lock);
spin_lock(&ub->lock);
@@ -1609,7 +1609,7 @@ static void ublk_unquiesce_dev(struct ublk_device *ub)
* can move on.
*/
for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
- ublk_get_queue(ub, i)->force_abort = true;
+ ublk_get_queue(ub, i)->force_abort = 1;
blk_mq_unquiesce_queue(ub->ub_disk->queue);
/* We may have requeued some rqs in ublk_quiesce_queue() */
@@ -1672,7 +1672,7 @@ static void ublk_nosrv_work(struct work_struct *work)
blk_mq_quiesce_queue(ub->ub_disk->queue);
ub->dev_info.state = UBLK_S_DEV_FAIL_IO;
for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
- ublk_get_queue(ub, i)->fail_io = true;
+ ublk_get_queue(ub, i)->fail_io = 1;
}
blk_mq_unquiesce_queue(ub->ub_disk->queue);
}
@@ -2744,8 +2744,8 @@ static void ublk_queue_reinit(struct ublk_device *ub, struct ublk_queue *ubq)
put_task_struct(ubq->ubq_daemon);
/* We have to reset it to NULL, otherwise ub won't accept new FETCH_REQ */
ubq->ubq_daemon = NULL;
- ubq->timeout = false;
- ubq->canceling = false;
+ ubq->timeout = 0;
+ ubq->canceling = 0;
for (i = 0; i < ubq->q_depth; i++) {
struct ublk_io *io = &ubq->ios[i];
@@ -2844,7 +2844,7 @@ static int ublk_ctrl_end_recovery(struct ublk_device *ub,
blk_mq_quiesce_queue(ub->ub_disk->queue);
ub->dev_info.state = UBLK_S_DEV_LIVE;
for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
- ublk_get_queue(ub, i)->fail_io = false;
+ ublk_get_queue(ub, i)->fail_io = 0;
}
blk_mq_unquiesce_queue(ub->ub_disk->queue);
}
--
2.47.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC PATCH 03/22] ublk: add helper of ublk_need_map_io()
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
2025-01-07 12:03 ` [RFC PATCH 01/22] ublk: remove two unused fields from 'struct ublk_queue' Ming Lei
2025-01-07 12:03 ` [RFC PATCH 02/22] ublk: convert several bool type fields into bitfield of `ublk_queue` Ming Lei
@ 2025-01-07 12:03 ` Ming Lei
2025-01-07 12:03 ` [RFC PATCH 04/22] ublk: move ublk into one standalone directory Ming Lei
` (18 subsequent siblings)
21 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-01-07 12:03 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
Ming Lei
ublk_need_map_io() is more readable, and it can cover the coming UBLK_BPF.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/block/ublk_drv.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 00363e8affc6..1a63a1aa99ed 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -582,6 +582,11 @@ static inline bool ublk_support_user_copy(const struct ublk_queue *ubq)
return ubq->flags & UBLK_F_USER_COPY;
}
+static inline bool ublk_need_map_io(const struct ublk_queue *ubq)
+{
+ return !ublk_support_user_copy(ubq);
+}
+
static inline bool ublk_need_req_ref(const struct ublk_queue *ubq)
{
/*
@@ -909,7 +914,7 @@ static int ublk_map_io(const struct ublk_queue *ubq, const struct request *req,
{
const unsigned int rq_bytes = blk_rq_bytes(req);
- if (ublk_support_user_copy(ubq))
+ if (!ublk_need_map_io(ubq))
return rq_bytes;
/*
@@ -933,7 +938,7 @@ static int ublk_unmap_io(const struct ublk_queue *ubq,
{
const unsigned int rq_bytes = blk_rq_bytes(req);
- if (ublk_support_user_copy(ubq))
+ if (!ublk_need_map_io(ubq))
return rq_bytes;
if (ublk_need_unmap_req(req)) {
@@ -1809,7 +1814,7 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
if (io->flags & UBLK_IO_FLAG_OWNED_BY_SRV)
goto out;
- if (!ublk_support_user_copy(ubq)) {
+ if (ublk_need_map_io(ubq)) {
/*
* FETCH_RQ has to provide IO buffer if NEED GET
* DATA is not enabled
@@ -1831,7 +1836,7 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV))
goto out;
- if (!ublk_support_user_copy(ubq)) {
+ if (ublk_need_map_io(ubq)) {
/*
* COMMIT_AND_FETCH_REQ has to provide IO buffer if
* NEED GET DATA is not enabled or it is Read IO.
--
2.47.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC PATCH 04/22] ublk: move ublk into one standalone directory
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
` (2 preceding siblings ...)
2025-01-07 12:03 ` [RFC PATCH 03/22] ublk: add helper of ublk_need_map_io() Ming Lei
@ 2025-01-07 12:03 ` Ming Lei
2025-01-07 12:03 ` [RFC PATCH 05/22] ublk: move private definitions into private header Ming Lei
` (17 subsequent siblings)
21 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-01-07 12:03 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
Ming Lei
Prepare for supporting ublk-bpf, which has to add more source files, so
create ublk/ for avoiding to pollute drivers/block/
Meantime rename the source file as ublk/main.c
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
MAINTAINERS | 2 +-
drivers/block/Kconfig | 32 +-------------------
drivers/block/Makefile | 2 +-
drivers/block/ublk/Kconfig | 36 +++++++++++++++++++++++
drivers/block/ublk/Makefile | 7 +++++
drivers/block/{ublk_drv.c => ublk/main.c} | 0
6 files changed, 46 insertions(+), 33 deletions(-)
create mode 100644 drivers/block/ublk/Kconfig
create mode 100644 drivers/block/ublk/Makefile
rename drivers/block/{ublk_drv.c => ublk/main.c} (100%)
diff --git a/MAINTAINERS b/MAINTAINERS
index c575de4903db..890f6195d03f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23982,7 +23982,7 @@ M: Ming Lei <ming.lei@redhat.com>
L: linux-block@vger.kernel.org
S: Maintained
F: Documentation/block/ublk.rst
-F: drivers/block/ublk_drv.c
+F: drivers/block/ublk/
F: include/uapi/linux/ublk_cmd.h
UBSAN
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index a97f2c40c640..4e5144183ade 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -379,37 +379,7 @@ config BLK_DEV_RBD
If unsure, say N.
-config BLK_DEV_UBLK
- tristate "Userspace block driver (Experimental)"
- select IO_URING
- help
- io_uring based userspace block driver. Together with ublk server, ublk
- has been working well, but interface with userspace or command data
- definition isn't finalized yet, and might change according to future
- requirement, so mark is as experimental now.
-
- Say Y if you want to get better performance because task_work_add()
- can be used in IO path for replacing io_uring cmd, which will become
- shared between IO tasks and ubq daemon, meantime task_work_add() can
- can handle batch more effectively, but task_work_add() isn't exported
- for module, so ublk has to be built to kernel.
-
-config BLKDEV_UBLK_LEGACY_OPCODES
- bool "Support legacy command opcode"
- depends on BLK_DEV_UBLK
- default y
- help
- ublk driver started to take plain command encoding, which turns out
- one bad way. The traditional ioctl command opcode encodes more
- info and basically defines each code uniquely, so opcode conflict
- is avoided, and driver can handle wrong command easily, meantime it
- may help security subsystem to audit io_uring command.
-
- Say Y if your application still uses legacy command opcode.
-
- Say N if you don't want to support legacy command opcode. It is
- suggested to enable N if your application(ublk server) switches to
- ioctl command encoding.
+source "drivers/block/ublk/Kconfig"
source "drivers/block/rnbd/Kconfig"
diff --git a/drivers/block/Makefile b/drivers/block/Makefile
index 1105a2d4fdcb..a6fdc62b817c 100644
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -40,6 +40,6 @@ obj-$(CONFIG_BLK_DEV_RNBD) += rnbd/
obj-$(CONFIG_BLK_DEV_NULL_BLK) += null_blk/
-obj-$(CONFIG_BLK_DEV_UBLK) += ublk_drv.o
+obj-$(CONFIG_BLK_DEV_UBLK) += ublk/
swim_mod-y := swim.o swim_asm.o
diff --git a/drivers/block/ublk/Kconfig b/drivers/block/ublk/Kconfig
new file mode 100644
index 000000000000..b06e3df09779
--- /dev/null
+++ b/drivers/block/ublk/Kconfig
@@ -0,0 +1,36 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# ublkl block device driver configuration
+#
+
+config BLK_DEV_UBLK
+ tristate "Userspace block driver (Experimental)"
+ select IO_URING
+ help
+ io_uring based userspace block driver. Together with ublk server, ublk
+ has been working well, but interface with userspace or command data
+ definition isn't finalized yet, and might change according to future
+ requirement, so mark is as experimental now.
+
+ Say Y if you want to get better performance because task_work_add()
+ can be used in IO path for replacing io_uring cmd, which will become
+ shared between IO tasks and ubq daemon, meantime task_work_add() can
+ can handle batch more effectively, but task_work_add() isn't exported
+ for module, so ublk has to be built to kernel.
+
+config BLKDEV_UBLK_LEGACY_OPCODES
+ bool "Support legacy command opcode"
+ depends on BLK_DEV_UBLK
+ default y
+ help
+ ublk driver started to take plain command encoding, which turns out
+ one bad way. The traditional ioctl command opcode encodes more
+ info and basically defines each code uniquely, so opcode conflict
+ is avoided, and driver can handle wrong command easily, meantime it
+ may help security subsystem to audit io_uring command.
+
+ Say Y if your application still uses legacy command opcode.
+
+ Say N if you don't want to support legacy command opcode. It is
+ suggested to enable N if your application(ublk server) switches to
+ ioctl command encoding.
diff --git a/drivers/block/ublk/Makefile b/drivers/block/ublk/Makefile
new file mode 100644
index 000000000000..30e06b74dd82
--- /dev/null
+++ b/drivers/block/ublk/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0
+
+# needed for trace events
+ccflags-y += -I$(src)
+
+ublk_drv-$(CONFIG_BLK_DEV_UBLK) := main.o
+obj-$(CONFIG_BLK_DEV_UBLK) += ublk_drv.o
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk/main.c
similarity index 100%
rename from drivers/block/ublk_drv.c
rename to drivers/block/ublk/main.c
--
2.47.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC PATCH 05/22] ublk: move private definitions into private header
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
` (3 preceding siblings ...)
2025-01-07 12:03 ` [RFC PATCH 04/22] ublk: move ublk into one standalone directory Ming Lei
@ 2025-01-07 12:03 ` Ming Lei
2025-01-07 12:03 ` [RFC PATCH 06/22] ublk: move several helpers to " Ming Lei
` (16 subsequent siblings)
21 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-01-07 12:03 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
Ming Lei
Add one private header file and move private definitions into this
file.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/block/ublk/main.c | 150 +-----------------------------------
drivers/block/ublk/ublk.h | 157 ++++++++++++++++++++++++++++++++++++++
2 files changed, 158 insertions(+), 149 deletions(-)
create mode 100644 drivers/block/ublk/ublk.h
diff --git a/drivers/block/ublk/main.c b/drivers/block/ublk/main.c
index 1a63a1aa99ed..2510193303bb 100644
--- a/drivers/block/ublk/main.c
+++ b/drivers/block/ublk/main.c
@@ -19,7 +19,6 @@
#include <linux/errno.h>
#include <linux/major.h>
#include <linux/wait.h>
-#include <linux/blkdev.h>
#include <linux/init.h>
#include <linux/swap.h>
#include <linux/slab.h>
@@ -35,162 +34,15 @@
#include <linux/ioprio.h>
#include <linux/sched/mm.h>
#include <linux/uaccess.h>
-#include <linux/cdev.h>
#include <linux/io_uring/cmd.h>
-#include <linux/blk-mq.h>
#include <linux/delay.h>
#include <linux/mm.h>
#include <asm/page.h>
#include <linux/task_work.h>
#include <linux/namei.h>
#include <linux/kref.h>
-#include <uapi/linux/ublk_cmd.h>
-
-#define UBLK_MINORS (1U << MINORBITS)
-
-/* private ioctl command mirror */
-#define UBLK_CMD_DEL_DEV_ASYNC _IOC_NR(UBLK_U_CMD_DEL_DEV_ASYNC)
-
-/* All UBLK_F_* have to be included into UBLK_F_ALL */
-#define UBLK_F_ALL (UBLK_F_SUPPORT_ZERO_COPY \
- | UBLK_F_URING_CMD_COMP_IN_TASK \
- | UBLK_F_NEED_GET_DATA \
- | UBLK_F_USER_RECOVERY \
- | UBLK_F_USER_RECOVERY_REISSUE \
- | UBLK_F_UNPRIVILEGED_DEV \
- | UBLK_F_CMD_IOCTL_ENCODE \
- | UBLK_F_USER_COPY \
- | UBLK_F_ZONED \
- | UBLK_F_USER_RECOVERY_FAIL_IO)
-
-#define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
- | UBLK_F_USER_RECOVERY_REISSUE \
- | UBLK_F_USER_RECOVERY_FAIL_IO)
-
-/* All UBLK_PARAM_TYPE_* should be included here */
-#define UBLK_PARAM_TYPE_ALL \
- (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \
- UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED)
-
-struct ublk_rq_data {
- struct llist_node node;
-
- struct kref ref;
-};
-
-struct ublk_uring_cmd_pdu {
- struct ublk_queue *ubq;
- u16 tag;
-};
-
-/*
- * io command is active: sqe cmd is received, and its cqe isn't done
- *
- * If the flag is set, the io command is owned by ublk driver, and waited
- * for incoming blk-mq request from the ublk block device.
- *
- * If the flag is cleared, the io command will be completed, and owned by
- * ublk server.
- */
-#define UBLK_IO_FLAG_ACTIVE 0x01
-
-/*
- * IO command is completed via cqe, and it is being handled by ublksrv, and
- * not committed yet
- *
- * Basically exclusively with UBLK_IO_FLAG_ACTIVE, so can be served for
- * cross verification
- */
-#define UBLK_IO_FLAG_OWNED_BY_SRV 0x02
-
-/*
- * IO command is aborted, so this flag is set in case of
- * !UBLK_IO_FLAG_ACTIVE.
- *
- * After this flag is observed, any pending or new incoming request
- * associated with this io command will be failed immediately
- */
-#define UBLK_IO_FLAG_ABORTED 0x04
-
-/*
- * UBLK_IO_FLAG_NEED_GET_DATA is set because IO command requires
- * get data buffer address from ublksrv.
- *
- * Then, bio data could be copied into this data buffer for a WRITE request
- * after the IO command is issued again and UBLK_IO_FLAG_NEED_GET_DATA is unset.
- */
-#define UBLK_IO_FLAG_NEED_GET_DATA 0x08
-
-/* atomic RW with ubq->cancel_lock */
-#define UBLK_IO_FLAG_CANCELED 0x80000000
-struct ublk_io {
- /* userspace buffer address from io cmd */
- __u64 addr;
- unsigned int flags;
- int res;
-
- struct io_uring_cmd *cmd;
-};
-
-struct ublk_queue {
- int q_id;
- int q_depth;
-
- unsigned long flags;
- struct task_struct *ubq_daemon;
- char *io_cmd_buf;
-
- struct llist_head io_cmds;
-
- unsigned short force_abort:1;
- unsigned short timeout:1;
- unsigned short canceling:1;
- unsigned short fail_io:1; /* copy of dev->state == UBLK_S_DEV_FAIL_IO */
- unsigned short nr_io_ready; /* how many ios setup */
- spinlock_t cancel_lock;
- struct ublk_device *dev;
- struct ublk_io ios[];
-};
-
-struct ublk_device {
- struct gendisk *ub_disk;
-
- char *__queues;
-
- unsigned int queue_size;
- struct ublksrv_ctrl_dev_info dev_info;
-
- struct blk_mq_tag_set tag_set;
-
- struct cdev cdev;
- struct device cdev_dev;
-
-#define UB_STATE_OPEN 0
-#define UB_STATE_USED 1
-#define UB_STATE_DELETED 2
- unsigned long state;
- int ub_number;
-
- struct mutex mutex;
-
- spinlock_t lock;
- struct mm_struct *mm;
-
- struct ublk_params params;
-
- struct completion completion;
- unsigned int nr_queues_ready;
- unsigned int nr_privileged_daemon;
-
- struct work_struct nosrv_work;
-};
-
-/* header of ublk_params */
-struct ublk_params_header {
- __u32 len;
- __u32 types;
-};
+#include "ublk.h"
static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq);
diff --git a/drivers/block/ublk/ublk.h b/drivers/block/ublk/ublk.h
new file mode 100644
index 000000000000..12e39a33015a
--- /dev/null
+++ b/drivers/block/ublk/ublk.h
@@ -0,0 +1,157 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+#ifndef UBLK_INTERNAL_HEADER
+#define UBLK_INTERNAL_HEADER
+
+#include <linux/blkdev.h>
+#include <linux/blk-mq.h>
+#include <linux/cdev.h>
+#include <uapi/linux/ublk_cmd.h>
+
+#define UBLK_MINORS (1U << MINORBITS)
+
+/* private ioctl command mirror */
+#define UBLK_CMD_DEL_DEV_ASYNC _IOC_NR(UBLK_U_CMD_DEL_DEV_ASYNC)
+
+/* All UBLK_F_* have to be included into UBLK_F_ALL */
+#define UBLK_F_ALL (UBLK_F_SUPPORT_ZERO_COPY \
+ | UBLK_F_URING_CMD_COMP_IN_TASK \
+ | UBLK_F_NEED_GET_DATA \
+ | UBLK_F_USER_RECOVERY \
+ | UBLK_F_USER_RECOVERY_REISSUE \
+ | UBLK_F_UNPRIVILEGED_DEV \
+ | UBLK_F_CMD_IOCTL_ENCODE \
+ | UBLK_F_USER_COPY \
+ | UBLK_F_ZONED \
+ | UBLK_F_USER_RECOVERY_FAIL_IO)
+
+#define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
+ | UBLK_F_USER_RECOVERY_REISSUE \
+ | UBLK_F_USER_RECOVERY_FAIL_IO)
+
+/* All UBLK_PARAM_TYPE_* should be included here */
+#define UBLK_PARAM_TYPE_ALL \
+ (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \
+ UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED)
+
+struct ublk_rq_data {
+ struct llist_node node;
+
+ struct kref ref;
+};
+
+struct ublk_uring_cmd_pdu {
+ struct ublk_queue *ubq;
+ u16 tag;
+};
+
+/*
+ * io command is active: sqe cmd is received, and its cqe isn't done
+ *
+ * If the flag is set, the io command is owned by ublk driver, and waited
+ * for incoming blk-mq request from the ublk block device.
+ *
+ * If the flag is cleared, the io command will be completed, and owned by
+ * ublk server.
+ */
+#define UBLK_IO_FLAG_ACTIVE 0x01
+
+/*
+ * IO command is completed via cqe, and it is being handled by ublksrv, and
+ * not committed yet
+ *
+ * Basically exclusively with UBLK_IO_FLAG_ACTIVE, so can be served for
+ * cross verification
+ */
+#define UBLK_IO_FLAG_OWNED_BY_SRV 0x02
+
+/*
+ * IO command is aborted, so this flag is set in case of
+ * !UBLK_IO_FLAG_ACTIVE.
+ *
+ * After this flag is observed, any pending or new incoming request
+ * associated with this io command will be failed immediately
+ */
+#define UBLK_IO_FLAG_ABORTED 0x04
+
+/*
+ * UBLK_IO_FLAG_NEED_GET_DATA is set because IO command requires
+ * get data buffer address from ublksrv.
+ *
+ * Then, bio data could be copied into this data buffer for a WRITE request
+ * after the IO command is issued again and UBLK_IO_FLAG_NEED_GET_DATA is unset.
+ */
+#define UBLK_IO_FLAG_NEED_GET_DATA 0x08
+
+/* atomic RW with ubq->cancel_lock */
+#define UBLK_IO_FLAG_CANCELED 0x80000000
+
+struct ublk_io {
+ /* userspace buffer address from io cmd */
+ __u64 addr;
+ unsigned int flags;
+ int res;
+
+ struct io_uring_cmd *cmd;
+};
+
+struct ublk_queue {
+ int q_id;
+ int q_depth;
+
+ unsigned long flags;
+ struct task_struct *ubq_daemon;
+ char *io_cmd_buf;
+
+ struct llist_head io_cmds;
+
+ unsigned short force_abort:1;
+ unsigned short timeout:1;
+ unsigned short canceling:1;
+ unsigned short fail_io:1; /* copy of dev->state == UBLK_S_DEV_FAIL_IO */
+ unsigned short nr_io_ready; /* how many ios setup */
+ spinlock_t cancel_lock;
+ struct ublk_device *dev;
+ struct ublk_io ios[];
+};
+
+struct ublk_device {
+ struct gendisk *ub_disk;
+
+ char *__queues;
+
+ unsigned int queue_size;
+ struct ublksrv_ctrl_dev_info dev_info;
+
+ struct blk_mq_tag_set tag_set;
+
+ struct cdev cdev;
+ struct device cdev_dev;
+
+#define UB_STATE_OPEN 0
+#define UB_STATE_USED 1
+#define UB_STATE_DELETED 2
+ unsigned long state;
+ int ub_number;
+
+ struct mutex mutex;
+
+ spinlock_t lock;
+ struct mm_struct *mm;
+
+ struct ublk_params params;
+
+ struct completion completion;
+ unsigned int nr_queues_ready;
+ unsigned int nr_privileged_daemon;
+
+ struct work_struct nosrv_work;
+};
+
+/* header of ublk_params */
+struct ublk_params_header {
+ __u32 len;
+ __u32 types;
+};
+
+
+#endif
--
2.47.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC PATCH 06/22] ublk: move several helpers to private header
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
` (4 preceding siblings ...)
2025-01-07 12:03 ` [RFC PATCH 05/22] ublk: move private definitions into private header Ming Lei
@ 2025-01-07 12:03 ` Ming Lei
2025-01-07 12:03 ` [RFC PATCH 07/22] ublk: bpf: add bpf prog attach helpers Ming Lei
` (15 subsequent siblings)
21 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-01-07 12:03 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
Ming Lei
Move several helpers into the private header so that make them visible
to the whole driver, and prepare for supporting ublk-bpf.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/block/ublk/main.c | 16 +++-------------
drivers/block/ublk/ublk.h | 11 +++++++++++
2 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/drivers/block/ublk/main.c b/drivers/block/ublk/main.c
index 2510193303bb..aefb414ebf6c 100644
--- a/drivers/block/ublk/main.c
+++ b/drivers/block/ublk/main.c
@@ -47,8 +47,6 @@
static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq);
static inline unsigned int ublk_req_build_flags(struct request *req);
-static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq,
- int tag);
static inline bool ublk_dev_is_user_copy(const struct ublk_device *ub)
{
return ub->dev_info.flags & UBLK_F_USER_COPY;
@@ -325,7 +323,6 @@ static blk_status_t ublk_setup_iod_zoned(struct ublk_queue *ubq,
#endif
-static inline void __ublk_complete_rq(struct request *req);
static void ublk_complete_rq(struct kref *ref);
static dev_t ublk_chr_devt;
@@ -496,7 +493,7 @@ static noinline struct ublk_device *ublk_get_device(struct ublk_device *ub)
}
/* Called in slow path only, keep it noinline for trace purpose */
-static noinline void ublk_put_device(struct ublk_device *ub)
+void ublk_put_device(struct ublk_device *ub)
{
put_device(&ub->cdev_dev);
}
@@ -512,13 +509,6 @@ static inline bool ublk_rq_has_data(const struct request *rq)
return bio_has_data(rq->bio);
}
-static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq,
- int tag)
-{
- return (struct ublksrv_io_desc *)
- &(ubq->io_cmd_buf[tag * sizeof(struct ublksrv_io_desc)]);
-}
-
static inline char *ublk_queue_cmd_buf(struct ublk_device *ub, int q_id)
{
return ublk_get_queue(ub, q_id)->io_cmd_buf;
@@ -887,7 +877,7 @@ static inline bool ubq_daemon_is_dying(struct ublk_queue *ubq)
}
/* todo: handle partial completion */
-static inline void __ublk_complete_rq(struct request *req)
+void __ublk_complete_rq(struct request *req)
{
struct ublk_queue *ubq = req->mq_hctx->driver_data;
struct ublk_io *io = &ubq->ios[req->tag];
@@ -2082,7 +2072,7 @@ static void ublk_remove(struct ublk_device *ub)
ublks_added--;
}
-static struct ublk_device *ublk_get_device_from_id(int idx)
+struct ublk_device *ublk_get_device_from_id(int idx)
{
struct ublk_device *ub = NULL;
diff --git a/drivers/block/ublk/ublk.h b/drivers/block/ublk/ublk.h
index 12e39a33015a..76aee4225c78 100644
--- a/drivers/block/ublk/ublk.h
+++ b/drivers/block/ublk/ublk.h
@@ -154,4 +154,15 @@ struct ublk_params_header {
};
+static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq,
+ int tag)
+{
+ return (struct ublksrv_io_desc *)
+ &(ubq->io_cmd_buf[tag * sizeof(struct ublksrv_io_desc)]);
+}
+
+struct ublk_device *ublk_get_device_from_id(int idx);
+void ublk_put_device(struct ublk_device *ub);
+void __ublk_complete_rq(struct request *req);
+
#endif
--
2.47.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC PATCH 07/22] ublk: bpf: add bpf prog attach helpers
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
` (5 preceding siblings ...)
2025-01-07 12:03 ` [RFC PATCH 06/22] ublk: move several helpers to " Ming Lei
@ 2025-01-07 12:03 ` Ming Lei
2025-01-07 12:03 ` [RFC PATCH 08/22] ublk: bpf: add bpf struct_ops Ming Lei
` (14 subsequent siblings)
21 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-01-07 12:03 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
Ming Lei
Add bpf prog attach helpers and prepare for supporting ublk bpf, in which
multiple ublk device may attach to same bpf prog, and there can be
multiple bpf progs.
`bpf_prog_consumer` will be embedded in the bpf prog user side, such as
ublk device, `bpf_prog_provider` will be embedded in the bpf struct_ops
prog side.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/block/ublk/bpf_reg.h | 77 ++++++++++++++++++++++++++++++++++++
1 file changed, 77 insertions(+)
create mode 100644 drivers/block/ublk/bpf_reg.h
diff --git a/drivers/block/ublk/bpf_reg.h b/drivers/block/ublk/bpf_reg.h
new file mode 100644
index 000000000000..79d02e93aea8
--- /dev/null
+++ b/drivers/block/ublk/bpf_reg.h
@@ -0,0 +1,77 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+#ifndef UBLK_INT_BPF_REG_HEADER
+#define UBLK_INT_BPF_REG_HEADER
+
+#include <linux/types.h>
+
+struct bpf_prog_consumer;
+struct bpf_prog_provider;
+
+typedef int (*bpf_prog_attach_t)(struct bpf_prog_consumer *consumer,
+ struct bpf_prog_provider *provider);
+typedef void (*bpf_prog_detach_t)(struct bpf_prog_consumer *consumer,
+ bool unreg);
+
+struct bpf_prog_consumer_ops {
+ bpf_prog_attach_t attach_fn;
+ bpf_prog_detach_t detach_fn;
+};
+
+struct bpf_prog_consumer {
+ const struct bpf_prog_consumer_ops *ops;
+ unsigned int prog_id;
+ struct list_head node;
+ struct bpf_prog_provider *provider;
+};
+
+struct bpf_prog_provider {
+ struct list_head list;
+};
+
+static inline void bpf_prog_provider_init(struct bpf_prog_provider *provider)
+{
+ INIT_LIST_HEAD(&provider->list);
+}
+
+static inline bool bpf_prog_provider_is_empty(
+ struct bpf_prog_provider *provider)
+{
+ return list_empty(&provider->list);
+}
+
+static inline int bpf_prog_consumer_attach(struct bpf_prog_consumer *consumer,
+ struct bpf_prog_provider *provider)
+{
+ const struct bpf_prog_consumer_ops *ops = consumer->ops;
+
+ if (!ops || !ops->attach_fn)
+ return -EINVAL;
+
+ if (ops->attach_fn) {
+ int ret = ops->attach_fn(consumer, provider);
+
+ if (ret)
+ return ret;
+ }
+ consumer->provider = provider;
+ list_add(&consumer->node, &provider->list);
+ return 0;
+}
+
+static inline void bpf_prog_consumer_detach(struct bpf_prog_consumer *consumer,
+ bool unreg)
+{
+ const struct bpf_prog_consumer_ops *ops = consumer->ops;
+
+ if (!consumer->provider)
+ return;
+
+ if (!list_empty(&consumer->node)) {
+ if (ops && ops->detach_fn)
+ ops->detach_fn(consumer, unreg);
+ list_del_init(&consumer->node);
+ consumer->provider = NULL;
+ }
+}
+
+#endif
--
2.47.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC PATCH 08/22] ublk: bpf: add bpf struct_ops
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
` (6 preceding siblings ...)
2025-01-07 12:03 ` [RFC PATCH 07/22] ublk: bpf: add bpf prog attach helpers Ming Lei
@ 2025-01-07 12:03 ` Ming Lei
2025-01-10 1:43 ` Alexei Starovoitov
2025-01-07 12:04 ` [RFC PATCH 09/22] ublk: bpf: attach bpf prog to ublk device Ming Lei
` (13 subsequent siblings)
21 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2025-01-07 12:03 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
Ming Lei
Add struct_ops support for ublk, so we can use struct_ops bpf prog to handle
ublk IO command with application defined struct_ops.
Follows the motivation for ublk-bpf:
1) support stacking ublk
- there are many 3rd party volume manager, ublk may be built over ublk
device for simplifying implementation, however, multiple userspace-kernel
context switch for handling one single IO can't be accepted from performance
view of point
- ublk-bpf can avoid user-kernel context switch in most fast io path, so
makes ublk over ublk possible
2) complicated virtual block device
- many complicated virtual block devices have admin&meta code path and
normal io fast path; meta & admin IO handling is usually complicated, so
it can be moved to userspace for relieving development burden; meantime
IO fast path can be kept in kernel space for the sake of high performance.
- bpf provides rich maps, which can help a lot for communication between
userspace and prog or between prog and prog.
- one typical example is qcow2, which meta io handling can be moved to
userspace, and fast io path is implemented with ublk-bpf in which one
efficient bpf map can be looked up first and see if this virtual LBA &
host LBA is found in the map, handle the IO with ublk-bpf if the mapping
is hit, otherwise forward to userspace to deal with meta IO.
3) some simple high performance virtual devices
- such as null & loop, the whole implementation can be done in bpf prog
Export `struct ublk_bpf_ops` as bpf struct_ops, so that bpf prog can
implement callbacks for handling ublk io commands:
- if `UBLK_BPF_IO_QUEUED` is returned from ->queue_io_cmd() or
->queue_io_cmd_daemon(), this io command has been queued in bpf prog,
so it won't be forwarded to userspace
- if `UBLK_BPF_IO_REDIRECT` is returned from ->queue_io_cmd() or
->queue_io_cmd_daemon(), this io command will be forwarded to userspace
- if `UBLK_BPF_IO_CONTINUE` is returned from ->queue_io_cmd() or
->queue_io_cmd_daemon(), part of this io command is queued, and
`ublk_bpf_return_t` carries how many bytes queued, so ublk driver will
continue to call the callback to queue remained bytes of this io command
further, this way is helpful for implementing stacking devices by
splitting io command.
Also ->release_io_cmd() is added for providing chance to notify bpf prog
that this io command is going to be released.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/block/ublk/Kconfig | 16 +++
drivers/block/ublk/Makefile | 3 +
drivers/block/ublk/bpf.h | 184 ++++++++++++++++++++++++
drivers/block/ublk/bpf_ops.c | 261 +++++++++++++++++++++++++++++++++++
drivers/block/ublk/main.c | 29 +++-
drivers/block/ublk/ublk.h | 33 +++++
6 files changed, 524 insertions(+), 2 deletions(-)
create mode 100644 drivers/block/ublk/bpf.h
create mode 100644 drivers/block/ublk/bpf_ops.c
diff --git a/drivers/block/ublk/Kconfig b/drivers/block/ublk/Kconfig
index b06e3df09779..23aa97d51956 100644
--- a/drivers/block/ublk/Kconfig
+++ b/drivers/block/ublk/Kconfig
@@ -34,3 +34,19 @@ config BLKDEV_UBLK_LEGACY_OPCODES
Say N if you don't want to support legacy command opcode. It is
suggested to enable N if your application(ublk server) switches to
ioctl command encoding.
+
+config UBLK_BPF
+ bool "UBLK-BPF support"
+ depends on BPF
+ depends on BLK_DEV_UBLK
+ help
+ This option allows to support eBPF programs on the UBLK subsystem.
+ eBPF programs can handle fast IO code path directly in kernel space,
+ and avoid to switch to ublk daemon userspace conext, meantime zero
+ copy can be supported directly.
+
+ Usually target code need to partition into two parts: fast io code path
+ which is run as eBPF prog in kernel context, and slow & complicated
+ meta/admin code path which is run in ublk daemon userspace context.
+ And use efficient bpf map for communication between user mode and
+ kernel bpf prog.
diff --git a/drivers/block/ublk/Makefile b/drivers/block/ublk/Makefile
index 30e06b74dd82..7058b0fc13bf 100644
--- a/drivers/block/ublk/Makefile
+++ b/drivers/block/ublk/Makefile
@@ -4,4 +4,7 @@
ccflags-y += -I$(src)
ublk_drv-$(CONFIG_BLK_DEV_UBLK) := main.o
+ifeq ($(CONFIG_UBLK_BPF), y)
+ublk_drv-$(CONFIG_BLK_DEV_UBLK) += bpf_ops.o
+endif
obj-$(CONFIG_BLK_DEV_UBLK) += ublk_drv.o
diff --git a/drivers/block/ublk/bpf.h b/drivers/block/ublk/bpf.h
new file mode 100644
index 000000000000..e3505c9ab86a
--- /dev/null
+++ b/drivers/block/ublk/bpf.h
@@ -0,0 +1,184 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+#ifndef UBLK_INT_BPF_HEADER
+#define UBLK_INT_BPF_HEADER
+
+#include "bpf_reg.h"
+
+typedef unsigned long ublk_bpf_return_t;
+typedef ublk_bpf_return_t (*queue_io_cmd_t)(struct ublk_bpf_io *io, unsigned int);
+typedef void (*release_io_cmd_t)(struct ublk_bpf_io *io);
+
+#ifdef CONFIG_UBLK_BPF
+#include <linux/filter.h>
+
+/*
+ * enum ublk_bpf_disposition - how to dispose the bpf io command
+ *
+ * @UBLK_BPF_IO_QUEUED: io command queued completely by bpf prog, so this
+ * cmd needn't to be forwarded to ublk daemon any more
+ * @UBLK_BPF_IO_REDIRECT: io command can't be queued by bpf prog, so this
+ * cmd will be forwarded to ublk daemon
+ * @UBLK_BPF_IO_CONTINUE: io command is being queued, and can be disposed
+ * further by bpf prog, so bpf callback will be called further
+ */
+enum ublk_bpf_disposition {
+ UBLK_BPF_IO_QUEUED = 0,
+ UBLK_BPF_IO_REDIRECT,
+ UBLK_BPF_IO_CONTINUE,
+};
+
+/**
+ * struct ublk_bpf_ops - A BPF struct_ops of callbacks allowing to implement
+ * ublk target from bpf program
+ * @id: ops id
+ * @queue_io_cmd: callback for queuing io command in ublk io context
+ * @queue_io_cmd_daemon: callback for queuing io command in ublk daemon
+ */
+struct ublk_bpf_ops {
+ /* struct_ops id, used for ublk device to attach prog */
+ unsigned id;
+
+ /* queue io command from ublk io context, can't be sleepable */
+ queue_io_cmd_t queue_io_cmd;
+
+ /* queue io command from target io daemon context, can be sleepable */
+ queue_io_cmd_t queue_io_cmd_daemon;
+
+ /* called when the io command reference drops to zero, can't be sleepable */
+ release_io_cmd_t release_io_cmd;
+
+ /* private: don't show in doc, must be the last field */
+ struct bpf_prog_provider provider;
+};
+
+#define UBLK_BPF_DISPOSITION_BITS (4)
+#define UBLK_BPF_DISPOSITION_SHIFT (BITS_PER_LONG - UBLK_BPF_DISPOSITION_BITS)
+
+static inline enum ublk_bpf_disposition ublk_bpf_get_disposition(ublk_bpf_return_t ret)
+{
+ return ret >> UBLK_BPF_DISPOSITION_SHIFT;
+}
+
+static inline unsigned int ublk_bpf_get_return_bytes(ublk_bpf_return_t ret)
+{
+ return ret & ((1UL << UBLK_BPF_DISPOSITION_SHIFT) - 1);
+}
+
+static inline ublk_bpf_return_t ublk_bpf_return_val(enum ublk_bpf_disposition rc,
+ unsigned int bytes)
+{
+ return (ublk_bpf_return_t) ((unsigned long)rc << UBLK_BPF_DISPOSITION_SHIFT) | bytes;
+}
+
+static inline struct request *ublk_bpf_get_req(const struct ublk_bpf_io *io)
+{
+ struct ublk_rq_data *data = container_of(io, struct ublk_rq_data, bpf_data);
+ struct request *req = blk_mq_rq_from_pdu(data);
+
+ return req;
+}
+
+static inline void ublk_bpf_io_dec_ref(struct ublk_bpf_io *io)
+{
+ if (refcount_dec_and_test(&io->ref)) {
+ struct request *req = ublk_bpf_get_req(io);
+
+ if (req->mq_hctx) {
+ const struct ublk_queue *ubq = req->mq_hctx->driver_data;
+
+ if (ubq->bpf_ops && ubq->bpf_ops->release_io_cmd)
+ ubq->bpf_ops->release_io_cmd(io);
+ }
+
+ if (test_bit(UBLK_BPF_IO_COMPLETED, &io->flags)) {
+ smp_rmb();
+ __clear_bit(UBLK_BPF_IO_PREP, &io->flags);
+ __ublk_complete_rq_with_res(req, io->res);
+ }
+ }
+}
+
+static inline void ublk_bpf_complete_io_cmd(struct ublk_bpf_io *io, int res)
+{
+ io->res = res;
+ smp_wmb();
+ set_bit(UBLK_BPF_IO_COMPLETED, &io->flags);
+ ublk_bpf_io_dec_ref(io);
+}
+
+
+bool ublk_run_bpf_handler(struct ublk_queue *ubq, struct request *req,
+ queue_io_cmd_t cb);
+
+/*
+ * Return true if bpf prog handled this io command, otherwise return false
+ * so that this io command will be forwarded to userspace
+ */
+static inline bool ublk_run_bpf_prog(struct ublk_queue *ubq,
+ struct request *req,
+ queue_io_cmd_t cb,
+ bool fail_on_null)
+{
+ if (likely(cb))
+ return ublk_run_bpf_handler(ubq, req, cb);
+
+ /* bpf prog is un-registered */
+ if (fail_on_null && !ubq->bpf_ops) {
+ __ublk_complete_rq_with_res(req, -EOPNOTSUPP);
+ return true;
+ }
+
+ return false;
+}
+
+static inline queue_io_cmd_t ublk_get_bpf_io_cb(struct ublk_queue *ubq)
+{
+ return ubq->bpf_ops ? ubq->bpf_ops->queue_io_cmd : NULL;
+}
+
+static inline queue_io_cmd_t ublk_get_bpf_io_cb_daemon(struct ublk_queue *ubq)
+{
+ return ubq->bpf_ops ? ubq->bpf_ops->queue_io_cmd_daemon : NULL;
+}
+
+static inline queue_io_cmd_t ublk_get_bpf_any_io_cb(struct ublk_queue *ubq)
+{
+ if (ublk_get_bpf_io_cb(ubq))
+ return ublk_get_bpf_io_cb(ubq);
+
+ return ublk_get_bpf_io_cb_daemon(ubq);
+}
+
+int ublk_bpf_struct_ops_init(void);
+
+#else
+
+static inline bool ublk_run_bpf_prog(struct ublk_queue *ubq,
+ struct request *req,
+ queue_io_cmd_t cb,
+ bool fail_on_null)
+{
+ return false;
+}
+
+static inline queue_io_cmd_t ublk_get_bpf_io_cb(struct ublk_queue *ubq)
+{
+ return NULL;
+}
+
+static inline queue_io_cmd_t ublk_get_bpf_io_cb_daemon(struct ublk_queue *ubq)
+{
+ return NULL;
+}
+
+static inline queue_io_cmd_t ublk_get_bpf_any_io_cb(struct ublk_queue *ubq)
+{
+ return NULL;
+}
+
+static inline int ublk_bpf_struct_ops_init(void)
+{
+ return 0;
+}
+#endif
+#endif
diff --git a/drivers/block/ublk/bpf_ops.c b/drivers/block/ublk/bpf_ops.c
new file mode 100644
index 000000000000..6ac2aebd477e
--- /dev/null
+++ b/drivers/block/ublk/bpf_ops.c
@@ -0,0 +1,261 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Red Hat */
+
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/bpf_verifier.h>
+#include <linux/bpf.h>
+#include <linux/btf.h>
+#include <linux/btf_ids.h>
+#include <linux/filter.h>
+#include <linux/xarray.h>
+
+#include "ublk.h"
+#include "bpf.h"
+
+static DEFINE_XARRAY(ublk_ops);
+static DEFINE_MUTEX(ublk_bpf_ops_lock);
+
+static bool ublk_bpf_ops_is_valid_access(int off, int size,
+ enum bpf_access_type type,
+ const struct bpf_prog *prog,
+ struct bpf_insn_access_aux *info)
+{
+ return bpf_tracing_btf_ctx_access(off, size, type, prog, info);
+}
+
+static int ublk_bpf_ops_btf_struct_access(struct bpf_verifier_log *log,
+ const struct bpf_reg_state *reg,
+ int off, int size)
+{
+ /* ublk prog can change nothing */
+ if (size > 0)
+ return -EACCES;
+
+ return NOT_INIT;
+}
+
+static const struct bpf_verifier_ops ublk_bpf_verifier_ops = {
+ .get_func_proto = bpf_base_func_proto,
+ .is_valid_access = ublk_bpf_ops_is_valid_access,
+ .btf_struct_access = ublk_bpf_ops_btf_struct_access,
+};
+
+static int ublk_bpf_ops_init(struct btf *btf)
+{
+ return 0;
+}
+
+static int ublk_bpf_ops_check_member(const struct btf_type *t,
+ const struct btf_member *member,
+ const struct bpf_prog *prog)
+{
+ u32 moff = __btf_member_bit_offset(t, member) / 8;
+
+ switch (moff) {
+ case offsetof(struct ublk_bpf_ops, queue_io_cmd):
+ case offsetof(struct ublk_bpf_ops, release_io_cmd):
+ if (prog->sleepable)
+ return -EINVAL;
+ case offsetof(struct ublk_bpf_ops, queue_io_cmd_daemon):
+ break;
+ default:
+ if (prog->sleepable)
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int ublk_bpf_ops_init_member(const struct btf_type *t,
+ const struct btf_member *member,
+ void *kdata, const void *udata)
+{
+ const struct ublk_bpf_ops *uops;
+ struct ublk_bpf_ops *kops;
+ u32 moff;
+
+ uops = (const struct ublk_bpf_ops *)udata;
+ kops = (struct ublk_bpf_ops *)kdata;
+
+ moff = __btf_member_bit_offset(t, member) / 8;
+
+ switch (moff) {
+ case offsetof(struct ublk_bpf_ops, id):
+ /* For dev_id, this function has to copy it and return 1 to
+ * indicate that the data has been handled by the struct_ops
+ * type, or the verifier will reject the map if the value of
+ * those fields is not zero.
+ */
+ kops->id = uops->id;
+ return 1;
+ }
+ return 0;
+}
+
+static int ublk_bpf_reg(void *kdata, struct bpf_link *link)
+{
+ struct ublk_bpf_ops *ops = kdata;
+ struct ublk_bpf_ops *curr;
+ int ret = -EBUSY;
+
+ mutex_lock(&ublk_bpf_ops_lock);
+ if (!xa_load(&ublk_ops, ops->id)) {
+ curr = kmalloc(sizeof(*curr), GFP_KERNEL);
+ if (curr) {
+ *curr = *ops;
+ bpf_prog_provider_init(&curr->provider);
+ ret = xa_err(xa_store(&ublk_ops, ops->id, curr, GFP_KERNEL));
+ } else {
+ ret = -ENOMEM;
+ }
+ }
+ mutex_unlock(&ublk_bpf_ops_lock);
+
+ return ret;
+}
+
+static void ublk_bpf_unreg(void *kdata, struct bpf_link *link)
+{
+ struct ublk_bpf_ops *ops = kdata;
+ struct ublk_bpf_ops *curr;
+ LIST_HEAD(consumer_list);
+ struct bpf_prog_consumer *consumer, *tmp;
+
+ mutex_lock(&ublk_bpf_ops_lock);
+ curr = xa_erase(&ublk_ops, ops->id);
+ if (curr)
+ list_splice_init(&curr->provider.list, &consumer_list);
+ mutex_unlock(&ublk_bpf_ops_lock);
+
+ list_for_each_entry_safe(consumer, tmp, &consumer_list, node)
+ bpf_prog_consumer_detach(consumer, true);
+ kfree(curr);
+}
+
+static void ublk_bpf_prep_io(struct ublk_bpf_io *io,
+ const struct ublksrv_io_desc *iod)
+{
+ io->flags = 0;
+ io->res = 0;
+ io->iod = iod;
+ __set_bit(UBLK_BPF_IO_PREP, &io->flags);
+ /* one is for submission, another is for completion */
+ refcount_set(&io->ref, 2);
+}
+
+/* Return true if io cmd is queued, otherwise forward it to userspace */
+bool ublk_run_bpf_handler(struct ublk_queue *ubq, struct request *req,
+ queue_io_cmd_t cb)
+{
+ ublk_bpf_return_t ret;
+ struct ublk_rq_data *data = blk_mq_rq_to_pdu(req);
+ struct ublksrv_io_desc *iod = ublk_get_iod(ubq, req->tag);
+ struct ublk_bpf_io *bpf_io = &data->bpf_data;
+ const unsigned long total = iod->nr_sectors << 9;
+ unsigned int done = 0;
+ bool res = true;
+ int err;
+
+ if (!test_bit(UBLK_BPF_IO_PREP, &bpf_io->flags))
+ ublk_bpf_prep_io(bpf_io, iod);
+
+ do {
+ enum ublk_bpf_disposition rc;
+ unsigned int bytes;
+
+ ret = cb(bpf_io, done);
+ rc = ublk_bpf_get_disposition(ret);
+
+ if (rc == UBLK_BPF_IO_QUEUED)
+ goto exit;
+
+ if (rc == UBLK_BPF_IO_REDIRECT)
+ break;
+
+ if (unlikely(rc != UBLK_BPF_IO_CONTINUE)) {
+ printk_ratelimited(KERN_ERR "%s: unknown rc code %d\n",
+ __func__, rc);
+ err = -EINVAL;
+ goto fail;
+ }
+
+ bytes = ublk_bpf_get_return_bytes(ret);
+ if (unlikely((bytes & 511) || !bytes)) {
+ err = -EREMOTEIO;
+ goto fail;
+ } else if (unlikely(bytes > total - done)) {
+ err = -ENOSPC;
+ goto fail;
+ } else {
+ done += bytes;
+ }
+ } while (done < total);
+
+ /*
+ * If any bytes are queued, we can't forward to userspace
+ * immediately because it is too complicated to support two side
+ * completion.
+ *
+ * But the request will be updated and retried after the queued
+ * part is completed, then it can be forwarded to userspace too.
+ */
+ res = done > 0;
+ if (!res) {
+ /* will redirect to userspace, so forget bpf handling */
+ __clear_bit(UBLK_BPF_IO_PREP, &bpf_io->flags);
+ refcount_dec(&bpf_io->ref);
+ }
+ goto exit;
+fail:
+ res = true;
+ ublk_bpf_complete_io_cmd(bpf_io, err);
+exit:
+ ublk_bpf_io_dec_ref(bpf_io);
+ return res;
+}
+
+static ublk_bpf_return_t ublk_bpf_run_io_task(struct ublk_bpf_io *io,
+ unsigned int offset)
+{
+ return ublk_bpf_return_val(UBLK_BPF_IO_REDIRECT, 0);
+}
+
+static ublk_bpf_return_t ublk_bpf_queue_io_cmd(struct ublk_bpf_io *io,
+ unsigned int offset)
+{
+ return ublk_bpf_return_val(UBLK_BPF_IO_REDIRECT, 0);
+}
+
+static void ublk_bpf_release_io_cmd(struct ublk_bpf_io *io)
+{
+}
+
+static struct ublk_bpf_ops __bpf_ublk_bpf_ops = {
+ .queue_io_cmd = ublk_bpf_queue_io_cmd,
+ .queue_io_cmd_daemon = ublk_bpf_run_io_task,
+ .release_io_cmd = ublk_bpf_release_io_cmd,
+};
+
+static struct bpf_struct_ops bpf_ublk_bpf_ops = {
+ .verifier_ops = &ublk_bpf_verifier_ops,
+ .init = ublk_bpf_ops_init,
+ .check_member = ublk_bpf_ops_check_member,
+ .init_member = ublk_bpf_ops_init_member,
+ .reg = ublk_bpf_reg,
+ .unreg = ublk_bpf_unreg,
+ .name = "ublk_bpf_ops",
+ .cfi_stubs = &__bpf_ublk_bpf_ops,
+ .owner = THIS_MODULE,
+};
+
+int __init ublk_bpf_struct_ops_init(void)
+{
+ int err;
+
+ err = register_bpf_struct_ops(&bpf_ublk_bpf_ops, ublk_bpf_ops);
+ if (err)
+ pr_warn("error while registering ublk bpf struct ops: %d", err);
+
+ return 0;
+}
diff --git a/drivers/block/ublk/main.c b/drivers/block/ublk/main.c
index aefb414ebf6c..29d3e7f656a7 100644
--- a/drivers/block/ublk/main.c
+++ b/drivers/block/ublk/main.c
@@ -43,6 +43,7 @@
#include <linux/kref.h>
#include "ublk.h"
+#include "bpf.h"
static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq);
@@ -1061,6 +1062,10 @@ static inline void __ublk_rq_task_work(struct request *req,
mapped_bytes >> 9;
}
+ if (ublk_support_bpf(ubq) && ublk_run_bpf_prog(ubq, req,
+ ublk_get_bpf_io_cb_daemon(ubq), true))
+ return;
+
ublk_init_req_ref(ubq, req);
ubq_complete_io_cmd(io, UBLK_IO_RES_OK, issue_flags);
}
@@ -1088,6 +1093,10 @@ static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq)
{
struct ublk_rq_data *data = blk_mq_rq_to_pdu(rq);
+ if (ublk_support_bpf(ubq) && ublk_run_bpf_prog(ubq, rq,
+ ublk_get_bpf_io_cb(ubq), false))
+ return;
+
if (llist_add(&data->node, &ubq->io_cmds)) {
struct ublk_io *io = &ubq->ios[rq->tag];
@@ -1265,8 +1274,24 @@ static void ublk_commit_completion(struct ublk_device *ub,
if (req_op(req) == REQ_OP_ZONE_APPEND)
req->__sector = ub_cmd->zone_append_lba;
- if (likely(!blk_should_fake_timeout(req->q)))
- ublk_put_req_ref(ubq, req);
+ if (likely(!blk_should_fake_timeout(req->q))) {
+ /*
+ * userspace may have setup everything, but still let bpf
+ * prog to handle io by returning -EAGAIN, this way provides
+ * single bpf io handle fast path, and should simplify things
+ * a lot.
+ */
+ if (ublk_support_bpf(ubq) && io->res == -EAGAIN) {
+ if(!ublk_run_bpf_prog(ubq, req,
+ ublk_get_bpf_any_io_cb(ubq), true)) {
+ /* give up now */
+ io->res = -EIO;
+ ublk_put_req_ref(ubq, req);
+ }
+ } else {
+ ublk_put_req_ref(ubq, req);
+ }
+ }
}
/*
diff --git a/drivers/block/ublk/ublk.h b/drivers/block/ublk/ublk.h
index 76aee4225c78..e9ceadbc616d 100644
--- a/drivers/block/ublk/ublk.h
+++ b/drivers/block/ublk/ublk.h
@@ -33,10 +33,26 @@
(UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \
UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED)
+enum {
+ UBLK_BPF_IO_PREP = 0,
+ UBLK_BPF_IO_COMPLETED = 1,
+};
+
+struct ublk_bpf_io {
+ const struct ublksrv_io_desc *iod;
+ unsigned long flags;
+ refcount_t ref;
+ int res;
+};
+
struct ublk_rq_data {
struct llist_node node;
struct kref ref;
+
+#ifdef CONFIG_UBLK_BPF
+ struct ublk_bpf_io bpf_data;
+#endif
};
struct ublk_uring_cmd_pdu {
@@ -104,6 +120,10 @@ struct ublk_queue {
struct llist_head io_cmds;
+#ifdef CONFIG_UBLK_BPF
+ struct ublk_bpf_ops *bpf_ops;
+#endif
+
unsigned short force_abort:1;
unsigned short timeout:1;
unsigned short canceling:1;
@@ -161,8 +181,21 @@ static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq,
&(ubq->io_cmd_buf[tag * sizeof(struct ublksrv_io_desc)]);
}
+static inline bool ublk_support_bpf(const struct ublk_queue *ubq)
+{
+ return false;
+}
+
struct ublk_device *ublk_get_device_from_id(int idx);
void ublk_put_device(struct ublk_device *ub);
void __ublk_complete_rq(struct request *req);
+static inline void __ublk_complete_rq_with_res(struct request *req, int res)
+{
+ struct ublk_queue *ubq = req->mq_hctx->driver_data;
+ struct ublk_io *io = &ubq->ios[req->tag];
+
+ io->res = res;
+ __ublk_complete_rq(req);
+}
#endif
--
2.47.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC PATCH 09/22] ublk: bpf: attach bpf prog to ublk device
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
` (7 preceding siblings ...)
2025-01-07 12:03 ` [RFC PATCH 08/22] ublk: bpf: add bpf struct_ops Ming Lei
@ 2025-01-07 12:04 ` Ming Lei
2025-01-07 12:04 ` [RFC PATCH 10/22] ublk: bpf: add kfunc for ublk bpf prog Ming Lei
` (12 subsequent siblings)
21 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-01-07 12:04 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
Ming Lei
Attach bpf program to ublk device before adding ublk disk, and detach it
after the disk is removed.
ublk device needs to provide the struct_ops ID for attaching the specific
prog, and each ublk device has to attach to only single bpf prog.
So that we can use the attached bpf prog for handling ublk IO command.
Meantime add two ublk bpf callbacks for prog to attach & detach ublk
device.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/block/ublk/Makefile | 2 +-
drivers/block/ublk/bpf.c | 99 ++++++++++++++++++++++++++++++++++++
drivers/block/ublk/bpf.h | 33 ++++++++++++
drivers/block/ublk/bpf_ops.c | 34 +++++++++++++
drivers/block/ublk/main.c | 25 ++++++---
drivers/block/ublk/ublk.h | 16 ++++++
6 files changed, 200 insertions(+), 9 deletions(-)
create mode 100644 drivers/block/ublk/bpf.c
diff --git a/drivers/block/ublk/Makefile b/drivers/block/ublk/Makefile
index 7058b0fc13bf..f843a9005cdb 100644
--- a/drivers/block/ublk/Makefile
+++ b/drivers/block/ublk/Makefile
@@ -5,6 +5,6 @@ ccflags-y += -I$(src)
ublk_drv-$(CONFIG_BLK_DEV_UBLK) := main.o
ifeq ($(CONFIG_UBLK_BPF), y)
-ublk_drv-$(CONFIG_BLK_DEV_UBLK) += bpf_ops.o
+ublk_drv-$(CONFIG_BLK_DEV_UBLK) += bpf_ops.o bpf.o
endif
obj-$(CONFIG_BLK_DEV_UBLK) += ublk_drv.o
diff --git a/drivers/block/ublk/bpf.c b/drivers/block/ublk/bpf.c
new file mode 100644
index 000000000000..479045a5f0d9
--- /dev/null
+++ b/drivers/block/ublk/bpf.c
@@ -0,0 +1,99 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Red Hat */
+
+#include "ublk.h"
+#include "bpf.h"
+
+static int ublk_set_bpf_ops(struct ublk_device *ub,
+ struct ublk_bpf_ops *ops)
+{
+ int i;
+
+ for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
+ if (ops && ublk_get_queue(ub, i)->bpf_ops) {
+ ublk_set_bpf_ops(ub, NULL);
+ return -EBUSY;
+ }
+ ublk_get_queue(ub, i)->bpf_ops = ops;
+ }
+ return 0;
+}
+
+static int ublk_bpf_prog_attach_cb(struct bpf_prog_consumer *consumer,
+ struct bpf_prog_provider *provider)
+{
+ struct ublk_device *ub = container_of(consumer, struct ublk_device,
+ prog);
+ struct ublk_bpf_ops *ops = container_of(provider,
+ struct ublk_bpf_ops, provider);
+ int ret;
+
+ if (!ublk_get_device(ub))
+ return -ENODEV;
+
+ ret = ublk_set_bpf_ops(ub, ops);
+ if (ret)
+ goto fail_put_dev;
+
+ if (ops->attach_dev) {
+ ret = ops->attach_dev(ub->dev_info.dev_id);
+ if (ret)
+ goto fail_reset_ops;
+ }
+ return 0;
+
+fail_reset_ops:
+ ublk_set_bpf_ops(ub, NULL);
+fail_put_dev:
+ ublk_put_device(ub);
+ return ret;
+}
+
+static void ublk_bpf_prog_detach_cb(struct bpf_prog_consumer *consumer,
+ bool unreg)
+{
+ struct ublk_device *ub = container_of(consumer, struct ublk_device,
+ prog);
+ struct ublk_bpf_ops *ops = container_of(consumer->provider,
+ struct ublk_bpf_ops, provider);
+
+ if (unreg) {
+ blk_mq_freeze_queue(ub->ub_disk->queue);
+ ublk_set_bpf_ops(ub, NULL);
+ blk_mq_unfreeze_queue(ub->ub_disk->queue);
+ } else {
+ ublk_set_bpf_ops(ub, NULL);
+ }
+ if (ops->detach_dev)
+ ops->detach_dev(ub->dev_info.dev_id);
+ ublk_put_device(ub);
+}
+
+static const struct bpf_prog_consumer_ops ublk_prog_consumer_ops = {
+ .attach_fn = ublk_bpf_prog_attach_cb,
+ .detach_fn = ublk_bpf_prog_detach_cb,
+};
+
+int ublk_bpf_attach(struct ublk_device *ub)
+{
+ if (!ublk_dev_support_bpf(ub))
+ return 0;
+
+ /* todo: ublk device need to provide struct_ops prog id */
+ ub->prog.prog_id = 0;
+ ub->prog.ops = &ublk_prog_consumer_ops;
+
+ return ublk_bpf_prog_attach(&ub->prog);
+}
+
+void ublk_bpf_detach(struct ublk_device *ub)
+{
+ if (!ublk_dev_support_bpf(ub))
+ return;
+ ublk_bpf_prog_detach(&ub->prog);
+}
+
+int __init ublk_bpf_init(void)
+{
+ return ublk_bpf_struct_ops_init();
+}
diff --git a/drivers/block/ublk/bpf.h b/drivers/block/ublk/bpf.h
index e3505c9ab86a..4e178cbecb74 100644
--- a/drivers/block/ublk/bpf.h
+++ b/drivers/block/ublk/bpf.h
@@ -7,6 +7,8 @@
typedef unsigned long ublk_bpf_return_t;
typedef ublk_bpf_return_t (*queue_io_cmd_t)(struct ublk_bpf_io *io, unsigned int);
typedef void (*release_io_cmd_t)(struct ublk_bpf_io *io);
+typedef int (*attach_dev_t)(int dev_id);
+typedef void (*detach_dev_t)(int dev_id);
#ifdef CONFIG_UBLK_BPF
#include <linux/filter.h>
@@ -47,6 +49,12 @@ struct ublk_bpf_ops {
/* called when the io command reference drops to zero, can't be sleepable */
release_io_cmd_t release_io_cmd;
+ /* called when attaching bpf prog to this ublk dev */
+ attach_dev_t attach_dev;
+
+ /* called when detaching bpf prog from this ublk dev */
+ detach_dev_t detach_dev;
+
/* private: don't show in doc, must be the last field */
struct bpf_prog_provider provider;
};
@@ -149,7 +157,12 @@ static inline queue_io_cmd_t ublk_get_bpf_any_io_cb(struct ublk_queue *ubq)
return ublk_get_bpf_io_cb_daemon(ubq);
}
+int ublk_bpf_init(void);
int ublk_bpf_struct_ops_init(void);
+int ublk_bpf_prog_attach(struct bpf_prog_consumer *consumer);
+void ublk_bpf_prog_detach(struct bpf_prog_consumer *consumer);
+int ublk_bpf_attach(struct ublk_device *ub);
+void ublk_bpf_detach(struct ublk_device *ub);
#else
@@ -176,9 +189,29 @@ static inline queue_io_cmd_t ublk_get_bpf_any_io_cb(struct ublk_queue *ubq)
return NULL;
}
+static inline int ublk_bpf_init(void)
+{
+ return 0;
+}
+
static inline int ublk_bpf_struct_ops_init(void)
{
return 0;
}
+
+static inline int ublk_bpf_prog_attach(struct bpf_prog_consumer *consumer)
+{
+ return 0;
+}
+static inline void ublk_bpf_prog_detach(struct bpf_prog_consumer *consumer)
+{
+}
+static inline int ublk_bpf_attach(struct ublk_device *ub)
+{
+ return 0;
+}
+static inline void ublk_bpf_detach(struct ublk_device *ub)
+{
+}
#endif
#endif
diff --git a/drivers/block/ublk/bpf_ops.c b/drivers/block/ublk/bpf_ops.c
index 6ac2aebd477e..05d8d415b30d 100644
--- a/drivers/block/ublk/bpf_ops.c
+++ b/drivers/block/ublk/bpf_ops.c
@@ -133,6 +133,29 @@ static void ublk_bpf_unreg(void *kdata, struct bpf_link *link)
kfree(curr);
}
+int ublk_bpf_prog_attach(struct bpf_prog_consumer *consumer)
+{
+ unsigned id = consumer->prog_id;
+ struct ublk_bpf_ops *ops;
+ int ret = -EINVAL;
+
+ mutex_lock(&ublk_bpf_ops_lock);
+ ops = xa_load(&ublk_ops, id);
+ if (ops && ops->id == id)
+ ret = bpf_prog_consumer_attach(consumer, &ops->provider);
+ mutex_unlock(&ublk_bpf_ops_lock);
+
+ return ret;
+}
+
+void ublk_bpf_prog_detach(struct bpf_prog_consumer *consumer)
+{
+ mutex_lock(&ublk_bpf_ops_lock);
+ bpf_prog_consumer_detach(consumer, false);
+ mutex_unlock(&ublk_bpf_ops_lock);
+}
+
+
static void ublk_bpf_prep_io(struct ublk_bpf_io *io,
const struct ublksrv_io_desc *iod)
{
@@ -231,10 +254,21 @@ static void ublk_bpf_release_io_cmd(struct ublk_bpf_io *io)
{
}
+static int ublk_bpf_attach_dev(int dev_id)
+{
+ return 0;
+}
+
+static void ublk_bpf_detach_dev(int dev_id)
+{
+}
+
static struct ublk_bpf_ops __bpf_ublk_bpf_ops = {
.queue_io_cmd = ublk_bpf_queue_io_cmd,
.queue_io_cmd_daemon = ublk_bpf_run_io_task,
.release_io_cmd = ublk_bpf_release_io_cmd,
+ .attach_dev = ublk_bpf_attach_dev,
+ .detach_dev = ublk_bpf_detach_dev,
};
static struct bpf_struct_ops bpf_ublk_bpf_ops = {
diff --git a/drivers/block/ublk/main.c b/drivers/block/ublk/main.c
index 29d3e7f656a7..0b136bc5247f 100644
--- a/drivers/block/ublk/main.c
+++ b/drivers/block/ublk/main.c
@@ -486,7 +486,7 @@ static inline bool ublk_need_get_data(const struct ublk_queue *ubq)
}
/* Called in slow path only, keep it noinline for trace purpose */
-static noinline struct ublk_device *ublk_get_device(struct ublk_device *ub)
+struct ublk_device *ublk_get_device(struct ublk_device *ub)
{
if (kobject_get_unless_zero(&ub->cdev_dev.kobj))
return ub;
@@ -499,12 +499,6 @@ void ublk_put_device(struct ublk_device *ub)
put_device(&ub->cdev_dev);
}
-static inline struct ublk_queue *ublk_get_queue(struct ublk_device *dev,
- int qid)
-{
- return (struct ublk_queue *)&(dev->__queues[qid * dev->queue_size]);
-}
-
static inline bool ublk_rq_has_data(const struct request *rq)
{
return bio_has_data(rq->bio);
@@ -1492,6 +1486,8 @@ static struct gendisk *ublk_detach_disk(struct ublk_device *ub)
{
struct gendisk *disk;
+ ublk_bpf_detach(ub);
+
/* Sync with ublk_abort_queue() by holding the lock */
spin_lock(&ub->lock);
disk = ub->ub_disk;
@@ -2206,12 +2202,19 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, struct io_uring_cmd *cmd)
goto out_put_cdev;
}
- ret = add_disk(disk);
+ ret = ublk_bpf_attach(ub);
if (ret)
goto out_put_cdev;
+ ret = add_disk(disk);
+ if (ret)
+ goto out_put_bpf;
+
set_bit(UB_STATE_USED, &ub->state);
+out_put_bpf:
+ if (ret)
+ ublk_bpf_detach(ub);
out_put_cdev:
if (ret) {
ublk_detach_disk(ub);
@@ -2967,8 +2970,14 @@ static int __init ublk_init(void)
if (ret)
goto free_chrdev_region;
+ ret = ublk_bpf_init();
+ if (ret)
+ goto unregister_class;
+
return 0;
+unregister_class:
+ class_unregister(&ublk_chr_class);
free_chrdev_region:
unregister_chrdev_region(ublk_chr_devt, UBLK_MINORS);
unregister_mis:
diff --git a/drivers/block/ublk/ublk.h b/drivers/block/ublk/ublk.h
index e9ceadbc616d..7579b0032a3c 100644
--- a/drivers/block/ublk/ublk.h
+++ b/drivers/block/ublk/ublk.h
@@ -7,6 +7,8 @@
#include <linux/cdev.h>
#include <uapi/linux/ublk_cmd.h>
+#include "bpf_reg.h"
+
#define UBLK_MINORS (1U << MINORBITS)
/* private ioctl command mirror */
@@ -153,6 +155,9 @@ struct ublk_device {
unsigned long state;
int ub_number;
+#ifdef CONFIG_UBLK_BPF
+ struct bpf_prog_consumer prog;
+#endif
struct mutex mutex;
spinlock_t lock;
@@ -173,6 +178,11 @@ struct ublk_params_header {
__u32 types;
};
+static inline struct ublk_queue *ublk_get_queue(struct ublk_device *dev,
+ int qid)
+{
+ return (struct ublk_queue *)&(dev->__queues[qid * dev->queue_size]);
+}
static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq,
int tag)
@@ -186,6 +196,12 @@ static inline bool ublk_support_bpf(const struct ublk_queue *ubq)
return false;
}
+static inline bool ublk_dev_support_bpf(const struct ublk_device *ub)
+{
+ return false;
+}
+
+struct ublk_device *ublk_get_device(struct ublk_device *ub);
struct ublk_device *ublk_get_device_from_id(int idx);
void ublk_put_device(struct ublk_device *ub);
void __ublk_complete_rq(struct request *req);
--
2.47.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC PATCH 10/22] ublk: bpf: add kfunc for ublk bpf prog
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
` (8 preceding siblings ...)
2025-01-07 12:04 ` [RFC PATCH 09/22] ublk: bpf: attach bpf prog to ublk device Ming Lei
@ 2025-01-07 12:04 ` Ming Lei
2025-01-07 12:04 ` [RFC PATCH 11/22] ublk: bpf: enable ublk-bpf Ming Lei
` (11 subsequent siblings)
21 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-01-07 12:04 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
Ming Lei
Define some kfunc for ublk bpf prog for handling ublk IO command in
application code.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/block/ublk/bpf.c | 78 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 78 insertions(+)
diff --git a/drivers/block/ublk/bpf.c b/drivers/block/ublk/bpf.c
index 479045a5f0d9..4179b7f61e92 100644
--- a/drivers/block/ublk/bpf.c
+++ b/drivers/block/ublk/bpf.c
@@ -93,7 +93,85 @@ void ublk_bpf_detach(struct ublk_device *ub)
ublk_bpf_prog_detach(&ub->prog);
}
+
+__bpf_kfunc_start_defs();
+__bpf_kfunc const struct ublksrv_io_desc *
+ublk_bpf_get_iod(const struct ublk_bpf_io *io)
+{
+ if (io)
+ return io->iod;
+ return NULL;
+}
+
+__bpf_kfunc unsigned int
+ublk_bpf_get_io_tag(const struct ublk_bpf_io *io)
+{
+ if (io) {
+ const struct request *req = ublk_bpf_get_req(io);
+
+ return req->tag;
+ }
+ return -1;
+}
+
+__bpf_kfunc unsigned int
+ublk_bpf_get_queue_id(const struct ublk_bpf_io *io)
+{
+ if (io) {
+ const struct request *req = ublk_bpf_get_req(io);
+
+ if (req->mq_hctx) {
+ const struct ublk_queue *ubq = req->mq_hctx->driver_data;
+
+ return ubq->q_id;
+ }
+ }
+ return -1;
+}
+
+__bpf_kfunc unsigned int
+ublk_bpf_get_dev_id(const struct ublk_bpf_io *io)
+{
+ if (io) {
+ const struct request *req = ublk_bpf_get_req(io);
+
+ if (req->mq_hctx) {
+ const struct ublk_queue *ubq = req->mq_hctx->driver_data;
+
+ return ubq->dev->dev_info.dev_id;
+ }
+ }
+ return -1;
+}
+
+__bpf_kfunc void
+ublk_bpf_complete_io(struct ublk_bpf_io *io, int res)
+{
+ ublk_bpf_complete_io_cmd(io, res);
+}
+
+BTF_KFUNCS_START(ublk_bpf_kfunc_ids)
+BTF_ID_FLAGS(func, ublk_bpf_complete_io, KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, ublk_bpf_get_iod, KF_TRUSTED_ARGS | KF_RET_NULL)
+BTF_ID_FLAGS(func, ublk_bpf_get_io_tag, KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, ublk_bpf_get_queue_id, KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, ublk_bpf_get_dev_id, KF_TRUSTED_ARGS)
+BTF_KFUNCS_END(ublk_bpf_kfunc_ids)
+
+static const struct btf_kfunc_id_set ublk_bpf_kfunc_set = {
+ .owner = THIS_MODULE,
+ .set = &ublk_bpf_kfunc_ids,
+};
+
int __init ublk_bpf_init(void)
{
+ int err;
+
+ err = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS,
+ &ublk_bpf_kfunc_set);
+ if (err) {
+ pr_warn("error while setting UBLK BPF tracing kfuncs: %d", err);
+ return err;
+ }
return ublk_bpf_struct_ops_init();
}
--
2.47.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC PATCH 11/22] ublk: bpf: enable ublk-bpf
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
` (9 preceding siblings ...)
2025-01-07 12:04 ` [RFC PATCH 10/22] ublk: bpf: add kfunc for ublk bpf prog Ming Lei
@ 2025-01-07 12:04 ` Ming Lei
2025-01-07 12:04 ` [RFC PATCH 12/22] selftests: ublk: add tests for the ublk-bpf initial implementation Ming Lei
` (10 subsequent siblings)
21 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-01-07 12:04 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
Ming Lei
Add feature flag of UBLK_F_BPF, meantime pass bpf struct_ops prog
id via ublk parameter from userspace.
ublk-bpf needs to copy data between ublk request pages and userspace
buffer any more, so let ublk_need_map_io() return false for UBLK_F_BPF
too.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/block/ublk/bpf.c | 3 +--
drivers/block/ublk/main.c | 15 ++++++++++++++-
drivers/block/ublk/ublk.h | 10 ++++++----
include/uapi/linux/ublk_cmd.h | 14 +++++++++++++-
4 files changed, 34 insertions(+), 8 deletions(-)
diff --git a/drivers/block/ublk/bpf.c b/drivers/block/ublk/bpf.c
index 4179b7f61e92..ef1546a7ccda 100644
--- a/drivers/block/ublk/bpf.c
+++ b/drivers/block/ublk/bpf.c
@@ -79,8 +79,7 @@ int ublk_bpf_attach(struct ublk_device *ub)
if (!ublk_dev_support_bpf(ub))
return 0;
- /* todo: ublk device need to provide struct_ops prog id */
- ub->prog.prog_id = 0;
+ ub->prog.prog_id = ub->params.bpf.ops_id;
ub->prog.ops = &ublk_prog_consumer_ops;
return ublk_bpf_prog_attach(&ub->prog);
diff --git a/drivers/block/ublk/main.c b/drivers/block/ublk/main.c
index 0b136bc5247f..3c2ed9bf924d 100644
--- a/drivers/block/ublk/main.c
+++ b/drivers/block/ublk/main.c
@@ -416,6 +416,19 @@ static int ublk_validate_params(const struct ublk_device *ub)
else if (ublk_dev_is_zoned(ub))
return -EINVAL;
+ if (ub->params.types & UBLK_PARAM_TYPE_BPF) {
+ const struct ublk_param_bpf *p = &ub->params.bpf;
+
+ if (!ublk_dev_support_bpf(ub))
+ return -EINVAL;
+
+ if (!(p->flags & UBLK_BPF_HAS_OPS_ID))
+ return -EINVAL;
+ } else {
+ if (ublk_dev_support_bpf(ub))
+ return -EINVAL;
+ }
+
return 0;
}
@@ -434,7 +447,7 @@ static inline bool ublk_support_user_copy(const struct ublk_queue *ubq)
static inline bool ublk_need_map_io(const struct ublk_queue *ubq)
{
- return !ublk_support_user_copy(ubq);
+ return !(ublk_support_user_copy(ubq) || ublk_support_bpf(ubq));
}
static inline bool ublk_need_req_ref(const struct ublk_queue *ubq)
diff --git a/drivers/block/ublk/ublk.h b/drivers/block/ublk/ublk.h
index 7579b0032a3c..8343e70bd723 100644
--- a/drivers/block/ublk/ublk.h
+++ b/drivers/block/ublk/ublk.h
@@ -24,7 +24,8 @@
| UBLK_F_CMD_IOCTL_ENCODE \
| UBLK_F_USER_COPY \
| UBLK_F_ZONED \
- | UBLK_F_USER_RECOVERY_FAIL_IO)
+ | UBLK_F_USER_RECOVERY_FAIL_IO \
+ | UBLK_F_BPF)
#define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \
| UBLK_F_USER_RECOVERY_REISSUE \
@@ -33,7 +34,8 @@
/* All UBLK_PARAM_TYPE_* should be included here */
#define UBLK_PARAM_TYPE_ALL \
(UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \
- UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED)
+ UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED | \
+ UBLK_PARAM_TYPE_BPF)
enum {
UBLK_BPF_IO_PREP = 0,
@@ -193,12 +195,12 @@ static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq,
static inline bool ublk_support_bpf(const struct ublk_queue *ubq)
{
- return false;
+ return ubq->flags & UBLK_F_BPF;
}
static inline bool ublk_dev_support_bpf(const struct ublk_device *ub)
{
- return false;
+ return ub->dev_info.flags & UBLK_F_BPF;
}
struct ublk_device *ublk_get_device(struct ublk_device *ub);
diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
index a8bc98bb69fc..27cf14e65cbc 100644
--- a/include/uapi/linux/ublk_cmd.h
+++ b/include/uapi/linux/ublk_cmd.h
@@ -207,6 +207,9 @@
*/
#define UBLK_F_USER_RECOVERY_FAIL_IO (1ULL << 9)
+/* ublk IO is handled by bpf prog */
+#define UBLK_F_BPF (1ULL << 10)
+
/* device state */
#define UBLK_S_DEV_DEAD 0
#define UBLK_S_DEV_LIVE 1
@@ -401,6 +404,13 @@ struct ublk_param_zoned {
__u8 reserved[20];
};
+struct ublk_param_bpf {
+#define UBLK_BPF_HAS_OPS_ID (1 << 0)
+ __u8 flags;
+ __u8 ops_id;
+ __u8 reserved[6];
+};
+
struct ublk_params {
/*
* Total length of parameters, userspace has to set 'len' for both
@@ -413,12 +423,14 @@ struct ublk_params {
#define UBLK_PARAM_TYPE_DISCARD (1 << 1)
#define UBLK_PARAM_TYPE_DEVT (1 << 2)
#define UBLK_PARAM_TYPE_ZONED (1 << 3)
+#define UBLK_PARAM_TYPE_BPF (1 << 4)
__u32 types; /* types of parameter included */
struct ublk_param_basic basic;
struct ublk_param_discard discard;
struct ublk_param_devt devt;
- struct ublk_param_zoned zoned;
+ struct ublk_param_zoned zoned;
+ struct ublk_param_bpf bpf;
};
#endif
--
2.47.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC PATCH 12/22] selftests: ublk: add tests for the ublk-bpf initial implementation
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
` (10 preceding siblings ...)
2025-01-07 12:04 ` [RFC PATCH 11/22] ublk: bpf: enable ublk-bpf Ming Lei
@ 2025-01-07 12:04 ` Ming Lei
2025-01-07 12:04 ` [RFC PATCH 13/22] selftests: ublk: add tests for covering io split Ming Lei
` (9 subsequent siblings)
21 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-01-07 12:04 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
Ming Lei
Create one ublk null target over ublk-bpf, in which every block IO is
handled by the `ublk_null` bpf prog. And the whole ublk implementation
requires liburing.
Meantime add basic read/write IO test over this ublk null disk, and make
sure basic IO function works as expected.
ublk/Makefile is stolen from tools/testing/selftests/hid/Makefile
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
MAINTAINERS | 1 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/ublk/.gitignore | 4 +
tools/testing/selftests/ublk/Makefile | 228 +++
tools/testing/selftests/ublk/config | 2 +
tools/testing/selftests/ublk/progs/ublk_bpf.h | 13 +
.../selftests/ublk/progs/ublk_bpf_kfunc.h | 23 +
.../testing/selftests/ublk/progs/ublk_null.c | 63 +
tools/testing/selftests/ublk/test_common.sh | 72 +
tools/testing/selftests/ublk/test_null_01.sh | 19 +
tools/testing/selftests/ublk/test_null_02.sh | 23 +
tools/testing/selftests/ublk/ublk_bpf.c | 1429 +++++++++++++++++
12 files changed, 1878 insertions(+)
create mode 100644 tools/testing/selftests/ublk/.gitignore
create mode 100644 tools/testing/selftests/ublk/Makefile
create mode 100644 tools/testing/selftests/ublk/config
create mode 100644 tools/testing/selftests/ublk/progs/ublk_bpf.h
create mode 100644 tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h
create mode 100644 tools/testing/selftests/ublk/progs/ublk_null.c
create mode 100755 tools/testing/selftests/ublk/test_common.sh
create mode 100755 tools/testing/selftests/ublk/test_null_01.sh
create mode 100755 tools/testing/selftests/ublk/test_null_02.sh
create mode 100644 tools/testing/selftests/ublk/ublk_bpf.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 890f6195d03f..8ff8773377c4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23984,6 +23984,7 @@ S: Maintained
F: Documentation/block/ublk.rst
F: drivers/block/ublk/
F: include/uapi/linux/ublk_cmd.h
+F: tools/testing/selftests/ublk/
UBSAN
M: Kees Cook <kees@kernel.org>
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 2401e973c359..1c20256e662b 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -111,6 +111,7 @@ endif
TARGETS += tmpfs
TARGETS += tpm2
TARGETS += tty
+TARGETS += ublk
TARGETS += uevent
TARGETS += user_events
TARGETS += vDSO
diff --git a/tools/testing/selftests/ublk/.gitignore b/tools/testing/selftests/ublk/.gitignore
new file mode 100644
index 000000000000..865dca93cf75
--- /dev/null
+++ b/tools/testing/selftests/ublk/.gitignore
@@ -0,0 +1,4 @@
+ublk_bpf
+*.skel.h
+/tools
+*-verify.state
diff --git a/tools/testing/selftests/ublk/Makefile b/tools/testing/selftests/ublk/Makefile
new file mode 100644
index 000000000000..a95f317211e7
--- /dev/null
+++ b/tools/testing/selftests/ublk/Makefile
@@ -0,0 +1,228 @@
+# SPDX-License-Identifier: GPL-2.0
+
+# based on tools/testing/selftest/bpf/Makefile
+include ../../../build/Build.include
+include ../../../scripts/Makefile.arch
+include ../../../scripts/Makefile.include
+
+CXX ?= $(CROSS_COMPILE)g++
+
+HOSTPKG_CONFIG := pkg-config
+
+CFLAGS += -g -O0 -rdynamic -Wall -Werror -I$(OUTPUT)
+CFLAGS += -I$(OUTPUT)/tools/include
+
+LDLIBS += -lelf -lz -lrt -lpthread -luring
+
+# Silence some warnings when compiled with clang
+ifneq ($(LLVM),)
+CFLAGS += -Wno-unused-command-line-argument
+endif
+
+TEST_PROGS := test_null_01.sh
+TEST_PROGS += test_null_02.sh
+
+# Order correspond to 'make run_tests' order
+TEST_GEN_PROGS_EXTENDED = ublk_bpf
+
+# Emit succinct information message describing current building step
+# $1 - generic step name (e.g., CC, LINK, etc);
+# $2 - optional "flavor" specifier; if provided, will be emitted as [flavor];
+# $3 - target (assumed to be file); only file name will be emitted;
+# $4 - optional extra arg, emitted as-is, if provided.
+ifeq ($(V),1)
+Q =
+msg =
+else
+Q = @
+msg = @printf ' %-8s%s %s%s\n' "$(1)" "$(if $(2), [$(2)])" "$(notdir $(3))" "$(if $(4), $(4))";
+MAKEFLAGS += --no-print-directory
+submake_extras := feature_display=0
+endif
+
+# override lib.mk's default rules
+OVERRIDE_TARGETS := 1
+override define CLEAN
+ $(call msg,CLEAN)
+ $(Q)$(RM) -r $(TEST_GEN_PROGS)
+ $(Q)$(RM) -r $(EXTRA_CLEAN)
+endef
+
+include ../lib.mk
+
+TOOLSDIR := $(top_srcdir)/tools
+LIBDIR := $(TOOLSDIR)/lib
+BPFDIR := $(LIBDIR)/bpf
+TOOLSINCDIR := $(TOOLSDIR)/include
+BPFTOOLDIR := $(TOOLSDIR)/bpf/bpftool
+SCRATCH_DIR := $(OUTPUT)/tools
+BUILD_DIR := $(SCRATCH_DIR)/build
+INCLUDE_DIR := $(SCRATCH_DIR)/include
+BPFOBJ := $(BUILD_DIR)/libbpf/libbpf.a
+ifneq ($(CROSS_COMPILE),)
+HOST_BUILD_DIR := $(BUILD_DIR)/host
+HOST_SCRATCH_DIR := $(OUTPUT)/host-tools
+HOST_INCLUDE_DIR := $(HOST_SCRATCH_DIR)/include
+else
+HOST_BUILD_DIR := $(BUILD_DIR)
+HOST_SCRATCH_DIR := $(SCRATCH_DIR)
+HOST_INCLUDE_DIR := $(INCLUDE_DIR)
+endif
+HOST_BPFOBJ := $(HOST_BUILD_DIR)/libbpf/libbpf.a
+RESOLVE_BTFIDS := $(HOST_BUILD_DIR)/resolve_btfids/resolve_btfids
+
+VMLINUX_BTF_PATHS ?= /sys/kernel/btf/ublk_drv
+VMLINUX_BTF ?= $(abspath $(firstword $(wildcard $(VMLINUX_BTF_PATHS))))
+ifeq ($(VMLINUX_BTF),)
+$(error Cannot find a vmlinux for VMLINUX_BTF at any of "$(VMLINUX_BTF_PATHS)")
+endif
+
+# Define simple and short `make test_progs`, `make test_sysctl`, etc targets
+# to build individual tests.
+# NOTE: Semicolon at the end is critical to override lib.mk's default static
+# rule for binaries.
+$(notdir $(TEST_GEN_PROGS)): %: $(OUTPUT)/% ;
+
+# sort removes libbpf duplicates when not cross-building
+MAKE_DIRS := $(sort $(BUILD_DIR)/libbpf $(HOST_BUILD_DIR)/libbpf \
+ $(HOST_BUILD_DIR)/bpftool $(HOST_BUILD_DIR)/resolve_btfids \
+ $(INCLUDE_DIR))
+$(MAKE_DIRS):
+ $(call msg,MKDIR,,$@)
+ $(Q)mkdir -p $@
+
+# LLVM's ld.lld doesn't support all the architectures, so use it only on x86
+ifeq ($(SRCARCH),x86)
+LLD := lld
+else
+LLD := ld
+endif
+
+DEFAULT_BPFTOOL := $(HOST_SCRATCH_DIR)/sbin/bpftool
+
+TEST_GEN_PROGS_EXTENDED += $(DEFAULT_BPFTOOL)
+
+$(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED): $(BPFOBJ)
+
+BPFTOOL ?= $(DEFAULT_BPFTOOL)
+$(DEFAULT_BPFTOOL): $(wildcard $(BPFTOOLDIR)/*.[ch] $(BPFTOOLDIR)/Makefile) \
+ $(HOST_BPFOBJ) | $(HOST_BUILD_DIR)/bpftool
+ $(Q)$(MAKE) $(submake_extras) -C $(BPFTOOLDIR) \
+ ARCH= CROSS_COMPILE= CC=$(HOSTCC) LD=$(HOSTLD) \
+ EXTRA_CFLAGS='-g -O0' \
+ OUTPUT=$(HOST_BUILD_DIR)/bpftool/ \
+ LIBBPF_OUTPUT=$(HOST_BUILD_DIR)/libbpf/ \
+ LIBBPF_DESTDIR=$(HOST_SCRATCH_DIR)/ \
+ prefix= DESTDIR=$(HOST_SCRATCH_DIR)/ install-bin
+
+$(BPFOBJ): $(wildcard $(BPFDIR)/*.[ch] $(BPFDIR)/Makefile) \
+ | $(BUILD_DIR)/libbpf
+ $(Q)$(MAKE) $(submake_extras) -C $(BPFDIR) OUTPUT=$(BUILD_DIR)/libbpf/ \
+ EXTRA_CFLAGS='-g -O0' \
+ DESTDIR=$(SCRATCH_DIR) prefix= all install_headers
+
+ifneq ($(BPFOBJ),$(HOST_BPFOBJ))
+$(HOST_BPFOBJ): $(wildcard $(BPFDIR)/*.[ch] $(BPFDIR)/Makefile) \
+ | $(HOST_BUILD_DIR)/libbpf
+ $(Q)$(MAKE) $(submake_extras) -C $(BPFDIR) \
+ EXTRA_CFLAGS='-g -O0' ARCH= CROSS_COMPILE= \
+ OUTPUT=$(HOST_BUILD_DIR)/libbpf/ CC=$(HOSTCC) LD=$(HOSTLD) \
+ DESTDIR=$(HOST_SCRATCH_DIR)/ prefix= all install_headers
+endif
+
+$(INCLUDE_DIR)/vmlinux.h: $(VMLINUX_BTF) $(BPFTOOL) | $(INCLUDE_DIR)
+ifeq ($(VMLINUX_H),)
+ $(call msg,GEN,,$@)
+ $(Q)$(BPFTOOL) btf dump file $(VMLINUX_BTF) format c > $@
+else
+ $(call msg,CP,,$@)
+ $(Q)cp "$(VMLINUX_H)" $@
+endif
+
+$(RESOLVE_BTFIDS): $(HOST_BPFOBJ) | $(HOST_BUILD_DIR)/resolve_btfids \
+ $(TOOLSDIR)/bpf/resolve_btfids/main.c \
+ $(TOOLSDIR)/lib/rbtree.c \
+ $(TOOLSDIR)/lib/zalloc.c \
+ $(TOOLSDIR)/lib/string.c \
+ $(TOOLSDIR)/lib/ctype.c \
+ $(TOOLSDIR)/lib/str_error_r.c
+ $(Q)$(MAKE) $(submake_extras) -C $(TOOLSDIR)/bpf/resolve_btfids \
+ CC=$(HOSTCC) LD=$(HOSTLD) AR=$(HOSTAR) \
+ LIBBPF_INCLUDE=$(HOST_INCLUDE_DIR) \
+ OUTPUT=$(HOST_BUILD_DIR)/resolve_btfids/ BPFOBJ=$(HOST_BPFOBJ)
+
+# Get Clang's default includes on this system, as opposed to those seen by
+# '--target=bpf'. This fixes "missing" files on some architectures/distros,
+# such as asm/byteorder.h, asm/socket.h, asm/sockios.h, sys/cdefs.h etc.
+#
+# Use '-idirafter': Don't interfere with include mechanics except where the
+# build would have failed anyways.
+define get_sys_includes
+$(shell $(1) -v -E - </dev/null 2>&1 \
+ | sed -n '/<...> search starts here:/,/End of search list./{ s| \(/.*\)|-idirafter \1|p }') \
+$(shell $(1) -dM -E - </dev/null | grep '__riscv_xlen ' | awk '{printf("-D__riscv_xlen=%d -D__BITS_PER_LONG=%d", $$3, $$3)}')
+endef
+
+# Determine target endianness.
+IS_LITTLE_ENDIAN = $(shell $(CC) -dM -E - </dev/null | \
+ grep 'define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__')
+MENDIAN=$(if $(IS_LITTLE_ENDIAN),-mlittle-endian,-mbig-endian)
+
+CLANG_SYS_INCLUDES = $(call get_sys_includes,$(CLANG))
+BPF_CFLAGS = -g -Werror -D__TARGET_ARCH_$(SRCARCH) $(MENDIAN) \
+ -I$(INCLUDE_DIR)
+
+CLANG_CFLAGS = $(CLANG_SYS_INCLUDES) \
+ -Wno-compare-distinct-pointer-types
+
+# Build BPF object using Clang
+# $1 - input .c file
+# $2 - output .o file
+# $3 - CFLAGS
+define CLANG_BPF_BUILD_RULE
+ $(call msg,CLNG-BPF,$(TRUNNER_BINARY),$2)
+ $(Q)$(CLANG) $3 -O2 --target=bpf -c $1 -mcpu=v3 -o $2
+endef
+# Similar to CLANG_BPF_BUILD_RULE, but with disabled alu32
+define CLANG_NOALU32_BPF_BUILD_RULE
+ $(call msg,CLNG-BPF,$(TRUNNER_BINARY),$2)
+ $(Q)$(CLANG) $3 -O2 --target=bpf -c $1 -mcpu=v2 -o $2
+endef
+# Build BPF object using GCC
+define GCC_BPF_BUILD_RULE
+ $(call msg,GCC-BPF,$(TRUNNER_BINARY),$2)
+ $(Q)$(BPF_GCC) $3 -O2 -c $1 -o $2
+endef
+
+BPF_PROGS_DIR := progs
+BPF_BUILD_RULE := CLANG_BPF_BUILD_RULE
+BPF_SRCS := $(notdir $(wildcard $(BPF_PROGS_DIR)/*.c))
+BPF_OBJS := $(patsubst %.c,$(OUTPUT)/%.bpf.o, $(BPF_SRCS))
+BPF_SKELS := $(patsubst %.c,$(OUTPUT)/%.skel.h, $(BPF_SRCS))
+TEST_GEN_FILES += $(BPF_OBJS)
+
+$(BPF_PROGS_DIR)-bpfobjs := y
+$(BPF_OBJS): $(OUTPUT)/%.bpf.o: \
+ $(BPF_PROGS_DIR)/%.c \
+ $(wildcard $(BPF_PROGS_DIR)/*.h) \
+ $(INCLUDE_DIR)/vmlinux.h \
+ $(wildcard $(BPFDIR)/ublk_bpf_*.h) \
+ $(wildcard $(BPFDIR)/*.bpf.h) \
+ | $(OUTPUT) $(BPFOBJ)
+ $(call $(BPF_BUILD_RULE),$<,$@, $(BPF_CFLAGS))
+
+$(BPF_SKELS): %.skel.h: %.bpf.o $(BPFTOOL) | $(OUTPUT)
+ $(call msg,GEN-SKEL,$(BINARY),$@)
+ $(Q)$(BPFTOOL) gen object $(<:.o=.linked1.o) $<
+ $(Q)$(BPFTOOL) gen skeleton $(<:.o=.linked1.o) name $(notdir $(<:.bpf.o=)) > $@
+
+$(OUTPUT)/%.o: %.c $(BPF_SKELS)
+ $(call msg,CC,,$@)
+ $(Q)$(CC) $(CFLAGS) -c $(filter %.c,$^) $(LDLIBS) -o $@
+
+$(OUTPUT)/%: $(OUTPUT)/%.o
+ $(call msg,BINARY,,$@)
+ $(Q)$(LINK.c) $^ $(LDLIBS) -o $@
+
+EXTRA_CLEAN := $(SCRATCH_DIR) $(HOST_SCRATCH_DIR) feature bpftool \
+ $(addprefix $(OUTPUT)/,*.o *.skel.h no_alu32)
diff --git a/tools/testing/selftests/ublk/config b/tools/testing/selftests/ublk/config
new file mode 100644
index 000000000000..295b1f5c6c6c
--- /dev/null
+++ b/tools/testing/selftests/ublk/config
@@ -0,0 +1,2 @@
+CONFIG_BLK_DEV_UBLK=m
+CONFIG_UBLK_BPF=y
diff --git a/tools/testing/selftests/ublk/progs/ublk_bpf.h b/tools/testing/selftests/ublk/progs/ublk_bpf.h
new file mode 100644
index 000000000000..a302a645b096
--- /dev/null
+++ b/tools/testing/selftests/ublk/progs/ublk_bpf.h
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef UBLK_BPF_GEN_H
+#define UBLK_BPF_GEN_H
+
+#include "ublk_bpf_kfunc.h"
+
+#ifdef DEBUG
+#define BPF_DBG(...) bpf_printk(__VA_ARGS__)
+#else
+#define BPF_DBG(...)
+#endif
+
+#endif
diff --git a/tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h b/tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h
new file mode 100644
index 000000000000..acab490d933c
--- /dev/null
+++ b/tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef UBLK_BPF_INTERNAL_H
+#define UBLK_BPF_INTERNAL_H
+
+#ifndef BITS_PER_LONG
+#define BITS_PER_LONG (sizeof(unsigned long) * 8)
+#endif
+
+#define UBLK_BPF_DISPOSITION_BITS (4)
+#define UBLK_BPF_DISPOSITION_SHIFT (BITS_PER_LONG - UBLK_BPF_DISPOSITION_BITS)
+
+static inline ublk_bpf_return_t ublk_bpf_return_val(enum ublk_bpf_disposition rc,
+ unsigned int bytes)
+{
+ return (ublk_bpf_return_t) ((unsigned long)rc << UBLK_BPF_DISPOSITION_SHIFT) | bytes;
+}
+
+extern const struct ublksrv_io_desc *ublk_bpf_get_iod(const struct ublk_bpf_io *io) __ksym;
+extern void ublk_bpf_complete_io(const struct ublk_bpf_io *io, int res) __ksym;
+extern int ublk_bpf_get_dev_id(const struct ublk_bpf_io *io) __ksym;
+extern int ublk_bpf_get_queue_id(const struct ublk_bpf_io *io) __ksym;
+extern int ublk_bpf_get_io_tag(const struct ublk_bpf_io *io) __ksym;
+#endif
diff --git a/tools/testing/selftests/ublk/progs/ublk_null.c b/tools/testing/selftests/ublk/progs/ublk_null.c
new file mode 100644
index 000000000000..3225b52dcd24
--- /dev/null
+++ b/tools/testing/selftests/ublk/progs/ublk_null.c
@@ -0,0 +1,63 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <linux/const.h>
+#include <linux/errno.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+//#define DEBUG
+#include "ublk_bpf.h"
+
+/* libbpf v1.4.5 is required for struct_ops to work */
+
+static inline ublk_bpf_return_t __ublk_null_handle_io(const struct ublk_bpf_io *io, unsigned int _off)
+{
+ unsigned long off = -1, sects = -1;
+ const struct ublksrv_io_desc *iod;
+ int res;
+
+ iod = ublk_bpf_get_iod(io);
+ if (iod) {
+ res = iod->nr_sectors << 9;
+ off = iod->start_sector;
+ sects = iod->nr_sectors;
+ } else
+ res = -EINVAL;
+
+ BPF_DBG("ublk dev %u qid %u: handle io tag %u %lx-%d res %d",
+ ublk_bpf_get_dev_id(io),
+ ublk_bpf_get_queue_id(io),
+ ublk_bpf_get_io_tag(io),
+ off, sects, res);
+ ublk_bpf_complete_io(io, res);
+
+ return ublk_bpf_return_val(UBLK_BPF_IO_QUEUED, 0);
+}
+
+SEC("struct_ops/ublk_bpf_queue_io_cmd")
+ublk_bpf_return_t BPF_PROG(ublk_null_handle_io, struct ublk_bpf_io *io, unsigned int off)
+{
+ return __ublk_null_handle_io(io, off);
+}
+
+SEC("struct_ops/ublk_bpf_attach_dev")
+int BPF_PROG(ublk_null_attach_dev, int dev_id)
+{
+ return 0;
+}
+
+SEC("struct_ops/ublk_bpf_detach_dev")
+void BPF_PROG(ublk_null_detach_dev, int dev_id)
+{
+}
+
+SEC(".struct_ops.link")
+struct ublk_bpf_ops null_ublk_bpf_ops = {
+ .id = 0,
+ .queue_io_cmd = (void *)ublk_null_handle_io,
+ .attach_dev = (void *)ublk_null_attach_dev,
+ .detach_dev = (void *)ublk_null_detach_dev,
+};
+
+char LICENSE[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/ublk/test_common.sh b/tools/testing/selftests/ublk/test_common.sh
new file mode 100755
index 000000000000..466b82e77860
--- /dev/null
+++ b/tools/testing/selftests/ublk/test_common.sh
@@ -0,0 +1,72 @@
+#!/bin/bash
+
+_check_root() {
+ local ksft_skip=4
+
+ if [ $UID != 0 ]; then
+ echo please run this as root >&2
+ exit $ksft_skip
+ fi
+}
+
+_remove_ublk_devices() {
+ ${UBLK_PROG} del -a
+}
+
+_get_ublk_dev_state() {
+ ${UBLK_PROG} list -n "$1" | grep "state" | awk '{print $11}'
+}
+
+_get_ublk_daemon_pid() {
+ ${UBLK_PROG} list -n "$1" | grep "pid" | awk '{print $7}'
+}
+
+_prep_test() {
+ _check_root
+ export UBLK_PROG=$(pwd)/ublk_bpf
+ _remove_ublk_devices
+}
+
+_prep_bpf_test() {
+ _prep_test
+ _reg_bpf_prog $@
+}
+
+_show_result()
+{
+ if [ $2 -ne 0 ]; then
+ echo "$1 : [FAIL]"
+ else
+ echo "$1 : [PASS]"
+ fi
+}
+
+_cleanup_test() {
+ _remove_ublk_devices
+}
+
+_cleanup_bpf_test() {
+ _cleanup_test
+ _unreg_bpf_prog $@
+}
+
+_reg_bpf_prog() {
+ ${UBLK_PROG} reg -t $1 $2
+ if [ $? -ne 0 ]; then
+ echo "fail to register bpf prog $1 $2"
+ exit -1
+ fi
+}
+
+_unreg_bpf_prog() {
+ ${UBLK_PROG} unreg -t $1
+}
+
+_add_ublk_dev() {
+ ${UBLK_PROG} add $@
+ if [ $? -ne 0 ]; then
+ echo "fail to add ublk dev $@"
+ exit -1
+ fi
+ udevadm settle
+}
diff --git a/tools/testing/selftests/ublk/test_null_01.sh b/tools/testing/selftests/ublk/test_null_01.sh
new file mode 100755
index 000000000000..eecb4278e894
--- /dev/null
+++ b/tools/testing/selftests/ublk/test_null_01.sh
@@ -0,0 +1,19 @@
+#!/bin/bash
+
+. test_common.sh
+
+TID="null_01"
+ERR_CODE=0
+
+_prep_test
+
+# add single ublk null disk without bpf prog
+_add_ublk_dev -t null -n 0 --quiet
+
+# run fio over the two disks
+fio --name=job1 --filename=/dev/ublkb0 --ioengine=libaio --rw=readwrite --iodepth=32 --size=256M > /dev/null 2>&1
+ERR_CODE=$?
+
+_cleanup_test
+
+_show_result $TID $ERR_CODE
diff --git a/tools/testing/selftests/ublk/test_null_02.sh b/tools/testing/selftests/ublk/test_null_02.sh
new file mode 100755
index 000000000000..eb0da89f3461
--- /dev/null
+++ b/tools/testing/selftests/ublk/test_null_02.sh
@@ -0,0 +1,23 @@
+#!/bin/bash
+
+. test_common.sh
+
+TID="null_02"
+ERR_CODE=0
+
+# prepare & register and pin bpf prog
+_prep_bpf_test "null" ublk_null.bpf.o
+
+# add two ublk null disks with the pinned bpf prog
+_add_ublk_dev -t null -n 0 --bpf_prog 0 --quiet
+_add_ublk_dev -t null -n 1 --bpf_prog 0 --quiet
+
+# run fio over the two disks
+fio --name=job1 --filename=/dev/ublkb0 --rw=readwrite --size=256M \
+ --name=job2 --filename=/dev/ublkb1 --rw=readwrite --size=256M > /dev/null 2>&1
+ERR_CODE=$?
+
+# cleanup & unregister and unpin the bpf prog
+_cleanup_bpf_test "null"
+
+_show_result $TID $ERR_CODE
diff --git a/tools/testing/selftests/ublk/ublk_bpf.c b/tools/testing/selftests/ublk/ublk_bpf.c
new file mode 100644
index 000000000000..2d923e42845d
--- /dev/null
+++ b/tools/testing/selftests/ublk/ublk_bpf.c
@@ -0,0 +1,1429 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Description: uring_cmd based ublk
+ */
+#include <unistd.h>
+#include <stdlib.h>
+#include <assert.h>
+#include <stdio.h>
+#include <stdarg.h>
+#include <string.h>
+#include <pthread.h>
+#include <getopt.h>
+#include <limits.h>
+#include <poll.h>
+#include <sys/syscall.h>
+#include <sys/mman.h>
+#include <sys/ioctl.h>
+#include <sys/inotify.h>
+#include <sys/wait.h>
+#include <liburing.h>
+#include <linux/ublk_cmd.h>
+
+#include <bpf/bpf.h>
+#include <bpf/btf.h>
+#include <bpf/libbpf.h>
+
+#define __maybe_unused __attribute__((unused))
+#define MAX_BACK_FILES 4
+#ifndef min
+#define min(a, b) ((a) < (b) ? (a) : (b))
+#endif
+#define UBLK_BPF_PIN_PATH "ublk"
+
+/****************** part 1: libublk ********************/
+
+#define CTRL_DEV "/dev/ublk-control"
+#define UBLKC_DEV "/dev/ublkc"
+#define UBLKB_DEV "/dev/ublkb"
+#define UBLK_CTRL_RING_DEPTH 32
+
+/* queue idle timeout */
+#define UBLKSRV_IO_IDLE_SECS 20
+
+#define UBLK_IO_MAX_BYTES 65536
+#define UBLK_MAX_QUEUES 4
+#define UBLK_QUEUE_DEPTH 128
+
+#define UBLK_DBG_DEV (1U << 0)
+#define UBLK_DBG_QUEUE (1U << 1)
+#define UBLK_DBG_IO_CMD (1U << 2)
+#define UBLK_DBG_IO (1U << 3)
+#define UBLK_DBG_CTRL_CMD (1U << 4)
+#define UBLK_LOG (1U << 5)
+
+struct ublk_dev;
+struct ublk_queue;
+
+struct dev_ctx {
+ char tgt_type[16];
+ unsigned long flags;
+ unsigned nr_hw_queues;
+ unsigned queue_depth;
+ int dev_id;
+ int nr_files;
+ char *files[MAX_BACK_FILES];
+ int bpf_prog_id;
+ unsigned int logging:1;
+ unsigned int all:1;
+};
+
+struct ublk_ctrl_cmd_data {
+ __u32 cmd_op;
+#define CTRL_CMD_HAS_DATA 1
+#define CTRL_CMD_HAS_BUF 2
+ __u32 flags;
+
+ __u64 data[2];
+ __u64 addr;
+ __u32 len;
+};
+
+struct ublk_io {
+ char *buf_addr;
+
+#define UBLKSRV_NEED_FETCH_RQ (1UL << 0)
+#define UBLKSRV_NEED_COMMIT_RQ_COMP (1UL << 1)
+#define UBLKSRV_IO_FREE (1UL << 2)
+ unsigned short flags;
+ unsigned short refs; /* used by target code only */
+
+ int result;
+};
+
+struct ublk_tgt_ops {
+ const char *name;
+ int (*init_tgt)(struct ublk_dev *);
+ void (*deinit_tgt)(struct ublk_dev *);
+
+ int (*queue_io)(struct ublk_queue *, int tag);
+ void (*tgt_io_done)(struct ublk_queue *,
+ int tag, const struct io_uring_cqe *);
+};
+
+struct ublk_tgt {
+ unsigned long dev_size;
+ unsigned int sq_depth;
+ unsigned int cq_depth;
+ const struct ublk_tgt_ops *ops;
+ struct ublk_params params;
+ char backing_file[1024 - 8 - sizeof(struct ublk_params)];
+};
+
+struct ublk_queue {
+ int q_id;
+ int q_depth;
+ unsigned int cmd_inflight;
+ unsigned int io_inflight;
+ struct ublk_dev *dev;
+ const struct ublk_tgt_ops *tgt_ops;
+ char *io_cmd_buf;
+ struct io_uring ring;
+ struct ublk_io ios[UBLK_QUEUE_DEPTH];
+#define UBLKSRV_QUEUE_STOPPING (1U << 0)
+#define UBLKSRV_QUEUE_IDLE (1U << 1)
+#define UBLKSRV_NO_BUF (1U << 2)
+ unsigned state;
+ pid_t tid;
+ pthread_t thread;
+};
+
+struct ublk_dev {
+ struct ublk_tgt tgt;
+ struct ublksrv_ctrl_dev_info dev_info;
+ struct ublk_queue q[UBLK_MAX_QUEUES];
+
+ int fds[2]; /* fds[0] points to /dev/ublkcN */
+ int nr_fds;
+ int ctrl_fd;
+ struct io_uring ring;
+
+ int bpf_prog_id;
+};
+
+#ifndef offsetof
+#define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER)
+#endif
+
+#ifndef container_of
+#define container_of(ptr, type, member) ({ \
+ unsigned long __mptr = (unsigned long)(ptr); \
+ ((type *)(__mptr - offsetof(type, member))); })
+#endif
+
+#define round_up(val, rnd) \
+ (((val) + ((rnd) - 1)) & ~((rnd) - 1))
+
+static unsigned int ublk_dbg_mask = UBLK_LOG;
+
+static const struct ublk_tgt_ops *ublk_find_tgt(const char *name);
+
+static inline int is_target_io(__u64 user_data)
+{
+ return (user_data & (1ULL << 63)) != 0;
+}
+
+static inline __u64 build_user_data(unsigned tag, unsigned op,
+ unsigned tgt_data, unsigned is_target_io)
+{
+ assert(!(tag >> 16) && !(op >> 8) && !(tgt_data >> 16));
+
+ return tag | (op << 16) | (tgt_data << 24) | (__u64)is_target_io << 63;
+}
+
+static inline unsigned int user_data_to_tag(__u64 user_data)
+{
+ return user_data & 0xffff;
+}
+
+static inline unsigned int user_data_to_op(__u64 user_data)
+{
+ return (user_data >> 16) & 0xff;
+}
+
+static void ublk_err(const char *fmt, ...)
+{
+ va_list ap;
+
+ va_start(ap, fmt);
+ vfprintf(stderr, fmt, ap);
+}
+
+static void ublk_log(const char *fmt, ...)
+{
+ if (ublk_dbg_mask & UBLK_LOG) {
+ va_list ap;
+
+ va_start(ap, fmt);
+ vfprintf(stdout, fmt, ap);
+ }
+}
+
+static void ublk_dbg(int level, const char *fmt, ...)
+{
+ if (level & ublk_dbg_mask) {
+ va_list ap;
+ va_start(ap, fmt);
+ vfprintf(stdout, fmt, ap);
+ }
+}
+
+static inline void *ublk_get_sqe_cmd(const struct io_uring_sqe *sqe)
+{
+ return (void *)&sqe->cmd;
+}
+
+static inline void ublk_mark_io_done(struct ublk_io *io, int res)
+{
+ io->flags |= (UBLKSRV_NEED_COMMIT_RQ_COMP | UBLKSRV_IO_FREE);
+ io->result = res;
+}
+
+static inline const struct ublksrv_io_desc *ublk_get_iod(
+ const struct ublk_queue *q, int tag)
+{
+ return (struct ublksrv_io_desc *)
+ &(q->io_cmd_buf[tag * sizeof(struct ublksrv_io_desc)]);
+}
+
+static inline void ublk_set_sqe_cmd_op(struct io_uring_sqe *sqe,
+ __u32 cmd_op)
+{
+ __u32 *addr = (__u32 *)&sqe->off;
+
+ addr[0] = cmd_op;
+ addr[1] = 0;
+}
+
+static inline int ublk_setup_ring(struct io_uring *r, int depth,
+ int cq_depth, unsigned flags)
+{
+ struct io_uring_params p;
+
+ memset(&p, 0, sizeof(p));
+ p.flags = flags | IORING_SETUP_CQSIZE;
+ p.cq_entries = cq_depth;
+
+ return io_uring_queue_init_params(depth, r, &p);
+}
+
+static void ublk_ctrl_init_cmd(struct ublk_dev *dev,
+ struct io_uring_sqe *sqe,
+ struct ublk_ctrl_cmd_data *data)
+{
+ struct ublksrv_ctrl_dev_info *info = &dev->dev_info;
+ struct ublksrv_ctrl_cmd *cmd = (struct ublksrv_ctrl_cmd *)ublk_get_sqe_cmd(sqe);
+
+ sqe->fd = dev->ctrl_fd;
+ sqe->opcode = IORING_OP_URING_CMD;
+ sqe->ioprio = 0;
+
+ if (data->flags & CTRL_CMD_HAS_BUF) {
+ cmd->addr = data->addr;
+ cmd->len = data->len;
+ }
+
+ if (data->flags & CTRL_CMD_HAS_DATA)
+ cmd->data[0] = data->data[0];
+
+ cmd->dev_id = info->dev_id;
+ cmd->queue_id = -1;
+
+ ublk_set_sqe_cmd_op(sqe, data->cmd_op);
+
+ io_uring_sqe_set_data(sqe, cmd);
+}
+
+static int __ublk_ctrl_cmd(struct ublk_dev *dev,
+ struct ublk_ctrl_cmd_data *data)
+{
+ struct io_uring_sqe *sqe;
+ struct io_uring_cqe *cqe;
+ int ret = -EINVAL;
+
+ sqe = io_uring_get_sqe(&dev->ring);
+ if (!sqe) {
+ ublk_err("%s: can't get sqe ret %d\n", __func__, ret);
+ return ret;
+ }
+
+ ublk_ctrl_init_cmd(dev, sqe, data);
+
+ ret = io_uring_submit(&dev->ring);
+ if (ret < 0) {
+ ublk_err("uring submit ret %d\n", ret);
+ return ret;
+ }
+
+ ret = io_uring_wait_cqe(&dev->ring, &cqe);
+ if (ret < 0) {
+ ublk_err("wait cqe: %s\n", strerror(-ret));
+ return ret;
+ }
+ io_uring_cqe_seen(&dev->ring, cqe);
+
+ return cqe->res;
+}
+
+static int ublk_ctrl_stop_dev(struct ublk_dev *dev)
+{
+ struct ublk_ctrl_cmd_data data = {
+ .cmd_op = UBLK_CMD_STOP_DEV,
+ };
+
+ return __ublk_ctrl_cmd(dev, &data);
+}
+
+static int ublk_ctrl_start_dev(struct ublk_dev *dev,
+ int daemon_pid)
+{
+ struct ublk_ctrl_cmd_data data = {
+ .cmd_op = UBLK_U_CMD_START_DEV,
+ .flags = CTRL_CMD_HAS_DATA,
+ };
+
+ dev->dev_info.ublksrv_pid = data.data[0] = daemon_pid;
+
+ return __ublk_ctrl_cmd(dev, &data);
+}
+
+static int ublk_ctrl_add_dev(struct ublk_dev *dev)
+{
+ struct ublk_ctrl_cmd_data data = {
+ .cmd_op = UBLK_U_CMD_ADD_DEV,
+ .flags = CTRL_CMD_HAS_BUF,
+ .addr = (__u64) (uintptr_t) &dev->dev_info,
+ .len = sizeof(struct ublksrv_ctrl_dev_info),
+ };
+
+ return __ublk_ctrl_cmd(dev, &data);
+}
+
+static int ublk_ctrl_del_dev(struct ublk_dev *dev)
+{
+ struct ublk_ctrl_cmd_data data = {
+ .cmd_op = UBLK_U_CMD_DEL_DEV,
+ .flags = 0,
+ };
+
+ return __ublk_ctrl_cmd(dev, &data);
+}
+
+static int ublk_ctrl_get_info(struct ublk_dev *dev)
+{
+ struct ublk_ctrl_cmd_data data = {
+ .cmd_op = UBLK_U_CMD_GET_DEV_INFO,
+ .flags = CTRL_CMD_HAS_BUF,
+ .addr = (__u64) (uintptr_t) &dev->dev_info,
+ .len = sizeof(struct ublksrv_ctrl_dev_info),
+ };
+
+ return __ublk_ctrl_cmd(dev, &data);
+}
+
+static int ublk_ctrl_set_params(struct ublk_dev *dev,
+ struct ublk_params *params)
+{
+ struct ublk_ctrl_cmd_data data = {
+ .cmd_op = UBLK_U_CMD_SET_PARAMS,
+ .flags = CTRL_CMD_HAS_BUF,
+ .addr = (__u64) (uintptr_t) params,
+ .len = sizeof(*params),
+ };
+ params->len = sizeof(*params);
+ return __ublk_ctrl_cmd(dev, &data);
+}
+
+static int ublk_ctrl_get_params(struct ublk_dev *dev,
+ struct ublk_params *params)
+{
+ struct ublk_ctrl_cmd_data data = {
+ .cmd_op = UBLK_CMD_GET_PARAMS,
+ .flags = CTRL_CMD_HAS_BUF,
+ .addr = (__u64)params,
+ .len = sizeof(*params),
+ };
+
+ params->len = sizeof(*params);
+
+ return __ublk_ctrl_cmd(dev, &data);
+}
+
+static int ublk_ctrl_get_features(struct ublk_dev *dev,
+ __u64 *features)
+{
+ struct ublk_ctrl_cmd_data data = {
+ .cmd_op = UBLK_U_CMD_GET_FEATURES,
+ .flags = CTRL_CMD_HAS_BUF,
+ .addr = (__u64) (uintptr_t) features,
+ .len = sizeof(*features),
+ };
+
+ return __ublk_ctrl_cmd(dev, &data);
+}
+
+static const char *ublk_dev_state_desc(struct ublk_dev *dev)
+{
+ switch (dev->dev_info.state) {
+ case UBLK_S_DEV_DEAD:
+ return "DEAD";
+ case UBLK_S_DEV_LIVE:
+ return "LIVE";
+ case UBLK_S_DEV_QUIESCED:
+ return "QUIESCED";
+ default:
+ return "UNKNOWN";
+ };
+}
+
+static void ublk_ctrl_dump(struct ublk_dev *dev, bool show_queue)
+{
+ struct ublksrv_ctrl_dev_info *info = &dev->dev_info;
+ int ret;
+ struct ublk_params p;
+
+ ret = ublk_ctrl_get_params(dev, &p);
+ if (ret < 0) {
+ ublk_err("failed to get params %m\n");
+ return;
+ }
+
+ ublk_log("dev id %d: nr_hw_queues %d queue_depth %d block size %d dev_capacity %lld\n",
+ info->dev_id,
+ info->nr_hw_queues, info->queue_depth,
+ 1 << p.basic.logical_bs_shift, p.basic.dev_sectors);
+ ublk_log("\tmax rq size %d daemon pid %d flags 0x%llx state %s\n",
+ info->max_io_buf_bytes,
+ info->ublksrv_pid, info->flags,
+ ublk_dev_state_desc(dev));
+ if (show_queue) {
+ int i;
+
+ for (i = 0; i < dev->dev_info.nr_hw_queues; i++)
+ ublk_log("\tqueue 0 tid: %d\n", dev->q[i].tid);
+ }
+ fflush(stdout);
+}
+
+static void ublk_ctrl_deinit(struct ublk_dev *dev)
+{
+ close(dev->ctrl_fd);
+ free(dev);
+}
+
+static struct ublk_dev *ublk_ctrl_init(void)
+{
+ struct ublk_dev *dev = (struct ublk_dev *)calloc(1, sizeof(*dev));
+ struct ublksrv_ctrl_dev_info *info = &dev->dev_info;
+ int ret;
+
+ dev->ctrl_fd = open(CTRL_DEV, O_RDWR);
+ if (dev->ctrl_fd < 0) {
+ free(dev);
+ return NULL;
+ }
+
+ info->max_io_buf_bytes = UBLK_IO_MAX_BYTES;
+
+ ret = ublk_setup_ring(&dev->ring, UBLK_CTRL_RING_DEPTH,
+ UBLK_CTRL_RING_DEPTH, IORING_SETUP_SQE128);
+ if (ret < 0) {
+ ublk_err("queue_init: %s\n", strerror(-ret));
+ free(dev);
+ return NULL;
+ }
+ dev->nr_fds = 1;
+
+ return dev;
+}
+
+static int __ublk_queue_cmd_buf_sz(unsigned depth)
+{
+ int size = depth * sizeof(struct ublksrv_io_desc);
+ unsigned int page_sz = getpagesize();
+
+ return round_up(size, page_sz);
+}
+
+static int ublk_queue_max_cmd_buf_sz(void)
+{
+ return __ublk_queue_cmd_buf_sz(UBLK_MAX_QUEUE_DEPTH);
+}
+
+static int ublk_queue_cmd_buf_sz(struct ublk_queue *q)
+{
+ return __ublk_queue_cmd_buf_sz(q->q_depth);
+}
+
+static void ublk_queue_deinit(struct ublk_queue *q)
+{
+ int i;
+ int nr_ios = q->q_depth;
+
+ io_uring_unregister_ring_fd(&q->ring);
+
+ if (q->ring.ring_fd > 0) {
+ io_uring_unregister_files(&q->ring);
+ close(q->ring.ring_fd);
+ q->ring.ring_fd = -1;
+ }
+
+ if (q->io_cmd_buf)
+ munmap(q->io_cmd_buf, ublk_queue_cmd_buf_sz(q));
+
+ for (i = 0; i < nr_ios; i++)
+ free(q->ios[i].buf_addr);
+}
+
+static int ublk_queue_init(struct ublk_queue *q)
+{
+ struct ublk_dev *dev = q->dev;
+ int depth = dev->dev_info.queue_depth;
+ int i, ret = -1;
+ int cmd_buf_size, io_buf_size;
+ unsigned long off;
+ int ring_depth = dev->tgt.sq_depth, cq_depth = dev->tgt.cq_depth;
+
+ q->tgt_ops = dev->tgt.ops;
+ q->state = 0;
+ q->q_depth = depth;
+ q->cmd_inflight = 0;
+ q->tid = gettid();
+ if (dev->dev_info.flags & UBLK_F_BPF)
+ q->state |= UBLKSRV_NO_BUF;
+
+ cmd_buf_size = ublk_queue_cmd_buf_sz(q);
+ off = UBLKSRV_CMD_BUF_OFFSET + q->q_id * ublk_queue_max_cmd_buf_sz();
+ q->io_cmd_buf = (char *)mmap(0, cmd_buf_size, PROT_READ,
+ MAP_SHARED | MAP_POPULATE, dev->fds[0], off);
+ if (q->io_cmd_buf == MAP_FAILED) {
+ ublk_err("ublk dev %d queue %d map io_cmd_buf failed %m\n",
+ q->dev->dev_info.dev_id, q->q_id);
+ goto fail;
+ }
+
+ io_buf_size = dev->dev_info.max_io_buf_bytes;
+ for (i = 0; i < q->q_depth; i++) {
+ q->ios[i].buf_addr = NULL;
+ q->ios[i].flags = UBLKSRV_NEED_FETCH_RQ | UBLKSRV_IO_FREE;
+
+ if (q->state & UBLKSRV_NO_BUF)
+ continue;
+
+ if (posix_memalign((void **)&q->ios[i].buf_addr,
+ getpagesize(), io_buf_size)) {
+ ublk_err("ublk dev %d queue %d io %d posix_memalign failed %m\n",
+ dev->dev_info.dev_id, q->q_id, i);
+ goto fail;
+ }
+ }
+
+ ret = ublk_setup_ring(&q->ring, ring_depth, cq_depth,
+ IORING_SETUP_COOP_TASKRUN);
+ if (ret < 0) {
+ ublk_err("ublk dev %d queue %d setup io_uring failed %d\n",
+ q->dev->dev_info.dev_id, q->q_id, ret);
+ goto fail;
+ }
+
+ io_uring_register_ring_fd(&q->ring);
+
+ ret = io_uring_register_files(&q->ring, dev->fds, dev->nr_fds);
+ if (ret) {
+ ublk_err("ublk dev %d queue %d register files failed %d\n",
+ q->dev->dev_info.dev_id, q->q_id, ret);
+ goto fail;
+ }
+
+ return 0;
+ fail:
+ ublk_queue_deinit(q);
+ ublk_err("ublk dev %d queue %d failed\n",
+ dev->dev_info.dev_id, q->q_id);
+ return -ENOMEM;
+}
+
+static int ublk_dev_prep(struct ublk_dev *dev)
+{
+ int dev_id = dev->dev_info.dev_id;
+ char buf[64];
+ int ret = 0;
+
+ snprintf(buf, 64, "%s%d", UBLKC_DEV, dev_id);
+ dev->fds[0] = open(buf, O_RDWR);
+ if (dev->fds[0] < 0) {
+ ret = -EBADF;
+ ublk_err("can't open %s, ret %d\n", buf, dev->fds[0]);
+ goto fail;
+ }
+
+ if (dev->tgt.ops->init_tgt)
+ ret = dev->tgt.ops->init_tgt(dev);
+
+ return ret;
+fail:
+ close(dev->fds[0]);
+ return ret;
+}
+
+static void ublk_dev_unprep(struct ublk_dev *dev)
+{
+ if (dev->tgt.ops->deinit_tgt)
+ dev->tgt.ops->deinit_tgt(dev);
+ close(dev->fds[0]);
+}
+
+static int ublk_queue_io_cmd(struct ublk_queue *q,
+ struct ublk_io *io, unsigned tag)
+{
+ struct ublksrv_io_cmd *cmd;
+ struct io_uring_sqe *sqe;
+ unsigned int cmd_op = 0;
+ __u64 user_data;
+
+ /* only freed io can be issued */
+ if (!(io->flags & UBLKSRV_IO_FREE))
+ return 0;
+
+ /* we issue because we need either fetching or committing */
+ if (!(io->flags &
+ (UBLKSRV_NEED_FETCH_RQ | UBLKSRV_NEED_COMMIT_RQ_COMP)))
+ return 0;
+
+ if (io->flags & UBLKSRV_NEED_COMMIT_RQ_COMP)
+ cmd_op = UBLK_U_IO_COMMIT_AND_FETCH_REQ;
+ else if (io->flags & UBLKSRV_NEED_FETCH_RQ)
+ cmd_op = UBLK_U_IO_FETCH_REQ;
+
+ sqe = io_uring_get_sqe(&q->ring);
+ if (!sqe) {
+ ublk_err("%s: run out of sqe %d, tag %d\n",
+ __func__, q->q_id, tag);
+ return -1;
+ }
+
+ cmd = (struct ublksrv_io_cmd *)ublk_get_sqe_cmd(sqe);
+
+ if (cmd_op == UBLK_U_IO_COMMIT_AND_FETCH_REQ)
+ cmd->result = io->result;
+
+ /* These fields should be written once, never change */
+ ublk_set_sqe_cmd_op(sqe, cmd_op);
+ sqe->fd = 0; /* dev->fds[0] */
+ sqe->opcode = IORING_OP_URING_CMD;
+ sqe->flags = IOSQE_FIXED_FILE;
+ sqe->rw_flags = 0;
+ cmd->tag = tag;
+ cmd->q_id = q->q_id;
+ if (!(q->state & UBLKSRV_NO_BUF))
+ cmd->addr = (__u64) (uintptr_t) io->buf_addr;
+ else
+ cmd->addr = 0;
+
+ user_data = build_user_data(tag, _IOC_NR(cmd_op), 0, 0);
+ io_uring_sqe_set_data64(sqe, user_data);
+
+ io->flags = 0;
+
+ q->cmd_inflight += 1;
+
+ ublk_dbg(UBLK_DBG_IO_CMD, "%s: (qid %d tag %u cmd_op %u) iof %x stopping %d\n",
+ __func__, q->q_id, tag, cmd_op,
+ io->flags, !!(q->state & UBLKSRV_QUEUE_STOPPING));
+ return 1;
+}
+
+__maybe_unused static int ublk_complete_io(struct ublk_queue *q,
+ unsigned tag, int res)
+{
+ struct ublk_io *io = &q->ios[tag];
+
+ ublk_mark_io_done(io, res);
+
+ return ublk_queue_io_cmd(q, io, tag);
+}
+
+static void ublk_submit_fetch_commands(struct ublk_queue *q)
+{
+ int i = 0;
+
+ for (i = 0; i < q->q_depth; i++)
+ ublk_queue_io_cmd(q, &q->ios[i], i);
+}
+
+static int ublk_queue_is_idle(struct ublk_queue *q)
+{
+ return !io_uring_sq_ready(&q->ring) && !q->io_inflight;
+}
+
+static int ublk_queue_is_done(struct ublk_queue *q)
+{
+ return (q->state & UBLKSRV_QUEUE_STOPPING) && ublk_queue_is_idle(q);
+}
+
+static inline void ublksrv_handle_tgt_cqe(struct ublk_queue *q,
+ struct io_uring_cqe *cqe)
+{
+ unsigned tag = user_data_to_tag(cqe->user_data);
+
+ if (cqe->res < 0 && cqe->res != -EAGAIN)
+ ublk_err("%s: failed tgt io: res %d qid %u tag %u, cmd_op %u\n",
+ __func__, cqe->res, q->q_id,
+ user_data_to_tag(cqe->user_data),
+ user_data_to_op(cqe->user_data));
+
+ if (q->tgt_ops->tgt_io_done)
+ q->tgt_ops->tgt_io_done(q, tag, cqe);
+}
+
+static void ublk_handle_cqe(struct io_uring *r,
+ struct io_uring_cqe *cqe, void *data)
+{
+ struct ublk_queue *q = container_of(r, struct ublk_queue, ring);
+ unsigned tag = user_data_to_tag(cqe->user_data);
+ unsigned cmd_op = user_data_to_op(cqe->user_data);
+ int fetch = (cqe->res != UBLK_IO_RES_ABORT) &&
+ !(q->state & UBLKSRV_QUEUE_STOPPING);
+ struct ublk_io *io;
+
+ ublk_dbg(UBLK_DBG_IO_CMD, "%s: res %d (qid %d tag %u cmd_op %u target %d) stopping %d\n",
+ __func__, cqe->res, q->q_id, tag, cmd_op,
+ is_target_io(cqe->user_data),
+ (q->state & UBLKSRV_QUEUE_STOPPING));
+
+ /* Don't retrieve io in case of target io */
+ if (is_target_io(cqe->user_data)) {
+ ublksrv_handle_tgt_cqe(q, cqe);
+ return;
+ }
+
+ io = &q->ios[tag];
+ q->cmd_inflight--;
+
+ if (!fetch) {
+ q->state |= UBLKSRV_QUEUE_STOPPING;
+ io->flags &= ~UBLKSRV_NEED_FETCH_RQ;
+ }
+
+ if (cqe->res == UBLK_IO_RES_OK) {
+ assert(tag < q->q_depth);
+ if (q->tgt_ops->queue_io)
+ q->tgt_ops->queue_io(q, tag);
+ } else {
+ /*
+ * COMMIT_REQ will be completed immediately since no fetching
+ * piggyback is required.
+ *
+ * Marking IO_FREE only, then this io won't be issued since
+ * we only issue io with (UBLKSRV_IO_FREE | UBLKSRV_NEED_*)
+ *
+ * */
+ io->flags = UBLKSRV_IO_FREE;
+ }
+}
+
+static int ublk_reap_events_uring(struct io_uring *r)
+{
+ struct io_uring_cqe *cqe;
+ unsigned head;
+ int count = 0;
+
+ io_uring_for_each_cqe(r, head, cqe) {
+ ublk_handle_cqe(r, cqe, NULL);
+ count += 1;
+ }
+ io_uring_cq_advance(r, count);
+
+ return count;
+}
+
+static int ublk_process_io(struct ublk_queue *q)
+{
+ int ret, reapped;
+
+ ublk_dbg(UBLK_DBG_QUEUE, "dev%d-q%d: to_submit %d inflight cmd %u stopping %d\n",
+ q->dev->dev_info.dev_id,
+ q->q_id, io_uring_sq_ready(&q->ring),
+ q->cmd_inflight,
+ (q->state & UBLKSRV_QUEUE_STOPPING));
+
+ if (ublk_queue_is_done(q))
+ return -ENODEV;
+
+ ret = io_uring_submit_and_wait(&q->ring, 1);
+ reapped = ublk_reap_events_uring(&q->ring);
+
+ ublk_dbg(UBLK_DBG_QUEUE, "submit result %d, reapped %d stop %d idle %d\n",
+ ret, reapped, (q->state & UBLKSRV_QUEUE_STOPPING),
+ (q->state & UBLKSRV_QUEUE_IDLE));
+
+ return reapped;
+}
+
+static void *ublk_io_handler_fn(void *data)
+{
+ struct ublk_queue *q = data;
+ int dev_id = q->dev->dev_info.dev_id;
+ int ret;
+
+ ret = ublk_queue_init(q);
+ if (ret) {
+ ublk_err("ublk dev %d queue %d init queue failed\n",
+ dev_id, q->q_id);
+ return NULL;
+ }
+ ublk_dbg(UBLK_DBG_QUEUE, "tid %d: ublk dev %d queue %d started\n",
+ q->tid, dev_id, q->q_id);
+
+ /* submit all io commands to ublk driver */
+ ublk_submit_fetch_commands(q);
+ do {
+ if (ublk_process_io(q) < 0)
+ break;
+ } while (1);
+
+ ublk_dbg(UBLK_DBG_QUEUE, "ublk dev %d queue %d exited\n", dev_id, q->q_id);
+ ublk_queue_deinit(q);
+ return NULL;
+}
+
+static void ublk_set_parameters(struct ublk_dev *dev)
+{
+ int ret;
+
+ ret = ublk_ctrl_set_params(dev, &dev->tgt.params);
+ if (ret)
+ ublk_err("dev %d set basic parameter failed %d\n",
+ dev->dev_info.dev_id, ret);
+}
+
+static int ublk_start_daemon(struct ublk_dev *dev)
+{
+ int ret, i;
+ void *thread_ret;
+ const struct ublksrv_ctrl_dev_info *dinfo = &dev->dev_info;
+
+ if (daemon(1, 1) < 0)
+ return -errno;
+
+ ublk_dbg(UBLK_DBG_DEV, "%s enter\n", __func__);
+
+ ret = ublk_dev_prep(dev);
+ if (ret)
+ return ret;
+
+ for (i = 0; i < dinfo->nr_hw_queues; i++) {
+ dev->q[i].dev = dev;
+ dev->q[i].q_id = i;
+ pthread_create(&dev->q[i].thread, NULL,
+ ublk_io_handler_fn,
+ &dev->q[i]);
+ }
+
+ /* everything is fine now, start us */
+ ublk_set_parameters(dev);
+ ret = ublk_ctrl_start_dev(dev, getpid());
+ if (ret < 0) {
+ ublk_err("%s: ublk_ctrl_start_dev failed: %d\n", __func__, ret);
+ goto fail;
+ }
+
+ ublk_ctrl_get_info(dev);
+ ublk_ctrl_dump(dev, true);
+
+ /* wait until we are terminated */
+ for (i = 0; i < dinfo->nr_hw_queues; i++)
+ pthread_join(dev->q[i].thread, &thread_ret);
+ fail:
+ ublk_dev_unprep(dev);
+ ublk_dbg(UBLK_DBG_DEV, "%s exit\n", __func__);
+
+ return ret;
+}
+
+static int wait_ublk_dev(char *dev_name, int evt_mask, unsigned timeout)
+{
+#define EV_SIZE (sizeof(struct inotify_event))
+#define EV_BUF_LEN (128 * (EV_SIZE + 16))
+ struct pollfd pfd;
+ int fd, wd;
+ int ret = -EINVAL;
+
+ fd = inotify_init();
+ if (fd < 0) {
+ ublk_dbg(UBLK_DBG_DEV, "%s: inotify init failed\n", __func__);
+ return fd;
+ }
+
+ wd = inotify_add_watch(fd, "/dev", evt_mask);
+ if (wd == -1) {
+ ublk_dbg(UBLK_DBG_DEV, "%s: add watch for /dev failed\n", __func__);
+ goto fail;
+ }
+
+ pfd.fd = fd;
+ pfd.events = POLL_IN;
+ while (1) {
+ int i = 0;
+ char buffer[EV_BUF_LEN];
+ ret = poll(&pfd, 1, 1000 * timeout);
+
+ if (ret == -1) {
+ ublk_err("%s: poll inotify failed: %d\n", __func__, ret);
+ goto rm_watch;
+ } else if (ret == 0) {
+ ublk_err("%s: poll inotify timeout\n", __func__);
+ ret = -ETIMEDOUT;
+ goto rm_watch;
+ }
+
+ ret = read(fd, buffer, EV_BUF_LEN);
+ if (ret < 0) {
+ ublk_err("%s: read inotify fd failed\n", __func__);
+ goto rm_watch;
+ }
+
+ while (i < ret) {
+ struct inotify_event *event = (struct inotify_event *)&buffer[i];
+
+ ublk_dbg(UBLK_DBG_DEV, "%s: inotify event %x %s\n",
+ __func__, event->mask, event->name);
+ if (event->mask & evt_mask) {
+ if (!strcmp(event->name, dev_name)) {
+ ret = 0;
+ goto rm_watch;
+ }
+ }
+ i += EV_SIZE + event->len;
+ }
+ }
+rm_watch:
+ inotify_rm_watch(fd, wd);
+fail:
+ close(fd);
+ return ret;
+}
+
+static int ublk_stop_io_daemon(const struct ublk_dev *dev)
+{
+ int daemon_pid = dev->dev_info.ublksrv_pid;
+ int dev_id = dev->dev_info.dev_id;
+ char ublkc[64];
+ int ret;
+
+ /* daemon may be dead already */
+ if (kill(daemon_pid, 0) < 0)
+ goto wait;
+
+ /*
+ * Wait until ublk char device is closed, when our daemon is shutdown
+ */
+ snprintf(ublkc, sizeof(ublkc), "%s%d", "ublkc", dev_id);
+ ret = wait_ublk_dev(ublkc, IN_CLOSE_WRITE, 3);
+ /* double check and inotify may not be 100% reliable */
+ if (ret == -ETIMEDOUT)
+ /* the daemon doesn't exist now if kill(0) fails */
+ ret = kill(daemon_pid, 0) < 0;
+wait:
+ waitpid(daemon_pid, NULL, 0);
+ ublk_dbg(UBLK_DBG_DEV, "%s: pid %d dev_id %d ret %d\n",
+ __func__, daemon_pid, dev_id, ret);
+
+ return ret;
+}
+
+static int cmd_dev_add(struct dev_ctx *ctx)
+{
+ char *tgt_type = ctx->tgt_type;
+ unsigned depth = ctx->queue_depth;
+ unsigned nr_queues = ctx->nr_hw_queues;
+ __u64 features;
+ const struct ublk_tgt_ops *ops;
+ struct ublksrv_ctrl_dev_info *info;
+ struct ublk_dev *dev;
+ int dev_id = ctx->dev_id;
+ char ublkb[64];
+ int ret;
+
+ ops = ublk_find_tgt(tgt_type);
+ if (!ops) {
+ ublk_err("%s: no such tgt type, type %s\n",
+ __func__, tgt_type);
+ return -ENODEV;
+ }
+
+ if (nr_queues > UBLK_MAX_QUEUES || depth > UBLK_QUEUE_DEPTH) {
+ ublk_err("%s: invalid nr_queues or depth queues %u depth %u\n",
+ __func__, nr_queues, depth);
+ return -EINVAL;
+ }
+
+ dev = ublk_ctrl_init();
+ if (!dev) {
+ ublk_err("%s: can't alloc dev id %d, type %s\n",
+ __func__, dev_id, tgt_type);
+ return -ENOMEM;
+ }
+
+ /* kernel doesn't support get_features */
+ ret = ublk_ctrl_get_features(dev, &features);
+ if (ret < 0)
+ return -EINVAL;
+
+ if (!(features & UBLK_F_CMD_IOCTL_ENCODE))
+ return -ENOTSUP;
+
+ info = &dev->dev_info;
+ info->dev_id = ctx->dev_id;
+ info->nr_hw_queues =nr_queues;
+ info->queue_depth = depth;
+ info->flags = ctx->flags;
+ dev->tgt.ops = ops;
+ dev->tgt.sq_depth = depth;
+ dev->tgt.cq_depth = depth;
+ dev->bpf_prog_id = ctx->bpf_prog_id;
+
+ ret = ublk_ctrl_add_dev(dev);
+ if (ret < 0) {
+ ublk_err("%s: can't add dev id %d, type %s ret %d\n",
+ __func__, dev_id, tgt_type, ret);
+ goto fail;
+ }
+
+ ret = -EINVAL;
+ switch (fork()) {
+ case -1:
+ goto fail;
+ case 0:
+ ublk_start_daemon(dev);
+ ublk_dbg(UBLK_DBG_DEV, "%s: daemon is started in children");
+ exit(EXIT_SUCCESS);
+ }
+
+ /*
+ * Wait until ublk disk is added, when our daemon is started
+ * successfully
+ */
+ snprintf(ublkb, sizeof(ublkb), "%s%u", "ublkb", dev->dev_info.dev_id);
+ ret = wait_ublk_dev(ublkb, IN_CREATE, 3);
+ if (ret < 0) {
+ ublk_err("%s: can't start daemon id %d, type %s\n",
+ __func__, dev_id, tgt_type);
+ ublk_ctrl_del_dev(dev);
+ } else {
+ ctx->dev_id = dev->dev_info.dev_id;
+ }
+ ublk_dbg(UBLK_DBG_DEV, "%s: start daemon id %d, type %s\n",
+ __func__, ctx->dev_id, tgt_type);
+fail:
+ ublk_ctrl_deinit(dev);
+ return ret;
+}
+
+static int __cmd_dev_del(struct dev_ctx *ctx)
+{
+ int number = ctx->dev_id;
+ struct ublk_dev *dev;
+ int ret;
+
+ dev = ublk_ctrl_init();
+ dev->dev_info.dev_id = number;
+
+ ret = ublk_ctrl_get_info(dev);
+ if (ret < 0)
+ goto fail;
+
+ ret = ublk_ctrl_stop_dev(dev);
+ if (ret < 0)
+ ublk_err("%s: stop dev %d failed ret %d\n", __func__, number, ret);
+
+ ret = ublk_stop_io_daemon(dev);
+ if (ret < 0)
+ ublk_err("%s: stop daemon id %d dev %d, ret %d\n",
+ __func__, dev->dev_info.ublksrv_pid, number, ret);
+ ublk_ctrl_del_dev(dev);
+fail:
+ if (ret >= 0)
+ ret = ublk_ctrl_get_info(dev);
+ ublk_ctrl_deinit(dev);
+
+ return (ret >= 0) ? 0 : ret;
+}
+
+static int cmd_dev_del(struct dev_ctx *ctx)
+{
+ int i;
+
+ if (ctx->dev_id >= 0 || !ctx->all)
+ return __cmd_dev_del(ctx);
+
+ for (i = 0; i < 255; i++) {
+ ctx->dev_id = i;
+ __cmd_dev_del(ctx);
+ }
+ return 0;
+}
+
+static int __cmd_dev_list(struct dev_ctx *ctx)
+{
+ struct ublk_dev *dev = ublk_ctrl_init();
+ int ret;
+
+ if (!dev)
+ return -ENODEV;
+
+ dev->dev_info.dev_id = ctx->dev_id;
+
+ ret = ublk_ctrl_get_info(dev);
+ if (ret < 0) {
+ if (ctx->logging)
+ ublk_err("%s: can't get dev info from %d: %d\n",
+ __func__, ctx->dev_id, ret);
+ } else {
+ ublk_ctrl_dump(dev, false);
+ }
+
+ ublk_ctrl_deinit(dev);
+
+ return ret;
+}
+
+static int cmd_dev_list(struct dev_ctx *ctx)
+{
+ int i;
+
+ if (ctx->dev_id >= 0 || !ctx->all)
+ return __cmd_dev_list(ctx);
+
+ ctx->logging = false;
+ for (i = 0; i < 255; i++) {
+ ctx->dev_id = i;
+ __cmd_dev_list(ctx);
+ }
+ return 0;
+}
+
+static int cmd_dev_unreg_bpf(struct dev_ctx *ctx)
+{
+ char path[PATH_MAX];
+ char cmd[PATH_MAX + 16];
+ struct stat st;
+
+ snprintf(path, PATH_MAX, "/sys/fs/bpf/%s/%s", UBLK_BPF_PIN_PATH, ctx->tgt_type);
+ if (stat(path, &st) != 0) {
+ ublk_err("bpf prog %s isn't registered on %s\n", ctx->tgt_type, path);
+ return -ENOENT;
+ }
+
+ sprintf(cmd, "rm -r %s", path);
+ if (system(cmd)) {
+ ublk_err("fail to run %s\n", cmd);
+ return -ENOENT;
+ }
+
+ return 0;
+}
+
+static int pathname_concat(char *buf, int buf_sz, const char *path,
+ const char *name)
+{
+ int len;
+
+ len = snprintf(buf, buf_sz, "%s/%s", path, name);
+ if (len < 0)
+ return -EINVAL;
+ if (len >= buf_sz)
+ return -ENAMETOOLONG;
+
+ return 0;
+}
+
+static int pin_map(struct bpf_map *map, const char *pindir,
+ const char *name)
+{
+ char pinfile[PATH_MAX];
+ int err;
+
+ err = pathname_concat(pinfile, sizeof(pinfile), pindir, name);
+ if (err)
+ return -1;
+
+ return bpf_map__pin(map, pinfile);
+}
+
+static int pin_link(struct bpf_link *link, const char *pindir,
+ const char *name)
+{
+ char pinfile[PATH_MAX];
+ int err;
+
+ err = pathname_concat(pinfile, sizeof(pinfile), pindir, name);
+ if (err)
+ return -1;
+
+ return bpf_link__pin(link, pinfile);
+}
+
+static int cmd_dev_reg_bpf(struct dev_ctx *ctx)
+{
+ LIBBPF_OPTS(bpf_object_open_opts, open_opts);
+ struct bpf_object *obj;
+ struct bpf_map *map;
+ char path[PATH_MAX];
+ struct stat st;
+
+ assert(ctx->nr_files == 1);
+
+ snprintf(path, PATH_MAX, "/sys/fs/bpf");
+ if (stat(path, &st) != 0) {
+ ublk_err("bpf fs isn't mounted on %s\n", path);
+ return -ENOENT;
+ }
+
+ snprintf(path, PATH_MAX, "/sys/fs/bpf/%s", UBLK_BPF_PIN_PATH);
+ if (stat(path, &st) != 0) {
+ if (mkdir(path, 0700) != 0) {
+ ublk_err("fail to create ublk bpf on %s\n", path);
+ return -ENOENT;
+ }
+ }
+
+ snprintf(path, PATH_MAX, "/sys/fs/bpf/%s/%s", UBLK_BPF_PIN_PATH, ctx->tgt_type);
+ if (stat(path, &st) == 0) {
+ ublk_err("fail to pin ublk bpf on %s\n", path);
+ return -EEXIST;
+ }
+
+ obj = bpf_object__open_file(ctx->files[0], &open_opts);
+ if (!obj)
+ return -1;
+
+ if (bpf_object__load(obj)) {
+ ublk_err("fail to load bpf obj from %s\n", ctx->files[0]);
+ bpf_object__close(obj);
+ return -1;
+ }
+
+ bpf_object__for_each_map(map, obj) {
+ struct bpf_link *link;
+
+ if (bpf_map__type(map) != BPF_MAP_TYPE_STRUCT_OPS) {
+ if (!bpf_map__is_internal(map))
+ pin_map(map, path, bpf_map__name(map));
+ continue;
+ }
+
+ link = bpf_map__attach_struct_ops(map);
+ if (!link) {
+ ublk_err("can't register struct_ops %s: %s",
+ bpf_map__name(map), strerror(errno));
+ continue;
+ }
+ pin_link(link, path, bpf_map__name(map));
+
+ bpf_link__disconnect(link);
+ bpf_link__destroy(link);
+ }
+
+ bpf_object__close(obj);
+ return 0;
+}
+
+static int cmd_dev_help(char *exe)
+{
+ printf("%s add -t [null] [-q nr_queues] [-d depth] [-n dev_id] [--bpf_prog ublk_prog_id] [backfile1] [backfile2] ...\n", exe);
+ printf("\t default: nr_queues=2(max 4), depth=128(max 128), dev_id=-1(auto allocation)\n");
+ printf("%s del [-n dev_id] -a \n", exe);
+ printf("\t -a delete all devices -n delete specified device\n");
+ printf("%s list [-n dev_id] -a \n", exe);
+ printf("\t -a list all devices, -n list specified device, default -a \n");
+ printf("%s reg -t [null] bpf_prog_obj_path \n", exe);
+ printf("%s unreg -t [null]\n", exe);
+ return 0;
+}
+
+/****************** part 2: target implementation ********************/
+
+static int ublk_null_tgt_init(struct ublk_dev *dev)
+{
+ const struct ublksrv_ctrl_dev_info *info = &dev->dev_info;
+ unsigned long dev_size = 250UL << 30;
+ bool use_bpf = info->flags & UBLK_F_BPF;
+
+ dev->tgt.dev_size = dev_size;
+ dev->tgt.params = (struct ublk_params) {
+ .types = UBLK_PARAM_TYPE_BASIC |
+ (use_bpf ? UBLK_PARAM_TYPE_BPF : 0),
+ .basic = {
+ .logical_bs_shift = 9,
+ .physical_bs_shift = 12,
+ .io_opt_shift = 12,
+ .io_min_shift = 9,
+ .max_sectors = info->max_io_buf_bytes >> 9,
+ .dev_sectors = dev_size >> 9,
+ },
+ .bpf = {
+ .flags = UBLK_BPF_HAS_OPS_ID,
+ .ops_id = dev->bpf_prog_id,
+ },
+ };
+
+ return 0;
+}
+
+static int ublk_null_queue_io(struct ublk_queue *q, int tag)
+{
+ const struct ublksrv_io_desc *iod = ublk_get_iod(q, tag);
+
+ /* won't be called for UBLK_F_BPF */
+ assert(!(q->dev->dev_info.flags & UBLK_F_BPF));
+
+ ublk_complete_io(q, tag, iod->nr_sectors << 9);
+
+ return 0;
+}
+
+static const struct ublk_tgt_ops tgt_ops_list[] = {
+ {
+ .name = "null",
+ .init_tgt = ublk_null_tgt_init,
+ .queue_io = ublk_null_queue_io,
+ },
+};
+
+static const struct ublk_tgt_ops *ublk_find_tgt(const char *name)
+{
+ const struct ublk_tgt_ops *ops;
+ int i;
+
+ if (name == NULL)
+ return NULL;
+
+ for (i = 0; sizeof(tgt_ops_list) / sizeof(*ops); i++)
+ if (strcmp(tgt_ops_list[i].name, name) == 0)
+ return &tgt_ops_list[i];
+ return NULL;
+}
+
+int main(int argc, char *argv[])
+{
+ static const struct option longopts[] = {
+ { "all", 0, NULL, 'a' },
+ { "type", 1, NULL, 't' },
+ { "number", 1, NULL, 'n' },
+ { "queues", 1, NULL, 'q' },
+ { "depth", 1, NULL, 'd' },
+ { "debug_mask", 1, NULL, 0 },
+ { "quiet", 0, NULL, 0 },
+ { "bpf_prog", 1, NULL, 0 },
+ { 0, 0, 0, 0 }
+ };
+ int option_idx, opt;
+ const char *cmd = argv[1];
+ struct dev_ctx ctx = {
+ .queue_depth = 128,
+ .nr_hw_queues = 2,
+ .dev_id = -1,
+ .bpf_prog_id = -1,
+ };
+ int ret = -EINVAL, i;
+
+ if (argc == 1)
+ return ret;
+
+ optind = 2;
+ while ((opt = getopt_long(argc, argv, "t:n:d:q:a",
+ longopts, &option_idx)) != -1) {
+ switch (opt) {
+ case 'a':
+ ctx.all = 1;
+ break;
+ case 'n':
+ ctx.dev_id = strtol(optarg, NULL, 10);
+ break;
+ case 't':
+ strncpy(ctx.tgt_type, optarg,
+ min(sizeof(ctx.tgt_type), strlen(optarg)));
+ break;
+ case 'q':
+ ctx.nr_hw_queues = strtol(optarg, NULL, 10);
+ break;
+ case 'd':
+ ctx.queue_depth = strtol(optarg, NULL, 10);
+ break;
+ case 0:
+ if (!strcmp(longopts[option_idx].name, "debug_mask"))
+ ublk_dbg_mask = strtol(optarg, NULL, 16);
+ if (!strcmp(longopts[option_idx].name, "quiet"))
+ ublk_dbg_mask = 0;
+ if (!strcmp(longopts[option_idx].name, "bpf_prog")) {
+ ctx.bpf_prog_id = strtol(optarg, NULL, 10);
+ ctx.flags |= UBLK_F_BPF;
+ }
+ break;
+ }
+ }
+
+ i = optind;
+ while (i < argc && ctx.nr_files < MAX_BACK_FILES) {
+ ctx.files[ctx.nr_files++] = argv[i++];
+ }
+
+ if (!strcmp(cmd, "add"))
+ ret = cmd_dev_add(&ctx);
+ else if (!strcmp(cmd, "del"))
+ ret = cmd_dev_del(&ctx);
+ else if (!strcmp(cmd, "list")) {
+ ctx.all = 1;
+ ret = cmd_dev_list(&ctx);
+ } else if (!strcmp(cmd, "reg"))
+ ret = cmd_dev_reg_bpf(&ctx);
+ else if (!strcmp(cmd, "unreg"))
+ ret = cmd_dev_unreg_bpf(&ctx);
+ else if (!strcmp(cmd, "help"))
+ ret = cmd_dev_help(argv[0]);
+ else
+ cmd_dev_help(argv[0]);
+
+ return ret;
+}
--
2.47.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC PATCH 13/22] selftests: ublk: add tests for covering io split
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
` (11 preceding siblings ...)
2025-01-07 12:04 ` [RFC PATCH 12/22] selftests: ublk: add tests for the ublk-bpf initial implementation Ming Lei
@ 2025-01-07 12:04 ` Ming Lei
2025-01-07 12:04 ` [RFC PATCH 14/22] selftests: ublk: add tests for covering redirecting to userspace Ming Lei
` (8 subsequent siblings)
21 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-01-07 12:04 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
Ming Lei
One io command can be queued in split way, add test case for covering
this way:
- split the io command into two sub-io if the io size is bigger than 512
- the 1st sub-io size is 512byte, and the 2nd sub-io is the remained
bytes
Complete the whole io command until the two sub-io are queued.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
tools/testing/selftests/ublk/Makefile | 1 +
.../testing/selftests/ublk/progs/ublk_null.c | 46 +++++++++++++++++++
tools/testing/selftests/ublk/test_null_03.sh | 21 +++++++++
3 files changed, 68 insertions(+)
create mode 100755 tools/testing/selftests/ublk/test_null_03.sh
diff --git a/tools/testing/selftests/ublk/Makefile b/tools/testing/selftests/ublk/Makefile
index a95f317211e7..5a940bae9cbb 100644
--- a/tools/testing/selftests/ublk/Makefile
+++ b/tools/testing/selftests/ublk/Makefile
@@ -21,6 +21,7 @@ endif
TEST_PROGS := test_null_01.sh
TEST_PROGS += test_null_02.sh
+TEST_PROGS += test_null_03.sh
# Order correspond to 'make run_tests' order
TEST_GEN_PROGS_EXTENDED = ublk_bpf
diff --git a/tools/testing/selftests/ublk/progs/ublk_null.c b/tools/testing/selftests/ublk/progs/ublk_null.c
index 3225b52dcd24..523bf8ff3ef8 100644
--- a/tools/testing/selftests/ublk/progs/ublk_null.c
+++ b/tools/testing/selftests/ublk/progs/ublk_null.c
@@ -11,6 +11,40 @@
/* libbpf v1.4.5 is required for struct_ops to work */
+static inline ublk_bpf_return_t __ublk_null_handle_io_split(const struct ublk_bpf_io *io, unsigned int _off)
+{
+ unsigned long off = -1, sects = -1;
+ const struct ublksrv_io_desc *iod;
+ int res;
+
+ iod = ublk_bpf_get_iod(io);
+ if (iod) {
+ res = iod->nr_sectors << 9;
+ off = iod->start_sector;
+ sects = iod->nr_sectors;
+ } else
+ res = -EINVAL;
+
+ BPF_DBG("ublk dev %u qid %u: handle io tag %u %lx-%d res %d",
+ ublk_bpf_get_dev_id(io),
+ ublk_bpf_get_queue_id(io),
+ ublk_bpf_get_io_tag(io),
+ off, sects, res);
+ if (res < 0) {
+ ublk_bpf_complete_io(io, res);
+ return ublk_bpf_return_val(UBLK_BPF_IO_QUEUED, 0);
+ }
+
+ /* split this io to one 512bytes sub-io and the remainder */
+ if (_off < 512 && res > 512)
+ return ublk_bpf_return_val(UBLK_BPF_IO_CONTINUE, 512);
+
+ /* complete the whole io command after the 2nd sub-io is queued */
+ ublk_bpf_complete_io(io, res);
+ return ublk_bpf_return_val(UBLK_BPF_IO_QUEUED, 0);
+}
+
+
static inline ublk_bpf_return_t __ublk_null_handle_io(const struct ublk_bpf_io *io, unsigned int _off)
{
unsigned long off = -1, sects = -1;
@@ -60,4 +94,16 @@ struct ublk_bpf_ops null_ublk_bpf_ops = {
.detach_dev = (void *)ublk_null_detach_dev,
};
+SEC("struct_ops/ublk_bpf_queue_io_cmd")
+ublk_bpf_return_t BPF_PROG(ublk_null_handle_io_split, struct ublk_bpf_io *io, unsigned int off)
+{
+ return __ublk_null_handle_io_split(io, off);
+}
+
+SEC(".struct_ops.link")
+struct ublk_bpf_ops null_ublk_bpf_ops_split = {
+ .id = 1,
+ .queue_io_cmd = (void *)ublk_null_handle_io_split,
+};
+
char LICENSE[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/ublk/test_null_03.sh b/tools/testing/selftests/ublk/test_null_03.sh
new file mode 100755
index 000000000000..c0b3a4d941c9
--- /dev/null
+++ b/tools/testing/selftests/ublk/test_null_03.sh
@@ -0,0 +1,21 @@
+#!/bin/bash
+
+. test_common.sh
+
+TID="null_03"
+ERR_CODE=0
+
+# prepare and register & pin bpf prog
+_prep_bpf_test "null" ublk_null.bpf.o
+
+# add two ublk null disks with the pinned bpf prog
+_add_ublk_dev -t null -n 0 --bpf_prog 1 --quiet
+
+# run fio over the ublk disk
+fio --name=job1 --filename=/dev/ublkb0 --ioengine=libaio --rw=readwrite --iodepth=32 --size=256M > /dev/null 2>&1
+ERR_CODE=$?
+
+# clean and unregister & unpin the bpf prog
+_cleanup_bpf_test "null"
+
+_show_result $TID $ERR_CODE
--
2.47.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC PATCH 14/22] selftests: ublk: add tests for covering redirecting to userspace
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
` (12 preceding siblings ...)
2025-01-07 12:04 ` [RFC PATCH 13/22] selftests: ublk: add tests for covering io split Ming Lei
@ 2025-01-07 12:04 ` Ming Lei
2025-01-07 12:04 ` [RFC PATCH 15/22] ublk: bpf: add bpf aio kfunc Ming Lei
` (7 subsequent siblings)
21 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-01-07 12:04 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
Ming Lei
Reuse ublk-null for testing UBLK_BPF_IO_REDIRECT:
- queue & complete io with odd tag number
- redirect io with even tag number, and let userspace handle their
queueing & completion
- also select some ios, and returns -EAGAIN from userspace & marking
it as ready for bpf prog to handle, then finally completed with bpf
prog in 2nd time
So we can cover code path for UBLK_BPF_IO_REDIRECT.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
tools/testing/selftests/ublk/Makefile | 1 +
.../selftests/ublk/progs/ublk_bpf_kfunc.h | 10 +++
.../testing/selftests/ublk/progs/ublk_null.c | 68 +++++++++++++++++++
tools/testing/selftests/ublk/test_null_04.sh | 21 ++++++
tools/testing/selftests/ublk/ublk_bpf.c | 39 ++++++++++-
5 files changed, 136 insertions(+), 3 deletions(-)
create mode 100755 tools/testing/selftests/ublk/test_null_04.sh
diff --git a/tools/testing/selftests/ublk/Makefile b/tools/testing/selftests/ublk/Makefile
index 5a940bae9cbb..38903f05d99d 100644
--- a/tools/testing/selftests/ublk/Makefile
+++ b/tools/testing/selftests/ublk/Makefile
@@ -22,6 +22,7 @@ endif
TEST_PROGS := test_null_01.sh
TEST_PROGS += test_null_02.sh
TEST_PROGS += test_null_03.sh
+TEST_PROGS += test_null_04.sh
# Order correspond to 'make run_tests' order
TEST_GEN_PROGS_EXTENDED = ublk_bpf
diff --git a/tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h b/tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h
index acab490d933c..1db8870b57d6 100644
--- a/tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h
+++ b/tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h
@@ -20,4 +20,14 @@ extern void ublk_bpf_complete_io(const struct ublk_bpf_io *io, int res) __ksym;
extern int ublk_bpf_get_dev_id(const struct ublk_bpf_io *io) __ksym;
extern int ublk_bpf_get_queue_id(const struct ublk_bpf_io *io) __ksym;
extern int ublk_bpf_get_io_tag(const struct ublk_bpf_io *io) __ksym;
+
+static inline unsigned long long build_io_key(const struct ublk_bpf_io *io)
+{
+ unsigned long long dev_id = (unsigned short)ublk_bpf_get_dev_id(io);
+ unsigned long long q_id = (unsigned short)ublk_bpf_get_queue_id(io);
+ unsigned long long tag = ublk_bpf_get_io_tag(io);
+
+ return (dev_id << 32) | (q_id << 16) | tag;
+}
+
#endif
diff --git a/tools/testing/selftests/ublk/progs/ublk_null.c b/tools/testing/selftests/ublk/progs/ublk_null.c
index 523bf8ff3ef8..cebdc8a2a214 100644
--- a/tools/testing/selftests/ublk/progs/ublk_null.c
+++ b/tools/testing/selftests/ublk/progs/ublk_null.c
@@ -9,6 +9,14 @@
//#define DEBUG
#include "ublk_bpf.h"
+/* todo: make it writable payload of ublk_bpf_io */
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(max_entries, 10240);
+ __type(key, unsigned long long); /* dev_id + q_id + tag */
+ __type(value, int);
+} io_map SEC(".maps");
+
/* libbpf v1.4.5 is required for struct_ops to work */
static inline ublk_bpf_return_t __ublk_null_handle_io_split(const struct ublk_bpf_io *io, unsigned int _off)
@@ -44,6 +52,54 @@ static inline ublk_bpf_return_t __ublk_null_handle_io_split(const struct ublk_bp
return ublk_bpf_return_val(UBLK_BPF_IO_QUEUED, 0);
}
+static inline ublk_bpf_return_t __ublk_null_handle_io_redirect(const struct ublk_bpf_io *io, unsigned int _off)
+{
+ unsigned int tag = ublk_bpf_get_io_tag(io);
+ unsigned long off = -1, sects = -1;
+ const struct ublksrv_io_desc *iod;
+ int res;
+
+ iod = ublk_bpf_get_iod(io);
+ if (iod) {
+ res = iod->nr_sectors << 9;
+ off = iod->start_sector;
+ sects = iod->nr_sectors;
+ } else
+ res = -EINVAL;
+
+ BPF_DBG("ublk dev %u qid %u: handle io tag %u %lx-%d res %d",
+ ublk_bpf_get_dev_id(io),
+ ublk_bpf_get_queue_id(io),
+ ublk_bpf_get_io_tag(io),
+ off, sects, res);
+ if (res < 0) {
+ ublk_bpf_complete_io(io, res);
+ return ublk_bpf_return_val(UBLK_BPF_IO_QUEUED, 0);
+ }
+
+ if (tag & 0x1) {
+ /* complete the whole io command after the 2nd sub-io is queued */
+ ublk_bpf_complete_io(io, res);
+ return ublk_bpf_return_val(UBLK_BPF_IO_QUEUED, 0);
+ } else {
+ unsigned long long key = build_io_key(io);
+ int *pv;
+
+ /* stored value means if it is ready to complete IO */
+ pv = bpf_map_lookup_elem(&io_map, &key);
+ if (pv && *pv) {
+ ublk_bpf_complete_io(io, res);
+ return ublk_bpf_return_val(UBLK_BPF_IO_QUEUED, 0);
+ } else {
+ int v = 0;
+ res = bpf_map_update_elem(&io_map, &key, &v, BPF_ANY);
+ if (res)
+ bpf_printk("update io map element failed %d key %llx\n", res, key);
+ return ublk_bpf_return_val(UBLK_BPF_IO_REDIRECT, 0);
+ }
+ }
+}
+
static inline ublk_bpf_return_t __ublk_null_handle_io(const struct ublk_bpf_io *io, unsigned int _off)
{
@@ -106,4 +162,16 @@ struct ublk_bpf_ops null_ublk_bpf_ops_split = {
.queue_io_cmd = (void *)ublk_null_handle_io_split,
};
+SEC("struct_ops/ublk_bpf_queue_io_cmd")
+ublk_bpf_return_t BPF_PROG(ublk_null_handle_io_redirect, struct ublk_bpf_io *io, unsigned int off)
+{
+ return __ublk_null_handle_io_redirect(io, off);
+}
+
+SEC(".struct_ops.link")
+struct ublk_bpf_ops null_ublk_bpf_ops_redirect = {
+ .id = 2,
+ .queue_io_cmd = (void *)ublk_null_handle_io_redirect,
+};
+
char LICENSE[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/ublk/test_null_04.sh b/tools/testing/selftests/ublk/test_null_04.sh
new file mode 100755
index 000000000000..f175e2ddb5cd
--- /dev/null
+++ b/tools/testing/selftests/ublk/test_null_04.sh
@@ -0,0 +1,21 @@
+#!/bin/bash
+
+. test_common.sh
+
+TID="null_04"
+ERR_CODE=0
+
+# prepare and register & pin bpf prog
+_prep_bpf_test "null" ublk_null.bpf.o
+
+# add two ublk null disks with the pinned bpf prog
+_add_ublk_dev -t null -n 0 --bpf_prog 2 --quiet
+
+# run fio over the ublk disk
+fio --name=job1 --filename=/dev/ublkb0 --ioengine=libaio --rw=readwrite --iodepth=32 --size=256M > /dev/null 2>&1
+ERR_CODE=$?
+
+# clean and unregister & unpin the bpf prog
+_cleanup_bpf_test "null"
+
+_show_result $TID $ERR_CODE
diff --git a/tools/testing/selftests/ublk/ublk_bpf.c b/tools/testing/selftests/ublk/ublk_bpf.c
index 2d923e42845d..e2c2e92268e1 100644
--- a/tools/testing/selftests/ublk/ublk_bpf.c
+++ b/tools/testing/selftests/ublk/ublk_bpf.c
@@ -1283,6 +1283,16 @@ static int cmd_dev_help(char *exe)
}
/****************** part 2: target implementation ********************/
+//extern int bpf_map_update_elem(int fd, const void *key, const void *value,
+// __u64 flags);
+
+static inline unsigned long long build_io_key(struct ublk_queue *q, int tag)
+{
+ unsigned long long dev_id = (unsigned short)q->dev->dev_info.dev_id;
+ unsigned long long q_id = (unsigned short)q->q_id;
+
+ return (dev_id << 32) | (q_id << 16) | tag;
+}
static int ublk_null_tgt_init(struct ublk_dev *dev)
{
@@ -1314,12 +1324,35 @@ static int ublk_null_tgt_init(struct ublk_dev *dev)
static int ublk_null_queue_io(struct ublk_queue *q, int tag)
{
const struct ublksrv_io_desc *iod = ublk_get_iod(q, tag);
+ bool bpf = q->dev->dev_info.flags & UBLK_F_BPF;
- /* won't be called for UBLK_F_BPF */
- assert(!(q->dev->dev_info.flags & UBLK_F_BPF));
+ /* either !UBLK_F_BPF or UBLK_F_BPF with redirect */
+ assert(!bpf || (bpf && !(tag & 0x1)));
- ublk_complete_io(q, tag, iod->nr_sectors << 9);
+ if (bpf && (tag % 4)) {
+ unsigned long long key = build_io_key(q, tag);
+ int map_fd;
+ int err;
+ int val = 1;
+
+ map_fd = bpf_obj_get("/sys/fs/bpf/ublk/null/io_map");
+ if (map_fd < 0) {
+ ublk_err("Error finding BPF map fd from pinned path\n");
+ goto exit;
+ }
+
+ /* make this io ready for bpf prog to handle */
+ err = bpf_map_update_elem(map_fd, &key, &val, BPF_ANY);
+ if (err) {
+ ublk_err("Error updating map element: %d\n", errno);
+ goto exit;
+ }
+ ublk_complete_io(q, tag, -EAGAIN);
+ return 0;
+ }
+exit:
+ ublk_complete_io(q, tag, iod->nr_sectors << 9);
return 0;
}
--
2.47.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC PATCH 15/22] ublk: bpf: add bpf aio kfunc
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
` (13 preceding siblings ...)
2025-01-07 12:04 ` [RFC PATCH 14/22] selftests: ublk: add tests for covering redirecting to userspace Ming Lei
@ 2025-01-07 12:04 ` Ming Lei
2025-01-07 12:04 ` [RFC PATCH 16/22] ublk: bpf: add bpf aio struct_ops Ming Lei
` (6 subsequent siblings)
21 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-01-07 12:04 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
Ming Lei
Define bpf aio kfunc for bpf prog to submit AIO, so far it begins with
filesystem IO only, and in the future, it may be extended for network IO.
Only bvec buffer is covered for doing FS IO over this buffer, but it
is easy to cover UBUF because we have the great iov iter.
With bpf aio, not only user-kernel context switch is avoided, but also
user-kernel buffer copy is saved. It is very similar with loop's direct
IO implementation.
These kfuncs can be used for other subsystems, and should have belong to
lib/, but let's start from ublk first. When it becomes mature or gets more
use cases, it can be moved to /lib.
Define bpf struct_ops of bpf_aio_complete_ops which needs to be implemented
by the caller for completing bpf aio via bpf prog, which will be done in the
following patches.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/block/ublk/Makefile | 2 +-
drivers/block/ublk/bpf.c | 40 +++++-
drivers/block/ublk/bpf.h | 1 +
drivers/block/ublk/bpf_aio.c | 251 +++++++++++++++++++++++++++++++++++
drivers/block/ublk/bpf_aio.h | 66 +++++++++
5 files changed, 358 insertions(+), 2 deletions(-)
create mode 100644 drivers/block/ublk/bpf_aio.c
create mode 100644 drivers/block/ublk/bpf_aio.h
diff --git a/drivers/block/ublk/Makefile b/drivers/block/ublk/Makefile
index f843a9005cdb..7094607c040d 100644
--- a/drivers/block/ublk/Makefile
+++ b/drivers/block/ublk/Makefile
@@ -5,6 +5,6 @@ ccflags-y += -I$(src)
ublk_drv-$(CONFIG_BLK_DEV_UBLK) := main.o
ifeq ($(CONFIG_UBLK_BPF), y)
-ublk_drv-$(CONFIG_BLK_DEV_UBLK) += bpf_ops.o bpf.o
+ublk_drv-$(CONFIG_BLK_DEV_UBLK) += bpf_ops.o bpf.o bpf_aio.o
endif
obj-$(CONFIG_BLK_DEV_UBLK) += ublk_drv.o
diff --git a/drivers/block/ublk/bpf.c b/drivers/block/ublk/bpf.c
index ef1546a7ccda..d5880d61abe5 100644
--- a/drivers/block/ublk/bpf.c
+++ b/drivers/block/ublk/bpf.c
@@ -155,8 +155,23 @@ BTF_ID_FLAGS(func, ublk_bpf_get_iod, KF_TRUSTED_ARGS | KF_RET_NULL)
BTF_ID_FLAGS(func, ublk_bpf_get_io_tag, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, ublk_bpf_get_queue_id, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, ublk_bpf_get_dev_id, KF_TRUSTED_ARGS)
+
+/* bpf aio kfunc */
+BTF_ID_FLAGS(func, bpf_aio_alloc, KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_aio_alloc_sleepable, KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_aio_release)
+BTF_ID_FLAGS(func, bpf_aio_submit)
BTF_KFUNCS_END(ublk_bpf_kfunc_ids)
+__bpf_kfunc void bpf_aio_release_dtor(void *aio)
+{
+ bpf_aio_release(aio);
+}
+CFI_NOSEAL(bpf_aio_release_dtor);
+BTF_ID_LIST(bpf_aio_dtor_ids)
+BTF_ID(struct, bpf_aio)
+BTF_ID(func, bpf_aio_release_dtor)
+
static const struct btf_kfunc_id_set ublk_bpf_kfunc_set = {
.owner = THIS_MODULE,
.set = &ublk_bpf_kfunc_ids,
@@ -164,6 +179,12 @@ static const struct btf_kfunc_id_set ublk_bpf_kfunc_set = {
int __init ublk_bpf_init(void)
{
+ const struct btf_id_dtor_kfunc aio_dtors[] = {
+ {
+ .btf_id = bpf_aio_dtor_ids[0],
+ .kfunc_btf_id = bpf_aio_dtor_ids[1]
+ },
+ };
int err;
err = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS,
@@ -172,5 +193,22 @@ int __init ublk_bpf_init(void)
pr_warn("error while setting UBLK BPF tracing kfuncs: %d", err);
return err;
}
- return ublk_bpf_struct_ops_init();
+
+ err = ublk_bpf_struct_ops_init();
+ if (err) {
+ pr_warn("error while initializing ublk bpf struct_ops: %d", err);
+ return err;
+ }
+
+ err = register_btf_id_dtor_kfuncs(aio_dtors, ARRAY_SIZE(aio_dtors),
+ THIS_MODULE);
+ if (err) {
+ pr_warn("error while registering aio destructor: %d", err);
+ return err;
+ }
+
+ err = bpf_aio_init();
+ if (err)
+ pr_warn("error while initializing bpf aio kfunc: %d", err);
+ return err;
}
diff --git a/drivers/block/ublk/bpf.h b/drivers/block/ublk/bpf.h
index 4e178cbecb74..0ab25743ae7d 100644
--- a/drivers/block/ublk/bpf.h
+++ b/drivers/block/ublk/bpf.h
@@ -3,6 +3,7 @@
#define UBLK_INT_BPF_HEADER
#include "bpf_reg.h"
+#include "bpf_aio.h"
typedef unsigned long ublk_bpf_return_t;
typedef ublk_bpf_return_t (*queue_io_cmd_t)(struct ublk_bpf_io *io, unsigned int);
diff --git a/drivers/block/ublk/bpf_aio.c b/drivers/block/ublk/bpf_aio.c
new file mode 100644
index 000000000000..65013fe8054f
--- /dev/null
+++ b/drivers/block/ublk/bpf_aio.c
@@ -0,0 +1,251 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Red Hat */
+
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/bpf.h>
+#include <linux/bpf_mem_alloc.h>
+#include <linux/btf.h>
+#include <linux/btf_ids.h>
+#include <linux/filter.h>
+
+#include "bpf_aio.h"
+
+static int __bpf_aio_submit(struct bpf_aio *aio);
+
+static struct kmem_cache *bpf_aio_cachep;
+static struct kmem_cache *bpf_aio_work_cachep;
+static struct workqueue_struct *bpf_aio_wq;
+
+static inline bool bpf_aio_is_rw(int op)
+{
+ return op == BPF_AIO_OP_FS_READ || op == BPF_AIO_OP_FS_WRITE;
+}
+
+/* check if it is short read */
+static bool bpf_aio_is_short_read(const struct bpf_aio *aio, long ret)
+{
+ return ret >= 0 && ret < aio->bytes &&
+ bpf_aio_get_op(aio) == BPF_AIO_OP_FS_READ;
+}
+
+/* zeroing the remained bytes starting from `off` to end */
+static void bpf_aio_zero_remained(const struct bpf_aio *aio, long off)
+{
+ struct iov_iter iter;
+
+ iov_iter_bvec(&iter, ITER_DEST, aio->buf.bvec, aio->buf.nr_bvec, aio->bytes);
+ iter.iov_offset = aio->buf.bvec_off;
+
+ iov_iter_advance(&iter, off);
+ iov_iter_zero(aio->bytes - off, &iter);
+}
+
+static void bpf_aio_do_completion(struct bpf_aio *aio)
+{
+ if (aio->iocb.ki_filp)
+ fput(aio->iocb.ki_filp);
+ if (aio->work)
+ kmem_cache_free(bpf_aio_work_cachep, aio->work);
+}
+
+/* ->ki_complete callback */
+static void bpf_aio_complete(struct kiocb *iocb, long ret)
+{
+ struct bpf_aio *aio = container_of(iocb, struct bpf_aio, iocb);
+
+ if (unlikely(ret == -EAGAIN)) {
+ aio->opf |= BPF_AIO_FORCE_WQ;
+ ret = __bpf_aio_submit(aio);
+ if (!ret)
+ return;
+ }
+
+ /* zero the remained bytes in case of short read */
+ if (bpf_aio_is_short_read(aio, ret))
+ bpf_aio_zero_remained(aio, ret);
+
+ bpf_aio_do_completion(aio);
+ aio->ops->bpf_aio_complete_cb(aio, ret);
+}
+
+static void bpf_aio_prep_rw(struct bpf_aio *aio, unsigned int rw,
+ struct iov_iter *iter)
+{
+ iov_iter_bvec(iter, rw, aio->buf.bvec, aio->buf.nr_bvec, aio->bytes);
+ iter->iov_offset = aio->buf.bvec_off;
+
+ if (unlikely(aio->opf & BPF_AIO_FORCE_WQ)) {
+ aio->iocb.ki_flags &= ~IOCB_NOWAIT;
+ aio->iocb.ki_complete = NULL;
+ } else {
+ aio->iocb.ki_flags |= IOCB_NOWAIT;
+ aio->iocb.ki_complete = bpf_aio_complete;
+ }
+}
+
+static int bpf_aio_do_submit(struct bpf_aio *aio)
+{
+ int op = bpf_aio_get_op(aio);
+ struct iov_iter iter;
+ struct file *file = aio->iocb.ki_filp;
+ int ret;
+
+ switch (op) {
+ case BPF_AIO_OP_FS_READ:
+ bpf_aio_prep_rw(aio, ITER_DEST, &iter);
+ if (file->f_op->read_iter)
+ ret = file->f_op->read_iter(&aio->iocb, &iter);
+ else
+ ret = -EOPNOTSUPP;
+ break;
+ case BPF_AIO_OP_FS_WRITE:
+ bpf_aio_prep_rw(aio, ITER_SOURCE, &iter);
+ if (file->f_op->write_iter)
+ ret = file->f_op->write_iter(&aio->iocb, &iter);
+ else
+ ret = -EOPNOTSUPP;
+ break;
+ case BPF_AIO_OP_FS_FSYNC:
+ ret = vfs_fsync_range(aio->iocb.ki_filp, aio->iocb.ki_pos,
+ aio->iocb.ki_pos + aio->bytes - 1, 0);
+ if (unlikely(ret && ret != -EINVAL))
+ ret = -EIO;
+ break;
+ case BPF_AIO_OP_FS_FALLOCATE:
+ ret = vfs_fallocate(aio->iocb.ki_filp, aio->iocb.ki_flags,
+ aio->iocb.ki_pos, aio->bytes);
+ break;
+ default:
+ ret = -EINVAL;
+ }
+
+ if (ret == -EIOCBQUEUED) {
+ ret = 0;
+ } else if (ret != -EAGAIN) {
+ bpf_aio_complete(&aio->iocb, ret);
+ ret = 0;
+ }
+
+ return ret;
+}
+
+static void bpf_aio_submit_work(struct work_struct *work)
+{
+ struct bpf_aio_work *aio_work = container_of(work, struct bpf_aio_work, work);
+
+ bpf_aio_do_submit(aio_work->aio);
+}
+
+static int __bpf_aio_submit(struct bpf_aio *aio)
+{
+ struct work_struct *work;
+
+do_submit:
+ if (likely(!(aio->opf & BPF_AIO_FORCE_WQ))) {
+ int ret = bpf_aio_do_submit(aio);
+
+ /* retry via workqueue in case of -EAGAIN */
+ if (ret != -EAGAIN)
+ return ret;
+ aio->opf |= BPF_AIO_FORCE_WQ;
+ }
+
+ if (!aio->work) {
+ bool in_irq = in_interrupt();
+ gfp_t gfpflags = in_irq ? GFP_ATOMIC : GFP_NOIO;
+
+ aio->work = kmem_cache_alloc(bpf_aio_work_cachep, gfpflags);
+ if (unlikely(!aio->work)) {
+ if (in_irq)
+ return -ENOMEM;
+ aio->opf &= ~BPF_AIO_FORCE_WQ;
+ goto do_submit;
+ }
+ }
+
+ aio->work->aio = aio;
+ work = &aio->work->work;
+ INIT_WORK(work, bpf_aio_submit_work);
+ queue_work(bpf_aio_wq, work);
+
+ return 0;
+}
+
+static struct bpf_aio *__bpf_aio_alloc(gfp_t gfpflags, unsigned op,
+ enum bpf_aio_flag aio_flags)
+{
+ struct bpf_aio *aio;
+
+ if (op >= BPF_AIO_OP_LAST)
+ return NULL;
+
+ if (aio_flags & BPF_AIO_OP_MASK)
+ return NULL;
+
+ aio = kmem_cache_alloc(bpf_aio_cachep, gfpflags);
+ if (!aio)
+ return NULL;
+
+ memset(aio, 0, sizeof(*aio));
+ aio->opf = op | (unsigned int)aio_flags;
+ return aio;
+}
+
+__bpf_kfunc struct bpf_aio *bpf_aio_alloc(unsigned int op, enum bpf_aio_flag aio_flags)
+{
+ return __bpf_aio_alloc(GFP_ATOMIC, op, aio_flags);
+}
+
+__bpf_kfunc struct bpf_aio *bpf_aio_alloc_sleepable(unsigned int op, enum bpf_aio_flag aio_flags)
+{
+ return __bpf_aio_alloc(GFP_NOIO, op, aio_flags);
+}
+
+__bpf_kfunc void bpf_aio_release(struct bpf_aio *aio)
+{
+ kmem_cache_free(bpf_aio_cachep, aio);
+}
+
+/* Submit AIO from bpf prog */
+__bpf_kfunc int bpf_aio_submit(struct bpf_aio *aio, int fd, loff_t pos,
+ unsigned bytes, unsigned io_flags)
+{
+ struct file *file;
+
+ if (!aio->ops)
+ return -EINVAL;
+
+ file = fget(fd);
+ if (!file)
+ return -EINVAL;
+
+ /* we could be called from io completion handler */
+ if (in_interrupt())
+ aio->opf |= BPF_AIO_FORCE_WQ;
+
+ aio->iocb.ki_pos = pos;
+ aio->iocb.ki_filp = file;
+ aio->iocb.ki_flags = io_flags;
+ aio->bytes = bytes;
+ if (bpf_aio_is_rw(bpf_aio_get_op(aio))) {
+ if (file->f_flags & O_DIRECT)
+ aio->iocb.ki_flags |= IOCB_DIRECT;
+ else
+ aio->opf |= BPF_AIO_FORCE_WQ;
+ aio->iocb.ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
+ } else {
+ aio->opf |= BPF_AIO_FORCE_WQ;
+ }
+
+ return __bpf_aio_submit(aio);
+}
+
+int __init bpf_aio_init(void)
+{
+ bpf_aio_cachep = KMEM_CACHE(bpf_aio, SLAB_PANIC);
+ bpf_aio_work_cachep = KMEM_CACHE(bpf_aio_work, SLAB_PANIC);
+ bpf_aio_wq = alloc_workqueue("bpf_aio", WQ_MEM_RECLAIM | WQ_HIGHPRI, 0);
+
+ return 0;
+}
diff --git a/drivers/block/ublk/bpf_aio.h b/drivers/block/ublk/bpf_aio.h
new file mode 100644
index 000000000000..625737965c90
--- /dev/null
+++ b/drivers/block/ublk/bpf_aio.h
@@ -0,0 +1,66 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Copyright (c) 2024 Red Hat */
+#ifndef UBLK_BPF_AIO_HEADER
+#define UBLK_BPF_AIO_HEADER
+
+#define BPF_AIO_OP_BITS 8
+#define BPF_AIO_OP_MASK ((1 << BPF_AIO_OP_BITS) - 1)
+
+enum bpf_aio_op {
+ BPF_AIO_OP_FS_READ = 0,
+ BPF_AIO_OP_FS_WRITE,
+ BPF_AIO_OP_FS_FSYNC,
+ BPF_AIO_OP_FS_FALLOCATE,
+ BPF_AIO_OP_LAST,
+};
+
+enum bpf_aio_flag_bits {
+ /* force to submit io from wq */
+ __BPF_AIO_FORCE_WQ = BPF_AIO_OP_BITS,
+ __BPF_AIO_NR_BITS, /* stops here */
+};
+
+enum bpf_aio_flag {
+ BPF_AIO_FORCE_WQ = (1 << __BPF_AIO_FORCE_WQ),
+};
+
+struct bpf_aio_work {
+ struct bpf_aio *aio;
+ struct work_struct work;
+};
+
+/* todo: support ubuf & iovec in future */
+struct bpf_aio_buf {
+ unsigned int bvec_off;
+ int nr_bvec;
+ const struct bio_vec *bvec;
+};
+
+struct bpf_aio {
+ unsigned int opf;
+ unsigned int bytes;
+ struct bpf_aio_buf buf;
+ struct bpf_aio_work *work;
+ const struct bpf_aio_complete_ops *ops;
+ struct kiocb iocb;
+};
+
+typedef void (*bpf_aio_complete_t)(struct bpf_aio *io, long ret);
+
+struct bpf_aio_complete_ops {
+ unsigned int id;
+ bpf_aio_complete_t bpf_aio_complete_cb;
+};
+
+static inline unsigned int bpf_aio_get_op(const struct bpf_aio *aio)
+{
+ return aio->opf & BPF_AIO_OP_MASK;
+}
+
+int bpf_aio_init(void);
+struct bpf_aio *bpf_aio_alloc(unsigned int op, enum bpf_aio_flag aio_flags);
+struct bpf_aio *bpf_aio_alloc_sleepable(unsigned int op, enum bpf_aio_flag aio_flags);
+void bpf_aio_release(struct bpf_aio *aio);
+int bpf_aio_submit(struct bpf_aio *aio, int fd, loff_t pos, unsigned bytes,
+ unsigned io_flags);
+#endif
--
2.47.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC PATCH 16/22] ublk: bpf: add bpf aio struct_ops
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
` (14 preceding siblings ...)
2025-01-07 12:04 ` [RFC PATCH 15/22] ublk: bpf: add bpf aio kfunc Ming Lei
@ 2025-01-07 12:04 ` Ming Lei
2025-01-07 12:04 ` [RFC PATCH 17/22] ublk: bpf: attach bpf aio prog to ublk device Ming Lei
` (5 subsequent siblings)
21 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-01-07 12:04 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
Ming Lei
Add bpf aio struct_ops, so that application code can provide bpf
aio completion callback in the struct_ops prog, then bpf aio can be
supported.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/block/ublk/Makefile | 2 +-
drivers/block/ublk/bpf_aio.c | 7 ++
drivers/block/ublk/bpf_aio.h | 12 +++
drivers/block/ublk/bpf_aio_ops.c | 152 +++++++++++++++++++++++++++++++
4 files changed, 172 insertions(+), 1 deletion(-)
create mode 100644 drivers/block/ublk/bpf_aio_ops.c
diff --git a/drivers/block/ublk/Makefile b/drivers/block/ublk/Makefile
index 7094607c040d..a47f65eb97f8 100644
--- a/drivers/block/ublk/Makefile
+++ b/drivers/block/ublk/Makefile
@@ -5,6 +5,6 @@ ccflags-y += -I$(src)
ublk_drv-$(CONFIG_BLK_DEV_UBLK) := main.o
ifeq ($(CONFIG_UBLK_BPF), y)
-ublk_drv-$(CONFIG_BLK_DEV_UBLK) += bpf_ops.o bpf.o bpf_aio.o
+ublk_drv-$(CONFIG_BLK_DEV_UBLK) += bpf_ops.o bpf.o bpf_aio.o bpf_aio_ops.o
endif
obj-$(CONFIG_BLK_DEV_UBLK) += ublk_drv.o
diff --git a/drivers/block/ublk/bpf_aio.c b/drivers/block/ublk/bpf_aio.c
index 65013fe8054f..6e93f28f389b 100644
--- a/drivers/block/ublk/bpf_aio.c
+++ b/drivers/block/ublk/bpf_aio.c
@@ -243,9 +243,16 @@ __bpf_kfunc int bpf_aio_submit(struct bpf_aio *aio, int fd, loff_t pos,
int __init bpf_aio_init(void)
{
+ int err;
+
bpf_aio_cachep = KMEM_CACHE(bpf_aio, SLAB_PANIC);
bpf_aio_work_cachep = KMEM_CACHE(bpf_aio_work, SLAB_PANIC);
bpf_aio_wq = alloc_workqueue("bpf_aio", WQ_MEM_RECLAIM | WQ_HIGHPRI, 0);
+ err = bpf_aio_struct_ops_init();
+ if (err) {
+ pr_warn("error while initializing bpf aio struct_ops: %d", err);
+ return err;
+ }
return 0;
}
diff --git a/drivers/block/ublk/bpf_aio.h b/drivers/block/ublk/bpf_aio.h
index 625737965c90..07fcd43fd2ac 100644
--- a/drivers/block/ublk/bpf_aio.h
+++ b/drivers/block/ublk/bpf_aio.h
@@ -3,6 +3,8 @@
#ifndef UBLK_BPF_AIO_HEADER
#define UBLK_BPF_AIO_HEADER
+#include "bpf_reg.h"
+
#define BPF_AIO_OP_BITS 8
#define BPF_AIO_OP_MASK ((1 << BPF_AIO_OP_BITS) - 1)
@@ -47,9 +49,18 @@ struct bpf_aio {
typedef void (*bpf_aio_complete_t)(struct bpf_aio *io, long ret);
+/**
+ * struct bpf_aio_complete_ops - A BPF struct_ops of callbacks allowing to
+ * complete `bpf_aio` submitted by `bpf_aio_submit()`
+ * @id: id used by bpf aio consumer, defined by globally
+ * @bpf_aio_complete_cb: callback for completing submitted `bpf_aio`
+ * @provider: holding all consumers of this struct_ops prog, used by
+ * kernel only
+ */
struct bpf_aio_complete_ops {
unsigned int id;
bpf_aio_complete_t bpf_aio_complete_cb;
+ struct bpf_prog_provider provider;
};
static inline unsigned int bpf_aio_get_op(const struct bpf_aio *aio)
@@ -58,6 +69,7 @@ static inline unsigned int bpf_aio_get_op(const struct bpf_aio *aio)
}
int bpf_aio_init(void);
+int bpf_aio_struct_ops_init(void);
struct bpf_aio *bpf_aio_alloc(unsigned int op, enum bpf_aio_flag aio_flags);
struct bpf_aio *bpf_aio_alloc_sleepable(unsigned int op, enum bpf_aio_flag aio_flags);
void bpf_aio_release(struct bpf_aio *aio);
diff --git a/drivers/block/ublk/bpf_aio_ops.c b/drivers/block/ublk/bpf_aio_ops.c
new file mode 100644
index 000000000000..12757f634dbd
--- /dev/null
+++ b/drivers/block/ublk/bpf_aio_ops.c
@@ -0,0 +1,152 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Red Hat */
+
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/bpf_verifier.h>
+#include <linux/bpf.h>
+#include <linux/btf.h>
+#include <linux/btf_ids.h>
+#include <linux/filter.h>
+#include <linux/xarray.h>
+
+#include "bpf_aio.h"
+
+static DEFINE_XARRAY(bpf_aio_all_ops);
+static DEFINE_MUTEX(bpf_aio_ops_lock);
+
+static bool bpf_aio_ops_is_valid_access(int off, int size,
+ enum bpf_access_type type, const struct bpf_prog *prog,
+ struct bpf_insn_access_aux *info)
+{
+ return bpf_tracing_btf_ctx_access(off, size, type, prog, info);
+}
+
+static int bpf_aio_ops_btf_struct_access(struct bpf_verifier_log *log,
+ const struct bpf_reg_state *reg,
+ int off, int size)
+{
+ /* bpf_aio prog can change nothing */
+ if (size > 0)
+ return -EACCES;
+
+ return NOT_INIT;
+}
+
+static const struct bpf_verifier_ops bpf_aio_verifier_ops = {
+ .get_func_proto = bpf_base_func_proto,
+ .is_valid_access = bpf_aio_ops_is_valid_access,
+ .btf_struct_access = bpf_aio_ops_btf_struct_access,
+};
+
+static int bpf_aio_ops_init(struct btf *btf)
+{
+ return 0;
+}
+
+static int bpf_aio_ops_check_member(const struct btf_type *t,
+ const struct btf_member *member,
+ const struct bpf_prog *prog)
+{
+ if (prog->sleepable)
+ return -EINVAL;
+ return 0;
+}
+
+static int bpf_aio_ops_init_member(const struct btf_type *t,
+ const struct btf_member *member,
+ void *kdata, const void *udata)
+{
+ const struct bpf_aio_complete_ops *uops;
+ struct bpf_aio_complete_ops *kops;
+ u32 moff;
+
+ uops = (const struct bpf_aio_complete_ops *)udata;
+ kops = (struct bpf_aio_complete_ops*)kdata;
+
+ moff = __btf_member_bit_offset(t, member) / 8;
+
+ switch (moff) {
+ case offsetof(struct bpf_aio_complete_ops, id):
+ /* For dev_id, this function has to copy it and return 1 to
+ * indicate that the data has been handled by the struct_ops
+ * type, or the verifier will reject the map if the value of
+ * those fields is not zero.
+ */
+ kops->id = uops->id;
+ return 1;
+ }
+ return 0;
+}
+
+static int bpf_aio_reg(void *kdata, struct bpf_link *link)
+{
+ struct bpf_aio_complete_ops *ops = kdata;
+ struct bpf_aio_complete_ops *curr;
+ int ret = -EBUSY;
+
+ mutex_lock(&bpf_aio_ops_lock);
+ if (!xa_load(&bpf_aio_all_ops, ops->id)) {
+ curr = kmalloc(sizeof(*curr), GFP_KERNEL);
+ if (curr) {
+ *curr = *ops;
+ bpf_prog_provider_init(&curr->provider);
+ ret = xa_err(xa_store(&bpf_aio_all_ops, ops->id,
+ curr, GFP_KERNEL));
+ } else {
+ ret = -ENOMEM;
+ }
+ }
+ mutex_unlock(&bpf_aio_ops_lock);
+
+ return ret;
+}
+
+static void bpf_aio_unreg(void *kdata, struct bpf_link *link)
+{
+ struct bpf_aio_complete_ops *ops = kdata;
+ struct bpf_prog_consumer *consumer, *tmp;
+ struct bpf_aio_complete_ops *curr;
+ LIST_HEAD(consumer_list);
+
+ mutex_lock(&bpf_aio_ops_lock);
+ curr = xa_erase(&bpf_aio_all_ops, ops->id);
+ if (curr)
+ list_splice_init(&curr->provider.list, &consumer_list);
+ mutex_unlock(&bpf_aio_ops_lock);
+
+ list_for_each_entry_safe(consumer, tmp, &consumer_list, node)
+ bpf_prog_consumer_detach(consumer, true);
+ kfree(curr);
+}
+
+static void bpf_aio_cb(struct bpf_aio *io, long ret)
+{
+}
+
+static struct bpf_aio_complete_ops __bpf_aio_ops = {
+ .bpf_aio_complete_cb = bpf_aio_cb,
+};
+
+static struct bpf_struct_ops bpf_aio_ops = {
+ .verifier_ops = &bpf_aio_verifier_ops,
+ .init = bpf_aio_ops_init,
+ .check_member = bpf_aio_ops_check_member,
+ .init_member = bpf_aio_ops_init_member,
+ .reg = bpf_aio_reg,
+ .unreg = bpf_aio_unreg,
+ .name = "bpf_aio_complete_ops",
+ .cfi_stubs = &__bpf_aio_ops,
+ .owner = THIS_MODULE,
+};
+
+int __init bpf_aio_struct_ops_init(void)
+{
+ int err;
+
+ err = register_bpf_struct_ops(&bpf_aio_ops, bpf_aio_complete_ops);
+ if (err)
+ pr_warn("error while registering bpf aio struct ops: %d", err);
+
+ return 0;
+}
--
2.47.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC PATCH 17/22] ublk: bpf: attach bpf aio prog to ublk device
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
` (15 preceding siblings ...)
2025-01-07 12:04 ` [RFC PATCH 16/22] ublk: bpf: add bpf aio struct_ops Ming Lei
@ 2025-01-07 12:04 ` Ming Lei
2025-01-07 12:04 ` [RFC PATCH 18/22] ublk: bpf: add several ublk bpf aio kfuncs Ming Lei
` (4 subsequent siblings)
21 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-01-07 12:04 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
Ming Lei
Attach bpf aio program to ublk device before adding ublk disk, and detach it
after the disk is removed. And when the bpf aio prog is unregistered,
all devices will detach from the prog automatically.
ublk device needs to provide the bpf aio struct_ops ID for attaching the
specific prog, and each ublk device has to attach to only single bpf prog.
So that we can use the attached bpf aio prog to submit bpf aio for handling ublk IO.
Given bpf aio prog is attached to ublk device, ublk bpf prog has to
provide one kfunc to assign 'bpf_aio_complete_ops *' to 'struct bpf_aio'
instance.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/block/ublk/bpf.c | 81 +++++++++++++++++++++++++++++++-
drivers/block/ublk/bpf_aio.c | 4 ++
drivers/block/ublk/bpf_aio.h | 4 ++
drivers/block/ublk/bpf_aio_ops.c | 22 +++++++++
drivers/block/ublk/ublk.h | 10 ++++
include/uapi/linux/ublk_cmd.h | 4 +-
6 files changed, 123 insertions(+), 2 deletions(-)
diff --git a/drivers/block/ublk/bpf.c b/drivers/block/ublk/bpf.c
index d5880d61abe5..921bbbcf4d9e 100644
--- a/drivers/block/ublk/bpf.c
+++ b/drivers/block/ublk/bpf.c
@@ -19,6 +19,79 @@ static int ublk_set_bpf_ops(struct ublk_device *ub,
return 0;
}
+static int ublk_set_bpf_aio_op(struct ublk_device *ub,
+ struct bpf_aio_complete_ops *ops)
+{
+ int i;
+
+ for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
+ if (ops && ublk_get_queue(ub, i)->bpf_aio_ops) {
+ ublk_set_bpf_aio_op(ub, NULL);
+ return -EBUSY;
+ }
+ ublk_get_queue(ub, i)->bpf_aio_ops = ops;
+ }
+ return 0;
+}
+
+static int ublk_bpf_aio_prog_attach_cb(struct bpf_prog_consumer *consumer,
+ struct bpf_prog_provider *provider)
+{
+ struct ublk_device *ub = container_of(consumer, struct ublk_device,
+ aio_prog);
+ struct bpf_aio_complete_ops *ops = container_of(provider,
+ struct bpf_aio_complete_ops, provider);
+ int ret = -ENODEV;
+
+ if (ublk_get_device(ub)) {
+ ret = ublk_set_bpf_aio_op(ub, ops);
+ if (ret)
+ ublk_put_device(ub);
+ }
+
+ return ret;
+}
+
+static void ublk_bpf_aio_prog_detach_cb(struct bpf_prog_consumer *consumer,
+ bool unreg)
+{
+ struct ublk_device *ub = container_of(consumer, struct ublk_device,
+ aio_prog);
+
+ if (unreg) {
+ blk_mq_freeze_queue(ub->ub_disk->queue);
+ ublk_set_bpf_aio_op(ub, NULL);
+ blk_mq_unfreeze_queue(ub->ub_disk->queue);
+ } else {
+ ublk_set_bpf_aio_op(ub, NULL);
+ }
+ ublk_put_device(ub);
+}
+
+static const struct bpf_prog_consumer_ops ublk_aio_prog_consumer_ops = {
+ .attach_fn = ublk_bpf_aio_prog_attach_cb,
+ .detach_fn = ublk_bpf_aio_prog_detach_cb,
+};
+
+static int ublk_bpf_aio_attach(struct ublk_device *ub)
+{
+ if (!ublk_dev_support_bpf_aio(ub))
+ return 0;
+
+ ub->aio_prog.prog_id = ub->params.bpf.aio_ops_id;
+ ub->aio_prog.ops = &ublk_aio_prog_consumer_ops;
+
+ return bpf_aio_prog_attach(&ub->aio_prog);
+}
+
+static void ublk_bpf_aio_detach(struct ublk_device *ub)
+{
+ if (!ublk_dev_support_bpf_aio(ub))
+ return;
+ bpf_aio_prog_detach(&ub->aio_prog);
+}
+
+
static int ublk_bpf_prog_attach_cb(struct bpf_prog_consumer *consumer,
struct bpf_prog_provider *provider)
{
@@ -76,19 +149,25 @@ static const struct bpf_prog_consumer_ops ublk_prog_consumer_ops = {
int ublk_bpf_attach(struct ublk_device *ub)
{
+ int ret;
+
if (!ublk_dev_support_bpf(ub))
return 0;
ub->prog.prog_id = ub->params.bpf.ops_id;
ub->prog.ops = &ublk_prog_consumer_ops;
- return ublk_bpf_prog_attach(&ub->prog);
+ ret = ublk_bpf_prog_attach(&ub->prog);
+ if (ret)
+ return ret;
+ return ublk_bpf_aio_attach(ub);
}
void ublk_bpf_detach(struct ublk_device *ub)
{
if (!ublk_dev_support_bpf(ub))
return;
+ ublk_bpf_aio_detach(ub);
ublk_bpf_prog_detach(&ub->prog);
}
diff --git a/drivers/block/ublk/bpf_aio.c b/drivers/block/ublk/bpf_aio.c
index 6e93f28f389b..da050be4b710 100644
--- a/drivers/block/ublk/bpf_aio.c
+++ b/drivers/block/ublk/bpf_aio.c
@@ -213,6 +213,10 @@ __bpf_kfunc int bpf_aio_submit(struct bpf_aio *aio, int fd, loff_t pos,
{
struct file *file;
+ /*
+ * ->ops has to assigned by kfunc of consumer subsystem because
+ * bpf prog lifetime is aligned with the consumer subsystem
+ */
if (!aio->ops)
return -EINVAL;
diff --git a/drivers/block/ublk/bpf_aio.h b/drivers/block/ublk/bpf_aio.h
index 07fcd43fd2ac..d144c5e20dcb 100644
--- a/drivers/block/ublk/bpf_aio.h
+++ b/drivers/block/ublk/bpf_aio.h
@@ -75,4 +75,8 @@ struct bpf_aio *bpf_aio_alloc_sleepable(unsigned int op, enum bpf_aio_flag aio_f
void bpf_aio_release(struct bpf_aio *aio);
int bpf_aio_submit(struct bpf_aio *aio, int fd, loff_t pos, unsigned bytes,
unsigned io_flags);
+
+int bpf_aio_prog_attach(struct bpf_prog_consumer *consumer);
+void bpf_aio_prog_detach(struct bpf_prog_consumer *consumer);
+
#endif
diff --git a/drivers/block/ublk/bpf_aio_ops.c b/drivers/block/ublk/bpf_aio_ops.c
index 12757f634dbd..04ad45fd24e6 100644
--- a/drivers/block/ublk/bpf_aio_ops.c
+++ b/drivers/block/ublk/bpf_aio_ops.c
@@ -120,6 +120,28 @@ static void bpf_aio_unreg(void *kdata, struct bpf_link *link)
kfree(curr);
}
+int bpf_aio_prog_attach(struct bpf_prog_consumer *consumer)
+{
+ unsigned id = consumer->prog_id;
+ struct bpf_aio_complete_ops *ops;
+ int ret = -EINVAL;
+
+ mutex_lock(&bpf_aio_ops_lock);
+ ops = xa_load(&bpf_aio_all_ops, id);
+ if (ops && ops->id == id)
+ ret = bpf_prog_consumer_attach(consumer, &ops->provider);
+ mutex_unlock(&bpf_aio_ops_lock);
+
+ return ret;
+}
+
+void bpf_aio_prog_detach(struct bpf_prog_consumer *consumer)
+{
+ mutex_lock(&bpf_aio_ops_lock);
+ bpf_prog_consumer_detach(consumer, false);
+ mutex_unlock(&bpf_aio_ops_lock);
+}
+
static void bpf_aio_cb(struct bpf_aio *io, long ret)
{
}
diff --git a/drivers/block/ublk/ublk.h b/drivers/block/ublk/ublk.h
index 8343e70bd723..2c33f6a94bf2 100644
--- a/drivers/block/ublk/ublk.h
+++ b/drivers/block/ublk/ublk.h
@@ -126,6 +126,7 @@ struct ublk_queue {
#ifdef CONFIG_UBLK_BPF
struct ublk_bpf_ops *bpf_ops;
+ struct bpf_aio_complete_ops *bpf_aio_ops;
#endif
unsigned short force_abort:1;
@@ -159,6 +160,7 @@ struct ublk_device {
#ifdef CONFIG_UBLK_BPF
struct bpf_prog_consumer prog;
+ struct bpf_prog_consumer aio_prog;
#endif
struct mutex mutex;
@@ -203,6 +205,14 @@ static inline bool ublk_dev_support_bpf(const struct ublk_device *ub)
return ub->dev_info.flags & UBLK_F_BPF;
}
+static inline bool ublk_dev_support_bpf_aio(const struct ublk_device *ub)
+{
+ if (!ublk_dev_support_bpf(ub))
+ return false;
+
+ return ub->params.bpf.flags & UBLK_BPF_HAS_AIO_OPS_ID;
+}
+
struct ublk_device *ublk_get_device(struct ublk_device *ub);
struct ublk_device *ublk_get_device_from_id(int idx);
void ublk_put_device(struct ublk_device *ub);
diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
index 27cf14e65cbc..ed6df4d61e89 100644
--- a/include/uapi/linux/ublk_cmd.h
+++ b/include/uapi/linux/ublk_cmd.h
@@ -406,9 +406,11 @@ struct ublk_param_zoned {
struct ublk_param_bpf {
#define UBLK_BPF_HAS_OPS_ID (1 << 0)
+#define UBLK_BPF_HAS_AIO_OPS_ID (1 << 1)
__u8 flags;
__u8 ops_id;
- __u8 reserved[6];
+ __u16 aio_ops_id;
+ __u8 reserved[4];
};
struct ublk_params {
--
2.47.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC PATCH 18/22] ublk: bpf: add several ublk bpf aio kfuncs
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
` (16 preceding siblings ...)
2025-01-07 12:04 ` [RFC PATCH 17/22] ublk: bpf: attach bpf aio prog to ublk device Ming Lei
@ 2025-01-07 12:04 ` Ming Lei
2025-01-07 12:04 ` [RFC PATCH 19/22] ublk: bpf: wire bpf aio with ublk io handling Ming Lei
` (3 subsequent siblings)
21 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-01-07 12:04 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
Ming Lei
Add ublk bpf aio kfuncs for bpf prog to do:
- prepare buffer
- assign bpf aio struct_ops
- submit bpf aios for handle ublk io command
- deal with ublk io and bpf aio lifetime, and make sure that
ublk io won't be completed until all bpf aios are completed
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/block/ublk/bpf.c | 77 ++++++++++++++++++++++++++++++++++++
drivers/block/ublk/bpf_aio.c | 6 ++-
drivers/block/ublk/bpf_aio.h | 38 +++++++++++++++++-
drivers/block/ublk/ublk.h | 2 +
4 files changed, 121 insertions(+), 2 deletions(-)
diff --git a/drivers/block/ublk/bpf.c b/drivers/block/ublk/bpf.c
index 921bbbcf4d9e..c0babf6d5868 100644
--- a/drivers/block/ublk/bpf.c
+++ b/drivers/block/ublk/bpf.c
@@ -228,6 +228,77 @@ ublk_bpf_complete_io(struct ublk_bpf_io *io, int res)
ublk_bpf_complete_io_cmd(io, res);
}
+/*
+ * Called before submitting one bpf aio in prog, and this ublk IO's
+ * reference is increased.
+ *
+ * Grab reference of `io` for this `aio`, and the reference will be dropped
+ * by ublk_bpf_dettach_and_complete_aio()
+ */
+__bpf_kfunc int
+ublk_bpf_attach_and_prep_aio(const struct ublk_bpf_io *_io, unsigned off,
+ unsigned bytes, struct bpf_aio *aio)
+{
+ struct ublk_bpf_io *io = (struct ublk_bpf_io *)_io;
+ const struct request *req;
+ const struct ublk_rq_data *data;
+ const struct ublk_bpf_io *bpf_io;
+
+ if (!io || !aio)
+ return -EINVAL;
+
+ req = ublk_bpf_get_req(io);
+ if (!req)
+ return -EINVAL;
+
+ if (off + bytes > blk_rq_bytes(req))
+ return -EINVAL;
+
+ if (req->mq_hctx) {
+ const struct ublk_queue *ubq = req->mq_hctx->driver_data;
+
+ bpf_aio_assign_cb(aio, ubq->bpf_aio_ops);
+ }
+
+ data = blk_mq_rq_to_pdu((struct request *)req);
+ bpf_io = &data->bpf_data;
+ bpf_aio_assign_buf(aio, &bpf_io->buf, off, bytes);
+
+ refcount_inc(&io->ref);
+ aio->private_data = (void *)io;
+
+ return 0;
+}
+
+/*
+ * Called after this attached aio is completed, and the associated ublk IO's
+ * reference is decreased, and if the reference is dropped to zero, complete
+ * this ublk IO.
+ *
+ * Return -EIOCBQUEUED if this `io` is being handled, and 0 is returned
+ * if it can be completed now.
+ */
+__bpf_kfunc void
+ublk_bpf_dettach_and_complete_aio(struct bpf_aio *aio)
+{
+ struct ublk_bpf_io *io = aio->private_data;
+
+ if (io) {
+ ublk_bpf_io_dec_ref(io);
+ aio->private_data = NULL;
+ }
+}
+
+__bpf_kfunc struct ublk_bpf_io *ublk_bpf_acquire_io_from_aio(struct bpf_aio *aio)
+{
+ return aio->private_data;
+}
+
+__bpf_kfunc void ublk_bpf_release_io_from_aio(struct ublk_bpf_io *io)
+{
+}
+
+
BTF_KFUNCS_START(ublk_bpf_kfunc_ids)
BTF_ID_FLAGS(func, ublk_bpf_complete_io, KF_TRUSTED_ARGS)
BTF_ID_FLAGS(func, ublk_bpf_get_iod, KF_TRUSTED_ARGS | KF_RET_NULL)
@@ -240,6 +311,12 @@ BTF_ID_FLAGS(func, bpf_aio_alloc, KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_aio_alloc_sleepable, KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_aio_release)
BTF_ID_FLAGS(func, bpf_aio_submit)
+
+/* ublk bpf aio kfuncs */
+BTF_ID_FLAGS(func, ublk_bpf_attach_and_prep_aio)
+BTF_ID_FLAGS(func, ublk_bpf_dettach_and_complete_aio)
+BTF_ID_FLAGS(func, ublk_bpf_acquire_io_from_aio, KF_ACQUIRE)
+BTF_ID_FLAGS(func, ublk_bpf_release_io_from_aio, KF_RELEASE)
BTF_KFUNCS_END(ublk_bpf_kfunc_ids)
__bpf_kfunc void bpf_aio_release_dtor(void *aio)
diff --git a/drivers/block/ublk/bpf_aio.c b/drivers/block/ublk/bpf_aio.c
index da050be4b710..06a6cc8f38b1 100644
--- a/drivers/block/ublk/bpf_aio.c
+++ b/drivers/block/ublk/bpf_aio.c
@@ -211,6 +211,7 @@ __bpf_kfunc void bpf_aio_release(struct bpf_aio *aio)
__bpf_kfunc int bpf_aio_submit(struct bpf_aio *aio, int fd, loff_t pos,
unsigned bytes, unsigned io_flags)
{
+ unsigned op = bpf_aio_get_op(aio);
struct file *file;
/*
@@ -220,6 +221,9 @@ __bpf_kfunc int bpf_aio_submit(struct bpf_aio *aio, int fd, loff_t pos,
if (!aio->ops)
return -EINVAL;
+ if (unlikely((bytes > aio->buf_size) && bpf_aio_is_rw(op)))
+ return -EINVAL;
+
file = fget(fd);
if (!file)
return -EINVAL;
@@ -232,7 +236,7 @@ __bpf_kfunc int bpf_aio_submit(struct bpf_aio *aio, int fd, loff_t pos,
aio->iocb.ki_filp = file;
aio->iocb.ki_flags = io_flags;
aio->bytes = bytes;
- if (bpf_aio_is_rw(bpf_aio_get_op(aio))) {
+ if (bpf_aio_is_rw(op)) {
if (file->f_flags & O_DIRECT)
aio->iocb.ki_flags |= IOCB_DIRECT;
else
diff --git a/drivers/block/ublk/bpf_aio.h b/drivers/block/ublk/bpf_aio.h
index d144c5e20dcb..0683139f5354 100644
--- a/drivers/block/ublk/bpf_aio.h
+++ b/drivers/block/ublk/bpf_aio.h
@@ -40,11 +40,15 @@ struct bpf_aio_buf {
struct bpf_aio {
unsigned int opf;
- unsigned int bytes;
+ union {
+ unsigned int bytes;
+ unsigned int buf_size;
+ };
struct bpf_aio_buf buf;
struct bpf_aio_work *work;
const struct bpf_aio_complete_ops *ops;
struct kiocb iocb;
+ void *private_data;
};
typedef void (*bpf_aio_complete_t)(struct bpf_aio *io, long ret);
@@ -68,6 +72,38 @@ static inline unsigned int bpf_aio_get_op(const struct bpf_aio *aio)
return aio->opf & BPF_AIO_OP_MASK;
}
+/* Must be called from kfunc defined in consumer subsystem */
+static inline void bpf_aio_assign_cb(struct bpf_aio *aio,
+ const struct bpf_aio_complete_ops *ops)
+{
+ aio->ops = ops;
+}
+
+/*
+ * Skip `skip` bytes and assign the advanced source buffer for `aio`, so
+ * we can cover this part of source buffer by this `aio`
+ */
+static inline void bpf_aio_assign_buf(struct bpf_aio *aio,
+ const struct bpf_aio_buf *src, unsigned skip,
+ unsigned bytes)
+{
+ const struct bio_vec *bvec, *end;
+ struct bpf_aio_buf *abuf = &aio->buf;
+
+ skip += src->bvec_off;
+ for (bvec = src->bvec, end = bvec + src->nr_bvec; bvec < end; bvec++) {
+ if (likely(skip < bvec->bv_len))
+ break;
+ skip -= bvec->bv_len;
+ }
+
+ aio->buf_size = bytes;
+ abuf->bvec_off = skip;
+ abuf->nr_bvec = src->nr_bvec - (bvec - src->bvec);
+ abuf->bvec = bvec;
+}
+
+
int bpf_aio_init(void);
int bpf_aio_struct_ops_init(void);
struct bpf_aio *bpf_aio_alloc(unsigned int op, enum bpf_aio_flag aio_flags);
diff --git a/drivers/block/ublk/ublk.h b/drivers/block/ublk/ublk.h
index 2c33f6a94bf2..4bd04512c894 100644
--- a/drivers/block/ublk/ublk.h
+++ b/drivers/block/ublk/ublk.h
@@ -8,6 +8,7 @@
#include <uapi/linux/ublk_cmd.h>
#include "bpf_reg.h"
+#include "bpf_aio.h"
#define UBLK_MINORS (1U << MINORBITS)
@@ -47,6 +48,7 @@ struct ublk_bpf_io {
unsigned long flags;
refcount_t ref;
int res;
+ struct bpf_aio_buf buf;
};
struct ublk_rq_data {
--
2.47.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC PATCH 19/22] ublk: bpf: wire bpf aio with ublk io handling
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
` (17 preceding siblings ...)
2025-01-07 12:04 ` [RFC PATCH 18/22] ublk: bpf: add several ublk bpf aio kfuncs Ming Lei
@ 2025-01-07 12:04 ` Ming Lei
2025-01-07 12:04 ` [RFC PATCH 20/22] selftests: add tests for ublk bpf aio Ming Lei
` (2 subsequent siblings)
21 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-01-07 12:04 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
Ming Lei
Add ublk_bpf_aio_prep_io_buf() and call it before running ublk bpf prog,
so wire everything together.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
drivers/block/ublk/bpf.h | 13 +++++++++
drivers/block/ublk/bpf_ops.c | 51 +++++++++++++++++++++++++++++++++++-
drivers/block/ublk/main.c | 5 ----
drivers/block/ublk/ublk.h | 6 +++++
4 files changed, 69 insertions(+), 6 deletions(-)
diff --git a/drivers/block/ublk/bpf.h b/drivers/block/ublk/bpf.h
index 0ab25743ae7d..a3d238bc707d 100644
--- a/drivers/block/ublk/bpf.h
+++ b/drivers/block/ublk/bpf.h
@@ -99,6 +99,9 @@ static inline void ublk_bpf_io_dec_ref(struct ublk_bpf_io *io)
ubq->bpf_ops->release_io_cmd(io);
}
+ if (test_bit(UBLK_BPF_BVEC_ALLOCATED, &io->flags))
+ kvfree(io->buf.bvec);
+
if (test_bit(UBLK_BPF_IO_COMPLETED, &io->flags)) {
smp_rmb();
__clear_bit(UBLK_BPF_IO_PREP, &io->flags);
@@ -158,6 +161,11 @@ static inline queue_io_cmd_t ublk_get_bpf_any_io_cb(struct ublk_queue *ubq)
return ublk_get_bpf_io_cb_daemon(ubq);
}
+static inline bool ublk_support_bpf_aio(const struct ublk_queue *ubq)
+{
+ return ublk_support_bpf(ubq) && ubq->bpf_aio_ops;
+}
+
int ublk_bpf_init(void);
int ublk_bpf_struct_ops_init(void);
int ublk_bpf_prog_attach(struct bpf_prog_consumer *consumer);
@@ -190,6 +198,11 @@ static inline queue_io_cmd_t ublk_get_bpf_any_io_cb(struct ublk_queue *ubq)
return NULL;
}
+static inline bool ublk_support_bpf_aio(const struct ublk_queue *ubq)
+{
+ return false;
+}
+
static inline int ublk_bpf_init(void)
{
return 0;
diff --git a/drivers/block/ublk/bpf_ops.c b/drivers/block/ublk/bpf_ops.c
index 05d8d415b30d..7085eab5e99b 100644
--- a/drivers/block/ublk/bpf_ops.c
+++ b/drivers/block/ublk/bpf_ops.c
@@ -155,6 +155,49 @@ void ublk_bpf_prog_detach(struct bpf_prog_consumer *consumer)
mutex_unlock(&ublk_bpf_ops_lock);
}
+static int ublk_bpf_aio_prep_io_buf(const struct request *req)
+{
+ struct ublk_rq_data *data = blk_mq_rq_to_pdu((struct request *)req);
+ struct ublk_bpf_io *io = &data->bpf_data;
+ struct req_iterator rq_iter;
+ struct bio_vec *bvec;
+ struct bio_vec bv;
+ unsigned offset;
+
+ io->buf.bvec = NULL;
+ io->buf.nr_bvec = 0;
+
+ if (!ublk_rq_has_data(req))
+ return 0;
+
+ rq_for_each_bvec(bv, req, rq_iter)
+ io->buf.nr_bvec++;
+
+ if (!io->buf.nr_bvec)
+ return 0;
+
+ if (req->bio != req->biotail) {
+ int idx = 0;
+
+ bvec = kvmalloc_array(io->buf.nr_bvec, sizeof(struct bio_vec),
+ GFP_NOIO);
+ if (!bvec)
+ return -ENOMEM;
+
+ offset = 0;
+ rq_for_each_bvec(bv, req, rq_iter)
+ bvec[idx++] = bv;
+ __set_bit(UBLK_BPF_BVEC_ALLOCATED, &io->flags);
+ } else {
+ struct bio *bio = req->bio;
+
+ offset = bio->bi_iter.bi_bvec_done;
+ bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
+ }
+ io->buf.bvec = bvec;
+ io->buf.bvec_off = offset;
+ return 0;
+}
static void ublk_bpf_prep_io(struct ublk_bpf_io *io,
const struct ublksrv_io_desc *iod)
@@ -180,8 +223,14 @@ bool ublk_run_bpf_handler(struct ublk_queue *ubq, struct request *req,
bool res = true;
int err;
- if (!test_bit(UBLK_BPF_IO_PREP, &bpf_io->flags))
+ if (!test_bit(UBLK_BPF_IO_PREP, &bpf_io->flags)) {
ublk_bpf_prep_io(bpf_io, iod);
+ if (ublk_support_bpf_aio(ubq)) {
+ err = ublk_bpf_aio_prep_io_buf(req);
+ if (err)
+ goto fail;
+ }
+ }
do {
enum ublk_bpf_disposition rc;
diff --git a/drivers/block/ublk/main.c b/drivers/block/ublk/main.c
index 3c2ed9bf924d..1974ebd33ce0 100644
--- a/drivers/block/ublk/main.c
+++ b/drivers/block/ublk/main.c
@@ -512,11 +512,6 @@ void ublk_put_device(struct ublk_device *ub)
put_device(&ub->cdev_dev);
}
-static inline bool ublk_rq_has_data(const struct request *rq)
-{
- return bio_has_data(rq->bio);
-}
-
static inline char *ublk_queue_cmd_buf(struct ublk_device *ub, int q_id)
{
return ublk_get_queue(ub, q_id)->io_cmd_buf;
diff --git a/drivers/block/ublk/ublk.h b/drivers/block/ublk/ublk.h
index 4bd04512c894..00b09589d95c 100644
--- a/drivers/block/ublk/ublk.h
+++ b/drivers/block/ublk/ublk.h
@@ -41,6 +41,7 @@
enum {
UBLK_BPF_IO_PREP = 0,
UBLK_BPF_IO_COMPLETED = 1,
+ UBLK_BPF_BVEC_ALLOCATED = 2,
};
struct ublk_bpf_io {
@@ -215,6 +216,11 @@ static inline bool ublk_dev_support_bpf_aio(const struct ublk_device *ub)
return ub->params.bpf.flags & UBLK_BPF_HAS_AIO_OPS_ID;
}
+static inline bool ublk_rq_has_data(const struct request *rq)
+{
+ return bio_has_data(rq->bio);
+}
+
struct ublk_device *ublk_get_device(struct ublk_device *ub);
struct ublk_device *ublk_get_device_from_id(int idx);
void ublk_put_device(struct ublk_device *ub);
--
2.47.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC PATCH 20/22] selftests: add tests for ublk bpf aio
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
` (18 preceding siblings ...)
2025-01-07 12:04 ` [RFC PATCH 19/22] ublk: bpf: wire bpf aio with ublk io handling Ming Lei
@ 2025-01-07 12:04 ` Ming Lei
2025-01-07 12:04 ` [RFC PATCH 21/22] selftests: add tests for covering both bpf aio and split Ming Lei
2025-01-07 12:04 ` [RFC PATCH 22/22] ublk: document ublk-bpf & bpf-aio Ming Lei
21 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-01-07 12:04 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
Ming Lei
Create ublk loop target which uses bpf aio to submit & complete FS
IO, then run write & read & verify on the ublk loop disk for making
sure ublk bpf aio works as expected.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
tools/testing/selftests/ublk/Makefile | 3 +
.../selftests/ublk/progs/ublk_bpf_kfunc.h | 11 ++
.../testing/selftests/ublk/progs/ublk_loop.c | 166 ++++++++++++++++++
tools/testing/selftests/ublk/test_common.sh | 47 +++++
tools/testing/selftests/ublk/test_loop_01.sh | 33 ++++
tools/testing/selftests/ublk/test_loop_02.sh | 24 +++
tools/testing/selftests/ublk/ublk_bpf.c | 141 ++++++++++++++-
7 files changed, 419 insertions(+), 6 deletions(-)
create mode 100644 tools/testing/selftests/ublk/progs/ublk_loop.c
create mode 100755 tools/testing/selftests/ublk/test_loop_01.sh
create mode 100755 tools/testing/selftests/ublk/test_loop_02.sh
diff --git a/tools/testing/selftests/ublk/Makefile b/tools/testing/selftests/ublk/Makefile
index 38903f05d99d..2540ae7a75a3 100644
--- a/tools/testing/selftests/ublk/Makefile
+++ b/tools/testing/selftests/ublk/Makefile
@@ -24,6 +24,9 @@ TEST_PROGS += test_null_02.sh
TEST_PROGS += test_null_03.sh
TEST_PROGS += test_null_04.sh
+TEST_PROGS += test_loop_01.sh
+TEST_PROGS += test_loop_02.sh
+
# Order correspond to 'make run_tests' order
TEST_GEN_PROGS_EXTENDED = ublk_bpf
diff --git a/tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h b/tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h
index 1db8870b57d6..9fb134e40d49 100644
--- a/tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h
+++ b/tools/testing/selftests/ublk/progs/ublk_bpf_kfunc.h
@@ -21,6 +21,17 @@ extern int ublk_bpf_get_dev_id(const struct ublk_bpf_io *io) __ksym;
extern int ublk_bpf_get_queue_id(const struct ublk_bpf_io *io) __ksym;
extern int ublk_bpf_get_io_tag(const struct ublk_bpf_io *io) __ksym;
+extern void ublk_bpf_dettach_and_complete_aio(struct bpf_aio *aio) __ksym;
+extern int ublk_bpf_attach_and_prep_aio(const struct ublk_bpf_io *_io, unsigned off, unsigned bytes, struct bpf_aio *aio) __ksym;
+extern struct ublk_bpf_io *ublk_bpf_acquire_io_from_aio(struct bpf_aio *aio) __ksym;
+extern void ublk_bpf_release_io_from_aio(struct ublk_bpf_io *io) __ksym;
+
+extern struct bpf_aio *bpf_aio_alloc(unsigned int op, enum bpf_aio_flag flags) __ksym;
+extern struct bpf_aio *bpf_aio_alloc_sleepable(unsigned int op, enum bpf_aio_flag flags) __ksym;
+extern void bpf_aio_release(struct bpf_aio *aio) __ksym;
+extern int bpf_aio_submit(struct bpf_aio *aio, int fd, loff_t pos,
+ unsigned bytes, unsigned io_flags) __ksym;
+
static inline unsigned long long build_io_key(const struct ublk_bpf_io *io)
{
unsigned long long dev_id = (unsigned short)ublk_bpf_get_dev_id(io);
diff --git a/tools/testing/selftests/ublk/progs/ublk_loop.c b/tools/testing/selftests/ublk/progs/ublk_loop.c
new file mode 100644
index 000000000000..952caf7b7399
--- /dev/null
+++ b/tools/testing/selftests/ublk/progs/ublk_loop.c
@@ -0,0 +1,166 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <linux/const.h>
+#include <linux/errno.h>
+#include <linux/falloc.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+//#define DEBUG
+#include "ublk_bpf.h"
+
+/* libbpf v1.4.5 is required for struct_ops to work */
+
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(max_entries, 128);
+ __type(key, unsigned int); /* dev id */
+ __type(value, int); /* backing file fd */
+} fd_map SEC(".maps");
+
+static inline void ublk_loop_comp_and_release_aio(struct bpf_aio *aio, int ret)
+{
+ struct ublk_bpf_io *io = ublk_bpf_acquire_io_from_aio(aio);
+
+ ublk_bpf_complete_io(io, ret);
+ ublk_bpf_release_io_from_aio(io);
+
+ ublk_bpf_dettach_and_complete_aio(aio);
+ bpf_aio_release(aio);
+}
+
+SEC("struct_ops/bpf_aio_complete_cb")
+void BPF_PROG(ublk_loop_comp_cb, struct bpf_aio *aio, long ret)
+{
+ BPF_DBG("aio result %d, back_file %s pos %llx", ret,
+ aio->iocb.ki_filp->f_path.dentry->d_name.name,
+ aio->iocb.ki_pos);
+ ublk_loop_comp_and_release_aio(aio, ret);
+}
+
+SEC(".struct_ops.link")
+struct bpf_aio_complete_ops loop_ublk_bpf_aio_ops = {
+ .id = 16,
+ .bpf_aio_complete_cb = (void *)ublk_loop_comp_cb,
+};
+
+static inline int ublk_loop_submit_backing_io(const struct ublk_bpf_io *io,
+ const struct ublksrv_io_desc *iod, int backing_fd)
+{
+ unsigned int op_flags = 0;
+ struct bpf_aio *aio;
+ int res = -EINVAL;
+ int op;
+
+ /* translate ublk opcode into backing file's */
+ switch (iod->op_flags & 0xff) {
+ case 0 /*UBLK_IO_OP_READ*/:
+ op = BPF_AIO_OP_FS_READ;
+ break;
+ case 1 /*UBLK_IO_OP_WRITE*/:
+ op = BPF_AIO_OP_FS_WRITE;
+ break;
+ case 2 /*UBLK_IO_OP_FLUSH*/:
+ op = BPF_AIO_OP_FS_FSYNC;
+ break;
+ case 3 /*UBLK_IO_OP_DISCARD*/:
+ op = BPF_AIO_OP_FS_FALLOCATE;
+ op_flags = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE;
+ break;
+ case 4 /*UBLK_IO_OP_WRITE_SAME*/:
+ op = BPF_AIO_OP_FS_FALLOCATE;
+ op_flags = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE;
+ break;
+ case 5 /*UBLK_IO_OP_WRITE_ZEROES*/:
+ op = BPF_AIO_OP_FS_FALLOCATE;
+ op_flags = FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ res = -ENOMEM;
+ aio = bpf_aio_alloc(op, 0);
+ if (!aio)
+ goto fail;
+
+ /* attach aio into the specified range of this io command */
+ res = ublk_bpf_attach_and_prep_aio(io, 0, iod->nr_sectors << 9, aio);
+ if (res < 0) {
+ bpf_printk("bpf aio attaching failed %d\n", res);
+ goto fail;
+ }
+
+ /* submit this aio onto the backing file */
+ res = bpf_aio_submit(aio, backing_fd, iod->start_sector << 9,
+ iod->nr_sectors << 9, op_flags);
+ if (res < 0) {
+ bpf_printk("aio submit failed %d\n", res);
+ ublk_loop_comp_and_release_aio(aio, res);
+ }
+ return 0;
+fail:
+ return res;
+}
+
+static inline ublk_bpf_return_t __ublk_loop_handle_io_cmd(const struct ublk_bpf_io *io, unsigned int off)
+{
+ const struct ublksrv_io_desc *iod;
+ int res = -EINVAL;
+ int fd_key = ublk_bpf_get_dev_id(io);
+ int *fd;
+ ublk_bpf_return_t ret = ublk_bpf_return_val(UBLK_BPF_IO_QUEUED, 0);
+
+ iod = ublk_bpf_get_iod(io);
+ if (!iod) {
+ ublk_bpf_complete_io(io, res);
+ return ret;
+ }
+
+ BPF_DBG("ublk dev %u qid %u: handle io cmd tag %u op %u %lx-%d off %u",
+ ublk_bpf_get_dev_id(io),
+ ublk_bpf_get_queue_id(io),
+ ublk_bpf_get_io_tag(io),
+ iod->op_flags & 0xff,
+ iod->start_sector << 9,
+ iod->nr_sectors << 9, off);
+
+ /* retrieve backing file descriptor */
+ fd = bpf_map_lookup_elem(&fd_map, &fd_key);
+ if (!fd) {
+ bpf_printk("can't get FD from %d\n", fd_key);
+ return ret;
+ }
+
+ /* handle this io command by submitting IOs on backing file */
+ res = ublk_loop_submit_backing_io(io, iod, *fd);
+
+exit:
+ /* io cmd can't be completes until this reference is dropped */
+ if (res < 0)
+ ublk_bpf_complete_io(io, io->res);
+
+ return ublk_bpf_return_val(UBLK_BPF_IO_QUEUED, 0);
+}
+
+SEC("struct_ops/ublk_bpf_release_io_cmd")
+void BPF_PROG(ublk_loop_release_io_cmd, struct ublk_bpf_io *io)
+{
+ BPF_DBG("%s: released io command %d", __func__, io->res);
+}
+
+SEC("struct_ops.s/ublk_bpf_queue_io_cmd_daemon")
+ublk_bpf_return_t BPF_PROG(ublk_loop_handle_io_cmd, struct ublk_bpf_io *io, unsigned int off)
+{
+ return __ublk_loop_handle_io_cmd(io, off);
+}
+
+SEC(".struct_ops.link")
+struct ublk_bpf_ops loop_ublk_bpf_ops = {
+ .id = 16,
+ .queue_io_cmd_daemon = (void *)ublk_loop_handle_io_cmd,
+ .release_io_cmd = (void *)ublk_loop_release_io_cmd,
+};
+
+char LICENSE[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/ublk/test_common.sh b/tools/testing/selftests/ublk/test_common.sh
index 466b82e77860..4727a6ec9734 100755
--- a/tools/testing/selftests/ublk/test_common.sh
+++ b/tools/testing/selftests/ublk/test_common.sh
@@ -70,3 +70,50 @@ _add_ublk_dev() {
fi
udevadm settle
}
+
+_create_backfile() {
+ local my_size=$1
+ local my_file=`mktemp ublk_bpf_${my_size}_XXXXX`
+
+ truncate -s ${my_size} ${my_file}
+ echo $my_file
+}
+
+_remove_backfile() {
+ local file=$1
+
+ [ -f "$file" ] && rm -f $file
+}
+
+_create_tmp_dir() {
+ local my_file=`mktemp -d ublk_bpf_dir_XXXXX`
+
+ echo $my_file
+}
+
+_remove_tmp_dir() {
+ local dir=$1
+
+ [ -d "$dir" ] && rmdir $dir
+}
+
+_mkfs_mount_test()
+{
+ local dev=$1
+ local err_code=0
+ local mnt_dir=`_create_tmp_dir`
+
+ mkfs.ext4 -F $dev > /dev/null 2>&1
+ err_code=$?
+ if [ $err_code -ne 0 ]; then
+ return $err_code
+ fi
+
+ mount -t ext4 $dev $mnt_dir > /dev/null 2>&1
+ umount $dev
+ err_code=$?
+ _remove_tmp_dir $mnt_dir
+ if [ $err_code -ne 0 ]; then
+ return $err_code
+ fi
+}
diff --git a/tools/testing/selftests/ublk/test_loop_01.sh b/tools/testing/selftests/ublk/test_loop_01.sh
new file mode 100755
index 000000000000..10c73ec0a01a
--- /dev/null
+++ b/tools/testing/selftests/ublk/test_loop_01.sh
@@ -0,0 +1,33 @@
+#!/bin/bash
+
+. test_common.sh
+
+TID="loop_01"
+ERR_CODE=0
+
+# prepare & register and pin bpf prog
+_prep_bpf_test "loop" ublk_loop.bpf.o
+
+backfile_0=`_create_backfile 256M`
+
+# add two ublk null disks with the pinned bpf prog
+_add_ublk_dev -t loop -n 0 --bpf_prog 16 --bpf_aio_prog 16 --quiet $backfile_0
+
+# run fio over the ublk disk
+fio --name=write_and_verify \
+ --filename=/dev/ublkb0 \
+ --ioengine=libaio --iodepth=4 \
+ --rw=write \
+ --size=256M \
+ --direct=1 \
+ --verify=crc32c \
+ --do_verify=1 \
+ --bs=4k > /dev/null 2>&1
+ERR_CODE=$?
+
+# cleanup & unregister and unpin the bpf prog
+_cleanup_bpf_test "loop"
+
+_remove_backfile $backfile_0
+
+_show_result $TID $ERR_CODE
diff --git a/tools/testing/selftests/ublk/test_loop_02.sh b/tools/testing/selftests/ublk/test_loop_02.sh
new file mode 100755
index 000000000000..05c3a863f517
--- /dev/null
+++ b/tools/testing/selftests/ublk/test_loop_02.sh
@@ -0,0 +1,24 @@
+#!/bin/bash
+
+. test_common.sh
+
+TID="loop_02"
+ERR_CODE=0
+
+# prepare & register and pin bpf prog
+_prep_bpf_test "loop" ublk_loop.bpf.o
+
+backfile_0=`_create_backfile 256M`
+
+# add two ublk null disks with the pinned bpf prog
+_add_ublk_dev -t loop -n 0 --bpf_prog 16 --bpf_aio_prog 16 --quiet $backfile_0
+
+_mkfs_mount_test /dev/ublkb0
+ERR_CODE=$?
+
+# cleanup & unregister and unpin the bpf prog
+_cleanup_bpf_test "loop"
+
+_remove_backfile $backfile_0
+
+_show_result $TID $ERR_CODE
diff --git a/tools/testing/selftests/ublk/ublk_bpf.c b/tools/testing/selftests/ublk/ublk_bpf.c
index e2c2e92268e1..c24d5e18a1b1 100644
--- a/tools/testing/selftests/ublk/ublk_bpf.c
+++ b/tools/testing/selftests/ublk/ublk_bpf.c
@@ -64,6 +64,7 @@ struct dev_ctx {
int nr_files;
char *files[MAX_BACK_FILES];
int bpf_prog_id;
+ int bpf_aio_prog_id;
unsigned int logging:1;
unsigned int all:1;
};
@@ -107,7 +108,10 @@ struct ublk_tgt {
unsigned int cq_depth;
const struct ublk_tgt_ops *ops;
struct ublk_params params;
- char backing_file[1024 - 8 - sizeof(struct ublk_params)];
+
+ int nr_backing_files;
+ unsigned long backing_file_size[MAX_BACK_FILES];
+ char backing_file[MAX_BACK_FILES][PATH_MAX];
};
struct ublk_queue {
@@ -133,12 +137,13 @@ struct ublk_dev {
struct ublksrv_ctrl_dev_info dev_info;
struct ublk_queue q[UBLK_MAX_QUEUES];
- int fds[2]; /* fds[0] points to /dev/ublkcN */
+ int fds[MAX_BACK_FILES + 1]; /* fds[0] points to /dev/ublkcN */
int nr_fds;
int ctrl_fd;
struct io_uring ring;
int bpf_prog_id;
+ int bpf_aio_prog_id;
};
#ifndef offsetof
@@ -983,7 +988,7 @@ static int cmd_dev_add(struct dev_ctx *ctx)
struct ublk_dev *dev;
int dev_id = ctx->dev_id;
char ublkb[64];
- int ret;
+ int ret, i;
ops = ublk_find_tgt(tgt_type);
if (!ops) {
@@ -1022,6 +1027,13 @@ static int cmd_dev_add(struct dev_ctx *ctx)
dev->tgt.sq_depth = depth;
dev->tgt.cq_depth = depth;
dev->bpf_prog_id = ctx->bpf_prog_id;
+ dev->bpf_aio_prog_id = ctx->bpf_aio_prog_id;
+ for (i = 0; i < MAX_BACK_FILES; i++) {
+ if (ctx->files[i]) {
+ strcpy(dev->tgt.backing_file[i], ctx->files[i]);
+ dev->tgt.nr_backing_files++;
+ }
+ }
ret = ublk_ctrl_add_dev(dev);
if (ret < 0) {
@@ -1271,14 +1283,14 @@ static int cmd_dev_reg_bpf(struct dev_ctx *ctx)
static int cmd_dev_help(char *exe)
{
- printf("%s add -t [null] [-q nr_queues] [-d depth] [-n dev_id] [--bpf_prog ublk_prog_id] [backfile1] [backfile2] ...\n", exe);
+ printf("%s add -t [null|loop] [-q nr_queues] [-d depth] [-n dev_id] [--bpf_prog ublk_prog_id] [--bpf_aio_prog ublk_aio_prog_id] [backfile1] [backfile2] ...\n", exe);
printf("\t default: nr_queues=2(max 4), depth=128(max 128), dev_id=-1(auto allocation)\n");
printf("%s del [-n dev_id] -a \n", exe);
printf("\t -a delete all devices -n delete specified device\n");
printf("%s list [-n dev_id] -a \n", exe);
printf("\t -a list all devices, -n list specified device, default -a \n");
- printf("%s reg -t [null] bpf_prog_obj_path \n", exe);
- printf("%s unreg -t [null]\n", exe);
+ printf("%s reg -t [null|loop] bpf_prog_obj_path \n", exe);
+ printf("%s unreg -t [null|loop]\n", exe);
return 0;
}
@@ -1356,12 +1368,125 @@ static int ublk_null_queue_io(struct ublk_queue *q, int tag)
return 0;
}
+static void backing_file_tgt_deinit(struct ublk_dev *dev)
+{
+ int i;
+
+ for (i = 1; i < dev->nr_fds; i++) {
+ fsync(dev->fds[i]);
+ close(dev->fds[i]);
+ }
+}
+
+static int backing_file_tgt_init(struct ublk_dev *dev)
+{
+ int fd, i;
+
+ assert(dev->nr_fds == 1);
+
+ for (i = 0; i < dev->tgt.nr_backing_files; i++) {
+ char *file = dev->tgt.backing_file[i];
+ unsigned long bytes;
+ struct stat st;
+
+ ublk_dbg(UBLK_DBG_DEV, "%s: file %d: %s\n", __func__, i, file);
+
+ fd = open(file, O_RDWR | O_DIRECT);
+ if (fd < 0) {
+ ublk_err("%s: backing file %s can't be opened: %s\n",
+ __func__, file, strerror(errno));
+ return -EBADF;
+ }
+
+ if (fstat(fd, &st) < 0) {
+ close(fd);
+ return -EBADF;
+ }
+
+ if (S_ISREG(st.st_mode))
+ bytes = st.st_size;
+ else if (S_ISBLK(st.st_mode)) {
+ if (ioctl(fd, BLKGETSIZE64, &bytes) != 0)
+ return -1;
+ } else {
+ return -EINVAL;
+ }
+
+ dev->tgt.backing_file_size[i] = bytes;
+ dev->fds[dev->nr_fds] = fd;
+ dev->nr_fds += 1;
+ }
+
+ return 0;
+}
+
+static int loop_bpf_setup_fd(unsigned dev_id, int fd)
+{
+ int map_fd;
+ int err;
+
+ map_fd = bpf_obj_get("/sys/fs/bpf/ublk/loop/fd_map");
+ if (map_fd < 0) {
+ ublk_err("Error getting map file descriptor from pinned map\n");
+ return -EINVAL;
+ }
+
+ err = bpf_map_update_elem(map_fd, &dev_id, &fd, BPF_ANY);
+ if (err) {
+ ublk_err("Error updating map element: %d\n", errno);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int ublk_loop_tgt_init(struct ublk_dev *dev)
+{
+ unsigned long long bytes;
+ int ret;
+ struct ublk_params p = {
+ .types = UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_BPF,
+ .basic = {
+ .logical_bs_shift = 9,
+ .physical_bs_shift = 12,
+ .io_opt_shift = 12,
+ .io_min_shift = 9,
+ .max_sectors = dev->dev_info.max_io_buf_bytes >> 9,
+ },
+ .bpf = {
+ .flags = UBLK_BPF_HAS_OPS_ID | UBLK_BPF_HAS_AIO_OPS_ID,
+ .ops_id = dev->bpf_prog_id,
+ .aio_ops_id = dev->bpf_aio_prog_id,
+ },
+ };
+
+ assert(dev->tgt.nr_backing_files == 1);
+ ret = backing_file_tgt_init(dev);
+ if (ret)
+ return ret;
+
+ assert(loop_bpf_setup_fd(dev->dev_info.dev_id, dev->fds[1]) == 0);
+
+ bytes = dev->tgt.backing_file_size[0];
+ dev->tgt.dev_size = bytes;
+ p.basic.dev_sectors = bytes >> 9;
+ dev->tgt.params = p;
+
+ return 0;
+}
+
+
static const struct ublk_tgt_ops tgt_ops_list[] = {
{
.name = "null",
.init_tgt = ublk_null_tgt_init,
.queue_io = ublk_null_queue_io,
},
+ {
+ .name = "loop",
+ .init_tgt = ublk_loop_tgt_init,
+ .deinit_tgt = backing_file_tgt_deinit,
+ },
};
static const struct ublk_tgt_ops *ublk_find_tgt(const char *name)
@@ -1389,6 +1514,7 @@ int main(int argc, char *argv[])
{ "debug_mask", 1, NULL, 0 },
{ "quiet", 0, NULL, 0 },
{ "bpf_prog", 1, NULL, 0 },
+ { "bpf_aio_prog", 1, NULL, 0 },
{ 0, 0, 0, 0 }
};
int option_idx, opt;
@@ -1398,6 +1524,7 @@ int main(int argc, char *argv[])
.nr_hw_queues = 2,
.dev_id = -1,
.bpf_prog_id = -1,
+ .bpf_aio_prog_id = -1,
};
int ret = -EINVAL, i;
@@ -1433,6 +1560,8 @@ int main(int argc, char *argv[])
ctx.bpf_prog_id = strtol(optarg, NULL, 10);
ctx.flags |= UBLK_F_BPF;
}
+ if (!strcmp(longopts[option_idx].name, "bpf_aio_prog"))
+ ctx.bpf_aio_prog_id = strtol(optarg, NULL, 10);
break;
}
}
--
2.47.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC PATCH 21/22] selftests: add tests for covering both bpf aio and split
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
` (19 preceding siblings ...)
2025-01-07 12:04 ` [RFC PATCH 20/22] selftests: add tests for ublk bpf aio Ming Lei
@ 2025-01-07 12:04 ` Ming Lei
2025-01-07 12:04 ` [RFC PATCH 22/22] ublk: document ublk-bpf & bpf-aio Ming Lei
21 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-01-07 12:04 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
Ming Lei
Add ublk-stripe for covering both bpf aio and io split features.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
tools/testing/selftests/ublk/Makefile | 3 +
.../selftests/ublk/progs/ublk_stripe.c | 319 ++++++++++++++++++
.../testing/selftests/ublk/test_stripe_01.sh | 35 ++
.../testing/selftests/ublk/test_stripe_02.sh | 26 ++
tools/testing/selftests/ublk/ublk_bpf.c | 88 ++++-
5 files changed, 468 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/ublk/progs/ublk_stripe.c
create mode 100755 tools/testing/selftests/ublk/test_stripe_01.sh
create mode 100755 tools/testing/selftests/ublk/test_stripe_02.sh
diff --git a/tools/testing/selftests/ublk/Makefile b/tools/testing/selftests/ublk/Makefile
index 2540ae7a75a3..7c30c5728694 100644
--- a/tools/testing/selftests/ublk/Makefile
+++ b/tools/testing/selftests/ublk/Makefile
@@ -27,6 +27,9 @@ TEST_PROGS += test_null_04.sh
TEST_PROGS += test_loop_01.sh
TEST_PROGS += test_loop_02.sh
+TEST_PROGS += test_stripe_01.sh
+TEST_PROGS += test_stripe_02.sh
+
# Order correspond to 'make run_tests' order
TEST_GEN_PROGS_EXTENDED = ublk_bpf
diff --git a/tools/testing/selftests/ublk/progs/ublk_stripe.c b/tools/testing/selftests/ublk/progs/ublk_stripe.c
new file mode 100644
index 000000000000..98a59239047c
--- /dev/null
+++ b/tools/testing/selftests/ublk/progs/ublk_stripe.c
@@ -0,0 +1,319 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+#include <linux/const.h>
+#include <linux/errno.h>
+#include <linux/falloc.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+//#define DEBUG
+#include "ublk_bpf.h"
+
+/* libbpf v1.4.5 is required for struct_ops to work */
+
+struct ublk_stripe {
+#define MAX_BACKFILES 4
+ unsigned char chunk_shift;
+ unsigned char nr_backfiles;
+ int fds[MAX_BACKFILES];
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(max_entries, 128);
+ __type(key, unsigned int); /* dev id */
+ __type(value, struct ublk_stripe); /* stripe setting */
+} stripe_map SEC(".maps");
+
+/* todo: make it writable payload of ublk_bpf_io */
+struct ublk_io_payload {
+ unsigned int ref;
+ int res;
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(max_entries, 10240);
+ __type(key, unsigned long long); /* dev_id + q_id + tag */
+ __type(value, struct ublk_io_payload); /* io payload */
+} io_map SEC(".maps");
+
+static inline void dec_stripe_io_ref(const struct ublk_bpf_io *io, struct ublk_io_payload *pv, int ret)
+{
+ if (!pv)
+ return;
+
+ if (pv->res >= 0)
+ pv->res = ret;
+
+ if (!__sync_sub_and_fetch(&pv->ref, 1)) {
+ unsigned rw = (io->iod->op_flags & 0xff);
+
+ if (pv->res >= 0 && (rw <= 1))
+ pv->res = io->iod->nr_sectors << 9;
+ ublk_bpf_complete_io(io, pv->res);
+ }
+}
+
+static inline void ublk_stripe_comp_and_release_aio(struct bpf_aio *aio, int ret)
+{
+ struct ublk_bpf_io *io = ublk_bpf_acquire_io_from_aio(aio);
+ struct ublk_io_payload *pv = NULL;
+ unsigned long long io_key = build_io_key(io);
+
+ if (!io)
+ return;
+
+ io_key = build_io_key(io);
+ pv = bpf_map_lookup_elem(&io_map, &io_key);
+
+ /* drop reference for each underlying aio */
+ dec_stripe_io_ref(io, pv, ret);
+ ublk_bpf_release_io_from_aio(io);
+
+ ublk_bpf_dettach_and_complete_aio(aio);
+ bpf_aio_release(aio);
+}
+
+SEC("struct_ops/bpf_aio_complete_cb")
+void BPF_PROG(ublk_stripe_comp_cb, struct bpf_aio *aio, long ret)
+{
+ BPF_DBG("aio result %d, back_file %s pos %llx", ret,
+ aio->iocb.ki_filp->f_path.dentry->d_name.name,
+ aio->iocb.ki_pos);
+ ublk_stripe_comp_and_release_aio(aio, ret);
+}
+
+SEC(".struct_ops.link")
+struct bpf_aio_complete_ops stripe_ublk_bpf_aio_ops = {
+ .id = 32,
+ .bpf_aio_complete_cb = (void *)ublk_stripe_comp_cb,
+};
+
+static inline int ublk_stripe_submit_backing_io(const struct ublk_bpf_io *io,
+ int backfile_fd, unsigned long backfile_off,
+ unsigned int backfile_bytes,
+ unsigned int buf_off)
+{
+ const struct ublksrv_io_desc *iod = io->iod;
+ unsigned int op_flags = 0;
+ struct bpf_aio *aio;
+ int res = -EINVAL;
+ int op;
+
+ /* translate ublk opcode into backing file's */
+ switch (iod->op_flags & 0xff) {
+ case 0 /*UBLK_IO_OP_READ*/:
+ op = BPF_AIO_OP_FS_READ;
+ break;
+ case 1 /*UBLK_IO_OP_WRITE*/:
+ op = BPF_AIO_OP_FS_WRITE;
+ break;
+ case 2 /*UBLK_IO_OP_FLUSH*/:
+ op = BPF_AIO_OP_FS_FSYNC;
+ break;
+ case 3 /*UBLK_IO_OP_DISCARD*/:
+ op = BPF_AIO_OP_FS_FALLOCATE;
+ op_flags = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE;
+ break;
+ case 4 /*UBLK_IO_OP_WRITE_SAME*/:
+ op = BPF_AIO_OP_FS_FALLOCATE;
+ op_flags = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE;
+ break;
+ case 5 /*UBLK_IO_OP_WRITE_ZEROES*/:
+ op = BPF_AIO_OP_FS_FALLOCATE;
+ op_flags = FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ res = -ENOMEM;
+ aio = bpf_aio_alloc(op, 0);
+ if (!aio)
+ goto fail;
+
+ /* attach aio into the specified range of this io command */
+ res = ublk_bpf_attach_and_prep_aio(io, buf_off, backfile_bytes, aio);
+ if (res < 0) {
+ bpf_printk("bpf aio attaching failed %d\n", res);
+ goto fail;
+ }
+
+ /* submit this aio onto the backing file */
+ res = bpf_aio_submit(aio, backfile_fd, backfile_off, backfile_bytes, op_flags);
+ if (res < 0) {
+ bpf_printk("aio submit failed %d\n", res);
+ ublk_stripe_comp_and_release_aio(aio, res);
+ }
+ return 0;
+fail:
+ return res;
+}
+
+static int calculate_backfile_off_bytes(const struct ublk_stripe *stripe,
+ unsigned long stripe_off, unsigned int stripe_bytes,
+ unsigned long *backfile_off,
+ unsigned int *backfile_bytes)
+{
+ unsigned long chunk_size = 1U << stripe->chunk_shift;
+ unsigned int nr_bf = stripe->nr_backfiles;
+ unsigned long unit_chunk_size = nr_bf << stripe->chunk_shift;
+ unsigned long start_off = stripe_off & ~(chunk_size - 1);
+ unsigned long unit_start_off = stripe_off & ~(unit_chunk_size - 1);
+ unsigned int idx = (start_off - unit_start_off) >> stripe->chunk_shift;
+
+ *backfile_bytes = stripe_bytes;
+ *backfile_off = (unit_start_off / nr_bf) + (idx << stripe->chunk_shift) + (stripe_off - start_off);
+
+ return stripe->fds[idx % MAX_BACKFILES];
+}
+
+static unsigned int calculate_stripe_off_bytes(const struct ublk_stripe *stripe,
+ const struct ublksrv_io_desc *iod, unsigned int this_off,
+ unsigned long *stripe_off)
+{
+ unsigned long off, next_off;
+ unsigned int chunk_size = 1U << stripe->chunk_shift;
+ unsigned int max_size = (iod->nr_sectors << 9) - this_off;
+
+ off = (iod->start_sector << 9) + this_off;
+ next_off = (off & ~(chunk_size - 1)) + chunk_size;;
+
+ *stripe_off = off;
+
+ if (max_size < next_off - off)
+ return max_size;
+ return next_off - off;
+}
+
+static inline ublk_bpf_return_t __ublk_stripe_handle_io_cmd(const struct ublk_bpf_io *io, unsigned int off)
+{
+ ublk_bpf_return_t ret = ublk_bpf_return_val(UBLK_BPF_IO_QUEUED, 0);
+ unsigned long stripe_off, backfile_off;
+ unsigned int stripe_bytes, backfile_bytes;
+ int dev_id = ublk_bpf_get_dev_id(io);
+ const struct ublksrv_io_desc *iod;
+ const struct ublk_stripe *stripe;
+ int res = -EINVAL;
+ int backfile_fd;
+ unsigned long long io_key = build_io_key(io);
+ struct ublk_io_payload pl = {
+ .ref = 2,
+ .res = 0,
+ };
+ struct ublk_io_payload *pv = NULL;
+
+ iod = ublk_bpf_get_iod(io);
+ if (!iod) {
+ ublk_bpf_complete_io(io, res);
+ return ret;
+ }
+
+ BPF_DBG("ublk dev %u qid %u: handle io cmd tag %u op %u %lx-%d off %u",
+ ublk_bpf_get_dev_id(io),
+ ublk_bpf_get_queue_id(io),
+ ublk_bpf_get_io_tag(io),
+ iod->op_flags & 0xff,
+ iod->start_sector << 9,
+ iod->nr_sectors << 9, off);
+
+ /* retrieve backing file descriptor */
+ stripe = bpf_map_lookup_elem(&stripe_map, &dev_id);
+ if (!stripe) {
+ bpf_printk("can't get FD from %d\n", dev_id);
+ return ret;
+ }
+
+ /* todo: build as big chunk as possible for each underlying files/disks */
+ stripe_bytes = calculate_stripe_off_bytes(stripe, iod, off, &stripe_off);
+ backfile_fd = calculate_backfile_off_bytes(stripe, stripe_off, stripe_bytes,
+ &backfile_off, &backfile_bytes);
+ BPF_DBG("\t <chunk_shift %u files %u> stripe(%lx %lu) backfile(%d %lx %lu)",
+ stripe->chunk_shift, stripe->nr_backfiles,
+ stripe_off, stripe_bytes,
+ backfile_fd, backfile_off, backfile_bytes);
+
+ if (!stripe_bytes) {
+ bpf_printk("submit bpf aio failed %d\n", res);
+ res = -EINVAL;
+ goto exit;
+ }
+
+ /* grab one submission reference, and one extra for the whole batch */
+ if (!off) {
+ res = bpf_map_update_elem(&io_map, &io_key, &pl, BPF_ANY);
+ if (res) {
+ bpf_printk("update io map element failed %d key %llx\n", res, io_key);
+ goto exit;
+ }
+ } else {
+ pv = bpf_map_lookup_elem(&io_map, &io_key);
+ if (pv)
+ __sync_fetch_and_add(&pv->ref, 1);
+ }
+
+ /* handle this io command by submitting IOs on backing file */
+ res = ublk_stripe_submit_backing_io(io, backfile_fd, backfile_off, backfile_bytes, off);
+
+exit:
+ /* io cmd can't be completes until this reference is dropped */
+ if (res < 0) {
+ bpf_printk("submit bpf aio failed %d\n", res);
+ ublk_bpf_complete_io(io, res);
+ return ret;
+ }
+
+ /* drop the extra reference for the whole batch */
+ if (off + stripe_bytes == iod->nr_sectors << 9) {
+ if (!pv)
+ pv = bpf_map_lookup_elem(&io_map, &io_key);
+ dec_stripe_io_ref(io, pv, pv ? pv->res : 0);
+ }
+
+ return ublk_bpf_return_val(UBLK_BPF_IO_CONTINUE, stripe_bytes);
+}
+
+SEC("struct_ops/ublk_bpf_release_io_cmd")
+void BPF_PROG(ublk_stripe_release_io_cmd, struct ublk_bpf_io *io)
+{
+ BPF_DBG("%s: complete io command %d", __func__, io->res);
+}
+
+SEC("struct_ops.s/ublk_bpf_queue_io_cmd_daemon")
+ublk_bpf_return_t BPF_PROG(ublk_stripe_handle_io_cmd, struct ublk_bpf_io *io, unsigned int off)
+{
+ return __ublk_stripe_handle_io_cmd(io, off);
+}
+
+SEC("struct_ops/ublk_bpf_attach_dev")
+int BPF_PROG(ublk_stripe_attach_dev, int dev_id)
+{
+ const struct ublk_stripe *stripe;
+
+ /* retrieve backing file descriptor */
+ stripe = bpf_map_lookup_elem(&stripe_map, &dev_id);
+ if (!stripe) {
+ bpf_printk("can't get FD from %d\n", dev_id);
+ return -EINVAL;
+ }
+
+ if (stripe->nr_backfiles >= MAX_BACKFILES)
+ return -EINVAL;
+
+ if (stripe->chunk_shift < 12)
+ return -EINVAL;
+
+ return 0;
+}
+
+SEC(".struct_ops.link")
+struct ublk_bpf_ops stripe_ublk_bpf_ops = {
+ .id = 32,
+ .attach_dev = (void *)ublk_stripe_attach_dev,
+ .queue_io_cmd_daemon = (void *)ublk_stripe_handle_io_cmd,
+ .release_io_cmd = (void *)ublk_stripe_release_io_cmd,
+};
+
+char LICENSE[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/ublk/test_stripe_01.sh b/tools/testing/selftests/ublk/test_stripe_01.sh
new file mode 100755
index 000000000000..3c21f7db495a
--- /dev/null
+++ b/tools/testing/selftests/ublk/test_stripe_01.sh
@@ -0,0 +1,35 @@
+#!/bin/bash
+
+. test_common.sh
+
+TID="stripe_01"
+ERR_CODE=0
+
+# prepare & register and pin bpf prog
+_prep_bpf_test "stripe" ublk_stripe.bpf.o
+
+backfile_0=`_create_backfile 256M`
+backfile_1=`_create_backfile 256M`
+
+# add two ublk null disks with the pinned bpf prog
+_add_ublk_dev -t stripe -n 0 --bpf_prog 32 --bpf_aio_prog 32 --quiet $backfile_0 $backfile_1
+
+# run fio over the ublk disk
+fio --name=write_and_verify \
+ --filename=/dev/ublkb0 \
+ --ioengine=libaio --iodepth=4 \
+ --rw=write \
+ --size=256M \
+ --direct=1 \
+ --verify=crc32c \
+ --do_verify=1 \
+ --bs=4k > /dev/null 2>&1
+ERR_CODE=$?
+
+# cleanup & unregister and unpin the bpf prog
+_cleanup_bpf_test "stripe"
+
+_remove_backfile $backfile_0
+_remove_backfile $backfile_1
+
+_show_result $TID $ERR_CODE
diff --git a/tools/testing/selftests/ublk/test_stripe_02.sh b/tools/testing/selftests/ublk/test_stripe_02.sh
new file mode 100755
index 000000000000..fdbb81dc53d8
--- /dev/null
+++ b/tools/testing/selftests/ublk/test_stripe_02.sh
@@ -0,0 +1,26 @@
+#!/bin/bash
+
+. test_common.sh
+
+TID="stripe_02"
+ERR_CODE=0
+
+# prepare & register and pin bpf prog
+_prep_bpf_test "stripe" ublk_stripe.bpf.o
+
+backfile_0=`_create_backfile 256M`
+backfile_1=`_create_backfile 256M`
+
+# add two ublk null disks with the pinned bpf prog
+_add_ublk_dev -t stripe -n 0 --bpf_prog 32 --bpf_aio_prog 32 --quiet $backfile_0 $backfile_1
+
+_mkfs_mount_test /dev/ublkb0
+ERR_CODE=$?
+
+# cleanup & unregister and unpin the bpf prog
+_cleanup_bpf_test "stripe"
+
+_remove_backfile $backfile_0
+_remove_backfile $backfile_1
+
+_show_result $TID $ERR_CODE
diff --git a/tools/testing/selftests/ublk/ublk_bpf.c b/tools/testing/selftests/ublk/ublk_bpf.c
index c24d5e18a1b1..85b2b4a09e05 100644
--- a/tools/testing/selftests/ublk/ublk_bpf.c
+++ b/tools/testing/selftests/ublk/ublk_bpf.c
@@ -1283,14 +1283,14 @@ static int cmd_dev_reg_bpf(struct dev_ctx *ctx)
static int cmd_dev_help(char *exe)
{
- printf("%s add -t [null|loop] [-q nr_queues] [-d depth] [-n dev_id] [--bpf_prog ublk_prog_id] [--bpf_aio_prog ublk_aio_prog_id] [backfile1] [backfile2] ...\n", exe);
+ printf("%s add -t [null|loop|stripe] [-q nr_queues] [-d depth] [-n dev_id] [--bpf_prog ublk_prog_id] [--bpf_aio_prog ublk_aio_prog_id] [backfile1] [backfile2] ...\n", exe);
printf("\t default: nr_queues=2(max 4), depth=128(max 128), dev_id=-1(auto allocation)\n");
printf("%s del [-n dev_id] -a \n", exe);
printf("\t -a delete all devices -n delete specified device\n");
printf("%s list [-n dev_id] -a \n", exe);
printf("\t -a list all devices, -n list specified device, default -a \n");
- printf("%s reg -t [null|loop] bpf_prog_obj_path \n", exe);
- printf("%s unreg -t [null|loop]\n", exe);
+ printf("%s reg -t [null|loop|stripe] bpf_prog_obj_path \n", exe);
+ printf("%s unreg -t [null|loop|stripe]\n", exe);
return 0;
}
@@ -1475,6 +1475,83 @@ static int ublk_loop_tgt_init(struct ublk_dev *dev)
return 0;
}
+struct ublk_stripe_params {
+ unsigned char chunk_shift;
+ unsigned char nr_backfiles;
+ int fds[MAX_BACK_FILES];
+};
+
+static int stripe_bpf_setup_parameters(struct ublk_dev *dev, unsigned int chunk_shift)
+{
+ int dev_id = dev->dev_info.dev_id;
+ struct ublk_stripe_params stripe = {
+ .chunk_shift = chunk_shift,
+ .nr_backfiles = dev->nr_fds - 1,
+ };
+ int map_fd;
+ int err, i;
+
+ for (i = 0; i < stripe.nr_backfiles; i++)
+ stripe.fds[i] = dev->fds[i + 1];
+
+ map_fd = bpf_obj_get("/sys/fs/bpf/ublk/stripe/stripe_map");
+ if (map_fd < 0) {
+ ublk_err("Error getting map file descriptor\n");
+ return -EINVAL;
+ }
+
+ err = bpf_map_update_elem(map_fd, &dev_id, &stripe, BPF_ANY);
+ if (err) {
+ ublk_err("Error updating map element: %d\n", errno);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int ublk_stripe_tgt_init(struct ublk_dev *dev)
+{
+ unsigned long long bytes = 0;
+ unsigned chunk_shift = 12;
+ int ret, i;
+ struct ublk_params p = {
+ .types = UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_BPF,
+ .basic = {
+ .logical_bs_shift = 9,
+ .physical_bs_shift = 12,
+ .io_opt_shift = 12,
+ .io_min_shift = 9,
+ .max_sectors = dev->dev_info.max_io_buf_bytes >> 9,
+ },
+ .bpf = {
+ .flags = UBLK_BPF_HAS_OPS_ID | UBLK_BPF_HAS_AIO_OPS_ID,
+ .ops_id = dev->bpf_prog_id,
+ .aio_ops_id = dev->bpf_aio_prog_id,
+ },
+ };
+
+ ret = backing_file_tgt_init(dev);
+ if (ret)
+ return ret;
+
+ assert(stripe_bpf_setup_parameters(dev, chunk_shift) == 0);
+
+ for (i = 0; i < dev->nr_fds - 1; i++) {
+ unsigned long size = dev->tgt.backing_file_size[i];
+
+ if (size != dev->tgt.backing_file_size[0])
+ return -EINVAL;
+ if (size & ((1 << chunk_shift) - 1))
+ return -EINVAL;
+ bytes += size;
+ }
+
+ dev->tgt.dev_size = bytes;
+ p.basic.dev_sectors = bytes >> 9;
+ dev->tgt.params = p;
+
+ return 0;
+}
static const struct ublk_tgt_ops tgt_ops_list[] = {
{
@@ -1487,6 +1564,11 @@ static const struct ublk_tgt_ops tgt_ops_list[] = {
.init_tgt = ublk_loop_tgt_init,
.deinit_tgt = backing_file_tgt_deinit,
},
+ {
+ .name = "stripe",
+ .init_tgt = ublk_stripe_tgt_init,
+ .deinit_tgt = backing_file_tgt_deinit,
+ },
};
static const struct ublk_tgt_ops *ublk_find_tgt(const char *name)
--
2.47.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [RFC PATCH 22/22] ublk: document ublk-bpf & bpf-aio
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
` (20 preceding siblings ...)
2025-01-07 12:04 ` [RFC PATCH 21/22] selftests: add tests for covering both bpf aio and split Ming Lei
@ 2025-01-07 12:04 ` Ming Lei
21 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2025-01-07 12:04 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
Ming Lei
Document ublk-bpf motivation and implementation.
Document bpf-aio implementation.
Document ublk-bpf selftests.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
Documentation/block/ublk.rst | 170 +++++++++++++++++++++++++++++++++++
1 file changed, 170 insertions(+)
diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst
index 51665a3e6a50..bf7a3df48036 100644
--- a/Documentation/block/ublk.rst
+++ b/Documentation/block/ublk.rst
@@ -309,6 +309,176 @@ with specified IO tag in the command data:
``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy
the server buffer (pages) read to the IO request pages.
+
+UBLK-BPF support
+================
+
+Motivation
+----------
+
+- support stacking ublk
+
+ There are many 3rd party volume manager, ublk may be built over ublk device
+ for simplifying implementation, however, multiple userspace-kernel context
+ switchs for handling one single IO can't be accepted from performance view
+ of point
+
+ ublk-bpf can avoid user-kernel context switch in most fast io path, so ublk
+ over ublk becomes possible
+
+- complicated virtual block device
+
+ Many complicated virtual block devices have admin&meta code path and normal
+ IO fast path; meta & admin IO handling is usually complicated, so it can be
+ moved to ublk server for relieving development burden; meantime IO fast path
+ can be kept in kernel space for the sake of high performance.
+
+ Bpf provides rich maps, which helps a lot for communication between
+ userspace and prog or between prog and prog.
+
+ One typical example is qcow2, which meta IO handling can be kept in
+ ublk server, and fast IO path is moved to bpf prog. Efficient bpf map can be
+ looked up first and see if this virtual LBA & host LBA mapping is hit in
+ the map. If yes, handle the IO with ublk-bpf directly, otherwise forward to
+ ublk server to populate the mapping first.
+
+- some simple high performance virtual devices
+
+ Such as null & loop, the whole implementation can be moved to bpf prog
+ completely.
+
+- provides chance to get similar performance with kernel driver
+
+ One round of kernel/user context switch is avoided, and one extra IO data
+ copy is saved
+
+bpf aio
+-------
+
+bpf aio exports kfuncs for bpf prog to submit & complete IO in async way.
+IO completion handler is provided by the bpf aio user, which is still
+defined in bpf prog(such as ublk bpf prog) as `struct bpf_aio_complete_ops`
+of bpf struct_ops.
+
+bpf aio is designed as generic interface, which can be used for any bpf prog
+in theory, and it may be move to `/lib/` in future if the interface becomes
+mature and stable enough.
+
+- bpf_aio_alloc()
+
+ Allocate one bpf aio instance of `struct bpf_aio`
+
+- bpf_aio_release()
+
+ Free one bpf aio instance of `struct bpf_aio`
+
+- bpf_aio_submit()
+
+ Submit one bpf aio instance of `struct bpf_aio` in async way.
+
+- `struct bpf_aio_complete_ops`
+
+ Define bpf aio completion callback implemented as bpf struct_ops, and
+ it is called when the submitted bpf aio is completed.
+
+
+ublk bpf implementation
+-----------------------
+
+Export `struct ublk_bpf_ops` as bpf struct_ops, so that ublk IO command
+can be queued or handled in the callback defined in the ublk bpf struct_ops,
+see the whole logic in `ublk_run_bpf_handler`:
+
+- `UBLK_BPF_IO_QUEUED`
+
+ If ->queue_io_cmd() or ->queue_io_cmd_daemon() returns `UBLK_BPF_IO_QUEUED`,
+ this IO command has been queued by bpf prog, so it won't be forwarded to
+ ublk server
+
+- `UBLK_BPF_IO_REDIRECT`
+
+ If ->queue_io_cmd() or ->queue_io_cmd_daemon() returns `UBLK_BPF_IO_REDIRECT`,
+ this IO command will be forwarded to ublk server
+
+- `UBLK_BPF_IO_CONTINUE`
+
+ If ->queue_io_cmd() or ->queue_io_cmd_daemon() returns `UBLK_BPF_IO_CONTINUE`,
+ part of this io command is queued, and `ublk_bpf_return_t` carries how many
+ bytes queued, so ublk driver will continue to call the callback to queue
+ remained bytes of this io command further, this way is helpful for
+ implementing stacking devices by allowing IO command split.
+
+ublk bpf provides kfuncs for ublk bpf prog to queue and handle ublk IO command:
+
+- ublk_bpf_complete_io()
+
+ Complete this ublk IO command
+
+- ublk_bpf_get_io_tag()
+
+ Get tag of this ublk IO command
+
+- ublk_bpf_get_queue_id()
+
+ Get queue id of this ublk IO command
+
+- ublk_bpf_get_dev_id()
+
+ Get device id of this ublk IO command
+
+- ublk_bpf_attach_and_prep_aio()
+
+ Attach & prepare bpf aio to this ublk IO command, bpf aio buffer is
+ prepared, and aio's complete callback is setup, so the user prog can
+ get notified when the bpf aio is completed
+
+- ublk_bpf_dettach_and_complete_aio()
+
+ Detach bpf aio from this IO command, and it is usually called from bpf
+ aio's completion callback.
+
+- ublk_bpf_acquire_io_from_aio()
+
+ Acquire ublk IO command from the aio, one typical use is for calling
+ ublk_bpf_complete_io() to complete ublk IO command
+
+- ublk_bpf_release_io_from_aio()
+
+ Release ublk IO command which is acquired from `ublk_bpf_acquire_io_from_aio`
+
+
+Test
+----
+
+- Build kernel & install kernel headers & reboot & test
+
+ enable CONFIG_BLK_DEV_UBLK & CONFIG_UBLK_BPF
+
+ make
+
+ make headers_install INSTALL_HDR_PATH=/usr
+
+ reboot
+
+ make -C tools/testing/selftests TARGETS=ublk run_test
+
+ublk selftests implements null, loop and stripe targets for covering all
+bpf features:
+
+- complete bpf IO handling
+
+- complete ublk server IO handling
+
+- mixed bpf prog and ublk server IO handling
+
+- bpf aio for loop & stripe
+
+- IO split via `UBLK_BPF_IO_CONTINUE` for implementing ublk-stripe
+
+Write & read verify, and mkfs.ext4 & mount & umount are run in the
+selftest.
+
+
Future development
==================
--
2.47.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [RFC PATCH 08/22] ublk: bpf: add bpf struct_ops
2025-01-07 12:03 ` [RFC PATCH 08/22] ublk: bpf: add bpf struct_ops Ming Lei
@ 2025-01-10 1:43 ` Alexei Starovoitov
2025-01-13 4:08 ` Ming Lei
0 siblings, 1 reply; 28+ messages in thread
From: Alexei Starovoitov @ 2025-01-10 1:43 UTC (permalink / raw)
To: Ming Lei
Cc: Jens Axboe, linux-block, bpf, Alexei Starovoitov,
Martin KaFai Lau, Yonghong Song
On Tue, Jan 7, 2025 at 4:08 AM Ming Lei <tom.leiming@gmail.com> wrote:
> +
> +/* Return true if io cmd is queued, otherwise forward it to userspace */
> +bool ublk_run_bpf_handler(struct ublk_queue *ubq, struct request *req,
> + queue_io_cmd_t cb)
> +{
> + ublk_bpf_return_t ret;
> + struct ublk_rq_data *data = blk_mq_rq_to_pdu(req);
> + struct ublksrv_io_desc *iod = ublk_get_iod(ubq, req->tag);
> + struct ublk_bpf_io *bpf_io = &data->bpf_data;
> + const unsigned long total = iod->nr_sectors << 9;
> + unsigned int done = 0;
> + bool res = true;
> + int err;
> +
> + if (!test_bit(UBLK_BPF_IO_PREP, &bpf_io->flags))
> + ublk_bpf_prep_io(bpf_io, iod);
> +
> + do {
> + enum ublk_bpf_disposition rc;
> + unsigned int bytes;
> +
> + ret = cb(bpf_io, done);
High level observation...
I suspect forcing all sturct_ops callbacks to have only these
two arguments and packing args into ublk_bpf_io
will be limiting in the long term.
And this part of api would need to be redesigned,
but since it's not an uapi... not a big deal.
> + rc = ublk_bpf_get_disposition(ret);
> +
> + if (rc == UBLK_BPF_IO_QUEUED)
> + goto exit;
> +
> + if (rc == UBLK_BPF_IO_REDIRECT)
> + break;
Same point about return value processing...
Each struct_ops callback could have had its own meaning
of retvals.
I suspect it would have been more flexible and more powerful
this way.
Other than that bpf plumbing looks good.
There is an issue with leaking allocated memory in bpf_aio_alloc kfunc
(it probably should be KF_ACQUIRE)
and a few other things, but before doing any in depth review
from bpf pov I'd like to hear what block folks think.
Motivation looks useful,
but the claim of performance gains without performance numbers
is a leap of faith.
> +
> + if (unlikely(rc != UBLK_BPF_IO_CONTINUE)) {
> + printk_ratelimited(KERN_ERR "%s: unknown rc code %d\n",
> + __func__, rc);
> + err = -EINVAL;
> + goto fail;
> + }
> +
> + bytes = ublk_bpf_get_return_bytes(ret);
> + if (unlikely((bytes & 511) || !bytes)) {
> + err = -EREMOTEIO;
> + goto fail;
> + } else if (unlikely(bytes > total - done)) {
> + err = -ENOSPC;
> + goto fail;
> + } else {
> + done += bytes;
> + }
> + } while (done < total);
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH 08/22] ublk: bpf: add bpf struct_ops
2025-01-10 1:43 ` Alexei Starovoitov
@ 2025-01-13 4:08 ` Ming Lei
2025-01-13 21:30 ` Alexei Starovoitov
0 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2025-01-13 4:08 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Jens Axboe, linux-block, bpf, Alexei Starovoitov,
Martin KaFai Lau, Yonghong Song
Hello Alexei,
Thanks for your comments!
On Thu, Jan 09, 2025 at 05:43:12PM -0800, Alexei Starovoitov wrote:
> On Tue, Jan 7, 2025 at 4:08 AM Ming Lei <tom.leiming@gmail.com> wrote:
> > +
> > +/* Return true if io cmd is queued, otherwise forward it to userspace */
> > +bool ublk_run_bpf_handler(struct ublk_queue *ubq, struct request *req,
> > + queue_io_cmd_t cb)
> > +{
> > + ublk_bpf_return_t ret;
> > + struct ublk_rq_data *data = blk_mq_rq_to_pdu(req);
> > + struct ublksrv_io_desc *iod = ublk_get_iod(ubq, req->tag);
> > + struct ublk_bpf_io *bpf_io = &data->bpf_data;
> > + const unsigned long total = iod->nr_sectors << 9;
> > + unsigned int done = 0;
> > + bool res = true;
> > + int err;
> > +
> > + if (!test_bit(UBLK_BPF_IO_PREP, &bpf_io->flags))
> > + ublk_bpf_prep_io(bpf_io, iod);
> > +
> > + do {
> > + enum ublk_bpf_disposition rc;
> > + unsigned int bytes;
> > +
> > + ret = cb(bpf_io, done);
>
> High level observation...
> I suspect forcing all sturct_ops callbacks to have only these
> two arguments and packing args into ublk_bpf_io
> will be limiting in the long term.
There are three callbacks defined, and only the two with same type for
queuing io commands are covered in this function.
But yes, callback type belongs to API, which should be designed
carefully, and I will think about further.
>
> And this part of api would need to be redesigned,
> but since it's not an uapi... not a big deal.
>
> > + rc = ublk_bpf_get_disposition(ret);
> > +
> > + if (rc == UBLK_BPF_IO_QUEUED)
> > + goto exit;
> > +
> > + if (rc == UBLK_BPF_IO_REDIRECT)
> > + break;
>
> Same point about return value processing...
> Each struct_ops callback could have had its own meaning
> of retvals.
> I suspect it would have been more flexible and more powerful
> this way.
Yeah, I agree, just the 3rd callback of release_io_cmd_t isn't covered
in this function.
>
> Other than that bpf plumbing looks good.
>
> There is an issue with leaking allocated memory in bpf_aio_alloc kfunc
> (it probably should be KF_ACQUIRE)
It is one problem which troubles me too:
- another callback of struct_ops/bpf_aio_complete_cb is guaranteed to be
called after the 'struct bpf_aio' instance is submitted via kfunc
bpf_aio_submit(), and it is supposed to be freed from
struct_ops/bpf_aio_complete_cb
- but the following verifier failure is triggered if bpf_aio_alloc and
bpf_aio_release are marked as KF_ACQUIRE & KF_RELEASE.
```
libbpf: prog 'ublk_loop_comp_cb': -- BEGIN PROG LOAD LOG --
Global function ublk_loop_comp_cb() doesn't return scalar. Only those are supported.
```
Here 'struct bpf_aio' instance isn't stored in map, and it is provided
from struct_ops callback(bpf_aio_complete_cb), I appreciate you may share
any idea about how to let KF_ACQUIRE/KF_RELEASE cover the usage here.
> and a few other things, but before doing any in depth review
> from bpf pov I'd like to hear what block folks think.
Me too, look forward to comments from our block guys.
>
> Motivation looks useful,
> but the claim of performance gains without performance numbers
> is a leap of faith.
Follows some data:
1) ublk-null bpf vs. ublk-null with bpf
- 1.97M IOPS vs. 3.7M IOPS
- setup ublk-null
cd tools/testing/selftests/ublk
./ublk_bpf add -t null -q 2
- setup ublk-null wit bpf
cd tools/testing/selftests/ublk
./ublk_bpf reg -t null ./ublk_null.bpf.o
./ublk_bpf add -t null -q 2 --bpf_prog 0
- run `fio/t/io_uring -p 0 /dev/ublkb0`
2) ublk-loop
The built-in utility of `ublk_bpf` only supports bpf io handling, but compared
with ublksrv, the improvement isn't so big, still with ~10%. One reason
is that bpf aio is just started, not optimized, in theory:
- it saves one kernel-user context switch
- save one time of user-kernel IO buffer copy
- much less io handling code footprint compared with userspace io handling
The improvement is supposed to be big especially in big chunk size
IO workload.
Thanks,
Ming
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH 08/22] ublk: bpf: add bpf struct_ops
2025-01-13 4:08 ` Ming Lei
@ 2025-01-13 21:30 ` Alexei Starovoitov
2025-01-15 11:58 ` Ming Lei
0 siblings, 1 reply; 28+ messages in thread
From: Alexei Starovoitov @ 2025-01-13 21:30 UTC (permalink / raw)
To: Ming Lei, Martin KaFai Lau, Amery Hung
Cc: Jens Axboe, linux-block, bpf, Alexei Starovoitov,
Martin KaFai Lau, Yonghong Song
On Sun, Jan 12, 2025 at 8:08 PM Ming Lei <tom.leiming@gmail.com> wrote:
>
> Hello Alexei,
>
> Thanks for your comments!
>
> On Thu, Jan 09, 2025 at 05:43:12PM -0800, Alexei Starovoitov wrote:
> > On Tue, Jan 7, 2025 at 4:08 AM Ming Lei <tom.leiming@gmail.com> wrote:
> > > +
> > > +/* Return true if io cmd is queued, otherwise forward it to userspace */
> > > +bool ublk_run_bpf_handler(struct ublk_queue *ubq, struct request *req,
> > > + queue_io_cmd_t cb)
> > > +{
> > > + ublk_bpf_return_t ret;
> > > + struct ublk_rq_data *data = blk_mq_rq_to_pdu(req);
> > > + struct ublksrv_io_desc *iod = ublk_get_iod(ubq, req->tag);
> > > + struct ublk_bpf_io *bpf_io = &data->bpf_data;
> > > + const unsigned long total = iod->nr_sectors << 9;
> > > + unsigned int done = 0;
> > > + bool res = true;
> > > + int err;
> > > +
> > > + if (!test_bit(UBLK_BPF_IO_PREP, &bpf_io->flags))
> > > + ublk_bpf_prep_io(bpf_io, iod);
> > > +
> > > + do {
> > > + enum ublk_bpf_disposition rc;
> > > + unsigned int bytes;
> > > +
> > > + ret = cb(bpf_io, done);
> >
> > High level observation...
> > I suspect forcing all sturct_ops callbacks to have only these
> > two arguments and packing args into ublk_bpf_io
> > will be limiting in the long term.
>
> There are three callbacks defined, and only the two with same type for
> queuing io commands are covered in this function.
>
> But yes, callback type belongs to API, which should be designed
> carefully, and I will think about further.
>
> >
> > And this part of api would need to be redesigned,
> > but since it's not an uapi... not a big deal.
> >
> > > + rc = ublk_bpf_get_disposition(ret);
> > > +
> > > + if (rc == UBLK_BPF_IO_QUEUED)
> > > + goto exit;
> > > +
> > > + if (rc == UBLK_BPF_IO_REDIRECT)
> > > + break;
> >
> > Same point about return value processing...
> > Each struct_ops callback could have had its own meaning
> > of retvals.
> > I suspect it would have been more flexible and more powerful
> > this way.
>
> Yeah, I agree, just the 3rd callback of release_io_cmd_t isn't covered
> in this function.
>
> >
> > Other than that bpf plumbing looks good.
> >
> > There is an issue with leaking allocated memory in bpf_aio_alloc kfunc
> > (it probably should be KF_ACQUIRE)
>
> It is one problem which troubles me too:
>
> - another callback of struct_ops/bpf_aio_complete_cb is guaranteed to be
> called after the 'struct bpf_aio' instance is submitted via kfunc
> bpf_aio_submit(), and it is supposed to be freed from
> struct_ops/bpf_aio_complete_cb
>
> - but the following verifier failure is triggered if bpf_aio_alloc and
> bpf_aio_release are marked as KF_ACQUIRE & KF_RELEASE.
>
> ```
> libbpf: prog 'ublk_loop_comp_cb': -- BEGIN PROG LOAD LOG --
> Global function ublk_loop_comp_cb() doesn't return scalar. Only those are supported.
> ```
That's odd.
Adding KF_ACQ/REL to bpf_aio_alloc/release kfuncs shouldn't affect
verification of ublk_loop_comp_cb() prog. It's fine for it to stay 'void'
return.
You probably made it global function and that's was the reason for this
verifier error. Global funcs have to return scalar for now.
We can relax this restriction if necessary.
>
> Here 'struct bpf_aio' instance isn't stored in map, and it is provided
> from struct_ops callback(bpf_aio_complete_cb), I appreciate you may share
> any idea about how to let KF_ACQUIRE/KF_RELEASE cover the usage here.
This is so that:
ublk_loop_comp_cb ->
ublk_loop_comp_and_release_aio ->
bpf_aio_release
would properly recognize that ref to aio is dropped?
Currently the verifier doesn't support that,
but there is work in progress to add this feature:
https://lore.kernel.org/bpf/20241220195619.2022866-2-amery.hung@gmail.com/
then in cfi_stabs annotated bio argument in bpf_aio_complete_cb()
as "struct bpf_aio *aio__ref"
Then the verifier will recognize that callback argument
comes refcounted and the prog has to call KF_RELEASE kfunc on it.
>
> > and a few other things, but before doing any in depth review
> > from bpf pov I'd like to hear what block folks think.
>
> Me too, look forward to comments from our block guys.
>
> >
> > Motivation looks useful,
> > but the claim of performance gains without performance numbers
> > is a leap of faith.
>
> Follows some data:
>
> 1) ublk-null bpf vs. ublk-null with bpf
>
> - 1.97M IOPS vs. 3.7M IOPS
>
> - setup ublk-null
>
> cd tools/testing/selftests/ublk
> ./ublk_bpf add -t null -q 2
>
> - setup ublk-null wit bpf
>
> cd tools/testing/selftests/ublk
> ./ublk_bpf reg -t null ./ublk_null.bpf.o
> ./ublk_bpf add -t null -q 2 --bpf_prog 0
>
> - run `fio/t/io_uring -p 0 /dev/ublkb0`
>
> 2) ublk-loop
>
> The built-in utility of `ublk_bpf` only supports bpf io handling, but compared
> with ublksrv, the improvement isn't so big, still with ~10%. One reason
> is that bpf aio is just started, not optimized, in theory:
>
> - it saves one kernel-user context switch
> - save one time of user-kernel IO buffer copy
> - much less io handling code footprint compared with userspace io handling
>
> The improvement is supposed to be big especially in big chunk size
> IO workload.
>
>
> Thanks,
> Ming
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH 08/22] ublk: bpf: add bpf struct_ops
2025-01-13 21:30 ` Alexei Starovoitov
@ 2025-01-15 11:58 ` Ming Lei
2025-01-15 20:11 ` Amery Hung
0 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2025-01-15 11:58 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Martin KaFai Lau, Amery Hung, Jens Axboe, linux-block, bpf,
Alexei Starovoitov, Martin KaFai Lau, Yonghong Song
Hello Alexei,
On Mon, Jan 13, 2025 at 01:30:45PM -0800, Alexei Starovoitov wrote:
> On Sun, Jan 12, 2025 at 8:08 PM Ming Lei <tom.leiming@gmail.com> wrote:
> >
> > Hello Alexei,
> >
> > Thanks for your comments!
> >
> > On Thu, Jan 09, 2025 at 05:43:12PM -0800, Alexei Starovoitov wrote:
> > > On Tue, Jan 7, 2025 at 4:08 AM Ming Lei <tom.leiming@gmail.com> wrote:
> > > > +
> > > > +/* Return true if io cmd is queued, otherwise forward it to userspace */
> > > > +bool ublk_run_bpf_handler(struct ublk_queue *ubq, struct request *req,
> > > > + queue_io_cmd_t cb)
> > > > +{
> > > > + ublk_bpf_return_t ret;
> > > > + struct ublk_rq_data *data = blk_mq_rq_to_pdu(req);
> > > > + struct ublksrv_io_desc *iod = ublk_get_iod(ubq, req->tag);
> > > > + struct ublk_bpf_io *bpf_io = &data->bpf_data;
> > > > + const unsigned long total = iod->nr_sectors << 9;
> > > > + unsigned int done = 0;
> > > > + bool res = true;
> > > > + int err;
> > > > +
> > > > + if (!test_bit(UBLK_BPF_IO_PREP, &bpf_io->flags))
> > > > + ublk_bpf_prep_io(bpf_io, iod);
> > > > +
> > > > + do {
> > > > + enum ublk_bpf_disposition rc;
> > > > + unsigned int bytes;
> > > > +
> > > > + ret = cb(bpf_io, done);
> > >
> > > High level observation...
> > > I suspect forcing all sturct_ops callbacks to have only these
> > > two arguments and packing args into ublk_bpf_io
> > > will be limiting in the long term.
> >
> > There are three callbacks defined, and only the two with same type for
> > queuing io commands are covered in this function.
> >
> > But yes, callback type belongs to API, which should be designed
> > carefully, and I will think about further.
> >
> > >
> > > And this part of api would need to be redesigned,
> > > but since it's not an uapi... not a big deal.
> > >
> > > > + rc = ublk_bpf_get_disposition(ret);
> > > > +
> > > > + if (rc == UBLK_BPF_IO_QUEUED)
> > > > + goto exit;
> > > > +
> > > > + if (rc == UBLK_BPF_IO_REDIRECT)
> > > > + break;
> > >
> > > Same point about return value processing...
> > > Each struct_ops callback could have had its own meaning
> > > of retvals.
> > > I suspect it would have been more flexible and more powerful
> > > this way.
> >
> > Yeah, I agree, just the 3rd callback of release_io_cmd_t isn't covered
> > in this function.
> >
> > >
> > > Other than that bpf plumbing looks good.
> > >
> > > There is an issue with leaking allocated memory in bpf_aio_alloc kfunc
> > > (it probably should be KF_ACQUIRE)
> >
> > It is one problem which troubles me too:
> >
> > - another callback of struct_ops/bpf_aio_complete_cb is guaranteed to be
> > called after the 'struct bpf_aio' instance is submitted via kfunc
> > bpf_aio_submit(), and it is supposed to be freed from
> > struct_ops/bpf_aio_complete_cb
> >
> > - but the following verifier failure is triggered if bpf_aio_alloc and
> > bpf_aio_release are marked as KF_ACQUIRE & KF_RELEASE.
> >
> > ```
> > libbpf: prog 'ublk_loop_comp_cb': -- BEGIN PROG LOAD LOG --
> > Global function ublk_loop_comp_cb() doesn't return scalar. Only those are supported.
> > ```
>
> That's odd.
> Adding KF_ACQ/REL to bpf_aio_alloc/release kfuncs shouldn't affect
> verification of ublk_loop_comp_cb() prog. It's fine for it to stay 'void'
> return.
> You probably made it global function and that's was the reason for this
> verifier error. Global funcs have to return scalar for now.
> We can relax this restriction if necessary.
Looks marking ublk_loop_comp_cb() as static doesn't work:
[root@ktest-40 ublk]# make
CLNG-BPF ublk_loop.bpf.o
GEN-SKEL ublk_loop.skel.h
libbpf: relocation against STT_SECTION in non-exec section is not supported!
Error: failed to link '/root/git/linux/tools/testing/selftests/ublk/ublk_loop.bpf.o': Invalid argument (22)
But seems not big deal because we can change its return type to 'int'.
>
> >
> > Here 'struct bpf_aio' instance isn't stored in map, and it is provided
> > from struct_ops callback(bpf_aio_complete_cb), I appreciate you may share
> > any idea about how to let KF_ACQUIRE/KF_RELEASE cover the usage here.
>
> This is so that:
>
> ublk_loop_comp_cb ->
> ublk_loop_comp_and_release_aio ->
> bpf_aio_release
>
> would properly recognize that ref to aio is dropped?
>
> Currently the verifier doesn't support that,
> but there is work in progress to add this feature:
>
> https://lore.kernel.org/bpf/20241220195619.2022866-2-amery.hung@gmail.com/
>
> then in cfi_stabs annotated bio argument in bpf_aio_complete_cb()
> as "struct bpf_aio *aio__ref"
>
> Then the verifier will recognize that callback argument
> comes refcounted and the prog has to call KF_RELEASE kfunc on it.
This looks one very nice feature, thanks for sharing it!
I tried to apply the above patch and patch 3 on next tree and pass 'aio__ref' to the
callback cfi_stabs, but still failed:
[root@ktest-40 ublk]# ./test_loop_01.sh
libbpf: prog 'ublk_loop_comp_cb': BPF program load failed: -EINVAL
libbpf: prog 'ublk_loop_comp_cb': -- BEGIN PROG LOAD LOG --
0: R1=ctx() R10=fp0
; int BPF_PROG(ublk_loop_comp_cb, struct bpf_aio *aio, long ret) @ ublk_loop.c:34
0: (79) r7 = *(u64 *)(r1 +8) ; R1=ctx() R7_w=scalar()
1: (79) r6 = *(u64 *)(r1 +0)
func 'bpf_aio_complete_cb' arg0 has btf_id 37354 type STRUCT 'bpf_aio'
2: R1=ctx() R6_w=trusted_ptr_bpf_aio()
; struct ublk_bpf_io *io = ublk_bpf_acquire_io_from_aio(aio); @ ublk_loop.c:24
2: (bf) r1 = r6 ; R1_w=trusted_ptr_bpf_aio() R6_w=trusted_ptr_bpf_aio()
3: (85) call ublk_bpf_acquire_io_from_aio#43231 ; R0_w=ptr_ublk_bpf_io(ref_obj_id=1) refs=1
4: (bf) r8 = r0 ; R0_w=ptr_ublk_bpf_io(ref_obj_id=1) R8_w=ptr_ublk_bpf_io(ref_obj_id=1) refs=1
; ublk_bpf_complete_io(io, ret); @ ublk_loop.c:26
5: (bf) r1 = r8 ; R1_w=ptr_ublk_bpf_io(ref_obj_id=1) R8_w=ptr_ublk_bpf_io(ref_obj_id=1) refs=1
6: (bc) w2 = w7 ; R2_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R7_w=scalar() refs=1
7: (85) call ublk_bpf_complete_io#43241 ; refs=1
; ublk_bpf_release_io_from_aio(io); @ ublk_loop.c:27
8: (bf) r1 = r8 ; R1_w=ptr_ublk_bpf_io(ref_obj_id=1) R8=ptr_ublk_bpf_io(ref_obj_id=1) refs=1
9: (85) call ublk_bpf_release_io_from_aio#43257 ;
; ublk_bpf_dettach_and_complete_aio(aio); @ ublk_loop.c:29
10: (bf) r1 = r6 ; R1_w=trusted_ptr_bpf_aio() R6=trusted_ptr_bpf_aio()
11: (85) call ublk_bpf_dettach_and_complete_aio#43245 ;
; bpf_aio_release(aio); @ ublk_loop.c:30
12: (bf) r1 = r6 ; R1_w=trusted_ptr_bpf_aio() R6=trusted_ptr_bpf_aio()
13: (85) call bpf_aio_release#95841
release kernel function bpf_aio_release expects refcounted PTR_TO_BTF_ID
processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
-- END PROG LOAD LOG --
libbpf: prog 'ublk_loop_comp_cb': failed to load: -EINVAL
libbpf: failed to load object 'ublk_loop.bpf.o'
fail to load bpf obj from ublk_loop.bpf.o
fail to register bpf prog loop ublk_loop.bpf.o
Thanks,
Ming
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH 08/22] ublk: bpf: add bpf struct_ops
2025-01-15 11:58 ` Ming Lei
@ 2025-01-15 20:11 ` Amery Hung
0 siblings, 0 replies; 28+ messages in thread
From: Amery Hung @ 2025-01-15 20:11 UTC (permalink / raw)
To: Ming Lei
Cc: Alexei Starovoitov, Martin KaFai Lau, Jens Axboe, linux-block,
bpf, Alexei Starovoitov, Martin KaFai Lau, Yonghong Song
On Wed, Jan 15, 2025 at 3:58 AM Ming Lei <tom.leiming@gmail.com> wrote:
>
> Hello Alexei,
>
> On Mon, Jan 13, 2025 at 01:30:45PM -0800, Alexei Starovoitov wrote:
> > On Sun, Jan 12, 2025 at 8:08 PM Ming Lei <tom.leiming@gmail.com> wrote:
> > >
> > > Hello Alexei,
> > >
> > > Thanks for your comments!
> > >
> > > On Thu, Jan 09, 2025 at 05:43:12PM -0800, Alexei Starovoitov wrote:
> > > > On Tue, Jan 7, 2025 at 4:08 AM Ming Lei <tom.leiming@gmail.com> wrote:
> > > > > +
> > > > > +/* Return true if io cmd is queued, otherwise forward it to userspace */
> > > > > +bool ublk_run_bpf_handler(struct ublk_queue *ubq, struct request *req,
> > > > > + queue_io_cmd_t cb)
> > > > > +{
> > > > > + ublk_bpf_return_t ret;
> > > > > + struct ublk_rq_data *data = blk_mq_rq_to_pdu(req);
> > > > > + struct ublksrv_io_desc *iod = ublk_get_iod(ubq, req->tag);
> > > > > + struct ublk_bpf_io *bpf_io = &data->bpf_data;
> > > > > + const unsigned long total = iod->nr_sectors << 9;
> > > > > + unsigned int done = 0;
> > > > > + bool res = true;
> > > > > + int err;
> > > > > +
> > > > > + if (!test_bit(UBLK_BPF_IO_PREP, &bpf_io->flags))
> > > > > + ublk_bpf_prep_io(bpf_io, iod);
> > > > > +
> > > > > + do {
> > > > > + enum ublk_bpf_disposition rc;
> > > > > + unsigned int bytes;
> > > > > +
> > > > > + ret = cb(bpf_io, done);
> > > >
> > > > High level observation...
> > > > I suspect forcing all sturct_ops callbacks to have only these
> > > > two arguments and packing args into ublk_bpf_io
> > > > will be limiting in the long term.
> > >
> > > There are three callbacks defined, and only the two with same type for
> > > queuing io commands are covered in this function.
> > >
> > > But yes, callback type belongs to API, which should be designed
> > > carefully, and I will think about further.
> > >
> > > >
> > > > And this part of api would need to be redesigned,
> > > > but since it's not an uapi... not a big deal.
> > > >
> > > > > + rc = ublk_bpf_get_disposition(ret);
> > > > > +
> > > > > + if (rc == UBLK_BPF_IO_QUEUED)
> > > > > + goto exit;
> > > > > +
> > > > > + if (rc == UBLK_BPF_IO_REDIRECT)
> > > > > + break;
> > > >
> > > > Same point about return value processing...
> > > > Each struct_ops callback could have had its own meaning
> > > > of retvals.
> > > > I suspect it would have been more flexible and more powerful
> > > > this way.
> > >
> > > Yeah, I agree, just the 3rd callback of release_io_cmd_t isn't covered
> > > in this function.
> > >
> > > >
> > > > Other than that bpf plumbing looks good.
> > > >
> > > > There is an issue with leaking allocated memory in bpf_aio_alloc kfunc
> > > > (it probably should be KF_ACQUIRE)
> > >
> > > It is one problem which troubles me too:
> > >
> > > - another callback of struct_ops/bpf_aio_complete_cb is guaranteed to be
> > > called after the 'struct bpf_aio' instance is submitted via kfunc
> > > bpf_aio_submit(), and it is supposed to be freed from
> > > struct_ops/bpf_aio_complete_cb
> > >
> > > - but the following verifier failure is triggered if bpf_aio_alloc and
> > > bpf_aio_release are marked as KF_ACQUIRE & KF_RELEASE.
> > >
> > > ```
> > > libbpf: prog 'ublk_loop_comp_cb': -- BEGIN PROG LOAD LOG --
> > > Global function ublk_loop_comp_cb() doesn't return scalar. Only those are supported.
> > > ```
> >
> > That's odd.
> > Adding KF_ACQ/REL to bpf_aio_alloc/release kfuncs shouldn't affect
> > verification of ublk_loop_comp_cb() prog. It's fine for it to stay 'void'
> > return.
> > You probably made it global function and that's was the reason for this
> > verifier error. Global funcs have to return scalar for now.
> > We can relax this restriction if necessary.
>
> Looks marking ublk_loop_comp_cb() as static doesn't work:
>
> [root@ktest-40 ublk]# make
> CLNG-BPF ublk_loop.bpf.o
> GEN-SKEL ublk_loop.skel.h
> libbpf: relocation against STT_SECTION in non-exec section is not supported!
> Error: failed to link '/root/git/linux/tools/testing/selftests/ublk/ublk_loop.bpf.o': Invalid argument (22)
>
> But seems not big deal because we can change its return type to 'int'.
>
> >
> > >
> > > Here 'struct bpf_aio' instance isn't stored in map, and it is provided
> > > from struct_ops callback(bpf_aio_complete_cb), I appreciate you may share
> > > any idea about how to let KF_ACQUIRE/KF_RELEASE cover the usage here.
> >
> > This is so that:
> >
> > ublk_loop_comp_cb ->
> > ublk_loop_comp_and_release_aio ->
> > bpf_aio_release
> >
> > would properly recognize that ref to aio is dropped?
> >
> > Currently the verifier doesn't support that,
> > but there is work in progress to add this feature:
> >
> > https://lore.kernel.org/bpf/20241220195619.2022866-2-amery.hung@gmail.com/
> >
> > then in cfi_stabs annotated bio argument in bpf_aio_complete_cb()
> > as "struct bpf_aio *aio__ref"
> >
> > Then the verifier will recognize that callback argument
> > comes refcounted and the prog has to call KF_RELEASE kfunc on it.
>
> This looks one very nice feature, thanks for sharing it!
>
> I tried to apply the above patch and patch 3 on next tree and pass 'aio__ref' to the
> callback cfi_stabs, but still failed:
>
> [root@ktest-40 ublk]# ./test_loop_01.sh
> libbpf: prog 'ublk_loop_comp_cb': BPF program load failed: -EINVAL
> libbpf: prog 'ublk_loop_comp_cb': -- BEGIN PROG LOAD LOG --
> 0: R1=ctx() R10=fp0
> ; int BPF_PROG(ublk_loop_comp_cb, struct bpf_aio *aio, long ret) @ ublk_loop.c:34
> 0: (79) r7 = *(u64 *)(r1 +8) ; R1=ctx() R7_w=scalar()
> 1: (79) r6 = *(u64 *)(r1 +0)
> func 'bpf_aio_complete_cb' arg0 has btf_id 37354 type STRUCT 'bpf_aio'
> 2: R1=ctx() R6_w=trusted_ptr_bpf_aio()
> ; struct ublk_bpf_io *io = ublk_bpf_acquire_io_from_aio(aio); @ ublk_loop.c:24
> 2: (bf) r1 = r6 ; R1_w=trusted_ptr_bpf_aio() R6_w=trusted_ptr_bpf_aio()
> 3: (85) call ublk_bpf_acquire_io_from_aio#43231 ; R0_w=ptr_ublk_bpf_io(ref_obj_id=1) refs=1
> 4: (bf) r8 = r0 ; R0_w=ptr_ublk_bpf_io(ref_obj_id=1) R8_w=ptr_ublk_bpf_io(ref_obj_id=1) refs=1
> ; ublk_bpf_complete_io(io, ret); @ ublk_loop.c:26
> 5: (bf) r1 = r8 ; R1_w=ptr_ublk_bpf_io(ref_obj_id=1) R8_w=ptr_ublk_bpf_io(ref_obj_id=1) refs=1
> 6: (bc) w2 = w7 ; R2_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R7_w=scalar() refs=1
> 7: (85) call ublk_bpf_complete_io#43241 ; refs=1
> ; ublk_bpf_release_io_from_aio(io); @ ublk_loop.c:27
> 8: (bf) r1 = r8 ; R1_w=ptr_ublk_bpf_io(ref_obj_id=1) R8=ptr_ublk_bpf_io(ref_obj_id=1) refs=1
> 9: (85) call ublk_bpf_release_io_from_aio#43257 ;
> ; ublk_bpf_dettach_and_complete_aio(aio); @ ublk_loop.c:29
> 10: (bf) r1 = r6 ; R1_w=trusted_ptr_bpf_aio() R6=trusted_ptr_bpf_aio()
> 11: (85) call ublk_bpf_dettach_and_complete_aio#43245 ;
> ; bpf_aio_release(aio); @ ublk_loop.c:30
> 12: (bf) r1 = r6 ; R1_w=trusted_ptr_bpf_aio() R6=trusted_ptr_bpf_aio()
> 13: (85) call bpf_aio_release#95841
> release kernel function bpf_aio_release expects refcounted PTR_TO_BTF_ID
> processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
> -- END PROG LOAD LOG --
> libbpf: prog 'ublk_loop_comp_cb': failed to load: -EINVAL
> libbpf: failed to load object 'ublk_loop.bpf.o'
> fail to load bpf obj from ublk_loop.bpf.o
> fail to register bpf prog loop ublk_loop.bpf.o
>
Hi Ming,
Your stub function signature does not look quite right.
It should be <struct_ops_name>__<op_name>, hence:
static void bpf_aio_complete_ops__bpf_aio_complete_cb(struct bpf_aio
*io__ref, long ret)
For more detail, look at find_stub_func_proto().
Thanks,
Amery
>
>
> Thanks,
> Ming
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2025-01-15 20:11 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
2025-01-07 12:03 ` [RFC PATCH 01/22] ublk: remove two unused fields from 'struct ublk_queue' Ming Lei
2025-01-07 12:03 ` [RFC PATCH 02/22] ublk: convert several bool type fields into bitfield of `ublk_queue` Ming Lei
2025-01-07 12:03 ` [RFC PATCH 03/22] ublk: add helper of ublk_need_map_io() Ming Lei
2025-01-07 12:03 ` [RFC PATCH 04/22] ublk: move ublk into one standalone directory Ming Lei
2025-01-07 12:03 ` [RFC PATCH 05/22] ublk: move private definitions into private header Ming Lei
2025-01-07 12:03 ` [RFC PATCH 06/22] ublk: move several helpers to " Ming Lei
2025-01-07 12:03 ` [RFC PATCH 07/22] ublk: bpf: add bpf prog attach helpers Ming Lei
2025-01-07 12:03 ` [RFC PATCH 08/22] ublk: bpf: add bpf struct_ops Ming Lei
2025-01-10 1:43 ` Alexei Starovoitov
2025-01-13 4:08 ` Ming Lei
2025-01-13 21:30 ` Alexei Starovoitov
2025-01-15 11:58 ` Ming Lei
2025-01-15 20:11 ` Amery Hung
2025-01-07 12:04 ` [RFC PATCH 09/22] ublk: bpf: attach bpf prog to ublk device Ming Lei
2025-01-07 12:04 ` [RFC PATCH 10/22] ublk: bpf: add kfunc for ublk bpf prog Ming Lei
2025-01-07 12:04 ` [RFC PATCH 11/22] ublk: bpf: enable ublk-bpf Ming Lei
2025-01-07 12:04 ` [RFC PATCH 12/22] selftests: ublk: add tests for the ublk-bpf initial implementation Ming Lei
2025-01-07 12:04 ` [RFC PATCH 13/22] selftests: ublk: add tests for covering io split Ming Lei
2025-01-07 12:04 ` [RFC PATCH 14/22] selftests: ublk: add tests for covering redirecting to userspace Ming Lei
2025-01-07 12:04 ` [RFC PATCH 15/22] ublk: bpf: add bpf aio kfunc Ming Lei
2025-01-07 12:04 ` [RFC PATCH 16/22] ublk: bpf: add bpf aio struct_ops Ming Lei
2025-01-07 12:04 ` [RFC PATCH 17/22] ublk: bpf: attach bpf aio prog to ublk device Ming Lei
2025-01-07 12:04 ` [RFC PATCH 18/22] ublk: bpf: add several ublk bpf aio kfuncs Ming Lei
2025-01-07 12:04 ` [RFC PATCH 19/22] ublk: bpf: wire bpf aio with ublk io handling Ming Lei
2025-01-07 12:04 ` [RFC PATCH 20/22] selftests: add tests for ublk bpf aio Ming Lei
2025-01-07 12:04 ` [RFC PATCH 21/22] selftests: add tests for covering both bpf aio and split Ming Lei
2025-01-07 12:04 ` [RFC PATCH 22/22] ublk: document ublk-bpf & bpf-aio Ming Lei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).