From: Ming Lei <tom.leiming@gmail.com>
To: Jens Axboe <axboe@kernel.dk>, linux-block@vger.kernel.org
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Yonghong Song <yonghong.song@linux.dev>,
Ming Lei <tom.leiming@gmail.com>
Subject: [RFC PATCH 22/22] ublk: document ublk-bpf & bpf-aio
Date: Tue, 7 Jan 2025 20:04:13 +0800 [thread overview]
Message-ID: <20250107120417.1237392-23-tom.leiming@gmail.com> (raw)
In-Reply-To: <20250107120417.1237392-1-tom.leiming@gmail.com>
Document ublk-bpf motivation and implementation.
Document bpf-aio implementation.
Document ublk-bpf selftests.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
Documentation/block/ublk.rst | 170 +++++++++++++++++++++++++++++++++++
1 file changed, 170 insertions(+)
diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst
index 51665a3e6a50..bf7a3df48036 100644
--- a/Documentation/block/ublk.rst
+++ b/Documentation/block/ublk.rst
@@ -309,6 +309,176 @@ with specified IO tag in the command data:
``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy
the server buffer (pages) read to the IO request pages.
+
+UBLK-BPF support
+================
+
+Motivation
+----------
+
+- support stacking ublk
+
+ There are many 3rd party volume manager, ublk may be built over ublk device
+ for simplifying implementation, however, multiple userspace-kernel context
+ switchs for handling one single IO can't be accepted from performance view
+ of point
+
+ ublk-bpf can avoid user-kernel context switch in most fast io path, so ublk
+ over ublk becomes possible
+
+- complicated virtual block device
+
+ Many complicated virtual block devices have admin&meta code path and normal
+ IO fast path; meta & admin IO handling is usually complicated, so it can be
+ moved to ublk server for relieving development burden; meantime IO fast path
+ can be kept in kernel space for the sake of high performance.
+
+ Bpf provides rich maps, which helps a lot for communication between
+ userspace and prog or between prog and prog.
+
+ One typical example is qcow2, which meta IO handling can be kept in
+ ublk server, and fast IO path is moved to bpf prog. Efficient bpf map can be
+ looked up first and see if this virtual LBA & host LBA mapping is hit in
+ the map. If yes, handle the IO with ublk-bpf directly, otherwise forward to
+ ublk server to populate the mapping first.
+
+- some simple high performance virtual devices
+
+ Such as null & loop, the whole implementation can be moved to bpf prog
+ completely.
+
+- provides chance to get similar performance with kernel driver
+
+ One round of kernel/user context switch is avoided, and one extra IO data
+ copy is saved
+
+bpf aio
+-------
+
+bpf aio exports kfuncs for bpf prog to submit & complete IO in async way.
+IO completion handler is provided by the bpf aio user, which is still
+defined in bpf prog(such as ublk bpf prog) as `struct bpf_aio_complete_ops`
+of bpf struct_ops.
+
+bpf aio is designed as generic interface, which can be used for any bpf prog
+in theory, and it may be move to `/lib/` in future if the interface becomes
+mature and stable enough.
+
+- bpf_aio_alloc()
+
+ Allocate one bpf aio instance of `struct bpf_aio`
+
+- bpf_aio_release()
+
+ Free one bpf aio instance of `struct bpf_aio`
+
+- bpf_aio_submit()
+
+ Submit one bpf aio instance of `struct bpf_aio` in async way.
+
+- `struct bpf_aio_complete_ops`
+
+ Define bpf aio completion callback implemented as bpf struct_ops, and
+ it is called when the submitted bpf aio is completed.
+
+
+ublk bpf implementation
+-----------------------
+
+Export `struct ublk_bpf_ops` as bpf struct_ops, so that ublk IO command
+can be queued or handled in the callback defined in the ublk bpf struct_ops,
+see the whole logic in `ublk_run_bpf_handler`:
+
+- `UBLK_BPF_IO_QUEUED`
+
+ If ->queue_io_cmd() or ->queue_io_cmd_daemon() returns `UBLK_BPF_IO_QUEUED`,
+ this IO command has been queued by bpf prog, so it won't be forwarded to
+ ublk server
+
+- `UBLK_BPF_IO_REDIRECT`
+
+ If ->queue_io_cmd() or ->queue_io_cmd_daemon() returns `UBLK_BPF_IO_REDIRECT`,
+ this IO command will be forwarded to ublk server
+
+- `UBLK_BPF_IO_CONTINUE`
+
+ If ->queue_io_cmd() or ->queue_io_cmd_daemon() returns `UBLK_BPF_IO_CONTINUE`,
+ part of this io command is queued, and `ublk_bpf_return_t` carries how many
+ bytes queued, so ublk driver will continue to call the callback to queue
+ remained bytes of this io command further, this way is helpful for
+ implementing stacking devices by allowing IO command split.
+
+ublk bpf provides kfuncs for ublk bpf prog to queue and handle ublk IO command:
+
+- ublk_bpf_complete_io()
+
+ Complete this ublk IO command
+
+- ublk_bpf_get_io_tag()
+
+ Get tag of this ublk IO command
+
+- ublk_bpf_get_queue_id()
+
+ Get queue id of this ublk IO command
+
+- ublk_bpf_get_dev_id()
+
+ Get device id of this ublk IO command
+
+- ublk_bpf_attach_and_prep_aio()
+
+ Attach & prepare bpf aio to this ublk IO command, bpf aio buffer is
+ prepared, and aio's complete callback is setup, so the user prog can
+ get notified when the bpf aio is completed
+
+- ublk_bpf_dettach_and_complete_aio()
+
+ Detach bpf aio from this IO command, and it is usually called from bpf
+ aio's completion callback.
+
+- ublk_bpf_acquire_io_from_aio()
+
+ Acquire ublk IO command from the aio, one typical use is for calling
+ ublk_bpf_complete_io() to complete ublk IO command
+
+- ublk_bpf_release_io_from_aio()
+
+ Release ublk IO command which is acquired from `ublk_bpf_acquire_io_from_aio`
+
+
+Test
+----
+
+- Build kernel & install kernel headers & reboot & test
+
+ enable CONFIG_BLK_DEV_UBLK & CONFIG_UBLK_BPF
+
+ make
+
+ make headers_install INSTALL_HDR_PATH=/usr
+
+ reboot
+
+ make -C tools/testing/selftests TARGETS=ublk run_test
+
+ublk selftests implements null, loop and stripe targets for covering all
+bpf features:
+
+- complete bpf IO handling
+
+- complete ublk server IO handling
+
+- mixed bpf prog and ublk server IO handling
+
+- bpf aio for loop & stripe
+
+- IO split via `UBLK_BPF_IO_CONTINUE` for implementing ublk-stripe
+
+Write & read verify, and mkfs.ext4 & mount & umount are run in the
+selftest.
+
+
Future development
==================
--
2.47.0
prev parent reply other threads:[~2025-01-07 12:09 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
2025-01-07 12:03 ` [RFC PATCH 01/22] ublk: remove two unused fields from 'struct ublk_queue' Ming Lei
2025-01-07 12:03 ` [RFC PATCH 02/22] ublk: convert several bool type fields into bitfield of `ublk_queue` Ming Lei
2025-01-07 12:03 ` [RFC PATCH 03/22] ublk: add helper of ublk_need_map_io() Ming Lei
2025-01-07 12:03 ` [RFC PATCH 04/22] ublk: move ublk into one standalone directory Ming Lei
2025-01-07 12:03 ` [RFC PATCH 05/22] ublk: move private definitions into private header Ming Lei
2025-01-07 12:03 ` [RFC PATCH 06/22] ublk: move several helpers to " Ming Lei
2025-01-07 12:03 ` [RFC PATCH 07/22] ublk: bpf: add bpf prog attach helpers Ming Lei
2025-01-07 12:03 ` [RFC PATCH 08/22] ublk: bpf: add bpf struct_ops Ming Lei
2025-01-10 1:43 ` Alexei Starovoitov
2025-01-13 4:08 ` Ming Lei
2025-01-13 21:30 ` Alexei Starovoitov
2025-01-15 11:58 ` Ming Lei
2025-01-15 20:11 ` Amery Hung
2025-01-07 12:04 ` [RFC PATCH 09/22] ublk: bpf: attach bpf prog to ublk device Ming Lei
2025-01-07 12:04 ` [RFC PATCH 10/22] ublk: bpf: add kfunc for ublk bpf prog Ming Lei
2025-01-07 12:04 ` [RFC PATCH 11/22] ublk: bpf: enable ublk-bpf Ming Lei
2025-01-07 12:04 ` [RFC PATCH 12/22] selftests: ublk: add tests for the ublk-bpf initial implementation Ming Lei
2025-01-07 12:04 ` [RFC PATCH 13/22] selftests: ublk: add tests for covering io split Ming Lei
2025-01-07 12:04 ` [RFC PATCH 14/22] selftests: ublk: add tests for covering redirecting to userspace Ming Lei
2025-01-07 12:04 ` [RFC PATCH 15/22] ublk: bpf: add bpf aio kfunc Ming Lei
2025-01-07 12:04 ` [RFC PATCH 16/22] ublk: bpf: add bpf aio struct_ops Ming Lei
2025-01-07 12:04 ` [RFC PATCH 17/22] ublk: bpf: attach bpf aio prog to ublk device Ming Lei
2025-01-07 12:04 ` [RFC PATCH 18/22] ublk: bpf: add several ublk bpf aio kfuncs Ming Lei
2025-01-07 12:04 ` [RFC PATCH 19/22] ublk: bpf: wire bpf aio with ublk io handling Ming Lei
2025-01-07 12:04 ` [RFC PATCH 20/22] selftests: add tests for ublk bpf aio Ming Lei
2025-01-07 12:04 ` [RFC PATCH 21/22] selftests: add tests for covering both bpf aio and split Ming Lei
2025-01-07 12:04 ` Ming Lei [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250107120417.1237392-23-tom.leiming@gmail.com \
--to=tom.leiming@gmail.com \
--cc=ast@kernel.org \
--cc=axboe@kernel.dk \
--cc=bpf@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).