bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Stefan Metzmacher <metze@samba.org>
Cc: Jens Axboe <axboe@kernel.dk>,
	io-uring@vger.kernel.org,
	Caleb Sander Mateos <csander@purestorage.com>,
	Akilesh Kailash <akailash@google.com>,
	bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>
Subject: Re: [PATCH 3/5] io_uring: bpf: extend io_uring with bpf struct_ops
Date: Fri, 14 Nov 2025 11:00:30 +0800	[thread overview]
Message-ID: <aRabTk29_v6p92mY@fedora> (raw)
In-Reply-To: <05a37623-c78c-4a86-a9f3-c78ce133fa66@samba.org>

On Thu, Nov 13, 2025 at 12:19:33PM +0100, Stefan Metzmacher wrote:
> Am 13.11.25 um 11:59 schrieb Ming Lei:
> > On Thu, Nov 13, 2025 at 11:32:56AM +0100, Stefan Metzmacher wrote:
> > > Hi Ming,
> > > 
> > > > io_uring can be extended with bpf struct_ops in the following ways:
> > > > 
> > > > 1) add new io_uring operation from application
> > > > - one typical use case is for operating device zero-copy buffer, which
> > > > belongs to kernel, and not visible or too expensive to export to
> > > > userspace, such as supporting copy data from this buffer to userspace,
> > > > decompressing data to zero-copy buffer in Android case[1][2], or
> > > > checksum/decrypting.
> > > > 
> > > > [1] https://lpc.events/event/18/contributions/1710/attachments/1440/3070/LPC2024_ublk_zero_copy.pdf
> > > > 
> > > > 2) extend 64 byte SQE, since bpf map can be used to store IO data
> > > >      conveniently
> > > > 
> > > > 3) communicate in IO chain, since bpf map can be shared among IOs,
> > > > when one bpf IO is completed, data can be written to IO chain wide
> > > > bpf map, then the following bpf IO can retrieve the data from this bpf
> > > > map, this way is more flexible than io_uring built-in buffer
> > > > 
> > > > 4) pretty handy to inject error for test purpose
> > > > 
> > > > bpf struct_ops is one very handy way to attach bpf prog with kernel, and
> > > > this patch simply wires existed io_uring operation callbacks with added
> > > > uring bpf struct_ops, so application can define its own uring bpf
> > > > operations.
> > > 
> > > This sounds useful to me.
> > > 
> > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > ---
> > > >    include/uapi/linux/io_uring.h |   9 ++
> > > >    io_uring/bpf.c                | 271 +++++++++++++++++++++++++++++++++-
> > > >    io_uring/io_uring.c           |   1 +
> > > >    io_uring/io_uring.h           |   3 +-
> > > >    io_uring/uring_bpf.h          |  30 ++++
> > > >    5 files changed, 311 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> > > > index b8c49813b4e5..94d2050131ac 100644
> > > > --- a/include/uapi/linux/io_uring.h
> > > > +++ b/include/uapi/linux/io_uring.h
> > > > @@ -74,6 +74,7 @@ struct io_uring_sqe {
> > > >    		__u32		install_fd_flags;
> > > >    		__u32		nop_flags;
> > > >    		__u32		pipe_flags;
> > > > +		__u32		bpf_op_flags;
> > > >    	};
> > > >    	__u64	user_data;	/* data to be passed back at completion time */
> > > >    	/* pack this to avoid bogus arm OABI complaints */
> > > > @@ -427,6 +428,13 @@ enum io_uring_op {
> > > >    #define IORING_RECVSEND_BUNDLE		(1U << 4)
> > > >    #define IORING_SEND_VECTORIZED		(1U << 5)
> > > > +/*
> > > > + * sqe->bpf_op_flags		top 8bits is for storing bpf op
> > > > + *				The other 24bits are used for bpf prog
> > > > + */
> > > > +#define IORING_BPF_OP_BITS	(8)
> > > > +#define IORING_BPF_OP_SHIFT	(24)
> > > > +
> > > >    /*
> > > >     * cqe.res for IORING_CQE_F_NOTIF if
> > > >     * IORING_SEND_ZC_REPORT_USAGE was requested
> > > > @@ -631,6 +639,7 @@ struct io_uring_params {
> > > >    #define IORING_FEAT_MIN_TIMEOUT		(1U << 15)
> > > >    #define IORING_FEAT_RW_ATTR		(1U << 16)
> > > >    #define IORING_FEAT_NO_IOWAIT		(1U << 17)
> > > > +#define IORING_FEAT_BPF			(1U << 18)
> > > >    /*
> > > >     * io_uring_register(2) opcodes and arguments
> > > > diff --git a/io_uring/bpf.c b/io_uring/bpf.c
> > > > index bb1e37d1e804..8227be6d5a10 100644
> > > > --- a/io_uring/bpf.c
> > > > +++ b/io_uring/bpf.c
> > > > @@ -4,28 +4,95 @@
> > > >    #include <linux/kernel.h>
> > > >    #include <linux/errno.h>
> > > >    #include <uapi/linux/io_uring.h>
> > > > +#include <linux/init.h>
> > > > +#include <linux/types.h>
> > > > +#include <linux/bpf_verifier.h>
> > > > +#include <linux/bpf.h>
> > > > +#include <linux/btf.h>
> > > > +#include <linux/btf_ids.h>
> > > > +#include <linux/filter.h>
> > > >    #include "io_uring.h"
> > > >    #include "uring_bpf.h"
> > > > +#define MAX_BPF_OPS_COUNT	(1 << IORING_BPF_OP_BITS)
> > > > +
> > > >    static DEFINE_MUTEX(uring_bpf_ctx_lock);
> > > >    static LIST_HEAD(uring_bpf_ctx_list);
> > > > +DEFINE_STATIC_SRCU(uring_bpf_srcu);
> > > > +static struct uring_bpf_ops bpf_ops[MAX_BPF_OPS_COUNT];
> > > 
> > > This indicates to me that the whole system with all applications in all namespaces
> > > need to coordinate in order to use these 256 ops?
> > 
> > So far there is only 62 in-tree io_uring operation defined, I feel 256
> > should be enough.
> > 
> > > I think in order to have something useful, this should be per
> > > struct io_ring_ctx and each application should be able to load
> > > its own bpf programs.
> > 
> > per-ctx requirement looks reasonable, and it shouldn't be hard to
> > support.
> > 
> > > 
> > > Something that uses bpf_prog_get_type() based on a bpf_fd
> > > like SIOCKCMATTACH in net/kcm/kcmsock.c.
> > 
> > I considered per-ctx prog before, one drawback is the prog can't be shared
> > among io_ring_ctx, which could waste memory. In my ublk case, there can be
> > lots of devices sharing same bpf prog.
> 
> Can't the ublk instances coordinate and use the same bpf_fd?
> new instances could request it via a unix socket and SCM_RIGHTS
> from a long running loading process. On the other hand do they
> really want to share?

struct_ops is typically registered once, used everywhere, such as
sched_ext and socket example.

This patch follows this usage, so every io_uring application can access it like the
in-kernel operations. 

I can understand the requirement for per-io-ring-ctx struct_ops, which
won't cause conflict among different applications.

For example, ublk/raid5, there are 100 such devices, each device is created in dedicated
process and uses its own io-uring, so 100 same struct_ops prog are registered in memory.
Given struct_ops prog is registered as per-io-ring-ctx, it may not be shared by `bpf_fd`, IMO.

> 
> I don't know much about bpf in details, so I'm wondering in your
> example from
> https://github.com/ming1/liburing/commit/625b69ddde15ad80e078c684ba166f49c1174fa4
> 
> Would memory_map be global in the whole system or would
> each loaded instance of the program have it's own instance of memory_map?
 
bpf map is global.

At default, each loaded prog owns the map, but it may be exported for
others by pinning the map.

It is easy to verify by writing test code in tools/testing/selftests/

But I am not an bpf expert...

Thanks,
Ming


  reply	other threads:[~2025-11-14  3:00 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-04 16:21 [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
2025-11-04 16:21 ` [PATCH 1/5] io_uring: prepare for extending io_uring with bpf Ming Lei
2025-11-04 16:21 ` [PATCH 2/5] io_uring: bpf: add io_uring_ctx setup for BPF into one list Ming Lei
2025-11-04 16:21 ` [PATCH 3/5] io_uring: bpf: extend io_uring with bpf struct_ops Ming Lei
2025-11-07 19:02   ` kernel test robot
2025-11-08  6:53   ` kernel test robot
2025-11-13 10:32   ` Stefan Metzmacher
2025-11-13 10:59     ` Ming Lei
2025-11-13 11:19       ` Stefan Metzmacher
2025-11-14  3:00         ` Ming Lei [this message]
2025-12-08 22:45           ` Caleb Sander Mateos
2025-12-09  3:08             ` Ming Lei
2025-12-10 16:11               ` Caleb Sander Mateos
2025-11-19 14:39   ` Jonathan Corbet
2025-11-20  1:46     ` Ming Lei
2025-11-20  1:51       ` Ming Lei
2025-11-04 16:21 ` [PATCH 4/5] io_uring: bpf: add buffer support for IORING_OP_BPF Ming Lei
2025-11-13 10:42   ` Stefan Metzmacher
2025-11-13 11:04     ` Ming Lei
2025-11-13 11:25       ` Stefan Metzmacher
2025-11-04 16:21 ` [PATCH 5/5] io_uring: bpf: add io_uring_bpf_req_memcpy() kfunc Ming Lei
2025-11-07 18:51   ` kernel test robot
2025-11-05 12:47 ` [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring Pavel Begunkov
2025-11-05 15:57   ` Ming Lei
2025-11-06 16:03     ` Pavel Begunkov
2025-11-07 15:54       ` Ming Lei
2025-11-11 14:07         ` Pavel Begunkov
2025-11-13  4:18           ` Ming Lei
2025-11-19 19:00             ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRabTk29_v6p92mY@fedora \
    --to=ming.lei@redhat.com \
    --cc=akailash@google.com \
    --cc=ast@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bpf@vger.kernel.org \
    --cc=csander@purestorage.com \
    --cc=io-uring@vger.kernel.org \
    --cc=metze@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).