All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Stefan Metzmacher <metze@samba.org>
Cc: Jens Axboe <axboe@kernel.dk>,
	io-uring@vger.kernel.org,
	Caleb Sander Mateos <csander@purestorage.com>,
	Akilesh Kailash <akailash@google.com>,
	bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>
Subject: Re: [PATCH 3/5] io_uring: bpf: extend io_uring with bpf struct_ops
Date: Thu, 13 Nov 2025 18:59:57 +0800	[thread overview]
Message-ID: <aRW6LfJi63X7wbPm@fedora> (raw)
In-Reply-To: <94f94f0e-7086-4f44-a658-9cb3b5496faf@samba.org>

On Thu, Nov 13, 2025 at 11:32:56AM +0100, Stefan Metzmacher wrote:
> Hi Ming,
> 
> > io_uring can be extended with bpf struct_ops in the following ways:
> > 
> > 1) add new io_uring operation from application
> > - one typical use case is for operating device zero-copy buffer, which
> > belongs to kernel, and not visible or too expensive to export to
> > userspace, such as supporting copy data from this buffer to userspace,
> > decompressing data to zero-copy buffer in Android case[1][2], or
> > checksum/decrypting.
> > 
> > [1] https://lpc.events/event/18/contributions/1710/attachments/1440/3070/LPC2024_ublk_zero_copy.pdf
> > 
> > 2) extend 64 byte SQE, since bpf map can be used to store IO data
> >     conveniently
> > 
> > 3) communicate in IO chain, since bpf map can be shared among IOs,
> > when one bpf IO is completed, data can be written to IO chain wide
> > bpf map, then the following bpf IO can retrieve the data from this bpf
> > map, this way is more flexible than io_uring built-in buffer
> > 
> > 4) pretty handy to inject error for test purpose
> > 
> > bpf struct_ops is one very handy way to attach bpf prog with kernel, and
> > this patch simply wires existed io_uring operation callbacks with added
> > uring bpf struct_ops, so application can define its own uring bpf
> > operations.
> 
> This sounds useful to me.
> 
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >   include/uapi/linux/io_uring.h |   9 ++
> >   io_uring/bpf.c                | 271 +++++++++++++++++++++++++++++++++-
> >   io_uring/io_uring.c           |   1 +
> >   io_uring/io_uring.h           |   3 +-
> >   io_uring/uring_bpf.h          |  30 ++++
> >   5 files changed, 311 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> > index b8c49813b4e5..94d2050131ac 100644
> > --- a/include/uapi/linux/io_uring.h
> > +++ b/include/uapi/linux/io_uring.h
> > @@ -74,6 +74,7 @@ struct io_uring_sqe {
> >   		__u32		install_fd_flags;
> >   		__u32		nop_flags;
> >   		__u32		pipe_flags;
> > +		__u32		bpf_op_flags;
> >   	};
> >   	__u64	user_data;	/* data to be passed back at completion time */
> >   	/* pack this to avoid bogus arm OABI complaints */
> > @@ -427,6 +428,13 @@ enum io_uring_op {
> >   #define IORING_RECVSEND_BUNDLE		(1U << 4)
> >   #define IORING_SEND_VECTORIZED		(1U << 5)
> > +/*
> > + * sqe->bpf_op_flags		top 8bits is for storing bpf op
> > + *				The other 24bits are used for bpf prog
> > + */
> > +#define IORING_BPF_OP_BITS	(8)
> > +#define IORING_BPF_OP_SHIFT	(24)
> > +
> >   /*
> >    * cqe.res for IORING_CQE_F_NOTIF if
> >    * IORING_SEND_ZC_REPORT_USAGE was requested
> > @@ -631,6 +639,7 @@ struct io_uring_params {
> >   #define IORING_FEAT_MIN_TIMEOUT		(1U << 15)
> >   #define IORING_FEAT_RW_ATTR		(1U << 16)
> >   #define IORING_FEAT_NO_IOWAIT		(1U << 17)
> > +#define IORING_FEAT_BPF			(1U << 18)
> >   /*
> >    * io_uring_register(2) opcodes and arguments
> > diff --git a/io_uring/bpf.c b/io_uring/bpf.c
> > index bb1e37d1e804..8227be6d5a10 100644
> > --- a/io_uring/bpf.c
> > +++ b/io_uring/bpf.c
> > @@ -4,28 +4,95 @@
> >   #include <linux/kernel.h>
> >   #include <linux/errno.h>
> >   #include <uapi/linux/io_uring.h>
> > +#include <linux/init.h>
> > +#include <linux/types.h>
> > +#include <linux/bpf_verifier.h>
> > +#include <linux/bpf.h>
> > +#include <linux/btf.h>
> > +#include <linux/btf_ids.h>
> > +#include <linux/filter.h>
> >   #include "io_uring.h"
> >   #include "uring_bpf.h"
> > +#define MAX_BPF_OPS_COUNT	(1 << IORING_BPF_OP_BITS)
> > +
> >   static DEFINE_MUTEX(uring_bpf_ctx_lock);
> >   static LIST_HEAD(uring_bpf_ctx_list);
> > +DEFINE_STATIC_SRCU(uring_bpf_srcu);
> > +static struct uring_bpf_ops bpf_ops[MAX_BPF_OPS_COUNT];
> 
> This indicates to me that the whole system with all applications in all namespaces
> need to coordinate in order to use these 256 ops?

So far there is only 62 in-tree io_uring operation defined, I feel 256
should be enough.

> 
> I think in order to have something useful, this should be per
> struct io_ring_ctx and each application should be able to load
> its own bpf programs.

per-ctx requirement looks reasonable, and it shouldn't be hard to
support.

> 
> Something that uses bpf_prog_get_type() based on a bpf_fd
> like SIOCKCMATTACH in net/kcm/kcmsock.c.

I considered per-ctx prog before, one drawback is the prog can't be shared
among io_ring_ctx, which could waste memory. In my ublk case, there can be
lots of devices sharing same bpf prog.



thanks,
Ming


  reply	other threads:[~2025-11-13 11:01 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-04 16:21 [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
2025-11-04 16:21 ` [PATCH 1/5] io_uring: prepare for extending io_uring with bpf Ming Lei
2025-12-31  1:13   ` Caleb Sander Mateos
2025-12-31  9:33     ` Ming Lei
2025-11-04 16:21 ` [PATCH 2/5] io_uring: bpf: add io_uring_ctx setup for BPF into one list Ming Lei
2025-12-31  1:13   ` Caleb Sander Mateos
2025-12-31  9:49     ` Ming Lei
2025-12-31 16:19       ` Caleb Sander Mateos
2025-11-04 16:21 ` [PATCH 3/5] io_uring: bpf: extend io_uring with bpf struct_ops Ming Lei
2025-11-07 19:02   ` kernel test robot
2025-11-08  6:53   ` kernel test robot
2025-11-13 10:32   ` Stefan Metzmacher
2025-11-13 10:59     ` Ming Lei [this message]
2025-11-13 11:19       ` Stefan Metzmacher
2025-11-14  3:00         ` Ming Lei
2025-12-08 22:45           ` Caleb Sander Mateos
2025-12-09  3:08             ` Ming Lei
2025-12-10 16:11               ` Caleb Sander Mateos
2025-11-19 14:39   ` Jonathan Corbet
2025-11-20  1:46     ` Ming Lei
2025-11-20  1:51       ` Ming Lei
2025-12-31  1:19   ` Caleb Sander Mateos
2025-12-31 10:32     ` Ming Lei
2025-12-31 16:48       ` Caleb Sander Mateos
2025-11-04 16:21 ` [PATCH 4/5] io_uring: bpf: add buffer support for IORING_OP_BPF Ming Lei
2025-11-13 10:42   ` Stefan Metzmacher
2025-11-13 11:04     ` Ming Lei
2025-11-13 11:25       ` Stefan Metzmacher
2025-12-31  1:42   ` Caleb Sander Mateos
2025-12-31 11:02     ` Ming Lei
2025-12-31 17:02       ` Caleb Sander Mateos
2025-11-04 16:21 ` [PATCH 5/5] io_uring: bpf: add io_uring_bpf_req_memcpy() kfunc Ming Lei
2025-11-07 18:51   ` kernel test robot
2025-12-31  1:42   ` Caleb Sander Mateos
2025-11-05 12:47 ` [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring Pavel Begunkov
2025-11-05 15:57   ` Ming Lei
2025-11-06 16:03     ` Pavel Begunkov
2025-11-07 15:54       ` Ming Lei
2025-11-11 14:07         ` Pavel Begunkov
2025-11-13  4:18           ` Ming Lei
2025-11-19 19:00             ` Pavel Begunkov
  -- strict thread matches above, loose matches on Subject: below --
2025-11-07 16:56 [PATCH 3/5] io_uring: bpf: extend io_uring with bpf struct_ops kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRW6LfJi63X7wbPm@fedora \
    --to=ming.lei@redhat.com \
    --cc=akailash@google.com \
    --cc=ast@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bpf@vger.kernel.org \
    --cc=csander@purestorage.com \
    --cc=io-uring@vger.kernel.org \
    --cc=metze@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.