Re: [RFC v2 5/5] io_uring/bpf: add basic kfunc helpers

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Pavel Begunkov <asml.silence@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>,
	io-uring@vger.kernel.org,
	 Martin KaFai Lau <martin.lau@linux.dev>,
	bpf <bpf@vger.kernel.org>,  LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC v2 5/5] io_uring/bpf: add basic kfunc helpers
Date: Fri, 13 Jun 2025 12:51:30 -0700	[thread overview]
Message-ID: <CAADnVQKu6Q1ePFuxxSLNsm-xggZbUEmWb_Y=4zeU54aAt5o6HA@mail.gmail.com> (raw)
In-Reply-To: <415993ef-0238-4fc0-a2e5-acb938ec2b10@gmail.com>

On Fri, Jun 13, 2025 at 9:11 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 6/13/25 01:25, Alexei Starovoitov wrote:
> > On Thu, Jun 12, 2025 at 6:25 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
> ...>>>> +BTF_ID_FLAGS(func, bpf_io_uring_extract_next_cqe, KF_RET_NULL);
> >>>> +BTF_KFUNCS_END(io_uring_kfunc_set)
> >>>
> >>> This is not safe in general.
> >>> The verifier doesn't enforce argument safety here.
> >>> As a minimum you need to add KF_TRUSTED_ARGS flag to all kfunc.
> >>> And once you do that you'll see that the verifier
> >>> doesn't recognize the cqe returned from bpf_io_uring_get_cqe*()
> >>> as trusted.
> >>
> >> Thanks, will add it. If I read it right, without the flag the
> >> program can, for example, create a struct io_ring_ctx on stack,
> >> fill it with nonsense and pass to kfuncs. Is that right?
> >
> > No. The verifier will only allow a pointer to struct io_ring_ctx
> > to be passed, but it may not be fully trusted.
> >
> > The verifier has 3 types of pointers to kernel structures:
> > 1. ptr_to_btf_id
> > 2. ptr_to_btf_id | trusted
> > 3. ptr_to_btf_id | untrusted
> >
> > 1st was added long ago for tracing and gradually got adopted
> > for non-tracing needs, but it has a foot gun, since
> > all pointer walks keep ptr_to_btf_id type.
> > It's fine in some cases to follow pointers, but not in all.
> > Hence 2nd variant was added and there
> > foo->bar dereference needs to be explicitly allowed
> > instead of allowed by default like for 1st kind.
> >
> > All loads through 1 and 3 are implemented as probe_read_kernel.
> > while loads from 2 are direct loads.
> >
> > So kfuncs without KF_TRUSTED_ARGS with struct io_ring_ctx *ctx
> > argument are likely fine and safe, since it's impossible
> > to get this io_ring_ctx pointer by dereferencing some other pointer.
> > But better to tighten safety from the start.
> > We recommend KF_TRUSTED_ARGS for all kfuncs and
> > eventually it will be the default.
>
> Sure, I'll add it, thanks for the explanation
>
> ...>> diff --git a/io_uring/bpf.c b/io_uring/bpf.c
> >> index 9494e4289605..400a06a74b5d 100644
> >> --- a/io_uring/bpf.c
> >> +++ b/io_uring/bpf.c
> >> @@ -2,6 +2,7 @@
> >>    #include <linux/bpf_verifier.h>
> >>
> >>    #include "io_uring.h"
> >> +#include "memmap.h"
> >>    #include "bpf.h"
> >>    #include "register.h"
> >>
> >> @@ -72,6 +73,14 @@ struct io_uring_cqe *bpf_io_uring_extract_next_cqe(struct io_ring_ctx *ctx)
> >>          return cqe;
> >>    }
> >>
> >> +__bpf_kfunc
> >> +void *bpf_io_uring_get_region(struct io_ring_ctx *ctx, u64 size__retsz)
> >> +{
> >> +       if (size__retsz > ((u64)ctx->ring_region.nr_pages << PAGE_SHIFT))
> >> +               return NULL;
> >> +       return io_region_get_ptr(&ctx->ring_region);
> >> +}
> >
> > and bpf prog should be able to read/write anything in
> > [ctx->ring_region->ptr, ..ptr + size] region ?
>
> Right, and it's already rw mmap'ed into the user space.
>
> > Populating (creating) dynptr is probably better.
> > See bpf_dynptr_from*()
> >
> > but what is the lifetime of that memory ?
>
> It's valid within a single run of the callback but shouldn't cross
> into another invocation. Specifically, it's protected by the lock,
> but that can be tuned. Does that match with what PTR_TO_MEM expects?

yes. PTR_TO_MEM lasts for duration of the prog.

> I can add refcounting for longer term pinning, maybe to store it
> as a bpf map or whatever is the right way, but I'd rather avoid
> anything expensive in the kfunc as that'll likely be called on
> every program run.

yeah. let's not add any refcounting.

It sounds like you want something similar to
__bpf_kfunc __u8 *
hid_bpf_get_data(struct hid_bpf_ctx *ctx, unsigned int offset, const
size_t rdwr_buf_size)

we have a special hack for it already in the verifier.
The argument need to be called rdwr_buf_size,
then it will be used to establish the range of PTR_TO_MEM.
It has to be run-time constant.

What you're proposing with "__retsz" is a cleaner version of the same.
But consider bpf_dynptr_from_io_uring(struct io_ring_ctx *ctx)
it can create a dynamically sized region,
and later use bpf_dynptr_slice_rdwr() to get writeable chunk of it.

I feel that __retsz approach may actually be a better fit at the end,
if you're ok with constant arg.

next prev parent reply	other threads:[~2025-06-13 19:51 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-06 13:57 [RFC v2 0/5] BPF controlled io_uring Pavel Begunkov
2025-06-06 13:57 ` [RFC v2 1/5] io_uring: add struct for state controlling cqwait Pavel Begunkov
2025-06-06 13:57 ` [RFC v2 2/5] io_uring/bpf: add stubs for bpf struct_ops Pavel Begunkov
2025-06-06 14:25   ` Jens Axboe
2025-06-06 14:28     ` Jens Axboe
2025-06-06 13:58 ` [RFC v2 3/5] io_uring/bpf: implement struct_ops registration Pavel Begunkov
2025-06-06 14:57   ` Jens Axboe
2025-06-06 20:00     ` Pavel Begunkov
2025-06-06 21:07       ` Jens Axboe
2025-06-06 13:58 ` [RFC v2 4/5] io_uring/bpf: add handle events callback Pavel Begunkov
2025-06-12  2:28   ` Alexei Starovoitov
2025-06-12  9:33     ` Pavel Begunkov
2025-06-12 14:07     ` Jens Axboe
2025-06-06 13:58 ` [RFC v2 5/5] io_uring/bpf: add basic kfunc helpers Pavel Begunkov
2025-06-12  2:47   ` Alexei Starovoitov
2025-06-12 13:26     ` Pavel Begunkov
2025-06-12 14:06       ` Jens Axboe
2025-06-13  0:25       ` Alexei Starovoitov
2025-06-13 16:12         ` Pavel Begunkov
2025-06-13 19:51           ` Alexei Starovoitov [this message]
2025-06-16 20:34             ` Pavel Begunkov
2025-06-06 14:38 ` [RFC v2 0/5] BPF controlled io_uring Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAADnVQKu6Q1ePFuxxSLNsm-xggZbUEmWb_Y=4zeU54aAt5o6HA@mail.gmail.com' \
    --to=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=asml.silence@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).