linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <tom.leiming@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, bpf <bpf@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Yonghong Song <yonghong.song@linux.dev>
Subject: Re: [RFC PATCH 08/22] ublk: bpf: add bpf struct_ops
Date: Mon, 13 Jan 2025 12:08:14 +0800	[thread overview]
Message-ID: <Z4SRrrXeoZ2MwH96@fedora> (raw)
In-Reply-To: <CAADnVQLGw07CNpi7=XHJRgBL2ku7Q23nfah07pBc45G+xeTKxw@mail.gmail.com>

Hello Alexei,

Thanks for your comments!

On Thu, Jan 09, 2025 at 05:43:12PM -0800, Alexei Starovoitov wrote:
> On Tue, Jan 7, 2025 at 4:08 AM Ming Lei <tom.leiming@gmail.com> wrote:
> > +
> > +/* Return true if io cmd is queued, otherwise forward it to userspace */
> > +bool ublk_run_bpf_handler(struct ublk_queue *ubq, struct request *req,
> > +                         queue_io_cmd_t cb)
> > +{
> > +       ublk_bpf_return_t ret;
> > +       struct ublk_rq_data *data = blk_mq_rq_to_pdu(req);
> > +       struct ublksrv_io_desc *iod = ublk_get_iod(ubq, req->tag);
> > +       struct ublk_bpf_io *bpf_io = &data->bpf_data;
> > +       const unsigned long total = iod->nr_sectors << 9;
> > +       unsigned int done = 0;
> > +       bool res = true;
> > +       int err;
> > +
> > +       if (!test_bit(UBLK_BPF_IO_PREP, &bpf_io->flags))
> > +               ublk_bpf_prep_io(bpf_io, iod);
> > +
> > +       do {
> > +               enum ublk_bpf_disposition rc;
> > +               unsigned int bytes;
> > +
> > +               ret = cb(bpf_io, done);
> 
> High level observation...
> I suspect forcing all sturct_ops callbacks to have only these
> two arguments and packing args into ublk_bpf_io
> will be limiting in the long term.

There are three callbacks defined, and only the two with same type for
queuing io commands are covered in this function.

But yes, callback type belongs to API, which should be designed
carefully, and I will think about further.

> 
> And this part of api would need to be redesigned,
> but since it's not an uapi... not a big deal.
> 
> > +               rc = ublk_bpf_get_disposition(ret);
> > +
> > +               if (rc == UBLK_BPF_IO_QUEUED)
> > +                       goto exit;
> > +
> > +               if (rc == UBLK_BPF_IO_REDIRECT)
> > +                       break;
> 
> Same point about return value processing...
> Each struct_ops callback could have had its own meaning
> of retvals.
> I suspect it would have been more flexible and more powerful
> this way.

Yeah, I agree, just the 3rd callback of release_io_cmd_t isn't covered
in this function.

> 
> Other than that bpf plumbing looks good.
> 
> There is an issue with leaking allocated memory in bpf_aio_alloc kfunc
> (it probably should be KF_ACQUIRE)

It is one problem which troubles me too:

- another callback of struct_ops/bpf_aio_complete_cb is guaranteed to be
called after the 'struct bpf_aio' instance is submitted via kfunc
bpf_aio_submit(), and it is supposed to be freed from
struct_ops/bpf_aio_complete_cb

- but the following verifier failure is triggered if bpf_aio_alloc and
bpf_aio_release are marked as KF_ACQUIRE & KF_RELEASE.

```
libbpf: prog 'ublk_loop_comp_cb': -- BEGIN PROG LOAD LOG --
Global function ublk_loop_comp_cb() doesn't return scalar. Only those are supported.
```

Here 'struct bpf_aio' instance isn't stored in map, and it is provided
from struct_ops callback(bpf_aio_complete_cb), I appreciate you may share
any idea about how to let KF_ACQUIRE/KF_RELEASE cover the usage here.

> and a few other things, but before doing any in depth review
> from bpf pov I'd like to hear what block folks think.

Me too, look forward to comments from our block guys.

> 
> Motivation looks useful,
> but the claim of performance gains without performance numbers
> is a leap of faith.

Follows some data:

1) ublk-null bpf vs. ublk-null with bpf

- 1.97M IOPS vs. 3.7M IOPS  

- setup ublk-null

	cd tools/testing/selftests/ublk
	./ublk_bpf add -t null -q 2

- setup ublk-null wit bpf

	cd tools/testing/selftests/ublk
	./ublk_bpf reg -t null ./ublk_null.bpf.o
	./ublk_bpf add -t null -q 2 --bpf_prog 0

- run  `fio/t/io_uring -p 0 /dev/ublkb0`

2) ublk-loop

The built-in utility of `ublk_bpf` only supports bpf io handling, but compared
with ublksrv, the improvement isn't so big, still with ~10%. One reason
is that bpf aio is just started, not optimized, in theory:

- it saves one kernel-user context switch
- save one time of user-kernel IO buffer copy
- much less io handling code footprint compared with userspace io handling

The improvement is supposed to be big especially in big chunk size
IO workload.


Thanks,
Ming

  reply	other threads:[~2025-01-13  4:08 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-07 12:03 [RFC PATCH 00/22] ublk: support bpf Ming Lei
2025-01-07 12:03 ` [RFC PATCH 01/22] ublk: remove two unused fields from 'struct ublk_queue' Ming Lei
2025-01-07 12:03 ` [RFC PATCH 02/22] ublk: convert several bool type fields into bitfield of `ublk_queue` Ming Lei
2025-01-07 12:03 ` [RFC PATCH 03/22] ublk: add helper of ublk_need_map_io() Ming Lei
2025-01-07 12:03 ` [RFC PATCH 04/22] ublk: move ublk into one standalone directory Ming Lei
2025-01-07 12:03 ` [RFC PATCH 05/22] ublk: move private definitions into private header Ming Lei
2025-01-07 12:03 ` [RFC PATCH 06/22] ublk: move several helpers to " Ming Lei
2025-01-07 12:03 ` [RFC PATCH 07/22] ublk: bpf: add bpf prog attach helpers Ming Lei
2025-01-07 12:03 ` [RFC PATCH 08/22] ublk: bpf: add bpf struct_ops Ming Lei
2025-01-10  1:43   ` Alexei Starovoitov
2025-01-13  4:08     ` Ming Lei [this message]
2025-01-13 21:30       ` Alexei Starovoitov
2025-01-15 11:58         ` Ming Lei
2025-01-15 20:11           ` Amery Hung
2025-01-07 12:04 ` [RFC PATCH 09/22] ublk: bpf: attach bpf prog to ublk device Ming Lei
2025-01-07 12:04 ` [RFC PATCH 10/22] ublk: bpf: add kfunc for ublk bpf prog Ming Lei
2025-01-07 12:04 ` [RFC PATCH 11/22] ublk: bpf: enable ublk-bpf Ming Lei
2025-01-07 12:04 ` [RFC PATCH 12/22] selftests: ublk: add tests for the ublk-bpf initial implementation Ming Lei
2025-01-07 12:04 ` [RFC PATCH 13/22] selftests: ublk: add tests for covering io split Ming Lei
2025-01-07 12:04 ` [RFC PATCH 14/22] selftests: ublk: add tests for covering redirecting to userspace Ming Lei
2025-01-07 12:04 ` [RFC PATCH 15/22] ublk: bpf: add bpf aio kfunc Ming Lei
2025-01-07 12:04 ` [RFC PATCH 16/22] ublk: bpf: add bpf aio struct_ops Ming Lei
2025-01-07 12:04 ` [RFC PATCH 17/22] ublk: bpf: attach bpf aio prog to ublk device Ming Lei
2025-01-07 12:04 ` [RFC PATCH 18/22] ublk: bpf: add several ublk bpf aio kfuncs Ming Lei
2025-01-07 12:04 ` [RFC PATCH 19/22] ublk: bpf: wire bpf aio with ublk io handling Ming Lei
2025-01-07 12:04 ` [RFC PATCH 20/22] selftests: add tests for ublk bpf aio Ming Lei
2025-01-07 12:04 ` [RFC PATCH 21/22] selftests: add tests for covering both bpf aio and split Ming Lei
2025-01-07 12:04 ` [RFC PATCH 22/22] ublk: document ublk-bpf & bpf-aio Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z4SRrrXeoZ2MwH96@fedora \
    --to=tom.leiming@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bpf@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).