All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org,
	Uday Shankar <ushankar@purestorage.com>,
	Stefani Seibold <stefani@seibold.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH V4 14/27] ublk: add UBLK_U_IO_FETCH_IO_CMDS for batch I/O processing
Date: Tue, 2 Dec 2025 16:14:35 +0800	[thread overview]
Message-ID: <aS6f68KVuyRxZitY@fedora> (raw)
In-Reply-To: <CADUfDZomo+Jz5oiQkU99+RZhxDqAjdt8B1tg_gj-O7thzqVbhw@mail.gmail.com>

On Mon, Dec 01, 2025 at 05:39:29PM -0800, Caleb Sander Mateos wrote:
> On Mon, Dec 1, 2025 at 5:27 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Mon, Dec 01, 2025 at 09:51:59AM -0800, Caleb Sander Mateos wrote:
> > > On Mon, Dec 1, 2025 at 1:42 AM Ming Lei <ming.lei@redhat.com> wrote:
> > > >
> > > > On Sun, Nov 30, 2025 at 09:55:47PM -0800, Caleb Sander Mateos wrote:
> > > > > On Thu, Nov 20, 2025 at 6:00 PM Ming Lei <ming.lei@redhat.com> wrote:
> > > > > >
> > > > > > Add UBLK_U_IO_FETCH_IO_CMDS command to enable efficient batch processing
> > > > > > of I/O requests. This multishot uring_cmd allows the ublk server to fetch
> > > > > > multiple I/O commands in a single operation, significantly reducing
> > > > > > submission overhead compared to individual FETCH_REQ* commands.
> > > > > >
> > > > > > Key Design Features:
> > > > > >
> > > > > > 1. Multishot Operation: One UBLK_U_IO_FETCH_IO_CMDS can fetch many I/O
> > > > > >    commands, with the batch size limited by the provided buffer length.
> > > > > >
> > > > > > 2. Dynamic Load Balancing: Multiple fetch commands can be submitted
> > > > > >    simultaneously, but only one is active at any time. This enables
> > > > > >    efficient load distribution across multiple server task contexts.
> > > > > >
> > > > > > 3. Implicit State Management: The implementation uses three key variables
> > > > > >    to track state:
> > > > > >    - evts_fifo: Queue of request tags awaiting processing
> > > > > >    - fcmd_head: List of available fetch commands
> > > > > >    - active_fcmd: Currently active fetch command (NULL = none active)
> > > > > >
> > > > > >    States are derived implicitly:
> > > > > >    - IDLE: No fetch commands available
> > > > > >    - READY: Fetch commands available, none active
> > > > > >    - ACTIVE: One fetch command processing events
> > > > > >
> > > > > > 4. Lockless Reader Optimization: The active fetch command can read from
> > > > > >    evts_fifo without locking (single reader guarantee), while writers
> > > > > >    (ublk_queue_rq/ublk_queue_rqs) use evts_lock protection. The memory
> > > > > >    barrier pairing plays key role for the single lockless reader
> > > > > >    optimization.
> > > > > >
> > > > > > Implementation Details:
> > > > > >
> > > > > > - ublk_queue_rq() and ublk_queue_rqs() save request tags to evts_fifo
> > > > > > - __ublk_pick_active_fcmd() selects an available fetch command when
> > > > > >   events arrive and no command is currently active
> > > > >
> > > > > What is __ublk_pick_active_fcmd()? I don't see a function with that name.
> > > >
> > > > It is renamed as __ublk_acquire_fcmd(), and its counter pair is
> > > > __ublk_release_fcmd().
> > >
> > > Okay, update the commit message then?
> > >
> > > >
> > > > >
> > > > > > - ublk_batch_dispatch() moves tags from evts_fifo to the fetch command's
> > > > > >   buffer and posts completion via io_uring_mshot_cmd_post_cqe()
> > > > > > - State transitions are coordinated via evts_lock to maintain consistency
> > > > > >
> > > > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > > > ---
> > > > > >  drivers/block/ublk_drv.c      | 412 +++++++++++++++++++++++++++++++---
> > > > > >  include/uapi/linux/ublk_cmd.h |   7 +
> > > > > >  2 files changed, 388 insertions(+), 31 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > > > > index cc9c92d97349..2e5e392c939e 100644
> > > > > > --- a/drivers/block/ublk_drv.c
> > > > > > +++ b/drivers/block/ublk_drv.c
> > > > > > @@ -93,6 +93,7 @@
> > > > > >
> > > > > >  /* ublk batch fetch uring_cmd */
> > > > > >  struct ublk_batch_fcmd {
> > > > > > +       struct list_head node;
> > > > > >         struct io_uring_cmd *cmd;
> > > > > >         unsigned short buf_group;
> > > > > >  };
> > > > > > @@ -117,7 +118,10 @@ struct ublk_uring_cmd_pdu {
> > > > > >          */
> > > > > >         struct ublk_queue *ubq;
> > > > > >
> > > > > > -       u16 tag;
> > > > > > +       union {
> > > > > > +               u16 tag;
> > > > > > +               struct ublk_batch_fcmd *fcmd; /* batch io only */
> > > > > > +       };
> > > > > >  };
> > > > > >
> > > > > >  struct ublk_batch_io_data {
> > > > > > @@ -229,18 +233,36 @@ struct ublk_queue {
> > > > > >         struct ublk_device *dev;
> > > > > >
> > > > > >         /*
> > > > > > -        * Inflight ublk request tag is saved in this fifo
> > > > > > +        * Batch I/O State Management:
> > > > > > +        *
> > > > > > +        * The batch I/O system uses implicit state management based on the
> > > > > > +        * combination of three key variables below.
> > > > > > +        *
> > > > > > +        * - IDLE: list_empty(&fcmd_head) && !active_fcmd
> > > > > > +        *   No fetch commands available, events queue in evts_fifo
> > > > > > +        *
> > > > > > +        * - READY: !list_empty(&fcmd_head) && !active_fcmd
> > > > > > +        *   Fetch commands available but none processing events
> > > > > >          *
> > > > > > -        * There are multiple writer from ublk_queue_rq() or ublk_queue_rqs(),
> > > > > > -        * so lock is required for storing request tag to fifo
> > > > > > +        * - ACTIVE: active_fcmd
> > > > > > +        *   One fetch command actively processing events from evts_fifo
> > > > > >          *
> > > > > > -        * Make sure just one reader for fetching request from task work
> > > > > > -        * function to ublk server, so no need to grab the lock in reader
> > > > > > -        * side.
> > > > > > +        * Key Invariants:
> > > > > > +        * - At most one active_fcmd at any time (single reader)
> > > > > > +        * - active_fcmd is always from fcmd_head list when non-NULL
> > > > > > +        * - evts_fifo can be read locklessly by the single active reader
> > > > > > +        * - All state transitions require evts_lock protection
> > > > > > +        * - Multiple writers to evts_fifo require lock protection
> > > > > >          */
> > > > > >         struct {
> > > > > >                 DECLARE_KFIFO_PTR(evts_fifo, unsigned short);
> > > > > >                 spinlock_t evts_lock;
> > > > > > +
> > > > > > +               /* List of fetch commands available to process events */
> > > > > > +               struct list_head fcmd_head;
> > > > > > +
> > > > > > +               /* Currently active fetch command (NULL = none active) */
> > > > > > +               struct ublk_batch_fcmd  *active_fcmd;
> > > > > >         }____cacheline_aligned_in_smp;
> > > > > >
> > > > > >         struct ublk_io ios[] __counted_by(q_depth);
> > > > > > @@ -292,12 +314,20 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq);
> > > > > >  static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub,
> > > > > >                 u16 q_id, u16 tag, struct ublk_io *io, size_t offset);
> > > > > >  static inline unsigned int ublk_req_build_flags(struct request *req);
> > > > > > +static void ublk_batch_dispatch(struct ublk_queue *ubq,
> > > > > > +                               struct ublk_batch_io_data *data,
> > > > > > +                               struct ublk_batch_fcmd *fcmd);
> > > > > >
> > > > > >  static inline bool ublk_dev_support_batch_io(const struct ublk_device *ub)
> > > > > >  {
> > > > > >         return false;
> > > > > >  }
> > > > > >
> > > > > > +static inline bool ublk_support_batch_io(const struct ublk_queue *ubq)
> > > > > > +{
> > > > > > +       return false;
> > > > > > +}
> > > > > > +
> > > > > >  static inline void ublk_io_lock(struct ublk_io *io)
> > > > > >  {
> > > > > >         spin_lock(&io->lock);
> > > > > > @@ -624,13 +654,45 @@ static wait_queue_head_t ublk_idr_wq;     /* wait until one idr is freed */
> > > > > >
> > > > > >  static DEFINE_MUTEX(ublk_ctl_mutex);
> > > > > >
> > > > > > +static struct ublk_batch_fcmd *
> > > > > > +ublk_batch_alloc_fcmd(struct io_uring_cmd *cmd)
> > > > > > +{
> > > > > > +       struct ublk_batch_fcmd *fcmd = kzalloc(sizeof(*fcmd), GFP_NOIO);
> > > > >
> > > > > An allocation in the I/O path seems unfortunate. Is there not room to
> > > > > store the struct ublk_batch_fcmd in the io_uring_cmd pdu?
> > > >
> > > > It is allocated once for one mshot request, which covers many IOs.
> > > >
> > > > It can't be held in uring_cmd pdu, but the allocation can be optimized in
> > > > future. Not a big deal in enablement stage.
> > >
> > > Okay, seems fine to optimize it in the future.
> > >
> > > >
> > > > > > +
> > > > > > +       if (fcmd) {
> > > > > > +               fcmd->cmd = cmd;
> > > > > > +               fcmd->buf_group = READ_ONCE(cmd->sqe->buf_index);
> > > > >
> > > > > Is it necessary to store sample this here just to pass it back to the
> > > > > io_uring layer? Wouldn't the io_uring layer already have access to it
> > > > > in struct io_kiocb's buf_index field?
> > > >
> > > > ->buf_group is used by io_uring_cmd_buffer_select(), and this way also
> > > > follows ->buf_index uses in both io_uring/net.c and io_uring/rw.c.
> > > >
> > > >
> > > > io_ring_buffer_select(), so we can't reuse req->buf_index here.
> > >
> > > But io_uring/net.c and io_uring/rw.c both retrieve the buf_group value
> > > from req->buf_index instead of the SQE, for example:
> > > if (req->flags & REQ_F_BUFFER_SELECT)
> > >         sr->buf_group = req->buf_index;
> > >
> > > Seems like it would make sense to do the same for
> > > UBLK_U_IO_FETCH_IO_CMDS. That also saves one pointer dereference here.
> >
> > IMO we shouldn't encourage driver to access `io_kiocb`, however, cmd->sqe
> > is exposed to driver explicitly.
> 
> Right, but we can add a helper in include/linux/io_uring/cmd.h to
> encapsulate accessing the io_kiocb field.

OK, however I'd suggest to do it as one followup optimization for avoiding
cross-tree change.


Thanks,
Ming


  reply	other threads:[~2025-12-02  8:14 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-21  1:58 [PATCH V4 00/27] ublk: add UBLK_F_BATCH_IO Ming Lei
2025-11-21  1:58 ` [PATCH V4 01/27] kfifo: add kfifo_alloc_node() helper for NUMA awareness Ming Lei
2025-11-29 19:12   ` Caleb Sander Mateos
2025-12-01  1:46     ` Ming Lei
2025-12-01  5:58       ` Caleb Sander Mateos
2025-11-21  1:58 ` [PATCH V4 02/27] ublk: add parameter `struct io_uring_cmd *` to ublk_prep_auto_buf_reg() Ming Lei
2025-11-21  1:58 ` [PATCH V4 03/27] ublk: add `union ublk_io_buf` with improved naming Ming Lei
2025-11-21  1:58 ` [PATCH V4 04/27] ublk: refactor auto buffer register in ublk_dispatch_req() Ming Lei
2025-11-21  1:58 ` [PATCH V4 05/27] ublk: pass const pointer to ublk_queue_is_zoned() Ming Lei
2025-11-21  1:58 ` [PATCH V4 06/27] ublk: add helper of __ublk_fetch() Ming Lei
2025-11-21  1:58 ` [PATCH V4 07/27] ublk: define ublk_ch_batch_io_fops for the coming feature F_BATCH_IO Ming Lei
2025-11-21  1:58 ` [PATCH V4 08/27] ublk: prepare for not tracking task context for command batch Ming Lei
2025-11-21  1:58 ` [PATCH V4 09/27] ublk: add new batch command UBLK_U_IO_PREP_IO_CMDS & UBLK_U_IO_COMMIT_IO_CMDS Ming Lei
2025-11-29 19:19   ` Caleb Sander Mateos
2025-11-21  1:58 ` [PATCH V4 10/27] ublk: handle UBLK_U_IO_PREP_IO_CMDS Ming Lei
2025-11-29 19:47   ` Caleb Sander Mateos
2025-11-30 19:25   ` Caleb Sander Mateos
2025-11-21  1:58 ` [PATCH V4 11/27] ublk: handle UBLK_U_IO_COMMIT_IO_CMDS Ming Lei
2025-11-30 16:39   ` Caleb Sander Mateos
2025-12-01 10:25     ` Ming Lei
2025-12-01 16:43       ` Caleb Sander Mateos
2025-11-21  1:58 ` [PATCH V4 12/27] ublk: add io events fifo structure Ming Lei
2025-11-30 16:53   ` Caleb Sander Mateos
2025-12-01  3:04     ` Ming Lei
2025-11-21  1:58 ` [PATCH V4 13/27] ublk: add batch I/O dispatch infrastructure Ming Lei
2025-11-30 19:24   ` Caleb Sander Mateos
2025-11-30 21:37     ` Caleb Sander Mateos
2025-12-01  2:32     ` Ming Lei
2025-12-01 17:37       ` Caleb Sander Mateos
2025-11-21  1:58 ` [PATCH V4 14/27] ublk: add UBLK_U_IO_FETCH_IO_CMDS for batch I/O processing Ming Lei
2025-12-01  5:55   ` Caleb Sander Mateos
2025-12-01  9:41     ` Ming Lei
2025-12-01 17:51       ` Caleb Sander Mateos
2025-12-02  1:27         ` Ming Lei
2025-12-02  1:39           ` Caleb Sander Mateos
2025-12-02  8:14             ` Ming Lei [this message]
2025-12-02 15:20               ` Caleb Sander Mateos
2025-11-21  1:58 ` [PATCH V4 15/27] ublk: abort requests filled in event kfifo Ming Lei
2025-12-01 18:52   ` Caleb Sander Mateos
2025-12-02  1:29     ` Ming Lei
2025-12-01 19:00   ` Caleb Sander Mateos
2025-11-21  1:58 ` [PATCH V4 16/27] ublk: add new feature UBLK_F_BATCH_IO Ming Lei
2025-12-01 21:16   ` Caleb Sander Mateos
2025-12-02  1:44     ` Ming Lei
2025-12-02 16:05       ` Caleb Sander Mateos
2025-12-03  2:21         ` Ming Lei
2025-11-21  1:58 ` [PATCH V4 17/27] ublk: document " Ming Lei
2025-12-01 21:46   ` Caleb Sander Mateos
2025-12-02  1:55     ` Ming Lei
2025-12-02  2:03     ` Ming Lei
2025-11-21  1:58 ` [PATCH V4 18/27] ublk: implement batch request completion via blk_mq_end_request_batch() Ming Lei
2025-12-01 21:55   ` Caleb Sander Mateos
2025-11-21  1:58 ` [PATCH V4 19/27] selftests: ublk: fix user_data truncation for tgt_data >= 256 Ming Lei
2025-11-21  1:58 ` [PATCH V4 20/27] selftests: ublk: replace assert() with ublk_assert() Ming Lei
2025-11-21  1:58 ` [PATCH V4 21/27] selftests: ublk: add ublk_io_buf_idx() for returning io buffer index Ming Lei
2025-11-21  1:58 ` [PATCH V4 22/27] selftests: ublk: add batch buffer management infrastructure Ming Lei
2025-11-21  1:58 ` [PATCH V4 23/27] selftests: ublk: handle UBLK_U_IO_PREP_IO_CMDS Ming Lei
2025-11-21  1:58 ` [PATCH V4 24/27] selftests: ublk: handle UBLK_U_IO_COMMIT_IO_CMDS Ming Lei
2025-11-21  1:58 ` [PATCH V4 25/27] selftests: ublk: handle UBLK_U_IO_FETCH_IO_CMDS Ming Lei
2025-11-21  1:58 ` [PATCH V4 26/27] selftests: ublk: add --batch/-b for enabling F_BATCH_IO Ming Lei
2025-11-21  1:58 ` [PATCH V4 27/27] selftests: ublk: support arbitrary threads/queues combination Ming Lei
2025-11-28 11:59 ` [PATCH V4 00/27] ublk: add UBLK_F_BATCH_IO Ming Lei
2025-11-28 16:19   ` Jens Axboe
2025-11-28 19:07     ` Caleb Sander Mateos
2025-11-29  1:24       ` Ming Lei
2025-11-28 16:22 ` (subset) " Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aS6f68KVuyRxZitY@fedora \
    --to=ming.lei@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=csander@purestorage.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stefani@seibold.net \
    --cc=ushankar@purestorage.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.