public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Jens Axboe <axboe@kernel.dk>,
	"Mohamed Abuelfotoh, Hazem" <abuehaze@amazon.com>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: Re: [PATCH] block: use plug request list tail for one-shot backmerge attempt
Date: Tue, 17 Jun 2025 10:36:06 +0800	[thread overview]
Message-ID: <aFDUlnKVHLJC6VuE@fedora> (raw)
In-Reply-To: <CADUfDZqie84nJeBVJn94UvYqNhY73n41L+tbXOnXdMBqesLDWA@mail.gmail.com>

On Mon, Jun 16, 2025 at 09:01:54AM -0700, Caleb Sander Mateos wrote:
> On Mon, Jun 16, 2025 at 6:11 AM Christoph Hellwig <hch@infradead.org> wrote:
> >
> > On Thu, Jun 12, 2025 at 06:28:47AM -0600, Jens Axboe wrote:
> > > But ideally we'd have that, and just a plain doubly linked list on the
> > > queue/dispatch side. Which makes the list handling there much easier to
> > > follow, as per your patch.
> >
> > Quick hack from the weekend.  This also never deletes the requests from
> > the submission list for the queue_rqs case, so depending on the workload
> > it should touch either the same amount of less cache lines as the
> > existing version.  Only very lightly tested, and ublk is broken and
> > doesn't even compile as it's running out space in the io_uring pdu.
> > I'll need help from someone who knows ublk for that.
> >
> > ---
> > From 07e283303c63fcb694e828380a24ad51f225a228 Mon Sep 17 00:00:00 2001
> > From: Christoph Hellwig <hch@lst.de>
> > Date: Fri, 13 Jun 2025 09:48:40 +0200
> > Subject: block: always use a list_head for requests
> >
> > Use standard double linked lists for the remaining lists of queued up
> > requests. This removes a lot of hairy list manipulation code and allows
> > east reverse walking of the lists, which is used in
> > blk_attempt_plug_merge to improve the merging, and in blk_add_rq_to_plug
> > to look at the correct request.
> >
> > XXX: ublk is broken right now, because there is no space in the io_uring
> > pdu for the list backpointer.
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  block/blk-core.c              |  2 +-
> >  block/blk-merge.c             | 11 +---
> >  block/blk-mq.c                | 97 ++++++++++++++---------------------
> >  drivers/block/null_blk/main.c | 16 +++---
> >  drivers/block/ublk_drv.c      | 43 +++++++---------
> >  drivers/block/virtio_blk.c    | 31 +++++------
> >  drivers/nvme/host/pci.c       | 32 ++++++------
> >  include/linux/blk-mq.h        |  2 +-
> >  include/linux/blkdev.h        |  2 +-
> >  9 files changed, 103 insertions(+), 133 deletions(-)
> >
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index b862c66018f2..29aad939a1e3 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -1127,7 +1127,7 @@ void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios)
> >                 return;
> >
> >         plug->cur_ktime = 0;
> > -       rq_list_init(&plug->mq_list);
> > +       INIT_LIST_HEAD(&plug->mq_list);
> >         rq_list_init(&plug->cached_rqs);
> >         plug->nr_ios = min_t(unsigned short, nr_ios, BLK_MAX_REQUEST_COUNT);
> >         plug->rq_count = 0;
> > diff --git a/block/blk-merge.c b/block/blk-merge.c
> > index 70d704615be5..223941e9ec08 100644
> > --- a/block/blk-merge.c
> > +++ b/block/blk-merge.c
> > @@ -995,17 +995,10 @@ bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
> >         struct blk_plug *plug = current->plug;
> >         struct request *rq;
> >
> > -       if (!plug || rq_list_empty(&plug->mq_list))
> > +       if (!plug)
> >                 return false;
> >
> > -       rq = plug->mq_list.tail;
> > -       if (rq->q == q)
> > -               return blk_attempt_bio_merge(q, rq, bio, nr_segs, false) ==
> > -                       BIO_MERGE_OK;
> > -       else if (!plug->multiple_queues)
> > -               return false;
> > -
> > -       rq_list_for_each(&plug->mq_list, rq) {
> > +       list_for_each_entry_reverse(rq, &plug->mq_list, queuelist) {
> >                 if (rq->q != q)
> >                         continue;
> >                 if (blk_attempt_bio_merge(q, rq, bio, nr_segs, false) ==
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 4806b867e37d..6d56471d4346 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -1378,7 +1378,8 @@ static inline unsigned short blk_plug_max_rq_count(struct blk_plug *plug)
> >
> >  static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
> >  {
> > -       struct request *last = rq_list_peek(&plug->mq_list);
> > +       struct request *last =
> > +               list_last_entry(&plug->mq_list, struct request, queuelist);
> >
> >         if (!plug->rq_count) {
> >                 trace_block_plug(rq->q);
> > @@ -1398,7 +1399,7 @@ static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
> >          */
> >         if (!plug->has_elevator && (rq->rq_flags & RQF_SCHED_TAGS))
> >                 plug->has_elevator = true;
> > -       rq_list_add_tail(&plug->mq_list, rq);
> > +       list_add_tail(&rq->queuelist, &plug->mq_list);
> >         plug->rq_count++;
> >  }
> >
> > @@ -2780,15 +2781,15 @@ static blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last)
> >         return __blk_mq_issue_directly(hctx, rq, last);
> >  }
> >
> > -static void blk_mq_issue_direct(struct rq_list *rqs)
> > +static void blk_mq_issue_direct(struct list_head *rqs)
> >  {
> >         struct blk_mq_hw_ctx *hctx = NULL;
> > -       struct request *rq;
> > +       struct request *rq, *n;
> >         int queued = 0;
> >         blk_status_t ret = BLK_STS_OK;
> >
> > -       while ((rq = rq_list_pop(rqs))) {
> > -               bool last = rq_list_empty(rqs);
> > +       list_for_each_entry_safe(rq, n, rqs, queuelist) {
> > +               list_del_init(&rq->queuelist);
> >
> >                 if (hctx != rq->mq_hctx) {
> >                         if (hctx) {
> > @@ -2798,7 +2799,7 @@ static void blk_mq_issue_direct(struct rq_list *rqs)
> >                         hctx = rq->mq_hctx;
> >                 }
> >
> > -               ret = blk_mq_request_issue_directly(rq, last);
> > +               ret = blk_mq_request_issue_directly(rq, list_empty(rqs));
> >                 switch (ret) {
> >                 case BLK_STS_OK:
> >                         queued++;
> > @@ -2819,45 +2820,18 @@ static void blk_mq_issue_direct(struct rq_list *rqs)
> >                 blk_mq_commit_rqs(hctx, queued, false);
> >  }
> >
> > -static void __blk_mq_flush_list(struct request_queue *q, struct rq_list *rqs)
> > +static void __blk_mq_flush_list(struct request_queue *q, struct list_head *rqs)
> >  {
> >         if (blk_queue_quiesced(q))
> >                 return;
> >         q->mq_ops->queue_rqs(rqs);
> >  }
> >
> > -static unsigned blk_mq_extract_queue_requests(struct rq_list *rqs,
> > -                                             struct rq_list *queue_rqs)
> > -{
> > -       struct request *rq = rq_list_pop(rqs);
> > -       struct request_queue *this_q = rq->q;
> > -       struct request **prev = &rqs->head;
> > -       struct rq_list matched_rqs = {};
> > -       struct request *last = NULL;
> > -       unsigned depth = 1;
> > -
> > -       rq_list_add_tail(&matched_rqs, rq);
> > -       while ((rq = *prev)) {
> > -               if (rq->q == this_q) {
> > -                       /* move rq from rqs to matched_rqs */
> > -                       *prev = rq->rq_next;
> > -                       rq_list_add_tail(&matched_rqs, rq);
> > -                       depth++;
> > -               } else {
> > -                       /* leave rq in rqs */
> > -                       prev = &rq->rq_next;
> > -                       last = rq;
> > -               }
> > -       }
> > -
> > -       rqs->tail = last;
> > -       *queue_rqs = matched_rqs;
> > -       return depth;
> > -}
> > -
> > -static void blk_mq_dispatch_queue_requests(struct rq_list *rqs, unsigned depth)
> > +static void blk_mq_dispatch_queue_requests(struct list_head *rqs,
> > +                                          unsigned depth)
> >  {
> > -       struct request_queue *q = rq_list_peek(rqs)->q;
> > +       struct request *rq = list_first_entry(rqs, struct request, queuelist);
> > +       struct request_queue *q = rq->q;
> >
> >         trace_block_unplug(q, depth, true);
> >
> > @@ -2869,39 +2843,35 @@ static void blk_mq_dispatch_queue_requests(struct rq_list *rqs, unsigned depth)
> >          */
> >         if (q->mq_ops->queue_rqs) {
> >                 blk_mq_run_dispatch_ops(q, __blk_mq_flush_list(q, rqs));
> > -               if (rq_list_empty(rqs))
> > +               if (list_empty(rqs))
> >                         return;
> >         }
> >
> >         blk_mq_run_dispatch_ops(q, blk_mq_issue_direct(rqs));
> >  }
> >
> > -static void blk_mq_dispatch_list(struct rq_list *rqs, bool from_sched)
> > +static void blk_mq_dispatch_list(struct list_head *rqs, bool from_sched)
> >  {
> >         struct blk_mq_hw_ctx *this_hctx = NULL;
> >         struct blk_mq_ctx *this_ctx = NULL;
> > -       struct rq_list requeue_list = {};
> > +       LIST_HEAD(list);
> > +       struct request *rq, *n;
> >         unsigned int depth = 0;
> >         bool is_passthrough = false;
> > -       LIST_HEAD(list);
> > -
> > -       do {
> > -               struct request *rq = rq_list_pop(rqs);
> >
> > +       list_for_each_entry_safe(rq, n, rqs, queuelist) {
> >                 if (!this_hctx) {
> >                         this_hctx = rq->mq_hctx;
> >                         this_ctx = rq->mq_ctx;
> >                         is_passthrough = blk_rq_is_passthrough(rq);
> >                 } else if (this_hctx != rq->mq_hctx || this_ctx != rq->mq_ctx ||
> >                            is_passthrough != blk_rq_is_passthrough(rq)) {
> > -                       rq_list_add_tail(&requeue_list, rq);
> >                         continue;
> >                 }
> > -               list_add_tail(&rq->queuelist, &list);
> > +               list_move_tail(&rq->queuelist, &list);
> >                 depth++;
> > -       } while (!rq_list_empty(rqs));
> > +       }
> >
> > -       *rqs = requeue_list;
> >         trace_block_unplug(this_hctx->queue, depth, !from_sched);
> >
> >         percpu_ref_get(&this_hctx->queue->q_usage_counter);
> > @@ -2921,17 +2891,27 @@ static void blk_mq_dispatch_list(struct rq_list *rqs, bool from_sched)
> >         percpu_ref_put(&this_hctx->queue->q_usage_counter);
> >  }
> >
> > -static void blk_mq_dispatch_multiple_queue_requests(struct rq_list *rqs)
> > +static void blk_mq_dispatch_multiple_queue_requests(struct list_head *rqs)
> >  {
> >         do {
> > -               struct rq_list queue_rqs;
> > -               unsigned depth;
> > +               struct request_queue *this_q = NULL;
> > +               struct request *rq, *n;
> > +               LIST_HEAD(queue_rqs);
> > +               unsigned depth = 0;
> > +
> > +               list_for_each_entry_safe(rq, n, rqs, queuelist) {
> > +                       if (!this_q)
> > +                               this_q = rq->q;
> > +                       if (this_q == rq->q) {
> > +                               list_move_tail(&rq->queuelist, &queue_rqs);
> > +                               depth++;
> > +                       }
> > +               }
> >
> > -               depth = blk_mq_extract_queue_requests(rqs, &queue_rqs);
> >                 blk_mq_dispatch_queue_requests(&queue_rqs, depth);
> > -               while (!rq_list_empty(&queue_rqs))
> > +               while (!list_empty(&queue_rqs))
> >                         blk_mq_dispatch_list(&queue_rqs, false);
> > -       } while (!rq_list_empty(rqs));
> > +       } while (!list_empty(rqs));
> >  }
> >
> >  void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
> > @@ -2955,15 +2935,14 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
> >                         blk_mq_dispatch_multiple_queue_requests(&plug->mq_list);
> >                         return;
> >                 }
> > -
> >                 blk_mq_dispatch_queue_requests(&plug->mq_list, depth);
> > -               if (rq_list_empty(&plug->mq_list))
> > +               if (list_empty(&plug->mq_list))
> >                         return;
> >         }
> >
> >         do {
> >                 blk_mq_dispatch_list(&plug->mq_list, from_schedule);
> > -       } while (!rq_list_empty(&plug->mq_list));
> > +       } while (!list_empty(&plug->mq_list));
> >  }
> >
> >  static void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
> > diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
> > index aa163ae9b2aa..ce3ac928122f 100644
> > --- a/drivers/block/null_blk/main.c
> > +++ b/drivers/block/null_blk/main.c
> > @@ -1694,22 +1694,22 @@ static blk_status_t null_queue_rq(struct blk_mq_hw_ctx *hctx,
> >         return BLK_STS_OK;
> >  }
> >
> > -static void null_queue_rqs(struct rq_list *rqlist)
> > +static void null_queue_rqs(struct list_head *rqlist)
> >  {
> > -       struct rq_list requeue_list = {};
> >         struct blk_mq_queue_data bd = { };
> > +       LIST_HEAD(requeue_list);
> > +       struct request *rq, *n;
> >         blk_status_t ret;
> >
> > -       do {
> > -               struct request *rq = rq_list_pop(rqlist);
> > -
> > +       list_for_each_entry_safe(rq, n, rqlist, queuelist) {
> >                 bd.rq = rq;
> >                 ret = null_queue_rq(rq->mq_hctx, &bd);
> >                 if (ret != BLK_STS_OK)
> > -                       rq_list_add_tail(&requeue_list, rq);
> > -       } while (!rq_list_empty(rqlist));
> > +                       list_move_tail(&rq->queuelist, &requeue_list);
> > +       }
> >
> > -       *rqlist = requeue_list;
> > +       INIT_LIST_HEAD(rqlist);
> > +       list_splice(&requeue_list, rqlist);
> >  }
> >
> >  static void null_init_queue(struct nullb *nullb, struct nullb_queue *nq)
> > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > index c637ea010d34..4d5b88ca7b1b 100644
> > --- a/drivers/block/ublk_drv.c
> > +++ b/drivers/block/ublk_drv.c
> > @@ -101,7 +101,7 @@ struct ublk_uring_cmd_pdu {
> >          */
> >         union {
> >                 struct request *req;
> > -               struct request *req_list;
> > +               struct list_head req_list;
> >         };
> >
> >         /*
> > @@ -1325,24 +1325,18 @@ static void ublk_cmd_list_tw_cb(struct io_uring_cmd *cmd,
> >                 unsigned int issue_flags)
> >  {
> >         struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
> > -       struct request *rq = pdu->req_list;
> > -       struct request *next;
> > +       struct request *rq, *n;
> >
> > -       do {
> > -               next = rq->rq_next;
> > -               rq->rq_next = NULL;
> > +       list_for_each_entry_safe(rq, n, &pdu->req_list, queuelist)
> >                 ublk_dispatch_req(rq->mq_hctx->driver_data, rq, issue_flags);
> > -               rq = next;
> > -       } while (rq);
> >  }
> >
> > -static void ublk_queue_cmd_list(struct ublk_io *io, struct rq_list *l)
> > +static void ublk_queue_cmd_list(struct ublk_io *io, struct list_head *rqlist)
> >  {
> >         struct io_uring_cmd *cmd = io->cmd;
> >         struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
> >
> > -       pdu->req_list = rq_list_peek(l);
> > -       rq_list_init(l);
> > +       list_splice(&pdu->req_list, rqlist);
> >         io_uring_cmd_complete_in_task(cmd, ublk_cmd_list_tw_cb);
> 
> ublk_cmd_list_tw_cb() doesn't need a doubly-linked list. It should be
> fine to continue storing just the first struct request * of the list
> in struct ublk_uring_cmd_pdu. That would avoid growing
> ublk_uring_cmd_pdu past the 32-byte limit.

Agree.

ublk needn't re-order, and doubly-linked list is useless here.


Thanks,
Ming


  reply	other threads:[~2025-06-17  2:36 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-11 14:53 [PATCH] block: use plug request list tail for one-shot backmerge attempt Jens Axboe
2025-06-11 16:55 ` Mohamed Abuelfotoh, Hazem
2025-06-11 17:53   ` Jens Axboe
2025-06-12  5:22     ` Christoph Hellwig
2025-06-12  5:23       ` Christoph Hellwig
2025-06-12 11:49       ` Jens Axboe
2025-06-12 11:56         ` Christoph Hellwig
2025-06-12 12:21           ` Jens Axboe
2025-06-12 12:23             ` Christoph Hellwig
2025-06-12 12:28               ` Jens Axboe
2025-06-16 13:11                 ` Christoph Hellwig
2025-06-16 16:01                   ` Caleb Sander Mateos
2025-06-17  2:36                     ` Ming Lei [this message]
2025-06-17  4:39                       ` Christoph Hellwig
2025-06-18  6:04                   ` Hannes Reinecke
2025-06-12 12:27     ` Mohamed Abuelfotoh, Hazem
2025-06-24 10:45     ` Mohamed Abuelfotoh, Hazem

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aFDUlnKVHLJC6VuE@fedora \
    --to=ming.lei@redhat.com \
    --cc=abuehaze@amazon.com \
    --cc=axboe@kernel.dk \
    --cc=csander@purestorage.com \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox