public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org,
	Uday Shankar <ushankar@purestorage.com>
Subject: Re: [PATCH V3 2/5] ublk: implement NUMA-aware memory allocation
Date: Thu, 30 Oct 2025 06:53:44 +0800	[thread overview]
Message-ID: <aQKa-Jjg-Gr2DeXe@fedora> (raw)
In-Reply-To: <CADUfDZpBhBO8cppa2phmhkaCSJW1Yzk1aLzoF4zH3Cgu+D9Pcg@mail.gmail.com>

On Wed, Oct 29, 2025 at 09:00:12AM -0700, Caleb Sander Mateos wrote:
> On Tue, Oct 28, 2025 at 8:11 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > Implement NUMA-friendly memory allocation for ublk driver to improve
> > performance on multi-socket systems.
> >
> > This commit includes the following changes:
> >
> > 1. Convert struct ublk_device to use a flexible array member for the
> >    queues field instead of a separate pointer array allocation. This
> >    eliminates one level of indirection and simplifies memory management.
> >    The queues array is now allocated as part of struct ublk_device using
> >    struct_size().
> 
> Technically it ends up being the same number of indirections as
> before, since changing queues from a single allocation to an array of
> separate allocations adds another indirection.

I think it is fine, because the pre-condition is NUMA aware allocation for
ublk_queue.

> 
> >
> > 2. Rename __queues to queues, dropping the __ prefix since the field is
> >    now accessed directly throughout the codebase rather than only through
> >    the ublk_get_queue() helper.
> >
> > 3. Remove the queue_size field from struct ublk_device as it is no longer
> >    needed.
> >
> > 4. Move queue allocation and deallocation into ublk_init_queue() and
> >    ublk_deinit_queue() respectively, improving encapsulation. This
> >    simplifies ublk_init_queues() and ublk_deinit_queues() to just
> >    iterate and call the per-queue functions.
> >
> > 5. Add ublk_get_queue_numa_node() helper function to determine the
> >    appropriate NUMA node for a queue by finding the first CPU mapped
> >    to that queue via tag_set.map[HCTX_TYPE_DEFAULT].mq_map[] and
> >    converting it to a NUMA node using cpu_to_node(). This function is
> >    called internally by ublk_init_queue() to determine the allocation
> >    node.
> >
> > 6. Allocate each queue structure on its local NUMA node using
> >    kvzalloc_node() in ublk_init_queue().
> >
> > 7. Allocate the I/O command buffer on the same NUMA node using
> >    alloc_pages_node().
> >
> > This reduces memory access latency on multi-socket NUMA systems by
> > ensuring each queue's data structures are local to the CPUs that
> > access them.
> >
> > Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  drivers/block/ublk_drv.c | 84 +++++++++++++++++++++++++---------------
> >  1 file changed, 53 insertions(+), 31 deletions(-)
> >
> > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > index 2569566bf5e6..ed77b4527b33 100644
> > --- a/drivers/block/ublk_drv.c
> > +++ b/drivers/block/ublk_drv.c
> > @@ -209,9 +209,6 @@ struct ublk_queue {
> >  struct ublk_device {
> >         struct gendisk          *ub_disk;
> >
> > -       char    *__queues;
> > -
> > -       unsigned int    queue_size;
> >         struct ublksrv_ctrl_dev_info    dev_info;
> >
> >         struct blk_mq_tag_set   tag_set;
> > @@ -239,6 +236,8 @@ struct ublk_device {
> >         bool canceling;
> >         pid_t   ublksrv_tgid;
> >         struct delayed_work     exit_work;
> > +
> > +       struct ublk_queue       *queues[] __counted_by(dev_info.nr_hw_queues);
> >  };
> >
> >  /* header of ublk_params */
> > @@ -781,7 +780,7 @@ static noinline void ublk_put_device(struct ublk_device *ub)
> >  static inline struct ublk_queue *ublk_get_queue(struct ublk_device *dev,
> >                 int qid)
> >  {
> > -       return (struct ublk_queue *)&(dev->__queues[qid * dev->queue_size]);
> > +       return dev->queues[qid];
> >  }
> >
> >  static inline bool ublk_rq_has_data(const struct request *rq)
> > @@ -2662,9 +2661,13 @@ static const struct file_operations ublk_ch_fops = {
> >
> >  static void ublk_deinit_queue(struct ublk_device *ub, int q_id)
> >  {
> > -       int size = ublk_queue_cmd_buf_size(ub);
> > -       struct ublk_queue *ubq = ublk_get_queue(ub, q_id);
> > -       int i;
> > +       struct ublk_queue *ubq = ub->queues[q_id];
> > +       int size, i;
> > +
> > +       if (!ubq)
> > +               return;
> > +
> > +       size = ublk_queue_cmd_buf_size(ub);
> >
> >         for (i = 0; i < ubq->q_depth; i++) {
> >                 struct ublk_io *io = &ubq->ios[i];
> > @@ -2676,57 +2679,76 @@ static void ublk_deinit_queue(struct ublk_device *ub, int q_id)
> >
> >         if (ubq->io_cmd_buf)
> >                 free_pages((unsigned long)ubq->io_cmd_buf, get_order(size));
> > +
> > +       kvfree(ubq);
> > +       ub->queues[q_id] = NULL;
> > +}
> > +
> > +static int ublk_get_queue_numa_node(struct ublk_device *ub, int q_id)
> > +{
> > +       unsigned int cpu;
> > +
> > +       /* Find first CPU mapped to this queue */
> > +       for_each_possible_cpu(cpu) {
> > +               if (ub->tag_set.map[HCTX_TYPE_DEFAULT].mq_map[cpu] == q_id)
> > +                       return cpu_to_node(cpu);
> > +       }
> 
> I think you could avoid this quadratic lookup by using blk_mq_hw_ctx's
> numa_node field. The initialization code would probably have to move
> to ublk_init_hctx() in order to have access to the blk_mq_hw_ctx. But
> may not be worth the effort just to save some time at ublk creation
> time. What you have seems fine.

It isn't doable and not necessary.

disk/hw queues are created & initialized when handling UBLK_CMD_START_DEV, but the
backed ublk_queue need to be allocated when handling UBLK_CMD_ADD_DEV, which happens
before dealing with UBLK_CMD_START_DEV.


Thanks,
Ming


  reply	other threads:[~2025-10-29 22:53 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-29  3:10 [PATCH V3 0/5] ublk: NUMA-aware memory allocation Ming Lei
2025-10-29  3:10 ` [PATCH V3 1/5] ublk: reorder tag_set initialization before queue allocation Ming Lei
2025-10-29  3:10 ` [PATCH V3 2/5] ublk: implement NUMA-aware memory allocation Ming Lei
2025-10-29 16:00   ` Caleb Sander Mateos
2025-10-29 22:53     ` Ming Lei [this message]
2025-10-30  4:04   ` kernel test robot
2025-10-30  8:00   ` kernel test robot
2025-10-30 14:07     ` Caleb Sander Mateos
2025-10-30 17:56       ` Nathan Chancellor
2025-10-31  3:28         ` Ming Lei
2025-10-29  3:10 ` [PATCH V3 3/5] ublk: use struct_size() for allocation Ming Lei
2025-10-29 16:00   ` Caleb Sander Mateos
2025-10-29  3:10 ` [PATCH V3 4/5] selftests: ublk: set CPU affinity before thread initialization Ming Lei
2025-10-29  3:10 ` [PATCH V3 5/5] selftests: ublk: make ublk_thread thread-local variable Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQKa-Jjg-Gr2DeXe@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=csander@purestorage.com \
    --cc=linux-block@vger.kernel.org \
    --cc=ushankar@purestorage.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox