All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org,
	Uday Shankar <ushankar@purestorage.com>
Subject: Re: [PATCH V3 2/5] ublk: implement NUMA-aware memory allocation
Date: Thu, 30 Oct 2025 06:53:44 +0800	[thread overview]
Message-ID: <aQKa-Jjg-Gr2DeXe@fedora> (raw)
In-Reply-To: <CADUfDZpBhBO8cppa2phmhkaCSJW1Yzk1aLzoF4zH3Cgu+D9Pcg@mail.gmail.com>

On Wed, Oct 29, 2025 at 09:00:12AM -0700, Caleb Sander Mateos wrote:
> On Tue, Oct 28, 2025 at 8:11 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > Implement NUMA-friendly memory allocation for ublk driver to improve
> > performance on multi-socket systems.
> >
> > This commit includes the following changes:
> >
> > 1. Convert struct ublk_device to use a flexible array member for the
> >    queues field instead of a separate pointer array allocation. This
> >    eliminates one level of indirection and simplifies memory management.
> >    The queues array is now allocated as part of struct ublk_device using
> >    struct_size().
> 
> Technically it ends up being the same number of indirections as
> before, since changing queues from a single allocation to an array of
> separate allocations adds another indirection.

I think it is fine, because the pre-condition is NUMA aware allocation for
ublk_queue.

> 
> >
> > 2. Rename __queues to queues, dropping the __ prefix since the field is
> >    now accessed directly throughout the codebase rather than only through
> >    the ublk_get_queue() helper.
> >
> > 3. Remove the queue_size field from struct ublk_device as it is no longer
> >    needed.
> >
> > 4. Move queue allocation and deallocation into ublk_init_queue() and
> >    ublk_deinit_queue() respectively, improving encapsulation. This
> >    simplifies ublk_init_queues() and ublk_deinit_queues() to just
> >    iterate and call the per-queue functions.
> >
> > 5. Add ublk_get_queue_numa_node() helper function to determine the
> >    appropriate NUMA node for a queue by finding the first CPU mapped
> >    to that queue via tag_set.map[HCTX_TYPE_DEFAULT].mq_map[] and
> >    converting it to a NUMA node using cpu_to_node(). This function is
> >    called internally by ublk_init_queue() to determine the allocation
> >    node.
> >
> > 6. Allocate each queue structure on its local NUMA node using
> >    kvzalloc_node() in ublk_init_queue().
> >
> > 7. Allocate the I/O command buffer on the same NUMA node using
> >    alloc_pages_node().
> >
> > This reduces memory access latency on multi-socket NUMA systems by
> > ensuring each queue's data structures are local to the CPUs that
> > access them.
> >
> > Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  drivers/block/ublk_drv.c | 84 +++++++++++++++++++++++++---------------
> >  1 file changed, 53 insertions(+), 31 deletions(-)
> >
> > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > index 2569566bf5e6..ed77b4527b33 100644
> > --- a/drivers/block/ublk_drv.c
> > +++ b/drivers/block/ublk_drv.c
> > @@ -209,9 +209,6 @@ struct ublk_queue {
> >  struct ublk_device {
> >         struct gendisk          *ub_disk;
> >
> > -       char    *__queues;
> > -
> > -       unsigned int    queue_size;
> >         struct ublksrv_ctrl_dev_info    dev_info;
> >
> >         struct blk_mq_tag_set   tag_set;
> > @@ -239,6 +236,8 @@ struct ublk_device {
> >         bool canceling;
> >         pid_t   ublksrv_tgid;
> >         struct delayed_work     exit_work;
> > +
> > +       struct ublk_queue       *queues[] __counted_by(dev_info.nr_hw_queues);
> >  };
> >
> >  /* header of ublk_params */
> > @@ -781,7 +780,7 @@ static noinline void ublk_put_device(struct ublk_device *ub)
> >  static inline struct ublk_queue *ublk_get_queue(struct ublk_device *dev,
> >                 int qid)
> >  {
> > -       return (struct ublk_queue *)&(dev->__queues[qid * dev->queue_size]);
> > +       return dev->queues[qid];
> >  }
> >
> >  static inline bool ublk_rq_has_data(const struct request *rq)
> > @@ -2662,9 +2661,13 @@ static const struct file_operations ublk_ch_fops = {
> >
> >  static void ublk_deinit_queue(struct ublk_device *ub, int q_id)
> >  {
> > -       int size = ublk_queue_cmd_buf_size(ub);
> > -       struct ublk_queue *ubq = ublk_get_queue(ub, q_id);
> > -       int i;
> > +       struct ublk_queue *ubq = ub->queues[q_id];
> > +       int size, i;
> > +
> > +       if (!ubq)
> > +               return;
> > +
> > +       size = ublk_queue_cmd_buf_size(ub);
> >
> >         for (i = 0; i < ubq->q_depth; i++) {
> >                 struct ublk_io *io = &ubq->ios[i];
> > @@ -2676,57 +2679,76 @@ static void ublk_deinit_queue(struct ublk_device *ub, int q_id)
> >
> >         if (ubq->io_cmd_buf)
> >                 free_pages((unsigned long)ubq->io_cmd_buf, get_order(size));
> > +
> > +       kvfree(ubq);
> > +       ub->queues[q_id] = NULL;
> > +}
> > +
> > +static int ublk_get_queue_numa_node(struct ublk_device *ub, int q_id)
> > +{
> > +       unsigned int cpu;
> > +
> > +       /* Find first CPU mapped to this queue */
> > +       for_each_possible_cpu(cpu) {
> > +               if (ub->tag_set.map[HCTX_TYPE_DEFAULT].mq_map[cpu] == q_id)
> > +                       return cpu_to_node(cpu);
> > +       }
> 
> I think you could avoid this quadratic lookup by using blk_mq_hw_ctx's
> numa_node field. The initialization code would probably have to move
> to ublk_init_hctx() in order to have access to the blk_mq_hw_ctx. But
> may not be worth the effort just to save some time at ublk creation
> time. What you have seems fine.

It isn't doable and not necessary.

disk/hw queues are created & initialized when handling UBLK_CMD_START_DEV, but the
backed ublk_queue need to be allocated when handling UBLK_CMD_ADD_DEV, which happens
before dealing with UBLK_CMD_START_DEV.


Thanks,
Ming


  reply	other threads:[~2025-10-29 22:53 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-29  3:10 [PATCH V3 0/5] ublk: NUMA-aware memory allocation Ming Lei
2025-10-29  3:10 ` [PATCH V3 1/5] ublk: reorder tag_set initialization before queue allocation Ming Lei
2025-10-29  3:10 ` [PATCH V3 2/5] ublk: implement NUMA-aware memory allocation Ming Lei
2025-10-29 16:00   ` Caleb Sander Mateos
2025-10-29 22:53     ` Ming Lei [this message]
2025-10-30  4:04   ` kernel test robot
2025-10-30  8:00   ` kernel test robot
2025-10-30 14:07     ` Caleb Sander Mateos
2025-10-30 17:56       ` Nathan Chancellor
2025-10-31  3:28         ` Ming Lei
2025-10-29  3:10 ` [PATCH V3 3/5] ublk: use struct_size() for allocation Ming Lei
2025-10-29 16:00   ` Caleb Sander Mateos
2025-10-29  3:10 ` [PATCH V3 4/5] selftests: ublk: set CPU affinity before thread initialization Ming Lei
2025-10-29  3:10 ` [PATCH V3 5/5] selftests: ublk: make ublk_thread thread-local variable Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQKa-Jjg-Gr2DeXe@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=csander@purestorage.com \
    --cc=linux-block@vger.kernel.org \
    --cc=ushankar@purestorage.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.