From: Ming Lei <ming.lei@redhat.com>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org,
Uday Shankar <ushankar@purestorage.com>
Subject: Re: [PATCH V3 2/5] ublk: implement NUMA-aware memory allocation
Date: Thu, 30 Oct 2025 06:53:44 +0800 [thread overview]
Message-ID: <aQKa-Jjg-Gr2DeXe@fedora> (raw)
In-Reply-To: <CADUfDZpBhBO8cppa2phmhkaCSJW1Yzk1aLzoF4zH3Cgu+D9Pcg@mail.gmail.com>
On Wed, Oct 29, 2025 at 09:00:12AM -0700, Caleb Sander Mateos wrote:
> On Tue, Oct 28, 2025 at 8:11 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > Implement NUMA-friendly memory allocation for ublk driver to improve
> > performance on multi-socket systems.
> >
> > This commit includes the following changes:
> >
> > 1. Convert struct ublk_device to use a flexible array member for the
> > queues field instead of a separate pointer array allocation. This
> > eliminates one level of indirection and simplifies memory management.
> > The queues array is now allocated as part of struct ublk_device using
> > struct_size().
>
> Technically it ends up being the same number of indirections as
> before, since changing queues from a single allocation to an array of
> separate allocations adds another indirection.
I think it is fine, because the pre-condition is NUMA aware allocation for
ublk_queue.
>
> >
> > 2. Rename __queues to queues, dropping the __ prefix since the field is
> > now accessed directly throughout the codebase rather than only through
> > the ublk_get_queue() helper.
> >
> > 3. Remove the queue_size field from struct ublk_device as it is no longer
> > needed.
> >
> > 4. Move queue allocation and deallocation into ublk_init_queue() and
> > ublk_deinit_queue() respectively, improving encapsulation. This
> > simplifies ublk_init_queues() and ublk_deinit_queues() to just
> > iterate and call the per-queue functions.
> >
> > 5. Add ublk_get_queue_numa_node() helper function to determine the
> > appropriate NUMA node for a queue by finding the first CPU mapped
> > to that queue via tag_set.map[HCTX_TYPE_DEFAULT].mq_map[] and
> > converting it to a NUMA node using cpu_to_node(). This function is
> > called internally by ublk_init_queue() to determine the allocation
> > node.
> >
> > 6. Allocate each queue structure on its local NUMA node using
> > kvzalloc_node() in ublk_init_queue().
> >
> > 7. Allocate the I/O command buffer on the same NUMA node using
> > alloc_pages_node().
> >
> > This reduces memory access latency on multi-socket NUMA systems by
> > ensuring each queue's data structures are local to the CPUs that
> > access them.
> >
> > Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> > drivers/block/ublk_drv.c | 84 +++++++++++++++++++++++++---------------
> > 1 file changed, 53 insertions(+), 31 deletions(-)
> >
> > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > index 2569566bf5e6..ed77b4527b33 100644
> > --- a/drivers/block/ublk_drv.c
> > +++ b/drivers/block/ublk_drv.c
> > @@ -209,9 +209,6 @@ struct ublk_queue {
> > struct ublk_device {
> > struct gendisk *ub_disk;
> >
> > - char *__queues;
> > -
> > - unsigned int queue_size;
> > struct ublksrv_ctrl_dev_info dev_info;
> >
> > struct blk_mq_tag_set tag_set;
> > @@ -239,6 +236,8 @@ struct ublk_device {
> > bool canceling;
> > pid_t ublksrv_tgid;
> > struct delayed_work exit_work;
> > +
> > + struct ublk_queue *queues[] __counted_by(dev_info.nr_hw_queues);
> > };
> >
> > /* header of ublk_params */
> > @@ -781,7 +780,7 @@ static noinline void ublk_put_device(struct ublk_device *ub)
> > static inline struct ublk_queue *ublk_get_queue(struct ublk_device *dev,
> > int qid)
> > {
> > - return (struct ublk_queue *)&(dev->__queues[qid * dev->queue_size]);
> > + return dev->queues[qid];
> > }
> >
> > static inline bool ublk_rq_has_data(const struct request *rq)
> > @@ -2662,9 +2661,13 @@ static const struct file_operations ublk_ch_fops = {
> >
> > static void ublk_deinit_queue(struct ublk_device *ub, int q_id)
> > {
> > - int size = ublk_queue_cmd_buf_size(ub);
> > - struct ublk_queue *ubq = ublk_get_queue(ub, q_id);
> > - int i;
> > + struct ublk_queue *ubq = ub->queues[q_id];
> > + int size, i;
> > +
> > + if (!ubq)
> > + return;
> > +
> > + size = ublk_queue_cmd_buf_size(ub);
> >
> > for (i = 0; i < ubq->q_depth; i++) {
> > struct ublk_io *io = &ubq->ios[i];
> > @@ -2676,57 +2679,76 @@ static void ublk_deinit_queue(struct ublk_device *ub, int q_id)
> >
> > if (ubq->io_cmd_buf)
> > free_pages((unsigned long)ubq->io_cmd_buf, get_order(size));
> > +
> > + kvfree(ubq);
> > + ub->queues[q_id] = NULL;
> > +}
> > +
> > +static int ublk_get_queue_numa_node(struct ublk_device *ub, int q_id)
> > +{
> > + unsigned int cpu;
> > +
> > + /* Find first CPU mapped to this queue */
> > + for_each_possible_cpu(cpu) {
> > + if (ub->tag_set.map[HCTX_TYPE_DEFAULT].mq_map[cpu] == q_id)
> > + return cpu_to_node(cpu);
> > + }
>
> I think you could avoid this quadratic lookup by using blk_mq_hw_ctx's
> numa_node field. The initialization code would probably have to move
> to ublk_init_hctx() in order to have access to the blk_mq_hw_ctx. But
> may not be worth the effort just to save some time at ublk creation
> time. What you have seems fine.
It isn't doable and not necessary.
disk/hw queues are created & initialized when handling UBLK_CMD_START_DEV, but the
backed ublk_queue need to be allocated when handling UBLK_CMD_ADD_DEV, which happens
before dealing with UBLK_CMD_START_DEV.
Thanks,
Ming
next prev parent reply other threads:[~2025-10-29 22:53 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-29 3:10 [PATCH V3 0/5] ublk: NUMA-aware memory allocation Ming Lei
2025-10-29 3:10 ` [PATCH V3 1/5] ublk: reorder tag_set initialization before queue allocation Ming Lei
2025-10-29 3:10 ` [PATCH V3 2/5] ublk: implement NUMA-aware memory allocation Ming Lei
2025-10-29 16:00 ` Caleb Sander Mateos
2025-10-29 22:53 ` Ming Lei [this message]
2025-10-30 4:04 ` kernel test robot
2025-10-30 8:00 ` kernel test robot
2025-10-30 14:07 ` Caleb Sander Mateos
2025-10-30 17:56 ` Nathan Chancellor
2025-10-31 3:28 ` Ming Lei
2025-10-29 3:10 ` [PATCH V3 3/5] ublk: use struct_size() for allocation Ming Lei
2025-10-29 16:00 ` Caleb Sander Mateos
2025-10-29 3:10 ` [PATCH V3 4/5] selftests: ublk: set CPU affinity before thread initialization Ming Lei
2025-10-29 3:10 ` [PATCH V3 5/5] selftests: ublk: make ublk_thread thread-local variable Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aQKa-Jjg-Gr2DeXe@fedora \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=csander@purestorage.com \
--cc=linux-block@vger.kernel.org \
--cc=ushankar@purestorage.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox