* [PATCH v3 0/2] nvme-rdma: parallelize I/O queue setup
@ 2026-06-25 21:27 Surabhi Gogte
2026-06-25 21:27 ` [PATCH v3 1/2] nvme-rdma: refactor nvme_rdma_alloc_queue() to take a queue pointer Surabhi Gogte
2026-06-25 21:27 ` [PATCH v3 2/2] nvme-rdma: parallelize I/O queue allocation and startup Surabhi Gogte
0 siblings, 2 replies; 5+ messages in thread
From: Surabhi Gogte @ 2026-06-25 21:27 UTC (permalink / raw)
To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg
Cc: linux-nvme, linux-kernel, mkhalfella, randyj, adailey,
Surabhi Gogte
This patch series parallelizes nvme-rdma connection and reconnection by
setting up I/O queues in parallel instead of sequentially. Allocation and
startup of each queue are combined into a single per-queue async work
item, so per-queue connection latency overlaps across all queues. This
matters most on high-core-count hosts with many I/O queues, where serial
setup dominates connect time.
Patch 1 is a preparatory refactor: nvme_rdma_alloc_queue() takes a queue
pointer so allocation and startup can be folded into a single async
worker.
Patch 2 contains the implementation for async setup of I/O queues.
Testing on a 64-core host with 64 I/O queues shows nvme-rdma connection
time reduced from ~1.4s to 416ms.
Signed-off-by: Surabhi Gogte <sgogte@purestorage.com>
---
Changes from v2->v3:
- Split the series into two patches: extract the nvme_rdma_alloc_queue()
refactor into a separate preparatory patch.
- Replace the atomic error flag in struct nvme_rdma_ctrl with a per-work
nvme_rdma_setup_ctx { queue, err } struct.
- Fix formatting changes regarding line overflow indentation and nesting.
Changes from v1->v2:
- Remove separate workqueue and use the async API instead.
Previous versions:
v1: https://lore.kernel.org/all/20260529001354.1003640-1-sgogte@purestorage.com/
v2: https://lore.kernel.org/all/20260604195321.2232838-1-sgogte@purestorage.com/
Surabhi Gogte (2):
nvme-rdma: refactor nvme_rdma_alloc_queue() to take a queue pointer
nvme-rdma: parallelize I/O queue allocation and startup
drivers/nvme/host/rdma.c | 135 ++++++++++++++++++++++++---------------
1 file changed, 82 insertions(+), 53 deletions(-)
--
2.54.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 1/2] nvme-rdma: refactor nvme_rdma_alloc_queue() to take a queue pointer
2026-06-25 21:27 [PATCH v3 0/2] nvme-rdma: parallelize I/O queue setup Surabhi Gogte
@ 2026-06-25 21:27 ` Surabhi Gogte
2026-06-26 7:34 ` Christoph Hellwig
2026-06-25 21:27 ` [PATCH v3 2/2] nvme-rdma: parallelize I/O queue allocation and startup Surabhi Gogte
1 sibling, 1 reply; 5+ messages in thread
From: Surabhi Gogte @ 2026-06-25 21:27 UTC (permalink / raw)
To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg
Cc: linux-nvme, linux-kernel, mkhalfella, randyj, adailey,
Surabhi Gogte
Callers are responsible for initializing queue->ctrl and queue->queue_size
before calling nvme_rdma_alloc_queue(), which now derives ctrl and idx
from the queue pointer directly. This removes redundant assignments inside
the function and simplifies the interface.
Signed-off-by: Surabhi Gogte <sgogte@purestorage.com>
---
drivers/nvme/host/rdma.c | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 6909e3542794..6b0b0a3dea62 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -566,16 +566,14 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue)
return ret;
}
-static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
- int idx, size_t queue_size)
+static int nvme_rdma_alloc_queue(struct nvme_rdma_queue *queue)
{
- struct nvme_rdma_queue *queue;
+ struct nvme_rdma_ctrl *ctrl = queue->ctrl;
+ int idx = nvme_rdma_queue_idx(queue);
struct sockaddr *src_addr = NULL;
int ret;
- queue = &ctrl->queues[idx];
mutex_init(&queue->queue_lock);
- queue->ctrl = ctrl;
if (idx && ctrl->ctrl.max_integrity_segments)
queue->pi_support = true;
else
@@ -587,8 +585,6 @@ static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
else
queue->cmnd_capsule_len = sizeof(struct nvme_command);
- queue->queue_size = queue_size;
-
queue->cm_id = rdma_create_id(&init_net, nvme_rdma_cm_handler, queue,
RDMA_PS_TCP, IB_QPT_RC);
if (IS_ERR(queue->cm_id)) {
@@ -736,8 +732,9 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
nvmf_set_io_queues(opts, nr_io_queues, ctrl->io_queues);
for (i = 1; i < ctrl->ctrl.queue_count; i++) {
- ret = nvme_rdma_alloc_queue(ctrl, i,
- ctrl->ctrl.sqsize + 1);
+ ctrl->queues[i].ctrl = ctrl;
+ ctrl->queues[i].queue_size = ctrl->ctrl.sqsize + 1;
+ ret = nvme_rdma_alloc_queue(&ctrl->queues[i]);
if (ret)
goto out_free_queues;
}
@@ -783,7 +780,9 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
bool pi_capable = false;
int error;
- error = nvme_rdma_alloc_queue(ctrl, 0, NVME_AQ_DEPTH);
+ ctrl->queues[0].ctrl = ctrl;
+ ctrl->queues[0].queue_size = NVME_AQ_DEPTH;
+ error = nvme_rdma_alloc_queue(&ctrl->queues[0]);
if (error)
return error;
--
2.54.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v3 2/2] nvme-rdma: parallelize I/O queue allocation and startup
2026-06-25 21:27 [PATCH v3 0/2] nvme-rdma: parallelize I/O queue setup Surabhi Gogte
2026-06-25 21:27 ` [PATCH v3 1/2] nvme-rdma: refactor nvme_rdma_alloc_queue() to take a queue pointer Surabhi Gogte
@ 2026-06-25 21:27 ` Surabhi Gogte
2026-06-26 7:40 ` Christoph Hellwig
1 sibling, 1 reply; 5+ messages in thread
From: Surabhi Gogte @ 2026-06-25 21:27 UTC (permalink / raw)
To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg
Cc: linux-nvme, linux-kernel, mkhalfella, randyj, adailey,
Surabhi Gogte
Refactor nvme rdma I/O queue setup to use async API, combining
allocation and startup into a single parallel operation per queue. This
reduces connection and reconnection setup time when there are delays in
establishing connections, which is especially important for
high-core-count hosts.
Key changes:
- Use async API to facilitate parallel calls for io queue setup.
- Add nvme_rdma_setup_ctx for propagating errors from async workers.
- Remove nvme_rdma_alloc_io_queues() and nvme_rdma_start_io_queues();
their logic is folded into nvme_rdma_setup_io_queues() and
nvme_rdma_configure_io_queues().
- Move queue count negotiation (nvme_set_queue_count,
nvmf_set_io_queues) from the removed nvme_rdma_alloc_io_queues()
into nvme_rdma_configure_io_queues().
Testing on a 64-core host with 64 IO-queues shows
nvme-rdma connection time reduced from ~1.4s to 416ms.
Signed-off-by: Surabhi Gogte <sgogte@purestorage.com>
---
drivers/nvme/host/rdma.c | 124 ++++++++++++++++++++++++---------------
1 file changed, 77 insertions(+), 47 deletions(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 6b0b0a3dea62..45d0ef8c4dd3 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -16,6 +16,7 @@
#include <linux/types.h>
#include <linux/list.h>
#include <linux/mutex.h>
+#include <linux/async.h>
#include <linux/scatterlist.h>
#include <linux/nvme.h>
#include <linux/unaligned.h>
@@ -100,6 +101,11 @@ struct nvme_rdma_queue {
struct mutex queue_lock;
};
+struct nvme_rdma_setup_ctx {
+ struct nvme_rdma_queue *queue;
+ int *err;
+};
+
struct nvme_rdma_ctrl {
/* read only in the hot path */
struct nvme_rdma_queue *queues;
@@ -690,60 +696,68 @@ static int nvme_rdma_start_queue(struct nvme_rdma_ctrl *ctrl, int idx)
return ret;
}
-static int nvme_rdma_start_io_queues(struct nvme_rdma_ctrl *ctrl,
- int first, int last)
+static void nvme_rdma_setup_queue_async(void *data, async_cookie_t cookie)
{
- int i, ret = 0;
+ struct nvme_rdma_setup_ctx *ctx = data;
+ struct nvme_rdma_queue *queue;
+ int ret;
- for (i = first; i < last; i++) {
- ret = nvme_rdma_start_queue(ctrl, i);
- if (ret)
- goto out_stop_queues;
- }
+ queue = ctx->queue;
+ ret = nvme_rdma_alloc_queue(queue);
+ if (ret)
+ goto out_err;
- return 0;
+ ret = nvme_rdma_start_queue(queue->ctrl, nvme_rdma_queue_idx(queue));
+ if (ret)
+ goto out_err;
-out_stop_queues:
- for (i--; i >= first; i--)
- nvme_rdma_stop_queue(&ctrl->queues[i]);
- return ret;
+ return;
+out_err:
+ WRITE_ONCE(*ctx->err, ret);
}
-static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
+static int nvme_rdma_setup_io_queues(struct nvme_rdma_ctrl *ctrl, unsigned int first,
+ unsigned int last, size_t queue_size)
{
- struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
- unsigned int nr_io_queues;
- int i, ret;
-
- nr_io_queues = nvmf_nr_io_queues(opts);
- ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
- if (ret)
- return ret;
+ ASYNC_DOMAIN_EXCLUSIVE(queue_domain);
+ struct nvme_rdma_setup_ctx *ctxs;
+ int nr_queues = last - first;
+ int err = 0, i, ret;
- if (nr_io_queues == 0) {
- dev_err(ctrl->ctrl.device,
- "unable to set any I/O queues\n");
+ ctxs = kmalloc_array(nr_queues, sizeof(*ctxs), GFP_KERNEL);
+ if (!ctxs)
return -ENOMEM;
- }
- ctrl->ctrl.queue_count = nr_io_queues + 1;
- dev_info(ctrl->ctrl.device,
- "creating %d I/O queues.\n", nr_io_queues);
-
- nvmf_set_io_queues(opts, nr_io_queues, ctrl->io_queues);
- for (i = 1; i < ctrl->ctrl.queue_count; i++) {
- ctrl->queues[i].ctrl = ctrl;
- ctrl->queues[i].queue_size = ctrl->ctrl.sqsize + 1;
- ret = nvme_rdma_alloc_queue(&ctrl->queues[i]);
- if (ret)
- goto out_free_queues;
+ for (i = 0; i < nr_queues; i++) {
+ struct nvme_rdma_queue *queue = &ctrl->queues[first + i];
+
+ queue->ctrl = ctrl;
+ queue->queue_size = queue_size;
+
+ ctxs[i].queue = queue;
+ ctxs[i].err = &err;
+ async_schedule_domain(nvme_rdma_setup_queue_async, &ctxs[i],
+ &queue_domain);
}
- return 0;
+ async_synchronize_full_domain(&queue_domain);
+ kfree(ctxs);
+ ret = READ_ONCE(err);
+ if (ret)
+ goto out_free_queues;
+
+ return 0;
out_free_queues:
- for (i--; i >= 1; i--)
- nvme_rdma_free_queue(&ctrl->queues[i]);
+ for (i = 0; i < nr_queues; i++) {
+ struct nvme_rdma_queue *queue =
+ &ctrl->queues[first + i];
+
+ if (test_bit(NVME_RDMA_Q_LIVE, &queue->flags))
+ nvme_rdma_stop_queue(queue);
+ if (test_bit(NVME_RDMA_Q_ALLOCATED, &queue->flags))
+ nvme_rdma_free_queue(queue);
+ }
return ret;
}
@@ -862,12 +876,23 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
{
+ unsigned int nr_io_queues;
int ret, nr_queues;
- ret = nvme_rdma_alloc_io_queues(ctrl);
+ nr_io_queues = nvmf_nr_io_queues(ctrl->ctrl.opts);
+ ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
if (ret)
return ret;
+ if (nr_io_queues == 0) {
+ dev_err(ctrl->ctrl.device, "unable to set any I/O queues\n");
+ return -ENOMEM;
+ }
+
+ ctrl->ctrl.queue_count = nr_io_queues + 1;
+ dev_info(ctrl->ctrl.device, "creating %d I/O queues.\n", nr_io_queues);
+ nvmf_set_io_queues(ctrl->ctrl.opts, nr_io_queues, ctrl->io_queues);
+
if (new) {
ret = nvme_rdma_alloc_tag_set(&ctrl->ctrl);
if (ret)
@@ -880,7 +905,9 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
* queue number might have changed.
*/
nr_queues = min(ctrl->tag_set.nr_hw_queues + 1, ctrl->ctrl.queue_count);
- ret = nvme_rdma_start_io_queues(ctrl, 1, nr_queues);
+ ret = nvme_rdma_setup_io_queues(ctrl, 1, nr_queues,
+ ctrl->ctrl.sqsize + 1);
+
if (ret)
goto out_cleanup_tagset;
@@ -904,12 +931,15 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
/*
* If the number of queues has increased (reconnect case)
- * start all new queues now.
+ * setup all new queues now.
*/
- ret = nvme_rdma_start_io_queues(ctrl, nr_queues,
- ctrl->tag_set.nr_hw_queues + 1);
- if (ret)
- goto out_wait_freeze_timed_out;
+ if (ctrl->tag_set.nr_hw_queues + 1 > nr_queues) {
+ ret = nvme_rdma_setup_io_queues(ctrl, nr_queues,
+ ctrl->tag_set.nr_hw_queues + 1,
+ ctrl->ctrl.sqsize + 1);
+ if (ret)
+ goto out_wait_freeze_timed_out;
+ }
return 0;
--
2.54.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v3 1/2] nvme-rdma: refactor nvme_rdma_alloc_queue() to take a queue pointer
2026-06-25 21:27 ` [PATCH v3 1/2] nvme-rdma: refactor nvme_rdma_alloc_queue() to take a queue pointer Surabhi Gogte
@ 2026-06-26 7:34 ` Christoph Hellwig
0 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2026-06-26 7:34 UTC (permalink / raw)
To: Surabhi Gogte
Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
linux-nvme, linux-kernel, mkhalfella, randyj, adailey
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3 2/2] nvme-rdma: parallelize I/O queue allocation and startup
2026-06-25 21:27 ` [PATCH v3 2/2] nvme-rdma: parallelize I/O queue allocation and startup Surabhi Gogte
@ 2026-06-26 7:40 ` Christoph Hellwig
0 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2026-06-26 7:40 UTC (permalink / raw)
To: Surabhi Gogte
Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
linux-nvme, linux-kernel, mkhalfella, randyj, adailey
On Thu, Jun 25, 2026 at 03:27:22PM -0600, Surabhi Gogte wrote:
> -static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
> +static int nvme_rdma_setup_io_queues(struct nvme_rdma_ctrl *ctrl, unsigned int first,
Overly long line.
> + unsigned int last, size_t queue_size)
> {
> + ASYNC_DOMAIN_EXCLUSIVE(queue_domain);
> + struct nvme_rdma_setup_ctx *ctxs;
> + int nr_queues = last - first;
> + int err = 0, i, ret;
>
> + ctxs = kmalloc_array(nr_queues, sizeof(*ctxs), GFP_KERNEL);
This should use kmalloc_objs in the brave new world.
> + if (!ctxs)
> return -ENOMEM;
>
> + for (i = 0; i < nr_queues; i++) {
> + struct nvme_rdma_queue *queue = &ctrl->queues[first + i];
> +
> + queue->ctrl = ctrl;
> + queue->queue_size = queue_size;
> +
> + ctxs[i].queue = queue;
> + ctxs[i].err = &err;
> + async_schedule_domain(nvme_rdma_setup_queue_async, &ctxs[i],
> + &queue_domain);
> }
>
> + async_synchronize_full_domain(&queue_domain);
> + kfree(ctxs);
It would nice if the async domain had a way to do the error propagation.
Well, that would be a nice follow on if you're interested.
Talking about follow ons: do you plan to do a similar change
to nvme-tcp? It would be great to keep the setup path for
both in sync.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-26 7:40 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-25 21:27 [PATCH v3 0/2] nvme-rdma: parallelize I/O queue setup Surabhi Gogte
2026-06-25 21:27 ` [PATCH v3 1/2] nvme-rdma: refactor nvme_rdma_alloc_queue() to take a queue pointer Surabhi Gogte
2026-06-26 7:34 ` Christoph Hellwig
2026-06-25 21:27 ` [PATCH v3 2/2] nvme-rdma: parallelize I/O queue allocation and startup Surabhi Gogte
2026-06-26 7:40 ` Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.