* [PATCH v3 1/4] nvmet-rdma: +1 to *queue_size from hsqsize/hrqsize
2016-08-16 19:56 [PATCH v3 0/4] sqsize fixes Jay Freyensee
@ 2016-08-16 19:56 ` Jay Freyensee
2016-08-17 0:47 ` Christoph Hellwig
2016-08-16 19:56 ` [PATCH v3 2/4] fabrics: define admin sqsize min default, per spec Jay Freyensee
` (2 subsequent siblings)
3 siblings, 1 reply; 10+ messages in thread
From: Jay Freyensee @ 2016-08-16 19:56 UTC (permalink / raw)
The host will be sending sqsize 0-based values,
the target need to be adjusted as well.
Signed-off-by: Jay Freyensee <james_p_freyensee at linux.intel.com>
Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
---
drivers/nvme/target/rdma.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index e06d504..68b7b04 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -1004,11 +1004,11 @@ nvmet_rdma_parse_cm_connect_req(struct rdma_conn_param *conn,
queue->host_qid = le16_to_cpu(req->qid);
/*
- * req->hsqsize corresponds to our recv queue size
- * req->hrqsize corresponds to our send queue size
+ * req->hsqsize corresponds to our recv queue size plus 1
+ * req->hrqsize corresponds to our send queue size plus 1
*/
- queue->recv_queue_size = le16_to_cpu(req->hsqsize);
- queue->send_queue_size = le16_to_cpu(req->hrqsize);
+ queue->recv_queue_size = le16_to_cpu(req->hsqsize) + 1;
+ queue->send_queue_size = le16_to_cpu(req->hrqsize) + 1;
if (!queue->host_qid && queue->recv_queue_size > NVMF_AQ_DEPTH)
return NVME_RDMA_CM_INVALID_HSQSIZE;
--
2.7.4
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH v3 1/4] nvmet-rdma: +1 to *queue_size from hsqsize/hrqsize
2016-08-16 19:56 ` [PATCH v3 1/4] nvmet-rdma: +1 to *queue_size from hsqsize/hrqsize Jay Freyensee
@ 2016-08-17 0:47 ` Christoph Hellwig
2016-08-17 14:49 ` Sagi Grimberg
2016-08-17 15:47 ` J Freyensee
0 siblings, 2 replies; 10+ messages in thread
From: Christoph Hellwig @ 2016-08-17 0:47 UTC (permalink / raw)
On Tue, Aug 16, 2016@12:56:52PM -0700, Jay Freyensee wrote:
> /*
> - * req->hsqsize corresponds to our recv queue size
> - * req->hrqsize corresponds to our send queue size
> + * req->hsqsize corresponds to our recv queue size plus 1
> + * req->hrqsize corresponds to our send queue size plus 1
> */
> - queue->recv_queue_size = le16_to_cpu(req->hsqsize);
> - queue->send_queue_size = le16_to_cpu(req->hrqsize);
> + queue->recv_queue_size = le16_to_cpu(req->hsqsize) + 1;
> + queue->send_queue_size = le16_to_cpu(req->hrqsize) + 1;
I brought this up on the nvme-technical list and the consensus is
that hrqsize doesn't use the one off notation. hsqsize refers to
the sqsize which is marked as "0's based", while hrqsize only
has a short and not very meaningful explanation, which implies that
it's '1's based' in NVMe terms (which, btw I think are utterly misleading).
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 1/4] nvmet-rdma: +1 to *queue_size from hsqsize/hrqsize
2016-08-17 0:47 ` Christoph Hellwig
@ 2016-08-17 14:49 ` Sagi Grimberg
2016-08-17 15:47 ` J Freyensee
1 sibling, 0 replies; 10+ messages in thread
From: Sagi Grimberg @ 2016-08-17 14:49 UTC (permalink / raw)
> On Tue, Aug 16, 2016@12:56:52PM -0700, Jay Freyensee wrote:
>> /*
>> - * req->hsqsize corresponds to our recv queue size
>> - * req->hrqsize corresponds to our send queue size
>> + * req->hsqsize corresponds to our recv queue size plus 1
>> + * req->hrqsize corresponds to our send queue size plus 1
>> */
>> - queue->recv_queue_size = le16_to_cpu(req->hsqsize);
>> - queue->send_queue_size = le16_to_cpu(req->hrqsize);
>> + queue->recv_queue_size = le16_to_cpu(req->hsqsize) + 1;
>> + queue->send_queue_size = le16_to_cpu(req->hrqsize) + 1;
>
> I brought this up on the nvme-technical list and the consensus is
> that hrqsize doesn't use the one off notation. hsqsize refers to
> the sqsize which is marked as "0's based", while hrqsize only
> has a short and not very meaningful explanation, which implies that
> it's '1's based' in NVMe terms (which, btw I think are utterly misleading).
Yea... the host needs to set hrqsize to opts->queue_size and the target
should take it as is...
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 1/4] nvmet-rdma: +1 to *queue_size from hsqsize/hrqsize
2016-08-17 0:47 ` Christoph Hellwig
2016-08-17 14:49 ` Sagi Grimberg
@ 2016-08-17 15:47 ` J Freyensee
2016-08-17 18:36 ` Sagi Grimberg
1 sibling, 1 reply; 10+ messages in thread
From: J Freyensee @ 2016-08-17 15:47 UTC (permalink / raw)
On Wed, 2016-08-17@02:47 +0200, Christoph Hellwig wrote:
> On Tue, Aug 16, 2016@12:56:52PM -0700, Jay Freyensee wrote:
> >
> > ? /*
> > - ?* req->hsqsize corresponds to our recv queue size
> > - ?* req->hrqsize corresponds to our send queue size
> > + ?* req->hsqsize corresponds to our recv queue size plus 1
> > + ?* req->hrqsize corresponds to our send queue size plus 1
> > ? ?*/
> > - queue->recv_queue_size = le16_to_cpu(req->hsqsize);
> > - queue->send_queue_size = le16_to_cpu(req->hrqsize);
> > + queue->recv_queue_size = le16_to_cpu(req->hsqsize) + 1;
> > + queue->send_queue_size = le16_to_cpu(req->hrqsize) + 1;
>
> I brought this up on the nvme-technical list and the consensus is
> that hrqsize doesn't use the one off notation.??hsqsize refers to
> the sqsize which is marked as "0's based", while hrqsize only
> has a short and not very meaningful explanation, which implies that
> it's '1's based' in NVMe terms (which, btw I think are utterly
> misleading).
OK, so what is the final verdict then? ?Resurrect patch five again and
resubmit the series? ?It makes hrqsize 1-based on both the host and
target.
http://lists.infradead.org/pipermail/linux-nvme/2016-August/005804.html
I interpret the 'explanation' it can be either solution and I don't see
an advantage for either so we should just pick one.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 1/4] nvmet-rdma: +1 to *queue_size from hsqsize/hrqsize
2016-08-17 15:47 ` J Freyensee
@ 2016-08-17 18:36 ` Sagi Grimberg
2016-08-17 18:51 ` J Freyensee
0 siblings, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2016-08-17 18:36 UTC (permalink / raw)
>>> /*
>>> - * req->hsqsize corresponds to our recv queue size
>>> - * req->hrqsize corresponds to our send queue size
>>> + * req->hsqsize corresponds to our recv queue size plus 1
>>> + * req->hrqsize corresponds to our send queue size plus 1
>>> */
>>> - queue->recv_queue_size = le16_to_cpu(req->hsqsize);
>>> - queue->send_queue_size = le16_to_cpu(req->hrqsize);
>>> + queue->recv_queue_size = le16_to_cpu(req->hsqsize) + 1;
>>> + queue->send_queue_size = le16_to_cpu(req->hrqsize) + 1;
>>
>> I brought this up on the nvme-technical list and the consensus is
>> that hrqsize doesn't use the one off notation. hsqsize refers to
>> the sqsize which is marked as "0's based", while hrqsize only
>> has a short and not very meaningful explanation, which implies that
>> it's '1's based' in NVMe terms (which, btw I think are utterly
>> misleading).
>
> OK, so what is the final verdict then? Resurrect patch five again and
> resubmit the series? It makes hrqsize 1-based on both the host and
> target.
> http://lists.infradead.org/pipermail/linux-nvme/2016-August/005804.html
I think you can just make the target take hrqsize as is in patch 1
and have the host send hrqsize = queue->queue_size.
One more respin should do it (sorry for the exhaustion ;))
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 1/4] nvmet-rdma: +1 to *queue_size from hsqsize/hrqsize
2016-08-17 18:36 ` Sagi Grimberg
@ 2016-08-17 18:51 ` J Freyensee
0 siblings, 0 replies; 10+ messages in thread
From: J Freyensee @ 2016-08-17 18:51 UTC (permalink / raw)
On Wed, 2016-08-17@21:36 +0300, Sagi Grimberg wrote:
> >
> > >
> > > >
> > > > ? /*
> > > > - ?* req->hsqsize corresponds to our recv queue size
> > > > - ?* req->hrqsize corresponds to our send queue size
> > > > + ?* req->hsqsize corresponds to our recv queue size
> > > > plus 1
> > > > + ?* req->hrqsize corresponds to our send queue size
> > > > plus 1
> > > > ? ?*/
> > > > - queue->recv_queue_size = le16_to_cpu(req->hsqsize);
> > > > - queue->send_queue_size = le16_to_cpu(req->hrqsize);
> > > > + queue->recv_queue_size = le16_to_cpu(req->hsqsize) +
> > > > 1;
> > > > + queue->send_queue_size = le16_to_cpu(req->hrqsize) +
> > > > 1;
> > >
> > > I brought this up on the nvme-technical list and the consensus is
> > > that hrqsize doesn't use the one off notation.??hsqsize refers to
> > > the sqsize which is marked as "0's based", while hrqsize only
> > > has a short and not very meaningful explanation, which implies
> > > that
> > > it's '1's based' in NVMe terms (which, btw I think are utterly
> > > misleading).
> >
> > OK, so what is the final verdict then???Resurrect patch five again
> > and
> > resubmit the series???It makes hrqsize 1-based on both the host and
> > target.
> > http://lists.infradead.org/pipermail/linux-nvme/2016-August/005804.
> > html
>
> I think you can just make the target take hrqsize as is in patch 1
> and have the host send hrqsize = queue->queue_size.
>
> One more respin should do it (sorry for the exhaustion ;))
No worries, doesn't help when the spec isn't clear on something :-).
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 2/4] fabrics: define admin sqsize min default, per spec
2016-08-16 19:56 [PATCH v3 0/4] sqsize fixes Jay Freyensee
2016-08-16 19:56 ` [PATCH v3 1/4] nvmet-rdma: +1 to *queue_size from hsqsize/hrqsize Jay Freyensee
@ 2016-08-16 19:56 ` Jay Freyensee
2016-08-16 19:56 ` [PATCH v3 3/4] nvme-rdma: fix sqsize/hsqsize " Jay Freyensee
2016-08-16 19:56 ` [PATCH v3 4/4] nvme-loop: set sqsize to 0-based value, " Jay Freyensee
3 siblings, 0 replies; 10+ messages in thread
From: Jay Freyensee @ 2016-08-16 19:56 UTC (permalink / raw)
Upon admin queue connect(), the rdma qp was being
set based on NVMF_AQ_DEPTH. However, the fabrics layer was
using the sqsize field value set for I/O queues for the admin
queue, which threw the nvme layer and rdma layer off-whack:
root at fedora23-fabrics-host1 nvmf]# dmesg
[ 3507.798642] nvme_fabrics: nvmf_connect_admin_queue():admin sqsize
being sent is: 128
[ 3507.798858] nvme nvme0: creating 16 I/O queues.
[ 3507.896407] nvme nvme0: new ctrl: NQN "nullside-nqn", addr
192.168.1.3:4420
Thus, to have a different admin queue value, we use
NVMF_AQ_DEPTH for connect() and RDMA private data
as the minimum depth specified in the NVMe-over-Fabrics 1.0 spec.
Reported-by: Daniel Verkamp <daniel.verkamp at intel.com>
Signed-off-by: Jay Freyensee <james_p_freyensee at linux.intel.com>
Reviewed-by: Daniel Verkamp <daniel.verkamp at intel.com>
---
drivers/nvme/host/fabrics.c | 9 ++++++++-
drivers/nvme/host/rdma.c | 13 +++++++++++--
2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index dc99676..020302c 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -363,7 +363,14 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
cmd.connect.opcode = nvme_fabrics_command;
cmd.connect.fctype = nvme_fabrics_type_connect;
cmd.connect.qid = 0;
- cmd.connect.sqsize = cpu_to_le16(ctrl->sqsize);
+
+ /*
+ * fabrics spec sets a minimum of depth 32 for admin queue,
+ * so set the queue with this depth always until
+ * justification otherwise.
+ */
+ cmd.connect.sqsize = cpu_to_le16(NVMF_AQ_DEPTH - 1);
+
/*
* Set keep-alive timeout in seconds granularity (ms * 1000)
* and add a grace period for controller kato enforcement
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 3e3ce2b..168cd23 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1284,8 +1284,17 @@ static int nvme_rdma_route_resolved(struct nvme_rdma_queue *queue)
priv.recfmt = cpu_to_le16(NVME_RDMA_CM_FMT_1_0);
priv.qid = cpu_to_le16(nvme_rdma_queue_idx(queue));
- priv.hrqsize = cpu_to_le16(queue->queue_size);
- priv.hsqsize = cpu_to_le16(queue->queue_size);
+ /*
+ * set the admin queue depth to the minimum size
+ * specified by the Fabrics standard.
+ */
+ if (priv.qid == 0) {
+ priv.hrqsize = cpu_to_le16(NVMF_AQ_DEPTH - 1);
+ priv.hsqsize = cpu_to_le16(NVMF_AQ_DEPTH - 1);
+ } else {
+ priv.hrqsize = cpu_to_le16(queue->queue_size);
+ priv.hsqsize = cpu_to_le16(queue->queue_size);
+ }
ret = rdma_connect(queue->cm_id, ¶m);
if (ret) {
--
2.7.4
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH v3 3/4] nvme-rdma: fix sqsize/hsqsize per spec
2016-08-16 19:56 [PATCH v3 0/4] sqsize fixes Jay Freyensee
2016-08-16 19:56 ` [PATCH v3 1/4] nvmet-rdma: +1 to *queue_size from hsqsize/hrqsize Jay Freyensee
2016-08-16 19:56 ` [PATCH v3 2/4] fabrics: define admin sqsize min default, per spec Jay Freyensee
@ 2016-08-16 19:56 ` Jay Freyensee
2016-08-16 19:56 ` [PATCH v3 4/4] nvme-loop: set sqsize to 0-based value, " Jay Freyensee
3 siblings, 0 replies; 10+ messages in thread
From: Jay Freyensee @ 2016-08-16 19:56 UTC (permalink / raw)
Per NVMe-over-Fabrics 1.0 spec, sqsize is represented as
a 0-based value.
Also per spec, the RDMA binding values shall be set
to sqsize, which makes hsqsize 0-based values.
Thus, the sqsize during NVMf connect() is now:
[root at fedora23-fabrics-host1 for-48]# dmesg
[ 318.720645] nvme_fabrics: nvmf_connect_admin_queue(): sqsize for
admin queue: 31
[ 318.720884] nvme nvme0: creating 16 I/O queues.
[ 318.810114] nvme_fabrics: nvmf_connect_io_queue(): sqsize for i/o
queue: 127
Reported-by: Daniel Verkamp <daniel.verkamp at intel.com>
Signed-off-by: Jay Freyensee <james_p_freyensee at linux.intel.com>
---
drivers/nvme/host/rdma.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 168cd23..3210228 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -649,7 +649,8 @@ static int nvme_rdma_init_io_queues(struct nvme_rdma_ctrl *ctrl)
int i, ret;
for (i = 1; i < ctrl->queue_count; i++) {
- ret = nvme_rdma_init_queue(ctrl, i, ctrl->ctrl.sqsize);
+ ret = nvme_rdma_init_queue(ctrl, i,
+ ctrl->ctrl.opts->queue_size);
if (ret) {
dev_info(ctrl->ctrl.device,
"failed to initialize i/o queue: %d\n", ret);
@@ -1292,8 +1293,8 @@ static int nvme_rdma_route_resolved(struct nvme_rdma_queue *queue)
priv.hrqsize = cpu_to_le16(NVMF_AQ_DEPTH - 1);
priv.hsqsize = cpu_to_le16(NVMF_AQ_DEPTH - 1);
} else {
- priv.hrqsize = cpu_to_le16(queue->queue_size);
- priv.hsqsize = cpu_to_le16(queue->queue_size);
+ priv.hrqsize = cpu_to_le16(queue->ctrl->ctrl.sqsize);
+ priv.hsqsize = cpu_to_le16(queue->ctrl->ctrl.sqsize);
}
ret = rdma_connect(queue->cm_id, ¶m);
@@ -1818,7 +1819,7 @@ static int nvme_rdma_create_io_queues(struct nvme_rdma_ctrl *ctrl)
memset(&ctrl->tag_set, 0, sizeof(ctrl->tag_set));
ctrl->tag_set.ops = &nvme_rdma_mq_ops;
- ctrl->tag_set.queue_depth = ctrl->ctrl.sqsize;
+ ctrl->tag_set.queue_depth = ctrl->ctrl.opts->queue_size;
ctrl->tag_set.reserved_tags = 1; /* fabric connect */
ctrl->tag_set.numa_node = NUMA_NO_NODE;
ctrl->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
@@ -1916,7 +1917,7 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
spin_lock_init(&ctrl->lock);
ctrl->queue_count = opts->nr_io_queues + 1; /* +1 for admin queue */
- ctrl->ctrl.sqsize = opts->queue_size;
+ ctrl->ctrl.sqsize = opts->queue_size - 1;
ctrl->ctrl.kato = opts->kato;
ret = -ENOMEM;
--
2.7.4
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH v3 4/4] nvme-loop: set sqsize to 0-based value, per spec
2016-08-16 19:56 [PATCH v3 0/4] sqsize fixes Jay Freyensee
` (2 preceding siblings ...)
2016-08-16 19:56 ` [PATCH v3 3/4] nvme-rdma: fix sqsize/hsqsize " Jay Freyensee
@ 2016-08-16 19:56 ` Jay Freyensee
3 siblings, 0 replies; 10+ messages in thread
From: Jay Freyensee @ 2016-08-16 19:56 UTC (permalink / raw)
Signed-off-by: Jay Freyensee <james_p_freyensee at linux.intel.com>
---
drivers/nvme/target/loop.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 94e7829..77481d5 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -558,7 +558,7 @@ static int nvme_loop_create_io_queues(struct nvme_loop_ctrl *ctrl)
memset(&ctrl->tag_set, 0, sizeof(ctrl->tag_set));
ctrl->tag_set.ops = &nvme_loop_mq_ops;
- ctrl->tag_set.queue_depth = ctrl->ctrl.sqsize;
+ ctrl->tag_set.queue_depth = ctrl->ctrl.opts->queue_size;
ctrl->tag_set.reserved_tags = 1; /* fabric connect */
ctrl->tag_set.numa_node = NUMA_NO_NODE;
ctrl->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
@@ -622,7 +622,7 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev,
ret = -ENOMEM;
- ctrl->ctrl.sqsize = opts->queue_size;
+ ctrl->ctrl.sqsize = opts->queue_size - 1;
ctrl->ctrl.kato = opts->kato;
ctrl->queues = kcalloc(opts->nr_io_queues + 1, sizeof(*ctrl->queues),
--
2.7.4
^ permalink raw reply related [flat|nested] 10+ messages in thread