* [PATCH 0/2] nvme-fabrics: sqsize bug fixes
@ 2016-08-06 0:54 Jay Freyensee
2016-08-06 0:54 ` [PATCH 1/2] nvme-rdma: tell fabrics layer admin queue depth Jay Freyensee
2016-08-06 0:54 ` [PATCH 2/2] nvme-rdma: sqsize/hsqsize/hrqsize is 0-based val Jay Freyensee
0 siblings, 2 replies; 11+ messages in thread
From: Jay Freyensee @ 2016-08-06 0:54 UTC (permalink / raw)
A few sqsize bugs have been discovered during the connect()
operation, basically the NVMe-fabrics layer being not-in-sync
with the implementation layer (RDMA).
As these bugs actually do break the NVMe-Fabrics spec, I'd like
to make the request to apply these fixes to the 4.8 kernel
(when approved).
Jay Freyensee (2):
nvme-rdma: tell fabrics layer admin queue depth
nvme-rdma: sqsize/hsqsize/hrqsize is 0-based val
drivers/nvme/host/fabrics.c | 2 +-
drivers/nvme/host/nvme.h | 1 +
drivers/nvme/host/rdma.c | 20 +++++++++++++++++---
3 files changed, 19 insertions(+), 4 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] nvme-rdma: tell fabrics layer admin queue depth
2016-08-06 0:54 [PATCH 0/2] nvme-fabrics: sqsize bug fixes Jay Freyensee
@ 2016-08-06 0:54 ` Jay Freyensee
2016-08-07 7:20 ` Sagi Grimberg
2016-08-06 0:54 ` [PATCH 2/2] nvme-rdma: sqsize/hsqsize/hrqsize is 0-based val Jay Freyensee
1 sibling, 1 reply; 11+ messages in thread
From: Jay Freyensee @ 2016-08-06 0:54 UTC (permalink / raw)
Upon admin queue connect(), the rdma qp was being
set based on NVMF_AQ_DEPTH. However, the fabrics layer was
using the sqsize field value set for I/O queues for the admin
queue, which through the nvme layer and rdma layer off-whack:
root at fedora23-fabrics-host1 nvmf]# dmesg
[ 3507.798642] nvme_fabrics: nvmf_connect_admin_queue():admin sqsize
being sent is: 128
[ 3507.798858] nvme nvme0: creating 16 I/O queues.
[ 3507.896407] nvme nvme0: new ctrl: NQN "nullside-nqn", addr
192.168.1.3:4420
Thus, to have a different admin queue value (which the fabrics
spec states the minimum depth for a fabrics admin queue is 32 via
the ASQSZ definition), we need also a new variable to hold
the sqsize for admin fabrics queue.
Reported-by: Daniel Verkamp <daniel.verkamp at intel.com>
Signed-off-by: Jay Freyensee <james_p_freyensee at linux.intel.com>
---
drivers/nvme/host/fabrics.c | 2 +-
drivers/nvme/host/nvme.h | 1 +
drivers/nvme/host/rdma.c | 18 ++++++++++++++++--
3 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index dc99676..f81d937 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -363,7 +363,7 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
cmd.connect.opcode = nvme_fabrics_command;
cmd.connect.fctype = nvme_fabrics_type_connect;
cmd.connect.qid = 0;
- cmd.connect.sqsize = cpu_to_le16(ctrl->sqsize);
+ cmd.connect.sqsize = cpu_to_le16(ctrl->admin_sqsize);
/*
* Set keep-alive timeout in seconds granularity (ms * 1000)
* and add a grace period for controller kato enforcement
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index ab18b78..32577a7 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -137,6 +137,7 @@ struct nvme_ctrl {
struct delayed_work ka_work;
/* Fabrics only */
+ u16 admin_sqsize;
u16 sqsize;
u32 ioccsz;
u32 iorcsz;
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 3e3ce2b..ff44167 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1284,8 +1284,13 @@ static int nvme_rdma_route_resolved(struct nvme_rdma_queue *queue)
priv.recfmt = cpu_to_le16(NVME_RDMA_CM_FMT_1_0);
priv.qid = cpu_to_le16(nvme_rdma_queue_idx(queue));
- priv.hrqsize = cpu_to_le16(queue->queue_size);
- priv.hsqsize = cpu_to_le16(queue->queue_size);
+ if (priv.qid == 0) {
+ priv.hrqsize = cpu_to_le16(queue->ctrl->ctrl.admin_sqsize);
+ priv.hsqsize = cpu_to_le16(queue->ctrl->ctrl.admin_sqsize);
+ } else {
+ priv.hrqsize = cpu_to_le16(queue->queue_size);
+ priv.hsqsize = cpu_to_le16(queue->queue_size);
+ }
ret = rdma_connect(queue->cm_id, ¶m);
if (ret) {
@@ -1907,6 +1912,15 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
spin_lock_init(&ctrl->lock);
ctrl->queue_count = opts->nr_io_queues + 1; /* +1 for admin queue */
+
+ /* as nvme_rdma_configure_admin_queue() is setting the rdma's
+ * internal submission queue to a different value other
+ * than opts->queue_size, we need to make sure the
+ * fabric layer uses that value upon an
+ * NVMeoF admin connect() and not default to the more
+ * common I/O queue size value (sqsize, opts->queue_size).
+ */
+ ctrl->ctrl.admin_sqsize = NVMF_AQ_DEPTH-1;
ctrl->ctrl.sqsize = opts->queue_size;
ctrl->ctrl.kato = opts->kato;
--
2.7.4
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/2] nvme-rdma: sqsize/hsqsize/hrqsize is 0-based val
2016-08-06 0:54 [PATCH 0/2] nvme-fabrics: sqsize bug fixes Jay Freyensee
2016-08-06 0:54 ` [PATCH 1/2] nvme-rdma: tell fabrics layer admin queue depth Jay Freyensee
@ 2016-08-06 0:54 ` Jay Freyensee
2016-08-07 7:29 ` Sagi Grimberg
1 sibling, 1 reply; 11+ messages in thread
From: Jay Freyensee @ 2016-08-06 0:54 UTC (permalink / raw)
Per NVMe-over-Fabrics 1.0 spec, sqsize is represented as
a 0-based value.
Also per spec, the RDMA binding values shall be set
to sqsize, which makes hsqsize and hrqsize 0-based values.
Thus, the sqsize at the NVMe Fabrics level is now:
[root at fedora23-fabrics-host1 for-48]# dmesg
[ 318.720645] nvme_fabrics: nvmf_connect_admin_queue(): sqsize for
admin queue: 31
[ 318.720884] nvme nvme0: creating 16 I/O queues.
[ 318.810114] nvme_fabrics: nvmf_connect_io_queue(): sqsize for i/o
queue: 127
Reported-by: Daniel Verkamp <daniel.verkamp at intel.com>
Signed-off-by: Jay Freyensee <james_p_freyensee at linux.intel.com>
---
drivers/nvme/host/rdma.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index ff44167..6300b10 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1288,8 +1288,8 @@ static int nvme_rdma_route_resolved(struct nvme_rdma_queue *queue)
priv.hrqsize = cpu_to_le16(queue->ctrl->ctrl.admin_sqsize);
priv.hsqsize = cpu_to_le16(queue->ctrl->ctrl.admin_sqsize);
} else {
- priv.hrqsize = cpu_to_le16(queue->queue_size);
- priv.hsqsize = cpu_to_le16(queue->queue_size);
+ priv.hrqsize = cpu_to_le16(queue->ctrl->ctrl.sqsize);
+ priv.hsqsize = cpu_to_le16(queue->ctrl->ctrl.sqsize);
}
ret = rdma_connect(queue->cm_id, ¶m);
@@ -1921,7 +1921,7 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
* common I/O queue size value (sqsize, opts->queue_size).
*/
ctrl->ctrl.admin_sqsize = NVMF_AQ_DEPTH-1;
- ctrl->ctrl.sqsize = opts->queue_size;
+ ctrl->ctrl.sqsize = opts->queue_size-1;
ctrl->ctrl.kato = opts->kato;
ret = -ENOMEM;
--
2.7.4
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 1/2] nvme-rdma: tell fabrics layer admin queue depth
2016-08-06 0:54 ` [PATCH 1/2] nvme-rdma: tell fabrics layer admin queue depth Jay Freyensee
@ 2016-08-07 7:20 ` Sagi Grimberg
2016-08-08 7:05 ` Christoph Hellwig
0 siblings, 1 reply; 11+ messages in thread
From: Sagi Grimberg @ 2016-08-07 7:20 UTC (permalink / raw)
> diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
> index dc99676..f81d937 100644
> --- a/drivers/nvme/host/fabrics.c
> +++ b/drivers/nvme/host/fabrics.c
> @@ -363,7 +363,7 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
> cmd.connect.opcode = nvme_fabrics_command;
> cmd.connect.fctype = nvme_fabrics_type_connect;
> cmd.connect.qid = 0;
> - cmd.connect.sqsize = cpu_to_le16(ctrl->sqsize);
> + cmd.connect.sqsize = cpu_to_le16(ctrl->admin_sqsize);
> /*
> * Set keep-alive timeout in seconds granularity (ms * 1000)
> * and add a grace period for controller kato enforcement
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index ab18b78..32577a7 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -137,6 +137,7 @@ struct nvme_ctrl {
> struct delayed_work ka_work;
>
> /* Fabrics only */
> + u16 admin_sqsize;
> u16 sqsize;
> u32 ioccsz;
> u32 iorcsz;
No need for admin_sqsize member, its always set to NVMF_AQ_DEPTH-1,
just use that...
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/2] nvme-rdma: sqsize/hsqsize/hrqsize is 0-based val
2016-08-06 0:54 ` [PATCH 2/2] nvme-rdma: sqsize/hsqsize/hrqsize is 0-based val Jay Freyensee
@ 2016-08-07 7:29 ` Sagi Grimberg
2016-08-08 7:06 ` Christoph Hellwig
2016-08-08 16:08 ` Verkamp, Daniel
0 siblings, 2 replies; 11+ messages in thread
From: Sagi Grimberg @ 2016-08-07 7:29 UTC (permalink / raw)
> Per NVMe-over-Fabrics 1.0 spec, sqsize is represented as
> a 0-based value.
Did we really decide that? The spec actually explicitly says
that a 0 sqsize will be failed:
"If the size is 0h or larger than the controller supports, then a
status value of Connect Invalid Parameters shall be returned."
And then it says: "This is a 0?s based value"
Confusing....
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] nvme-rdma: tell fabrics layer admin queue depth
2016-08-07 7:20 ` Sagi Grimberg
@ 2016-08-08 7:05 ` Christoph Hellwig
2016-08-08 16:11 ` J Freyensee
0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2016-08-08 7:05 UTC (permalink / raw)
On Sun, Aug 07, 2016@10:20:38AM +0300, Sagi Grimberg wrote:
> No need for admin_sqsize member, its always set to NVMF_AQ_DEPTH-1,
> just use that...
Agreed.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/2] nvme-rdma: sqsize/hsqsize/hrqsize is 0-based val
2016-08-07 7:29 ` Sagi Grimberg
@ 2016-08-08 7:06 ` Christoph Hellwig
2016-08-08 16:30 ` Verkamp, Daniel
2016-08-08 16:08 ` Verkamp, Daniel
1 sibling, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2016-08-08 7:06 UTC (permalink / raw)
On Sun, Aug 07, 2016@10:29:08AM +0300, Sagi Grimberg wrote:
> Did we really decide that? The spec actually explicitly says
> that a 0 sqsize will be failed:
>
> "If the size is 0h or larger than the controller supports, then a
> status value of Connect Invalid Parameters shall be returned."
>
> And then it says: "This is a 0?s based value"
>
> Confusing....
Yeah, I think we'll need an errate in the TWG. And I think the
current Linux behavior is the sensible one. Jay, do you want
to drive this in the WG or I should I bring it up?
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/2] nvme-rdma: sqsize/hsqsize/hrqsize is 0-based val
2016-08-07 7:29 ` Sagi Grimberg
2016-08-08 7:06 ` Christoph Hellwig
@ 2016-08-08 16:08 ` Verkamp, Daniel
1 sibling, 0 replies; 11+ messages in thread
From: Verkamp, Daniel @ 2016-08-08 16:08 UTC (permalink / raw)
On Sun, 2016-08-07@10:29 +0300, Sagi Grimberg wrote:
> >
> > Per NVMe-over-Fabrics 1.0 spec, sqsize is represented as
> > a 0-based value.
>
> Did we really decide that? The spec actually explicitly says
> that a 0 sqsize will be failed:
>
> "If the size is 0h or larger than the controller supports, then a
> status value of Connect Invalid Parameters shall be returned."
>
> And then it says: "This is a 0?s based value"
>
> Confusing....
I think 0 is disallowed because that would mean the queue size is 1, and
NVMe requires at least a queue size of 2 to be useful at all (one slot
has to be empty to avoid confusion between queue full and queue empty
cases).
-- Daniel
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] nvme-rdma: tell fabrics layer admin queue depth
2016-08-08 7:05 ` Christoph Hellwig
@ 2016-08-08 16:11 ` J Freyensee
0 siblings, 0 replies; 11+ messages in thread
From: J Freyensee @ 2016-08-08 16:11 UTC (permalink / raw)
On Mon, 2016-08-08@09:05 +0200, Christoph Hellwig wrote:
> On Sun, Aug 07, 2016@10:20:38AM +0300, Sagi Grimberg wrote:
> >
> > No need for admin_sqsize member, its always set to NVMF_AQ_DEPTH-1,
> > just use that...
>
Ok, but this allows the code to grow into the ASQSZ field (whenever
this is implemented or not), which is done on discovery that each new
controller discovered could have a different size Admin queue field.
But I'm perfectly happy using this,it makes the code a bit easier.
> Agreed.
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/2] nvme-rdma: sqsize/hsqsize/hrqsize is 0-based val
2016-08-08 7:06 ` Christoph Hellwig
@ 2016-08-08 16:30 ` Verkamp, Daniel
[not found] ` <9A947B1F6D0FF74CB5D6ACFF43B473F3A3A45C75@ORSMSX112.amr.corp.intel.com>
0 siblings, 1 reply; 11+ messages in thread
From: Verkamp, Daniel @ 2016-08-08 16:30 UTC (permalink / raw)
On Mon, 2016-08-08@09:06 +0200, Christoph Hellwig wrote:
> On Sun, Aug 07, 2016@10:29:08AM +0300, Sagi Grimberg wrote:
> >
> > Did we really decide that? The spec actually explicitly says
> > that a 0 sqsize will be failed:
> >
> > "If the size is 0h or larger than the controller supports, then a
> > status value of Connect Invalid Parameters shall be returned."
> >
> > And then it says: "This is a 0?s based value"
> >
> > Confusing....
>
> Yeah, I think we'll need an errate in the TWG.??And I think the
> current Linux behavior is the sensible one.??Jay, do you want
> to drive this in the WG or I should I bring it up?
I would be happy if all of the 0's based values in the spec were
removed, but I think it's a little too late for that...
In this particular case, I'm pretty sure this is not a mistake; the
relevant text is copied directly from the NVMe base spec Create I/O
Submission Queue command's QSIZE parameter.
What does need to be clarified is the RDMA transport definition. ?The
Connect private data contains two queue size-related fields:
- HSQSIZE, which is supposed to be the RDMA QP send depth, but also says
"shall be set to the Submission Queue Size (SQSIZE)", while not saying
whether it is 0's based.
- HRQSIZE, which is the QP receive depth, which has no equivalent like
SQSIZE in the Connect command, and also does not mention being 0's
based.
These should hopefully both be 0's based or both not 0's based, but
right now, I have no idea what the intent of the spec is.
Thanks,
-- Daniel
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/2] nvme-rdma: sqsize/hsqsize/hrqsize is 0-based val
[not found] ` <9A947B1F6D0FF74CB5D6ACFF43B473F3A3A45C75@ORSMSX112.amr.corp.intel.com>
@ 2016-08-08 18:39 ` J Freyensee
0 siblings, 0 replies; 11+ messages in thread
From: J Freyensee @ 2016-08-08 18:39 UTC (permalink / raw)
On Mon, 2016-08-08@18:14 +0000, Minturn, Dave B wrote:
> >
> > >
> > > -----Original Message-----
> > > From: Linux-nvme [mailto:linux-nvme-bounces at lists.infradead.org]
> > > On
> > > Behalf Of Verkamp, Daniel
> > > Sent: Monday, August 08, 2016 9:30 AM
> > > To: hch at lst.de; sagi at grimberg.me
> > > Cc: james_p_freyensee at linux.intel.com; linux-nvme at lists.infradead
> > > .org
> > > Subject: Re: [PATCH 2/2] nvme-rdma: sqsize/hsqsize/hrqsize is 0-
> > > based val
> > >
> > > On Mon, 2016-08-08@09:06 +0200, Christoph Hellwig wrote:
> > > >
> > > > On Sun, Aug 07, 2016@10:29:08AM +0300, Sagi Grimberg wrote:
> > > > >
> > > > >
> > > > > Did we really decide that? The spec actually explicitly says
> > > > > that a 0 sqsize will be failed:
> > > > >
> > > > > "If the size is 0h or larger than the controller supports,
> > > > > then a
> > > > > status value of Connect Invalid Parameters shall be
> > > > > returned."
> > > > >
> > > > > And then it says: "This is a 0?s based value"
> > > > >
> > > > > Confusing....
> > > >
> > > > Yeah, I think we'll need an errate in the TWG.??And I think the
> > > > current Linux behavior is the sensible one.??Jay, do you want
> > > > to drive this in the WG or I should I bring it up?
> > >
> > > I would be happy if all of the 0's based values in the spec were
> > > removed, but I think it's a little too late for that...
> > >
> > > In this particular case, I'm pretty sure this is not a mistake;
> > > the
> > > relevant text is copied directly from the NVMe base spec Create
> > > I/O
> > > Submission Queue command's QSIZE parameter.
> > >
>
> Daniel is correct, this is not a mistake in the specification.
>
> >
> > >
> > > What does need to be clarified is the RDMA transport definition.
> > > ?The
> > > Connect private data contains two queue size-related fields:
> > >
> > > - HSQSIZE, which is supposed to be the RDMA QP send depth, but
> > > also says
> > > "shall be set to the Submission Queue Size (SQSIZE)", while not
> > > saying
> > > whether it is 0's based.
>
> The spec. states that SQSIZE is 0-based and given HSQSIZE must be
> equal to SQSIZE, the value is going to be 0's based.
> We can suggest adding some additional clarification the spec. such as
> a reminder that SQSIZE is 0-based.???
>
>
> >
> > >
> > > - HRQSIZE, which is the QP receive depth, which has no equivalent
> > > like
> > > SQSIZE in the Connect command, and also does not mention being
> > > 0's
> > > based.
(replying back to the list)
So with the comment Dave made, this code in the rdma.c host:
-???????????????priv.hrqsize = cpu_to_le16(queue->queue_size);
-???????????????priv.hsqsize = cpu_to_le16(queue->queue_size);
+???????????????priv.hrqsize = cpu_to_le16(queue->ctrl->ctrl.sqsize);
priv.hrqsize is wrong according to the NVMe fabrics spec. ?At minimum
it should be implemented as:
priv.hrqsize = cpu_to_le16(queue->ctrl->ctrl.sqsize + 1);
+???????????????priv.hsqsize = cpu_to_le16(queue->ctrl->ctrl.sqsize);
and this line remains the same because sqsize is 0 based and the spec
clearly says hsqsize shall be set to sqsize.
I see the confusion for hrqsize and I think we should try and get it
cleared up in errata, but I think sqsize and hsqsize the way it's
stated in the spec is pretty clear.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-08-08 18:39 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-06 0:54 [PATCH 0/2] nvme-fabrics: sqsize bug fixes Jay Freyensee
2016-08-06 0:54 ` [PATCH 1/2] nvme-rdma: tell fabrics layer admin queue depth Jay Freyensee
2016-08-07 7:20 ` Sagi Grimberg
2016-08-08 7:05 ` Christoph Hellwig
2016-08-08 16:11 ` J Freyensee
2016-08-06 0:54 ` [PATCH 2/2] nvme-rdma: sqsize/hsqsize/hrqsize is 0-based val Jay Freyensee
2016-08-07 7:29 ` Sagi Grimberg
2016-08-08 7:06 ` Christoph Hellwig
2016-08-08 16:30 ` Verkamp, Daniel
[not found] ` <9A947B1F6D0FF74CB5D6ACFF43B473F3A3A45C75@ORSMSX112.amr.corp.intel.com>
2016-08-08 18:39 ` J Freyensee
2016-08-08 16:08 ` Verkamp, Daniel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).