* fix nvme sqsize on off regression
@ 2022-12-25 10:32 Christoph Hellwig
2022-12-25 10:32 ` [PATCH 1/4] nvme: fix setting the queue depth in nvme_alloc_io_tag_set Christoph Hellwig
` (4 more replies)
0 siblings, 5 replies; 20+ messages in thread
From: Christoph Hellwig @ 2022-12-25 10:32 UTC (permalink / raw)
To: Linus Torvalds, Jens Axboe
Cc: Keith Busch, Sagi Grimberg, Hugh Dickins, linux-nvme
Hi all,
this series fixes up the sqsize regression that Hugh reported. The first
two patches are the real fixes, and it might make sense if Linus just
picks them directly before -rc1. The other two just clean things up a
bit to avoid pitfalls like this in the future.
Diffstat:
apple.c | 3 +--
core.c | 21 ++++++++++++++++++---
fabrics.c | 2 +-
fc.c | 12 ++++++------
pci.c | 27 ++++++++++++---------------
rdma.c | 23 +++++++++++------------
tcp.c | 12 ++++++------
7 files changed, 55 insertions(+), 45 deletions(-)
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 1/4] nvme: fix setting the queue depth in nvme_alloc_io_tag_set
2022-12-25 10:32 fix nvme sqsize on off regression Christoph Hellwig
@ 2022-12-25 10:32 ` Christoph Hellwig
2022-12-25 10:51 ` Sagi Grimberg
2022-12-25 21:15 ` Hugh Dickins
2022-12-25 10:32 ` [PATCH 2/4] nvme-pci: update sqsize when adjusting the queue depth Christoph Hellwig
` (3 subsequent siblings)
4 siblings, 2 replies; 20+ messages in thread
From: Christoph Hellwig @ 2022-12-25 10:32 UTC (permalink / raw)
To: Linus Torvalds, Jens Axboe
Cc: Keith Busch, Sagi Grimberg, Hugh Dickins, linux-nvme
While the CAP.MQES field in NVMe is a 0s based filed with a natural one
off, we also need to account for the queue wrap condition and fix undo
the one off again in nvme_alloc_io_tag_set. This was never properly
done by the fabrics drivers, but they don't seem to care because there
is no actual physical queue that can wrap around, but it became a
problem when converting over the PCIe driver. Also add back the
BLK_MQ_MAX_DEPTH check that was lost in the same commit.
Fixes: 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/nvme/host/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index e26b085a007aea..cda1361e6d4fbb 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4897,7 +4897,7 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
memset(set, 0, sizeof(*set));
set->ops = ops;
- set->queue_depth = ctrl->sqsize + 1;
+ set->queue_depth = min_t(unsigned, ctrl->sqsize, BLK_MQ_MAX_DEPTH - 1);
/*
* Some Apple controllers requires tags to be unique across admin and
* the (only) I/O queue, so reserve the first 32 tags of the I/O queue.
--
2.35.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 2/4] nvme-pci: update sqsize when adjusting the queue depth
2022-12-25 10:32 fix nvme sqsize on off regression Christoph Hellwig
2022-12-25 10:32 ` [PATCH 1/4] nvme: fix setting the queue depth in nvme_alloc_io_tag_set Christoph Hellwig
@ 2022-12-25 10:32 ` Christoph Hellwig
2022-12-25 11:19 ` Sagi Grimberg
2022-12-25 21:21 ` Hugh Dickins
2022-12-25 10:32 ` [PATCH 3/4] nvme: store the actual queue size in ctrl->sqsize Christoph Hellwig
` (2 subsequent siblings)
4 siblings, 2 replies; 20+ messages in thread
From: Christoph Hellwig @ 2022-12-25 10:32 UTC (permalink / raw)
To: Linus Torvalds, Jens Axboe
Cc: Keith Busch, Sagi Grimberg, Hugh Dickins, linux-nvme
Update the core sqsize field in addition to the PCIe-specific
q_depth field as the core tagset allocation helpers rely on it.
Fixes: 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/nvme/host/pci.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 804b6a6cb43a9c..b13baccedb4a95 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2333,10 +2333,12 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
if (dev->cmb_use_sqes) {
result = nvme_cmb_qdepth(dev, nr_io_queues,
sizeof(struct nvme_command));
- if (result > 0)
+ if (result > 0) {
dev->q_depth = result;
- else
+ dev->ctrl.sqsize = result - 1;
+ } else {
dev->cmb_use_sqes = false;
+ }
}
do {
@@ -2537,7 +2539,6 @@ static int nvme_pci_enable(struct nvme_dev *dev)
dev->q_depth = min_t(u32, NVME_CAP_MQES(dev->ctrl.cap) + 1,
io_queue_depth);
- dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */
dev->db_stride = 1 << NVME_CAP_STRIDE(dev->ctrl.cap);
dev->dbs = dev->bar + 4096;
@@ -2578,7 +2579,7 @@ static int nvme_pci_enable(struct nvme_dev *dev)
dev_warn(dev->ctrl.device, "IO queue depth clamped to %d\n",
dev->q_depth);
}
-
+ dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */
nvme_map_cmb(dev);
--
2.35.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 3/4] nvme: store the actual queue size in ctrl->sqsize
2022-12-25 10:32 fix nvme sqsize on off regression Christoph Hellwig
2022-12-25 10:32 ` [PATCH 1/4] nvme: fix setting the queue depth in nvme_alloc_io_tag_set Christoph Hellwig
2022-12-25 10:32 ` [PATCH 2/4] nvme-pci: update sqsize when adjusting the queue depth Christoph Hellwig
@ 2022-12-25 10:32 ` Christoph Hellwig
2022-12-25 11:09 ` Sagi Grimberg
2022-12-28 17:02 ` Keith Busch
2022-12-25 10:32 ` [PATCH 4/4] nvme-pci: remove the dev->q_depth field Christoph Hellwig
2022-12-26 19:11 ` (subset) fix nvme sqsize on off regression Jens Axboe
4 siblings, 2 replies; 20+ messages in thread
From: Christoph Hellwig @ 2022-12-25 10:32 UTC (permalink / raw)
To: Linus Torvalds, Jens Axboe
Cc: Keith Busch, Sagi Grimberg, Hugh Dickins, linux-nvme
Convert from the strange on the wire 0s based value to a regular size
when reading the filed to avoid adjustments all over.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/nvme/host/apple.c | 3 +--
drivers/nvme/host/core.c | 19 +++++++++++++++++--
drivers/nvme/host/fabrics.c | 2 +-
drivers/nvme/host/fc.c | 12 ++++++------
drivers/nvme/host/pci.c | 4 ++--
drivers/nvme/host/rdma.c | 23 +++++++++++------------
drivers/nvme/host/tcp.c | 12 ++++++------
7 files changed, 44 insertions(+), 31 deletions(-)
diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
index e36aeb50b4edc1..073a1e8c021c14 100644
--- a/drivers/nvme/host/apple.c
+++ b/drivers/nvme/host/apple.c
@@ -1080,8 +1080,7 @@ static void apple_nvme_reset_work(struct work_struct *work)
writeq(anv->ioq.tcb_dma_addr,
anv->mmio_nvme + APPLE_NVMMU_IOSQ_TCB_BASE);
- anv->ctrl.sqsize =
- APPLE_ANS_MAX_QUEUE_DEPTH - 1; /* 0's based queue depth */
+ anv->ctrl.sqsize = APPLE_ANS_MAX_QUEUE_DEPTH;
anv->ctrl.cap = readq(anv->mmio_nvme + NVME_REG_CAP);
dev_dbg(anv->dev, "Enabling controller now");
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index cda1361e6d4fbb..f2b7bb8bcc31fa 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3242,7 +3242,8 @@ int nvme_init_ctrl_finish(struct nvme_ctrl *ctrl, bool was_suspended)
return ret;
}
- ctrl->sqsize = min_t(u16, NVME_CAP_MQES(ctrl->cap), ctrl->sqsize);
+ /* CAP.MQES is 0s based */
+ ctrl->sqsize = min_t(u16, NVME_CAP_MQES(ctrl->cap) - 1, ctrl->sqsize);
if (ctrl->vs >= NVME_VS(1, 1, 0))
ctrl->subsystem = NVME_CAP_NSSRC(ctrl->cap);
@@ -3505,9 +3506,18 @@ static DEVICE_ATTR(field, S_IRUGO, field##_show, NULL);
nvme_show_int_function(cntlid);
nvme_show_int_function(numa_node);
nvme_show_int_function(queue_count);
-nvme_show_int_function(sqsize);
nvme_show_int_function(kato);
+static ssize_t sqsize_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+ /* Report the 0s based register value for historic reasons: */
+ return sysfs_emit(buf, "%d\n", ctrl->sqsize - 1);
+}
+static DEVICE_ATTR(sqsize, S_IRUGO, sqsize_show, NULL);
+
static ssize_t nvme_sysfs_delete(struct device *dev,
struct device_attribute *attr, const char *buf,
size_t count)
@@ -4897,7 +4907,12 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
memset(set, 0, sizeof(*set));
set->ops = ops;
+
+ /*
+ * Leave an empty slot in the SQ to deal with queue wrap conditions.
+ */
set->queue_depth = min_t(unsigned, ctrl->sqsize, BLK_MQ_MAX_DEPTH - 1);
+
/*
* Some Apple controllers requires tags to be unique across admin and
* the (only) I/O queue, so reserve the first 32 tags of the I/O queue.
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index ce27276f552dad..bb3c611136c90c 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -464,7 +464,7 @@ int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, u16 qid)
cmd.connect.opcode = nvme_fabrics_command;
cmd.connect.fctype = nvme_fabrics_type_connect;
cmd.connect.qid = cpu_to_le16(qid);
- cmd.connect.sqsize = cpu_to_le16(ctrl->sqsize);
+ cmd.connect.sqsize = cpu_to_le16(ctrl->sqsize - 1);
if (ctrl->opts->disable_sqflow)
cmd.connect.cattr |= NVME_CONNECT_DISABLE_SQFLOW;
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 4564f16a0b203c..81278750b8c45d 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2922,11 +2922,11 @@ nvme_fc_create_io_queues(struct nvme_fc_ctrl *ctrl)
if (ret)
return ret;
- ret = nvme_fc_create_hw_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
+ ret = nvme_fc_create_hw_io_queues(ctrl, ctrl->ctrl.sqsize);
if (ret)
goto out_cleanup_tagset;
- ret = nvme_fc_connect_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
+ ret = nvme_fc_connect_io_queues(ctrl, ctrl->ctrl.sqsize);
if (ret)
goto out_delete_hw_queues;
@@ -2982,11 +2982,11 @@ nvme_fc_recreate_io_queues(struct nvme_fc_ctrl *ctrl)
blk_mq_update_nr_hw_queues(&ctrl->tag_set, nr_io_queues);
}
- ret = nvme_fc_create_hw_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
+ ret = nvme_fc_create_hw_io_queues(ctrl, ctrl->ctrl.sqsize);
if (ret)
goto out_free_io_queues;
- ret = nvme_fc_connect_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
+ ret = nvme_fc_connect_io_queues(ctrl, ctrl->ctrl.sqsize);
if (ret)
goto out_delete_hw_queues;
@@ -3148,7 +3148,7 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
"to maxcmd\n",
opts->queue_size, ctrl->ctrl.maxcmd);
opts->queue_size = ctrl->ctrl.maxcmd;
- ctrl->ctrl.sqsize = opts->queue_size - 1;
+ ctrl->ctrl.sqsize = opts->queue_size;
}
ret = nvme_fc_init_aen_ops(ctrl);
@@ -3509,7 +3509,7 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
lport->ops->max_hw_queues);
ctrl->ctrl.queue_count++; /* +1 for admin queue */
- ctrl->ctrl.sqsize = opts->queue_size - 1;
+ ctrl->ctrl.sqsize = opts->queue_size;
ctrl->ctrl.kato = opts->kato;
ctrl->ctrl.cntlid = 0xffff;
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index b13baccedb4a95..a621dd7a0a8efa 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2335,7 +2335,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
sizeof(struct nvme_command));
if (result > 0) {
dev->q_depth = result;
- dev->ctrl.sqsize = result - 1;
+ dev->ctrl.sqsize = result;
} else {
dev->cmb_use_sqes = false;
}
@@ -2579,7 +2579,7 @@ static int nvme_pci_enable(struct nvme_dev *dev)
dev_warn(dev->ctrl.device, "IO queue depth clamped to %d\n",
dev->q_depth);
}
- dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */
+ dev->ctrl.sqsize = dev->q_depth;
nvme_map_cmb(dev);
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index bbad26b82b56dd..26306f7b80c2e9 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -773,8 +773,7 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
}
for (i = 1; i < ctrl->ctrl.queue_count; i++) {
- ret = nvme_rdma_alloc_queue(ctrl, i,
- ctrl->ctrl.sqsize + 1);
+ ret = nvme_rdma_alloc_queue(ctrl, i, ctrl->ctrl.sqsize);
if (ret)
goto out_free_queues;
}
@@ -1059,24 +1058,24 @@ static int nvme_rdma_setup_ctrl(struct nvme_rdma_ctrl *ctrl, bool new)
goto destroy_admin;
}
- if (ctrl->ctrl.opts->queue_size > ctrl->ctrl.sqsize + 1) {
+ if (ctrl->ctrl.opts->queue_size > ctrl->ctrl.sqsize) {
dev_warn(ctrl->ctrl.device,
"queue_size %zu > ctrl sqsize %u, clamping down\n",
- ctrl->ctrl.opts->queue_size, ctrl->ctrl.sqsize + 1);
+ ctrl->ctrl.opts->queue_size, ctrl->ctrl.sqsize);
}
- if (ctrl->ctrl.sqsize + 1 > NVME_RDMA_MAX_QUEUE_SIZE) {
+ if (ctrl->ctrl.sqsize > NVME_RDMA_MAX_QUEUE_SIZE) {
dev_warn(ctrl->ctrl.device,
"ctrl sqsize %u > max queue size %u, clamping down\n",
- ctrl->ctrl.sqsize + 1, NVME_RDMA_MAX_QUEUE_SIZE);
- ctrl->ctrl.sqsize = NVME_RDMA_MAX_QUEUE_SIZE - 1;
+ ctrl->ctrl.sqsize, NVME_RDMA_MAX_QUEUE_SIZE);
+ ctrl->ctrl.sqsize = NVME_RDMA_MAX_QUEUE_SIZE;
}
- if (ctrl->ctrl.sqsize + 1 > ctrl->ctrl.maxcmd) {
+ if (ctrl->ctrl.sqsize > ctrl->ctrl.maxcmd) {
dev_warn(ctrl->ctrl.device,
"sqsize %u > ctrl maxcmd %u, clamping down\n",
- ctrl->ctrl.sqsize + 1, ctrl->ctrl.maxcmd);
- ctrl->ctrl.sqsize = ctrl->ctrl.maxcmd - 1;
+ ctrl->ctrl.sqsize, ctrl->ctrl.maxcmd);
+ ctrl->ctrl.sqsize = ctrl->ctrl.maxcmd;
}
if (ctrl->ctrl.sgls & (1 << 20))
@@ -1891,7 +1890,7 @@ static int nvme_rdma_route_resolved(struct nvme_rdma_queue *queue)
* 1's based representation of sqsize.
*/
priv.hrqsize = cpu_to_le16(queue->queue_size);
- priv.hsqsize = cpu_to_le16(queue->ctrl->ctrl.sqsize);
+ priv.hsqsize = cpu_to_le16(queue->ctrl->ctrl.sqsize - 1);
}
ret = rdma_connect_locked(queue->cm_id, ¶m);
@@ -2338,7 +2337,7 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
ctrl->ctrl.queue_count = opts->nr_io_queues + opts->nr_write_queues +
opts->nr_poll_queues + 1;
- ctrl->ctrl.sqsize = opts->queue_size - 1;
+ ctrl->ctrl.sqsize = opts->queue_size;
ctrl->ctrl.kato = opts->kato;
ret = -ENOMEM;
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index b69b89166b6b96..37a96cbd8929db 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2047,16 +2047,16 @@ static int nvme_tcp_setup_ctrl(struct nvme_ctrl *ctrl, bool new)
goto destroy_admin;
}
- if (opts->queue_size > ctrl->sqsize + 1)
+ if (opts->queue_size > ctrl->sqsize)
dev_warn(ctrl->device,
"queue_size %zu > ctrl sqsize %u, clamping down\n",
- opts->queue_size, ctrl->sqsize + 1);
+ opts->queue_size, ctrl->sqsize);
- if (ctrl->sqsize + 1 > ctrl->maxcmd) {
+ if (ctrl->sqsize > ctrl->maxcmd) {
dev_warn(ctrl->device,
"sqsize %u > ctrl maxcmd %u, clamping down\n",
- ctrl->sqsize + 1, ctrl->maxcmd);
- ctrl->sqsize = ctrl->maxcmd - 1;
+ ctrl->sqsize, ctrl->maxcmd);
+ ctrl->sqsize = ctrl->maxcmd;
}
if (ctrl->queue_count > 1) {
@@ -2562,7 +2562,7 @@ static struct nvme_ctrl *nvme_tcp_create_ctrl(struct device *dev,
ctrl->ctrl.opts = opts;
ctrl->ctrl.queue_count = opts->nr_io_queues + opts->nr_write_queues +
opts->nr_poll_queues + 1;
- ctrl->ctrl.sqsize = opts->queue_size - 1;
+ ctrl->ctrl.sqsize = opts->queue_size;
ctrl->ctrl.kato = opts->kato;
INIT_DELAYED_WORK(&ctrl->connect_work,
--
2.35.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 4/4] nvme-pci: remove the dev->q_depth field
2022-12-25 10:32 fix nvme sqsize on off regression Christoph Hellwig
` (2 preceding siblings ...)
2022-12-25 10:32 ` [PATCH 3/4] nvme: store the actual queue size in ctrl->sqsize Christoph Hellwig
@ 2022-12-25 10:32 ` Christoph Hellwig
2022-12-25 11:19 ` Sagi Grimberg
2022-12-26 19:11 ` (subset) fix nvme sqsize on off regression Jens Axboe
4 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2022-12-25 10:32 UTC (permalink / raw)
To: Linus Torvalds, Jens Axboe
Cc: Keith Busch, Sagi Grimberg, Hugh Dickins, linux-nvme
This field duplicates the sqsize field in the common nvme_ctrl structure,
so remove it and use the common field instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/nvme/host/pci.c | 30 +++++++++++++-----------------
1 file changed, 13 insertions(+), 17 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a621dd7a0a8efa..59ef702f910efa 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -126,7 +126,6 @@ struct nvme_dev {
unsigned max_qid;
unsigned io_queues[HCTX_MAX_TYPES];
unsigned int num_vecs;
- u32 q_depth;
int io_sqes;
u32 db_stride;
void __iomem *bar;
@@ -1515,7 +1514,7 @@ static void nvme_reap_pending_cqes(struct nvme_dev *dev)
static int nvme_cmb_qdepth(struct nvme_dev *dev, int nr_io_queues,
int entry_size)
{
- int q_depth = dev->q_depth;
+ u16 q_depth = dev->ctrl.sqsize;
unsigned q_size_aligned = roundup(q_depth * entry_size,
NVME_CTRL_PAGE_SIZE);
@@ -1824,7 +1823,7 @@ static int nvme_create_io_queues(struct nvme_dev *dev)
int ret = 0;
for (i = dev->ctrl.queue_count; i <= dev->max_qid; i++) {
- if (nvme_alloc_queue(dev, i, dev->q_depth)) {
+ if (nvme_alloc_queue(dev, i, dev->ctrl.sqsize)) {
ret = -ENOMEM;
break;
}
@@ -2333,12 +2332,10 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
if (dev->cmb_use_sqes) {
result = nvme_cmb_qdepth(dev, nr_io_queues,
sizeof(struct nvme_command));
- if (result > 0) {
- dev->q_depth = result;
+ if (result > 0)
dev->ctrl.sqsize = result;
- } else {
+ else
dev->cmb_use_sqes = false;
- }
}
do {
@@ -2537,8 +2534,8 @@ static int nvme_pci_enable(struct nvme_dev *dev)
dev->ctrl.cap = lo_hi_readq(dev->bar + NVME_REG_CAP);
- dev->q_depth = min_t(u32, NVME_CAP_MQES(dev->ctrl.cap) + 1,
- io_queue_depth);
+ dev->ctrl.sqsize = min_t(u32, NVME_CAP_MQES(dev->ctrl.cap) + 1,
+ io_queue_depth);
dev->db_stride = 1 << NVME_CAP_STRIDE(dev->ctrl.cap);
dev->dbs = dev->bar + 4096;
@@ -2557,16 +2554,16 @@ static int nvme_pci_enable(struct nvme_dev *dev)
* some MacBook7,1 to avoid controller resets and data loss.
*/
if (pdev->vendor == PCI_VENDOR_ID_APPLE && pdev->device == 0x2001) {
- dev->q_depth = 2;
+ dev->ctrl.sqsize = 2;
dev_warn(dev->ctrl.device, "detected Apple NVMe controller, "
"set queue depth=%u to work around controller resets\n",
- dev->q_depth);
+ dev->ctrl.sqsize);
} else if (pdev->vendor == PCI_VENDOR_ID_SAMSUNG &&
(pdev->device == 0xa821 || pdev->device == 0xa822) &&
NVME_CAP_MQES(dev->ctrl.cap) == 0) {
- dev->q_depth = 64;
+ dev->ctrl.sqsize = 64;
dev_err(dev->ctrl.device, "detected PM1725 NVMe controller, "
- "set queue depth=%u\n", dev->q_depth);
+ "set queue depth=%u\n", dev->ctrl.sqsize);
}
/*
@@ -2574,12 +2571,11 @@ static int nvme_pci_enable(struct nvme_dev *dev)
* big enough so that we get 32 tags for the admin queue
*/
if ((dev->ctrl.quirks & NVME_QUIRK_SHARED_TAGS) &&
- (dev->q_depth < (NVME_AQ_DEPTH + 2))) {
- dev->q_depth = NVME_AQ_DEPTH + 2;
+ (dev->ctrl.sqsize < NVME_AQ_DEPTH + 2)) {
+ dev->ctrl.sqsize = NVME_AQ_DEPTH + 2;
dev_warn(dev->ctrl.device, "IO queue depth clamped to %d\n",
- dev->q_depth);
+ dev->ctrl.sqsize);
}
- dev->ctrl.sqsize = dev->q_depth;
nvme_map_cmb(dev);
--
2.35.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] nvme: fix setting the queue depth in nvme_alloc_io_tag_set
2022-12-25 10:32 ` [PATCH 1/4] nvme: fix setting the queue depth in nvme_alloc_io_tag_set Christoph Hellwig
@ 2022-12-25 10:51 ` Sagi Grimberg
2022-12-25 10:53 ` Sagi Grimberg
2022-12-25 21:15 ` Hugh Dickins
1 sibling, 1 reply; 20+ messages in thread
From: Sagi Grimberg @ 2022-12-25 10:51 UTC (permalink / raw)
To: Christoph Hellwig, Linus Torvalds, Jens Axboe
Cc: Keith Busch, Hugh Dickins, linux-nvme
> While the CAP.MQES field in NVMe is a 0s based filed with a natural one
> off, we also need to account for the queue wrap condition and fix undo
> the one off again in nvme_alloc_io_tag_set. This was never properly
> done by the fabrics drivers, but they don't seem to care because there
> is no actual physical queue that can wrap around, but it became a
> problem when converting over the PCIe driver. Also add back the
> BLK_MQ_MAX_DEPTH check that was lost in the same commit.
>
> Fixes: 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers")
This looks fine, but does this actually fix anything?
How does nvme-pci ever set sqsize anywhere close to BLK_MQ_MAX_DEPTH ?
NVME_PCI_MAX_QUEUE_SIZE is 4095...
> Reported-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/nvme/host/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index e26b085a007aea..cda1361e6d4fbb 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4897,7 +4897,7 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
>
> memset(set, 0, sizeof(*set));
> set->ops = ops;
> - set->queue_depth = ctrl->sqsize + 1;
> + set->queue_depth = min_t(unsigned, ctrl->sqsize, BLK_MQ_MAX_DEPTH - 1);
> /*
> * Some Apple controllers requires tags to be unique across admin and
> * the (only) I/O queue, so reserve the first 32 tags of the I/O queue.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] nvme: fix setting the queue depth in nvme_alloc_io_tag_set
2022-12-25 10:51 ` Sagi Grimberg
@ 2022-12-25 10:53 ` Sagi Grimberg
2022-12-28 16:11 ` Christoph Hellwig
0 siblings, 1 reply; 20+ messages in thread
From: Sagi Grimberg @ 2022-12-25 10:53 UTC (permalink / raw)
To: Christoph Hellwig, Linus Torvalds, Jens Axboe
Cc: Keith Busch, Hugh Dickins, linux-nvme
>> While the CAP.MQES field in NVMe is a 0s based filed with a natural one
>> off, we also need to account for the queue wrap condition and fix undo
>> the one off again in nvme_alloc_io_tag_set. This was never properly
>> done by the fabrics drivers, but they don't seem to care because there
>> is no actual physical queue that can wrap around, but it became a
>> problem when converting over the PCIe driver. Also add back the
>> BLK_MQ_MAX_DEPTH check that was lost in the same commit.
>>
>> Fixes: 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers")
>
> This looks fine, but does this actually fix anything?
> How does nvme-pci ever set sqsize anywhere close to BLK_MQ_MAX_DEPTH ?
> NVME_PCI_MAX_QUEUE_SIZE is 4095...
Disregard this comment.
But this means that fabrics drivers automatically get 1 less queue
entry? with this patch alone?
>
>> Reported-by: Hugh Dickins <hughd@google.com>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> ---
>> drivers/nvme/host/core.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index e26b085a007aea..cda1361e6d4fbb 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -4897,7 +4897,7 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl
>> *ctrl, struct blk_mq_tag_set *set,
>> memset(set, 0, sizeof(*set));
>> set->ops = ops;
>> - set->queue_depth = ctrl->sqsize + 1;
>> + set->queue_depth = min_t(unsigned, ctrl->sqsize, BLK_MQ_MAX_DEPTH
>> - 1);
>> /*
>> * Some Apple controllers requires tags to be unique across
>> admin and
>> * the (only) I/O queue, so reserve the first 32 tags of the I/O
>> queue.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/4] nvme: store the actual queue size in ctrl->sqsize
2022-12-25 10:32 ` [PATCH 3/4] nvme: store the actual queue size in ctrl->sqsize Christoph Hellwig
@ 2022-12-25 11:09 ` Sagi Grimberg
2022-12-28 17:02 ` Keith Busch
1 sibling, 0 replies; 20+ messages in thread
From: Sagi Grimberg @ 2022-12-25 11:09 UTC (permalink / raw)
To: Christoph Hellwig, Linus Torvalds, Jens Axboe
Cc: Keith Busch, Hugh Dickins, linux-nvme
On 12/25/22 12:32, Christoph Hellwig wrote:
> Convert from the strange on the wire 0s based value to a regular size
> when reading the filed to avoid adjustments all over.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/nvme/host/apple.c | 3 +--
> drivers/nvme/host/core.c | 19 +++++++++++++++++--
> drivers/nvme/host/fabrics.c | 2 +-
> drivers/nvme/host/fc.c | 12 ++++++------
> drivers/nvme/host/pci.c | 4 ++--
> drivers/nvme/host/rdma.c | 23 +++++++++++------------
> drivers/nvme/host/tcp.c | 12 ++++++------
> 7 files changed, 44 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
> index e36aeb50b4edc1..073a1e8c021c14 100644
> --- a/drivers/nvme/host/apple.c
> +++ b/drivers/nvme/host/apple.c
> @@ -1080,8 +1080,7 @@ static void apple_nvme_reset_work(struct work_struct *work)
> writeq(anv->ioq.tcb_dma_addr,
> anv->mmio_nvme + APPLE_NVMMU_IOSQ_TCB_BASE);
>
> - anv->ctrl.sqsize =
> - APPLE_ANS_MAX_QUEUE_DEPTH - 1; /* 0's based queue depth */
> + anv->ctrl.sqsize = APPLE_ANS_MAX_QUEUE_DEPTH;
> anv->ctrl.cap = readq(anv->mmio_nvme + NVME_REG_CAP);
>
> dev_dbg(anv->dev, "Enabling controller now");
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index cda1361e6d4fbb..f2b7bb8bcc31fa 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3242,7 +3242,8 @@ int nvme_init_ctrl_finish(struct nvme_ctrl *ctrl, bool was_suspended)
> return ret;
> }
>
> - ctrl->sqsize = min_t(u16, NVME_CAP_MQES(ctrl->cap), ctrl->sqsize);
> + /* CAP.MQES is 0s based */
> + ctrl->sqsize = min_t(u16, NVME_CAP_MQES(ctrl->cap) - 1, ctrl->sqsize);
Shoudn't this be NVME_CAP_MQES(ctrl->cap) + 1 ?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/4] nvme-pci: update sqsize when adjusting the queue depth
2022-12-25 10:32 ` [PATCH 2/4] nvme-pci: update sqsize when adjusting the queue depth Christoph Hellwig
@ 2022-12-25 11:19 ` Sagi Grimberg
2022-12-28 16:10 ` Christoph Hellwig
2022-12-25 21:21 ` Hugh Dickins
1 sibling, 1 reply; 20+ messages in thread
From: Sagi Grimberg @ 2022-12-25 11:19 UTC (permalink / raw)
To: Christoph Hellwig, Linus Torvalds, Jens Axboe
Cc: Keith Busch, Hugh Dickins, linux-nvme
> Update the core sqsize field in addition to the PCIe-specific
> q_depth field as the core tagset allocation helpers rely on it.
>
> Fixes: 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers")
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/nvme/host/pci.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 804b6a6cb43a9c..b13baccedb4a95 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2333,10 +2333,12 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
> if (dev->cmb_use_sqes) {
> result = nvme_cmb_qdepth(dev, nr_io_queues,
> sizeof(struct nvme_command));
> - if (result > 0)
> + if (result > 0) {
> dev->q_depth = result;
> - else
> + dev->ctrl.sqsize = result - 1;
> + } else {
> dev->cmb_use_sqes = false;
> + }
> }
>
> do {
> @@ -2537,7 +2539,6 @@ static int nvme_pci_enable(struct nvme_dev *dev)
>
> dev->q_depth = min_t(u32, NVME_CAP_MQES(dev->ctrl.cap) + 1,
> io_queue_depth);
> - dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */
> dev->db_stride = 1 << NVME_CAP_STRIDE(dev->ctrl.cap);
> dev->dbs = dev->bar + 4096;
>
> @@ -2578,7 +2579,7 @@ static int nvme_pci_enable(struct nvme_dev *dev)
> dev_warn(dev->ctrl.device, "IO queue depth clamped to %d\n",
> dev->q_depth);
> }
> -
> + dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */
>
> nvme_map_cmb(dev);
>
OK, I think I understand now.
But if you want to patches 1+2 to be taken before 3+4, I'd make sqsize
to be a 0's based value of q_depth after we subtract one entry for queue
wraparound. So others would not get affected from patches 1+2. Then in
one batch make all assignments to sqsize 1's based like in patch 3 and
change nvme_alloc_io_tagset at the same time.
Not super critical, but there is a small chance that if 1+2 are taken
before 3+4, someone may complain...
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 4/4] nvme-pci: remove the dev->q_depth field
2022-12-25 10:32 ` [PATCH 4/4] nvme-pci: remove the dev->q_depth field Christoph Hellwig
@ 2022-12-25 11:19 ` Sagi Grimberg
0 siblings, 0 replies; 20+ messages in thread
From: Sagi Grimberg @ 2022-12-25 11:19 UTC (permalink / raw)
To: Christoph Hellwig, Linus Torvalds, Jens Axboe
Cc: Keith Busch, Hugh Dickins, linux-nvme
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] nvme: fix setting the queue depth in nvme_alloc_io_tag_set
2022-12-25 10:32 ` [PATCH 1/4] nvme: fix setting the queue depth in nvme_alloc_io_tag_set Christoph Hellwig
2022-12-25 10:51 ` Sagi Grimberg
@ 2022-12-25 21:15 ` Hugh Dickins
1 sibling, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2022-12-25 21:15 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Linus Torvalds, Jens Axboe, Keith Busch, Sagi Grimberg,
Hugh Dickins, linux-nvme
On Sun, 25 Dec 2022, Christoph Hellwig wrote:
> While the CAP.MQES field in NVMe is a 0s based filed with a natural one
> off, we also need to account for the queue wrap condition and fix undo
> the one off again in nvme_alloc_io_tag_set. This was never properly
> done by the fabrics drivers, but they don't seem to care because there
> is no actual physical queue that can wrap around, but it became a
> problem when converting over the PCIe driver. Also add back the
> BLK_MQ_MAX_DEPTH check that was lost in the same commit.
>
> Fixes: 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers")
> Reported-by: Hugh Dickins <hughd@google.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Hugh Dickins <hughd@google.com>
> ---
> drivers/nvme/host/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index e26b085a007aea..cda1361e6d4fbb 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4897,7 +4897,7 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
>
> memset(set, 0, sizeof(*set));
> set->ops = ops;
> - set->queue_depth = ctrl->sqsize + 1;
> + set->queue_depth = min_t(unsigned, ctrl->sqsize, BLK_MQ_MAX_DEPTH - 1);
> /*
> * Some Apple controllers requires tags to be unique across admin and
> * the (only) I/O queue, so reserve the first 32 tags of the I/O queue.
> --
> 2.35.1
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/4] nvme-pci: update sqsize when adjusting the queue depth
2022-12-25 10:32 ` [PATCH 2/4] nvme-pci: update sqsize when adjusting the queue depth Christoph Hellwig
2022-12-25 11:19 ` Sagi Grimberg
@ 2022-12-25 21:21 ` Hugh Dickins
1 sibling, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2022-12-25 21:21 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Linus Torvalds, Jens Axboe, Keith Busch, Sagi Grimberg,
Hugh Dickins, linux-nvme
On Sun, 25 Dec 2022, Christoph Hellwig wrote:
> Update the core sqsize field in addition to the PCIe-specific
> q_depth field as the core tagset allocation helpers rely on it.
>
> Fixes: 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers")
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Hugh Dickins <hughd@google.com>
(I'm not saying Tested-by, because although the equivalent of this
was in my testing last night, I don't believe it would have been
hitting the cases which this patch repairs.
And I won't be looking at 3/4 or 4/4 today: but certainly agree
that reducing the number of fields which need to be kept in sync
is a good step forward.)
> ---
> drivers/nvme/host/pci.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 804b6a6cb43a9c..b13baccedb4a95 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2333,10 +2333,12 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
> if (dev->cmb_use_sqes) {
> result = nvme_cmb_qdepth(dev, nr_io_queues,
> sizeof(struct nvme_command));
> - if (result > 0)
> + if (result > 0) {
> dev->q_depth = result;
> - else
> + dev->ctrl.sqsize = result - 1;
> + } else {
> dev->cmb_use_sqes = false;
> + }
> }
>
> do {
> @@ -2537,7 +2539,6 @@ static int nvme_pci_enable(struct nvme_dev *dev)
>
> dev->q_depth = min_t(u32, NVME_CAP_MQES(dev->ctrl.cap) + 1,
> io_queue_depth);
> - dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */
> dev->db_stride = 1 << NVME_CAP_STRIDE(dev->ctrl.cap);
> dev->dbs = dev->bar + 4096;
>
> @@ -2578,7 +2579,7 @@ static int nvme_pci_enable(struct nvme_dev *dev)
> dev_warn(dev->ctrl.device, "IO queue depth clamped to %d\n",
> dev->q_depth);
> }
> -
> + dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */
>
> nvme_map_cmb(dev);
>
> --
> 2.35.1
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: (subset) fix nvme sqsize on off regression
2022-12-25 10:32 fix nvme sqsize on off regression Christoph Hellwig
` (3 preceding siblings ...)
2022-12-25 10:32 ` [PATCH 4/4] nvme-pci: remove the dev->q_depth field Christoph Hellwig
@ 2022-12-26 19:11 ` Jens Axboe
4 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2022-12-26 19:11 UTC (permalink / raw)
To: Linus Torvalds, Jens Axboe, Christoph Hellwig
Cc: Keith Busch, Sagi Grimberg, Hugh Dickins, linux-nvme
On Sun, 25 Dec 2022 11:32:30 +0100, Christoph Hellwig wrote:
> this series fixes up the sqsize regression that Hugh reported. The first
> two patches are the real fixes, and it might make sense if Linus just
> picks them directly before -rc1. The other two just clean things up a
> bit to avoid pitfalls like this in the future.
>
> Diffstat:
> apple.c | 3 +--
> core.c | 21 ++++++++++++++++++---
> fabrics.c | 2 +-
> fc.c | 12 ++++++------
> pci.c | 27 ++++++++++++---------------
> rdma.c | 23 +++++++++++------------
> tcp.c | 12 ++++++------
> 7 files changed, 55 insertions(+), 45 deletions(-)
>
> [...]
Applied, thanks!
[1/4] nvme: fix setting the queue depth in nvme_alloc_io_tag_set
commit: 33b93727ce90c8db916fb071ed13e90106339754
[2/4] nvme-pci: update sqsize when adjusting the queue depth
commit: 88d356ca41ba1c3effc2d4208dfbd4392f58cd6d
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/4] nvme-pci: update sqsize when adjusting the queue depth
2022-12-25 11:19 ` Sagi Grimberg
@ 2022-12-28 16:10 ` Christoph Hellwig
2022-12-29 12:07 ` Sagi Grimberg
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2022-12-28 16:10 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Christoph Hellwig, Linus Torvalds, Jens Axboe, Keith Busch,
Hugh Dickins, linux-nvme
On Sun, Dec 25, 2022 at 01:19:43PM +0200, Sagi Grimberg wrote:
> But if you want to patches 1+2 to be taken before 3+4, I'd make sqsize
> to be a 0's based value of q_depth after we subtract one entry for queue
> wraparound. So others would not get affected from patches 1+2. Then in
> one batch make all assignments to sqsize 1's based like in patch 3 and
> change nvme_alloc_io_tagset at the same time.
1 doe not make any difference in the 0s based vs not values,
they just reduce the queue size by 1 intentionally.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/4] nvme: fix setting the queue depth in nvme_alloc_io_tag_set
2022-12-25 10:53 ` Sagi Grimberg
@ 2022-12-28 16:11 ` Christoph Hellwig
0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2022-12-28 16:11 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Christoph Hellwig, Linus Torvalds, Jens Axboe, Keith Busch,
Hugh Dickins, linux-nvme
On Sun, Dec 25, 2022 at 12:53:38PM +0200, Sagi Grimberg wrote:
> But this means that fabrics drivers automatically get 1 less queue entry?
> with this patch alone?
Yes. And also with the entire series.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/4] nvme: store the actual queue size in ctrl->sqsize
2022-12-25 10:32 ` [PATCH 3/4] nvme: store the actual queue size in ctrl->sqsize Christoph Hellwig
2022-12-25 11:09 ` Sagi Grimberg
@ 2022-12-28 17:02 ` Keith Busch
1 sibling, 0 replies; 20+ messages in thread
From: Keith Busch @ 2022-12-28 17:02 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Linus Torvalds, Jens Axboe, Sagi Grimberg, Hugh Dickins,
linux-nvme
On Sun, Dec 25, 2022 at 11:32:33AM +0100, Christoph Hellwig wrote:
> Convert from the strange on the wire 0s based value to a regular size
> when reading the filed to avoid adjustments all over.
s/filed/field
> @@ -3242,7 +3242,8 @@ int nvme_init_ctrl_finish(struct nvme_ctrl *ctrl, bool was_suspended)
> return ret;
> }
>
> - ctrl->sqsize = min_t(u16, NVME_CAP_MQES(ctrl->cap), ctrl->sqsize);
> + /* CAP.MQES is 0s based */
> + ctrl->sqsize = min_t(u16, NVME_CAP_MQES(ctrl->cap) - 1, ctrl->sqsize);
Did you mean '+ 1' instead of '- 1'? The minimum MQES value per spec is
1, giving 2 slots. Subtracting 1 from that possible value would give you
sqsize 0.
But I'm confused why this is being changed. The ctrl->sqsize isn't a 0's
based value. It's normal 1-based value that tells blk-mq how many tags
to allocate.
> +static ssize_t sqsize_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
> +
> + /* Report the 0s based register value for historic reasons: */
> + return sysfs_emit(buf, "%d\n", ctrl->sqsize - 1);
> +}
Weird indentation here.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/4] nvme-pci: update sqsize when adjusting the queue depth
2022-12-28 16:10 ` Christoph Hellwig
@ 2022-12-29 12:07 ` Sagi Grimberg
2022-12-29 16:34 ` Keith Busch
0 siblings, 1 reply; 20+ messages in thread
From: Sagi Grimberg @ 2022-12-29 12:07 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Linus Torvalds, Jens Axboe, Keith Busch, Hugh Dickins, linux-nvme
>> But if you want to patches 1+2 to be taken before 3+4, I'd make sqsize
>> to be a 0's based value of q_depth after we subtract one entry for queue
>> wraparound. So others would not get affected from patches 1+2. Then in
>> one batch make all assignments to sqsize 1's based like in patch 3 and
>> change nvme_alloc_io_tagset at the same time.
>
> 1 doe not make any difference in the 0s based vs not values,
> they just reduce the queue size by 1 intentionally.
Not sure who is they here...
Anyways, I guess we'll have to wait and see if someone happens to care
if his/hers effective queuing is reduced by 1 all of a sudden...
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/4] nvme-pci: update sqsize when adjusting the queue depth
2022-12-29 12:07 ` Sagi Grimberg
@ 2022-12-29 16:34 ` Keith Busch
2023-01-02 9:39 ` Sagi Grimberg
0 siblings, 1 reply; 20+ messages in thread
From: Keith Busch @ 2022-12-29 16:34 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Christoph Hellwig, Linus Torvalds, Jens Axboe, Hugh Dickins,
linux-nvme
On Thu, Dec 29, 2022 at 02:07:04PM +0200, Sagi Grimberg wrote:
>
> > > But if you want to patches 1+2 to be taken before 3+4, I'd make sqsize
> > > to be a 0's based value of q_depth after we subtract one entry for queue
> > > wraparound. So others would not get affected from patches 1+2. Then in
> > > one batch make all assignments to sqsize 1's based like in patch 3 and
> > > change nvme_alloc_io_tagset at the same time.
> >
> > 1 doe not make any difference in the 0s based vs not values,
> > they just reduce the queue size by 1 intentionally.
>
> Not sure who is they here...
>
> Anyways, I guess we'll have to wait and see if someone happens to care
> if his/hers effective queuing is reduced by 1 all of a sudden...
All nvme transports require the host driver leave one queue entry unused.
The driver was not spec compliant prior to patches 1 and 2. I'm not sure
if that "queue full" definition makes sense for fabrics lacking a shared
ring, but the spec says we have to work that way.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/4] nvme-pci: update sqsize when adjusting the queue depth
2022-12-29 16:34 ` Keith Busch
@ 2023-01-02 9:39 ` Sagi Grimberg
2023-01-03 16:45 ` Keith Busch
0 siblings, 1 reply; 20+ messages in thread
From: Sagi Grimberg @ 2023-01-02 9:39 UTC (permalink / raw)
To: Keith Busch
Cc: Christoph Hellwig, Linus Torvalds, Jens Axboe, Hugh Dickins,
linux-nvme
>>>> But if you want to patches 1+2 to be taken before 3+4, I'd make sqsize
>>>> to be a 0's based value of q_depth after we subtract one entry for queue
>>>> wraparound. So others would not get affected from patches 1+2. Then in
>>>> one batch make all assignments to sqsize 1's based like in patch 3 and
>>>> change nvme_alloc_io_tagset at the same time.
>>>
>>> 1 doe not make any difference in the 0s based vs not values,
>>> they just reduce the queue size by 1 intentionally.
>>
>> Not sure who is they here...
>>
>> Anyways, I guess we'll have to wait and see if someone happens to care
>> if his/hers effective queuing is reduced by 1 all of a sudden...
>
> All nvme transports require the host driver leave one queue entry unused.
> The driver was not spec compliant prior to patches 1 and 2. I'm not sure
> if that "queue full" definition makes sense for fabrics lacking a shared
> ring, but the spec says we have to work that way.
fabrics already reserves one entry for a connect cmd, so effectively it
was reserving a slot already...
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/4] nvme-pci: update sqsize when adjusting the queue depth
2023-01-02 9:39 ` Sagi Grimberg
@ 2023-01-03 16:45 ` Keith Busch
0 siblings, 0 replies; 20+ messages in thread
From: Keith Busch @ 2023-01-03 16:45 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Christoph Hellwig, Linus Torvalds, Jens Axboe, Hugh Dickins,
linux-nvme
On Mon, Jan 02, 2023 at 11:39:35AM +0200, Sagi Grimberg wrote:
>
> > > > > But if you want to patches 1+2 to be taken before 3+4, I'd make sqsize
> > > > > to be a 0's based value of q_depth after we subtract one entry for queue
> > > > > wraparound. So others would not get affected from patches 1+2. Then in
> > > > > one batch make all assignments to sqsize 1's based like in patch 3 and
> > > > > change nvme_alloc_io_tagset at the same time.
> > > >
> > > > 1 doe not make any difference in the 0s based vs not values,
> > > > they just reduce the queue size by 1 intentionally.
> > >
> > > Not sure who is they here...
> > >
> > > Anyways, I guess we'll have to wait and see if someone happens to care
> > > if his/hers effective queuing is reduced by 1 all of a sudden...
> >
> > All nvme transports require the host driver leave one queue entry unused.
> > The driver was not spec compliant prior to patches 1 and 2. I'm not sure
> > if that "queue full" definition makes sense for fabrics lacking a shared
> > ring, but the spec says we have to work that way.
>
> fabrics already reserves one entry for a connect cmd, so effectively it
> was reserving a slot already...
nvmf_connect_io_queue() sets the cmd.connect.sqsize to ctrl->sqsize, so
doing a +1 on that value for the tagset is definitely not right. A spec
complaint target should have been halting and setting CSTS.CFS if we
actually used all tags.
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2023-01-03 20:00 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-25 10:32 fix nvme sqsize on off regression Christoph Hellwig
2022-12-25 10:32 ` [PATCH 1/4] nvme: fix setting the queue depth in nvme_alloc_io_tag_set Christoph Hellwig
2022-12-25 10:51 ` Sagi Grimberg
2022-12-25 10:53 ` Sagi Grimberg
2022-12-28 16:11 ` Christoph Hellwig
2022-12-25 21:15 ` Hugh Dickins
2022-12-25 10:32 ` [PATCH 2/4] nvme-pci: update sqsize when adjusting the queue depth Christoph Hellwig
2022-12-25 11:19 ` Sagi Grimberg
2022-12-28 16:10 ` Christoph Hellwig
2022-12-29 12:07 ` Sagi Grimberg
2022-12-29 16:34 ` Keith Busch
2023-01-02 9:39 ` Sagi Grimberg
2023-01-03 16:45 ` Keith Busch
2022-12-25 21:21 ` Hugh Dickins
2022-12-25 10:32 ` [PATCH 3/4] nvme: store the actual queue size in ctrl->sqsize Christoph Hellwig
2022-12-25 11:09 ` Sagi Grimberg
2022-12-28 17:02 ` Keith Busch
2022-12-25 10:32 ` [PATCH 4/4] nvme-pci: remove the dev->q_depth field Christoph Hellwig
2022-12-25 11:19 ` Sagi Grimberg
2022-12-26 19:11 ` (subset) fix nvme sqsize on off regression Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox