From: Christoph Hellwig <hch@infradead.org>
To: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>,
Jens Axboe <axboe@kernel.dk>, Keith Busch <kbusch@kernel.org>,
Sagi Grimberg <sagi@grimberg.me>,
Chaitanya Kulkarni <kch@nvidia.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Thorsten Leemhuis <regressions@leemhuis.info>,
linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: 6.2 nvme-pci: something wrong
Date: Fri, 23 Dec 2022 23:14:23 -0800 [thread overview]
Message-ID: <Y6amzxU7choHAXWi@infradead.org> (raw)
In-Reply-To: <572cfcc0-197a-9ead-9cb-3c5bf5e735@google.com>
On Fri, Dec 23, 2022 at 09:24:56PM -0800, Hugh Dickins wrote:
> Hi Christoph,
>
> There's something wrong with the nvme-pci heading for 6.2-rc1:
> no problem booting here on this Lenovo ThinkPad X1 Carbon 5th,
> but under load...
>
> nvme nvme0: I/O 0 (I/O Cmd) QID 2 timeout, aborting
> nvme nvme0: I/O 1 (I/O Cmd) QID 2 timeout, aborting
> nvme nvme0: I/O 2 (I/O Cmd) QID 2 timeout, aborting
> nvme nvme0: I/O 3 (I/O Cmd) QID 2 timeout, aborting
> nvme nvme0: Abort status: 0x0
> nvme nvme0: Abort status: 0x0
> nvme nvme0: Abort status: 0x0
> nvme nvme0: Abort status: 0x0
> nvme nvme0: I/O 0 QID 2 timeout, reset controller
>
> ...and more, until I just have to poweroff and reboot.
>
> Bisection points to your
> 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers")
> And that does revert cleanly, giving a kernel which shows no problem.
>
> I've spent a while comparing old nvme_pci_alloc_tag_set() and new
> nvme_alloc_io_tag_set(), I do not know my way around there at all
> and may be talking nonsense, but it did look as if there might now
> be a difference in the queue_depth, sqsize, q_depth conversions.
>
> I'm running load successfully with the patch below, but I strongly
> suspect that the right patch will be somewhere else: over to you!
Thanks for the report! The patch is definitively wrong, ->sqsize
hold one of the awful so called 'zeroes based values' in NVMe,
where 0 means 1, and thus have a built-in one off. We should
probably convert it to a sane value at read time, but that's a
separate discussion.
I suspect your controller is one of those where we quirk the size,
and the commit you bisected fails to reflects that in the common
sqsizse value. The patch below should be the minimum fix, and in
the long term, the duplicate bookkeeping for it in the PCI driver
should go away:
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index f0f8027644bbf8..a73c0ee7bd1892 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2536,7 +2536,6 @@ static int nvme_pci_enable(struct nvme_dev *dev)
dev->q_depth = min_t(u32, NVME_CAP_MQES(dev->ctrl.cap) + 1,
io_queue_depth);
- dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */
dev->db_stride = 1 << NVME_CAP_STRIDE(dev->ctrl.cap);
dev->dbs = dev->bar + 4096;
@@ -2577,7 +2576,7 @@ static int nvme_pci_enable(struct nvme_dev *dev)
dev_warn(dev->ctrl.device, "IO queue depth clamped to %d\n",
dev->q_depth);
}
-
+ dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */
nvme_map_cmb(dev);
next prev parent reply other threads:[~2022-12-24 7:14 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-24 5:24 6.2 nvme-pci: something wrong Hugh Dickins
2022-12-24 7:14 ` Christoph Hellwig [this message]
2022-12-24 10:19 ` Hugh Dickins
2022-12-24 16:56 ` Linus Torvalds
2022-12-24 7:52 ` 6.2 nvme-pci: something wrong #forregzbot Thorsten Leemhuis
2023-01-04 14:02 ` Thorsten Leemhuis
2022-12-24 22:06 ` 6.2 nvme-pci: something wrong Keith Busch
2022-12-25 5:30 ` Christoph Hellwig
2022-12-25 8:33 ` Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y6amzxU7choHAXWi@infradead.org \
--to=hch@infradead.org \
--cc=axboe@kernel.dk \
--cc=hughd@google.com \
--cc=kbusch@kernel.org \
--cc=kch@nvidia.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=regressions@leemhuis.info \
--cc=sagi@grimberg.me \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox