* [PATCH v2] nvme: reserve a keep-alive admin tag for all transports
@ 2026-05-15 7:12 Chao Shi
0 siblings, 0 replies; only message in thread
From: Chao Shi @ 2026-05-15 7:12 UTC (permalink / raw)
To: linux-nvme, Keith Busch
Cc: Christoph Hellwig, Sagi Grimberg, Jens Axboe, Tatsuya Sasaki,
Maurizio Lombardi, linux-kernel, Sungwoo Kim, Dave Tian,
Weidong Zhu
nvme_keep_alive_work() always allocates with BLK_MQ_REQ_RESERVED, but
nvme_alloc_admin_tag_set() only sets reserved_tags for fabrics. Since
commit b58da2d270db ("nvme: update keep alive interval when kato is
modified"), userspace can start keep-alive on any transport via Set
Features (KATO), after which the allocation trips WARN_ON_ONCE() in
blk_mq_get_tag() and fails with -EWOULDBLOCK:
nvme nvme0: keep-alive failed: -11
Per NVMe 2.0a section 5.27.1.12 and the transport binding wording,
PCIe MAY support KATO. Reserve one admin tag on all transports so
the host is ready when a controller accepts the feature. Fabrics
keeps two, the second being for the connect command.
A quirk-based approach was considered but no PCIe controller
documented to declare KAS != 0 was found (two enterprise SSDs tested
locally report KAS=0), so an allowlist has no entries today.
Link: https://lore.kernel.org/linux-nvme/20260428022911.1288485-1-coshi036@gmail.com/
Fixes: b58da2d270db ("nvme: update keep alive interval when kato is modified")
Found by FuzzNvme (Syzkaller with FEMU fuzzing framework).
Acked-by: Sungwoo Kim <iam@sung-woo.kim>
Acked-by: Dave Tian <daveti@purdue.edu>
Acked-by: Weidong Zhu <weizhu@fiu.edu>
Signed-off-by: Chao Shi <coshi036@gmail.com>
---
Reproducer (run as root on an unpatched kernel with a PCIe NVMe device):
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <sys/ioctl.h>
#include <linux/nvme_ioctl.h>
int main(void)
{
struct nvme_admin_cmd cmd = {0};
int fd = open("/dev/nvme0", O_RDWR);
if (fd < 0) { perror("open"); return 1; }
cmd.opcode = 0x09; /* SET_FEATURES */
cmd.cdw10 = 0x0f; /* Feature ID: KATO */
cmd.cdw11 = 5; /* KATO = 5 seconds */
if (ioctl(fd, NVME_IOCTL_ADMIN_CMD, &cmd) < 0) {
perror("ioctl");
return 1;
}
return 0;
}
Within ~kato/2 seconds after the program exits, dmesg shows:
nvme nvme0: keep alive interval updated from 0 ms to 5000 ms
WARNING: CPU: 0 PID: ... at block/blk-mq-tag.c:148 blk_mq_get_tag+...
nvme nvme0: keep-alive failed: -11
Changes since v1:
- Add spec citation (NVMe 2.0a 5.27.1.12 + transport binding wording)
clarifying that PCIe MAY support KATO.
- Discuss the quirk-based alternative suggested in v1 review and
note that no PCIe controller declaring KAS != 0 is documented
today (two enterprise SSDs tested locally report KAS=0).
- Add Link: to v1 thread.
drivers/nvme/host/core.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7bf228df6001..6db02ecde6d1 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4850,8 +4850,13 @@ int nvme_alloc_admin_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
memset(set, 0, sizeof(*set));
set->ops = ops;
set->queue_depth = NVME_AQ_MQ_TAG_DEPTH;
+ /*
+ * Reserve one tag for keep-alive, which is allocated with
+ * BLK_MQ_REQ_RESERVED and can be enabled on any transport via the
+ * KATO feature. Fabrics needs a second reserved tag for connect.
+ */
+ set->reserved_tags = 1;
if (ctrl->ops->flags & NVME_F_FABRICS)
- /* Reserved for fabric connect and keep alive */
set->reserved_tags = 2;
set->numa_node = ctrl->numa_node;
if (ctrl->ops->flags & NVME_F_BLOCKING)
--
2.43.0
^ permalink raw reply related [flat|nested] only message in thread
only message in thread, other threads:[~2026-05-15 7:13 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-15 7:12 [PATCH v2] nvme: reserve a keep-alive admin tag for all transports Chao Shi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox