public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme: reserve a keep-alive admin tag for all transports
@ 2026-04-28  2:29 Chao Shi
  2026-04-28  6:47 ` Keith Busch
  0 siblings, 1 reply; 4+ messages in thread
From: Chao Shi @ 2026-04-28  2:29 UTC (permalink / raw)
  To: linux-nvme, Keith Busch
  Cc: Christoph Hellwig, Sagi Grimberg, Jens Axboe, Tatsuya Sasaki,
	linux-kernel, Sungwoo Kim, Dave Tian, Weidong Zhu

nvme_keep_alive_work() always allocates with BLK_MQ_REQ_RESERVED, but
nvme_alloc_admin_tag_set() only sets reserved_tags for fabrics.  Since
commit b58da2d270db ("nvme: update keep alive interval when kato is
modified"), userspace can start keep-alive on any transport via Set
Features (KATO), after which the allocation trips WARN_ON_ONCE() in
blk_mq_get_tag() and fails with -EWOULDBLOCK:

  nvme nvme0: keep-alive failed: -11

Reserve one admin tag for keep-alive on all transports.  Fabrics keeps
two, the second being for the connect command.

Fixes: b58da2d270db ("nvme: update keep alive interval when kato is modified")

Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).

Acked-by: Sungwoo Kim <iam@sung-woo.kim>
Acked-by: Dave Tian <daveti@purdue.edu>
Acked-by: Weidong Zhu <weizhu@fiu.edu>
Signed-off-by: Chao Shi <coshi036@gmail.com>
---

Reproducer (run as root on an unpatched kernel with a PCIe NVMe device):

    #include <fcntl.h>
    #include <stdio.h>
    #include <string.h>
    #include <sys/ioctl.h>
    #include <linux/nvme_ioctl.h>

    int main(void)
    {
            struct nvme_admin_cmd cmd = {0};
            int fd = open("/dev/nvme0", O_RDWR);
            if (fd < 0) { perror("open"); return 1; }
            cmd.opcode = 0x09;       /* SET_FEATURES */
            cmd.cdw10  = 0x0f;       /* Feature ID: KATO */
            cmd.cdw11  = 5;          /* KATO = 5 seconds */
            if (ioctl(fd, NVME_IOCTL_ADMIN_CMD, &cmd) < 0) {
                    perror("ioctl");
                    return 1;
            }
            return 0;
    }

Within ~kato/2 seconds after the program exits, dmesg shows:

    nvme nvme0: keep alive interval updated from 0 ms to 5000 ms
    WARNING: CPU: 0 PID: ... at block/blk-mq-tag.c:148 blk_mq_get_tag+...
    nvme nvme0: keep-alive failed: -11

 drivers/nvme/host/core.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7bf228df6001..6db02ecde6d1 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4850,8 +4850,13 @@ int nvme_alloc_admin_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
 	memset(set, 0, sizeof(*set));
 	set->ops = ops;
 	set->queue_depth = NVME_AQ_MQ_TAG_DEPTH;
+	/*
+	 * Reserve one tag for keep-alive, which is allocated with
+	 * BLK_MQ_REQ_RESERVED and can be enabled on any transport via the
+	 * KATO feature.  Fabrics needs a second reserved tag for connect.
+	 */
+	set->reserved_tags = 1;
 	if (ctrl->ops->flags & NVME_F_FABRICS)
-		/* Reserved for fabric connect and keep alive */
 		set->reserved_tags = 2;
 	set->numa_node = ctrl->numa_node;
 	if (ctrl->ops->flags & NVME_F_BLOCKING)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] nvme: reserve a keep-alive admin tag for all transports
  2026-04-28  2:29 [PATCH] nvme: reserve a keep-alive admin tag for all transports Chao Shi
@ 2026-04-28  6:47 ` Keith Busch
  2026-04-28  7:15   ` Maurizio Lombardi
  0 siblings, 1 reply; 4+ messages in thread
From: Keith Busch @ 2026-04-28  6:47 UTC (permalink / raw)
  To: Chao Shi
  Cc: linux-nvme, Christoph Hellwig, Sagi Grimberg, Jens Axboe,
	Tatsuya Sasaki, linux-kernel, Sungwoo Kim, Dave Tian, Weidong Zhu

On Mon, Apr 27, 2026 at 10:29:11PM -0400, Chao Shi wrote:
> nvme_keep_alive_work() always allocates with BLK_MQ_REQ_RESERVED, but
> nvme_alloc_admin_tag_set() only sets reserved_tags for fabrics.  Since
> commit b58da2d270db ("nvme: update keep alive interval when kato is
> modified"), userspace can start keep-alive on any transport via Set
> Features (KATO), after which the allocation trips WARN_ON_ONCE() in
> blk_mq_get_tag() and fails with -EWOULDBLOCK:
> 
>   nvme nvme0: keep-alive failed: -11
> 
> Reserve one admin tag for keep-alive on all transports.  Fabrics keeps
> two, the second being for the connect command.
 
> Fixes: b58da2d270db ("nvme: update keep alive interval when kato is modified")
> 
> Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).
> 
> Acked-by: Sungwoo Kim <iam@sung-woo.kim>
> Acked-by: Dave Tian <daveti@purdue.edu>
> Acked-by: Weidong Zhu <weizhu@fiu.edu>
> Signed-off-by: Chao Shi <coshi036@gmail.com>
> ---
> 
> Reproducer (run as root on an unpatched kernel with a PCIe NVMe device):

You have a PCI controller that doesn't return Invalid Field In Command
status to the KATO feature? That's weird, it's fabrics specific feature.
I think the right thing to do is simply skip the driver's KATO start for
PCI.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] nvme: reserve a keep-alive admin tag for all transports
  2026-04-28  6:47 ` Keith Busch
@ 2026-04-28  7:15   ` Maurizio Lombardi
  2026-04-28  7:24     ` Keith Busch
  0 siblings, 1 reply; 4+ messages in thread
From: Maurizio Lombardi @ 2026-04-28  7:15 UTC (permalink / raw)
  To: Keith Busch, Chao Shi
  Cc: linux-nvme, Christoph Hellwig, Sagi Grimberg, Jens Axboe,
	Tatsuya Sasaki, linux-kernel, Sungwoo Kim, Dave Tian, Weidong Zhu

On Tue Apr 28, 2026 at 8:47 AM CEST, Keith Busch wrote:
> On Mon, Apr 27, 2026 at 10:29:11PM -0400, Chao Shi wrote:
>> nvme_keep_alive_work() always allocates with BLK_MQ_REQ_RESERVED, but
>> nvme_alloc_admin_tag_set() only sets reserved_tags for fabrics.  Since
>> commit b58da2d270db ("nvme: update keep alive interval when kato is
>> modified"), userspace can start keep-alive on any transport via Set
>> Features (KATO), after which the allocation trips WARN_ON_ONCE() in
>> blk_mq_get_tag() and fails with -EWOULDBLOCK:
>> 
>>   nvme nvme0: keep-alive failed: -11
>> 
>> Reserve one admin tag for keep-alive on all transports.  Fabrics keeps
>> two, the second being for the connect command.
>  
>> Fixes: b58da2d270db ("nvme: update keep alive interval when kato is modified")
>> 
>> Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).
>> 
>> Acked-by: Sungwoo Kim <iam@sung-woo.kim>
>> Acked-by: Dave Tian <daveti@purdue.edu>
>> Acked-by: Weidong Zhu <weizhu@fiu.edu>
>> Signed-off-by: Chao Shi <coshi036@gmail.com>
>> ---
>> 
>> Reproducer (run as root on an unpatched kernel with a PCIe NVMe device):
>
> You have a PCI controller that doesn't return Invalid Field In Command
> status to the KATO feature? That's weird, it's fabrics specific feature.

Are you sure that it's fabrics-only?

The spec 2.0a, at section 5.27.1.12 Keep Alive Timer (Feature Identifier
0Fh)

says:
"Keep Alive Timeout (KATO):
 
This field specifies the timeout value for the Keep Alive feature in
milliseconds.  [...]
The default value for this field is 0h for NVMe transports that do not require use of the Keep Alive
feature (e.g., NVMe over PCIe). For NVMe transports that require use of the Keep Alive feature
(e.g., RDMA and TCP), the default value for this field is 1D4C0h "

To me, it sounds like for nvme-pci, keep alive isn't required, but could
be activated.


Maurizio

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] nvme: reserve a keep-alive admin tag for all transports
  2026-04-28  7:15   ` Maurizio Lombardi
@ 2026-04-28  7:24     ` Keith Busch
  0 siblings, 0 replies; 4+ messages in thread
From: Keith Busch @ 2026-04-28  7:24 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: Chao Shi, linux-nvme, Christoph Hellwig, Sagi Grimberg,
	Jens Axboe, Tatsuya Sasaki, linux-kernel, Sungwoo Kim, Dave Tian,
	Weidong Zhu

On Tue, Apr 28, 2026 at 09:15:10AM +0200, Maurizio Lombardi wrote:
> The spec 2.0a, at section 5.27.1.12 Keep Alive Timer (Feature Identifier
> 0Fh)
> 
> says:
> "Keep Alive Timeout (KATO):
>  
> This field specifies the timeout value for the Keep Alive feature in
> milliseconds.  [...]
> The default value for this field is 0h for NVMe transports that do not require use of the Keep Alive
> feature (e.g., NVMe over PCIe). For NVMe transports that require use of the Keep Alive feature
> (e.g., RDMA and TCP), the default value for this field is 1D4C0h "
> 
> To me, it sounds like for nvme-pci, keep alive isn't required, but could
> be activated.

The spec says the support is subject to the Transport binding
specification, which does not exist in the PCIe transport spec.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-28  7:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28  2:29 [PATCH] nvme: reserve a keep-alive admin tag for all transports Chao Shi
2026-04-28  6:47 ` Keith Busch
2026-04-28  7:15   ` Maurizio Lombardi
2026-04-28  7:24     ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox