* [PATCH] nvme: reserve a keep-alive admin tag for all transports
@ 2026-04-28 2:29 Chao Shi
2026-04-28 6:47 ` Keith Busch
0 siblings, 1 reply; 11+ messages in thread
From: Chao Shi @ 2026-04-28 2:29 UTC (permalink / raw)
To: linux-nvme, Keith Busch
Cc: Christoph Hellwig, Sagi Grimberg, Jens Axboe, Tatsuya Sasaki,
linux-kernel, Sungwoo Kim, Dave Tian, Weidong Zhu
nvme_keep_alive_work() always allocates with BLK_MQ_REQ_RESERVED, but
nvme_alloc_admin_tag_set() only sets reserved_tags for fabrics. Since
commit b58da2d270db ("nvme: update keep alive interval when kato is
modified"), userspace can start keep-alive on any transport via Set
Features (KATO), after which the allocation trips WARN_ON_ONCE() in
blk_mq_get_tag() and fails with -EWOULDBLOCK:
nvme nvme0: keep-alive failed: -11
Reserve one admin tag for keep-alive on all transports. Fabrics keeps
two, the second being for the connect command.
Fixes: b58da2d270db ("nvme: update keep alive interval when kato is modified")
Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).
Acked-by: Sungwoo Kim <iam@sung-woo.kim>
Acked-by: Dave Tian <daveti@purdue.edu>
Acked-by: Weidong Zhu <weizhu@fiu.edu>
Signed-off-by: Chao Shi <coshi036@gmail.com>
---
Reproducer (run as root on an unpatched kernel with a PCIe NVMe device):
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <sys/ioctl.h>
#include <linux/nvme_ioctl.h>
int main(void)
{
struct nvme_admin_cmd cmd = {0};
int fd = open("/dev/nvme0", O_RDWR);
if (fd < 0) { perror("open"); return 1; }
cmd.opcode = 0x09; /* SET_FEATURES */
cmd.cdw10 = 0x0f; /* Feature ID: KATO */
cmd.cdw11 = 5; /* KATO = 5 seconds */
if (ioctl(fd, NVME_IOCTL_ADMIN_CMD, &cmd) < 0) {
perror("ioctl");
return 1;
}
return 0;
}
Within ~kato/2 seconds after the program exits, dmesg shows:
nvme nvme0: keep alive interval updated from 0 ms to 5000 ms
WARNING: CPU: 0 PID: ... at block/blk-mq-tag.c:148 blk_mq_get_tag+...
nvme nvme0: keep-alive failed: -11
drivers/nvme/host/core.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7bf228df6001..6db02ecde6d1 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4850,8 +4850,13 @@ int nvme_alloc_admin_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
memset(set, 0, sizeof(*set));
set->ops = ops;
set->queue_depth = NVME_AQ_MQ_TAG_DEPTH;
+ /*
+ * Reserve one tag for keep-alive, which is allocated with
+ * BLK_MQ_REQ_RESERVED and can be enabled on any transport via the
+ * KATO feature. Fabrics needs a second reserved tag for connect.
+ */
+ set->reserved_tags = 1;
if (ctrl->ops->flags & NVME_F_FABRICS)
- /* Reserved for fabric connect and keep alive */
set->reserved_tags = 2;
set->numa_node = ctrl->numa_node;
if (ctrl->ops->flags & NVME_F_BLOCKING)
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH] nvme: reserve a keep-alive admin tag for all transports
2026-04-28 2:29 [PATCH] nvme: reserve a keep-alive admin tag for all transports Chao Shi
@ 2026-04-28 6:47 ` Keith Busch
2026-04-28 7:15 ` Maurizio Lombardi
0 siblings, 1 reply; 11+ messages in thread
From: Keith Busch @ 2026-04-28 6:47 UTC (permalink / raw)
To: Chao Shi
Cc: linux-nvme, Christoph Hellwig, Sagi Grimberg, Jens Axboe,
Tatsuya Sasaki, linux-kernel, Sungwoo Kim, Dave Tian, Weidong Zhu
On Mon, Apr 27, 2026 at 10:29:11PM -0400, Chao Shi wrote:
> nvme_keep_alive_work() always allocates with BLK_MQ_REQ_RESERVED, but
> nvme_alloc_admin_tag_set() only sets reserved_tags for fabrics. Since
> commit b58da2d270db ("nvme: update keep alive interval when kato is
> modified"), userspace can start keep-alive on any transport via Set
> Features (KATO), after which the allocation trips WARN_ON_ONCE() in
> blk_mq_get_tag() and fails with -EWOULDBLOCK:
>
> nvme nvme0: keep-alive failed: -11
>
> Reserve one admin tag for keep-alive on all transports. Fabrics keeps
> two, the second being for the connect command.
> Fixes: b58da2d270db ("nvme: update keep alive interval when kato is modified")
>
> Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).
>
> Acked-by: Sungwoo Kim <iam@sung-woo.kim>
> Acked-by: Dave Tian <daveti@purdue.edu>
> Acked-by: Weidong Zhu <weizhu@fiu.edu>
> Signed-off-by: Chao Shi <coshi036@gmail.com>
> ---
>
> Reproducer (run as root on an unpatched kernel with a PCIe NVMe device):
You have a PCI controller that doesn't return Invalid Field In Command
status to the KATO feature? That's weird, it's fabrics specific feature.
I think the right thing to do is simply skip the driver's KATO start for
PCI.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] nvme: reserve a keep-alive admin tag for all transports
2026-04-28 6:47 ` Keith Busch
@ 2026-04-28 7:15 ` Maurizio Lombardi
2026-04-28 7:24 ` Keith Busch
2026-05-15 5:05 ` Chao S
0 siblings, 2 replies; 11+ messages in thread
From: Maurizio Lombardi @ 2026-04-28 7:15 UTC (permalink / raw)
To: Keith Busch, Chao Shi
Cc: linux-nvme, Christoph Hellwig, Sagi Grimberg, Jens Axboe,
Tatsuya Sasaki, linux-kernel, Sungwoo Kim, Dave Tian, Weidong Zhu
On Tue Apr 28, 2026 at 8:47 AM CEST, Keith Busch wrote:
> On Mon, Apr 27, 2026 at 10:29:11PM -0400, Chao Shi wrote:
>> nvme_keep_alive_work() always allocates with BLK_MQ_REQ_RESERVED, but
>> nvme_alloc_admin_tag_set() only sets reserved_tags for fabrics. Since
>> commit b58da2d270db ("nvme: update keep alive interval when kato is
>> modified"), userspace can start keep-alive on any transport via Set
>> Features (KATO), after which the allocation trips WARN_ON_ONCE() in
>> blk_mq_get_tag() and fails with -EWOULDBLOCK:
>>
>> nvme nvme0: keep-alive failed: -11
>>
>> Reserve one admin tag for keep-alive on all transports. Fabrics keeps
>> two, the second being for the connect command.
>
>> Fixes: b58da2d270db ("nvme: update keep alive interval when kato is modified")
>>
>> Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).
>>
>> Acked-by: Sungwoo Kim <iam@sung-woo.kim>
>> Acked-by: Dave Tian <daveti@purdue.edu>
>> Acked-by: Weidong Zhu <weizhu@fiu.edu>
>> Signed-off-by: Chao Shi <coshi036@gmail.com>
>> ---
>>
>> Reproducer (run as root on an unpatched kernel with a PCIe NVMe device):
>
> You have a PCI controller that doesn't return Invalid Field In Command
> status to the KATO feature? That's weird, it's fabrics specific feature.
Are you sure that it's fabrics-only?
The spec 2.0a, at section 5.27.1.12 Keep Alive Timer (Feature Identifier
0Fh)
says:
"Keep Alive Timeout (KATO):
This field specifies the timeout value for the Keep Alive feature in
milliseconds. [...]
The default value for this field is 0h for NVMe transports that do not require use of the Keep Alive
feature (e.g., NVMe over PCIe). For NVMe transports that require use of the Keep Alive feature
(e.g., RDMA and TCP), the default value for this field is 1D4C0h "
To me, it sounds like for nvme-pci, keep alive isn't required, but could
be activated.
Maurizio
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] nvme: reserve a keep-alive admin tag for all transports
2026-04-28 7:15 ` Maurizio Lombardi
@ 2026-04-28 7:24 ` Keith Busch
2026-05-08 9:04 ` Christoph Hellwig
2026-05-15 5:05 ` Chao S
1 sibling, 1 reply; 11+ messages in thread
From: Keith Busch @ 2026-04-28 7:24 UTC (permalink / raw)
To: Maurizio Lombardi
Cc: Chao Shi, linux-nvme, Christoph Hellwig, Sagi Grimberg,
Jens Axboe, Tatsuya Sasaki, linux-kernel, Sungwoo Kim, Dave Tian,
Weidong Zhu
On Tue, Apr 28, 2026 at 09:15:10AM +0200, Maurizio Lombardi wrote:
> The spec 2.0a, at section 5.27.1.12 Keep Alive Timer (Feature Identifier
> 0Fh)
>
> says:
> "Keep Alive Timeout (KATO):
>
> This field specifies the timeout value for the Keep Alive feature in
> milliseconds. [...]
> The default value for this field is 0h for NVMe transports that do not require use of the Keep Alive
> feature (e.g., NVMe over PCIe). For NVMe transports that require use of the Keep Alive feature
> (e.g., RDMA and TCP), the default value for this field is 1D4C0h "
>
> To me, it sounds like for nvme-pci, keep alive isn't required, but could
> be activated.
The spec says the support is subject to the Transport binding
specification, which does not exist in the PCIe transport spec.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] nvme: reserve a keep-alive admin tag for all transports
2026-04-28 7:24 ` Keith Busch
@ 2026-05-08 9:04 ` Christoph Hellwig
2026-05-08 9:31 ` Keith Busch
2026-05-15 5:04 ` Chao S
0 siblings, 2 replies; 11+ messages in thread
From: Christoph Hellwig @ 2026-05-08 9:04 UTC (permalink / raw)
To: Keith Busch
Cc: Maurizio Lombardi, Chao Shi, linux-nvme, Christoph Hellwig,
Sagi Grimberg, Jens Axboe, Tatsuya Sasaki, linux-kernel,
Sungwoo Kim, Dave Tian, Weidong Zhu
On Tue, Apr 28, 2026 at 08:24:35AM +0100, Keith Busch wrote:
> > This field specifies the timeout value for the Keep Alive feature in
> > milliseconds. [...]
> > The default value for this field is 0h for NVMe transports that do not require use of the Keep Alive
> > feature (e.g., NVMe over PCIe). For NVMe transports that require use of the Keep Alive feature
> > (e.g., RDMA and TCP), the default value for this field is 1D4C0h "
> >
> > To me, it sounds like for nvme-pci, keep alive isn't required, but could
> > be activated.
>
> The spec says the support is subject to the Transport binding
> specification, which does not exist in the PCIe transport spec.
My memories from the fabrics working group back in the day is that we
explicitly intended to support it in PCIe. The wording in the spec
referring to transport specs I can find is:
The NVMe Transport binding specification for the associated NVMe Transport
defines:
o the minimum Keep Alive Timeout value, if any;
o the maximum Keep Alive Timeout value, if any; and
o if the Keep Alive Timer feature is required to be supported and enabled.
which does not read to me like there is any required language in the
transport spec to require keep alive.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] nvme: reserve a keep-alive admin tag for all transports
2026-05-08 9:04 ` Christoph Hellwig
@ 2026-05-08 9:31 ` Keith Busch
2026-05-10 20:53 ` Sagi Grimberg
2026-05-15 5:00 ` Chao S
2026-05-15 5:04 ` Chao S
1 sibling, 2 replies; 11+ messages in thread
From: Keith Busch @ 2026-05-08 9:31 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Maurizio Lombardi, Chao Shi, linux-nvme, Sagi Grimberg,
Jens Axboe, Tatsuya Sasaki, linux-kernel, Sungwoo Kim, Dave Tian,
Weidong Zhu
On Fri, May 08, 2026 at 11:04:27AM +0200, Christoph Hellwig wrote:
> On Tue, Apr 28, 2026 at 08:24:35AM +0100, Keith Busch wrote:
> > > This field specifies the timeout value for the Keep Alive feature in
> > > milliseconds. [...]
> > > The default value for this field is 0h for NVMe transports that do not require use of the Keep Alive
> > > feature (e.g., NVMe over PCIe). For NVMe transports that require use of the Keep Alive feature
> > > (e.g., RDMA and TCP), the default value for this field is 1D4C0h "
> > >
> > > To me, it sounds like for nvme-pci, keep alive isn't required, but could
> > > be activated.
> >
> > The spec says the support is subject to the Transport binding
> > specification, which does not exist in the PCIe transport spec.
>
> My memories from the fabrics working group back in the day is that we
> explicitly intended to support it in PCIe. The wording in the spec
> referring to transport specs I can find is:
>
> The NVMe Transport binding specification for the associated NVMe Transport
> defines:
>
> o the minimum Keep Alive Timeout value, if any;
> o the maximum Keep Alive Timeout value, if any; and
> o if the Keep Alive Timer feature is required to be supported and enabled.
>
> which does not read to me like there is any required language in the
> transport spec to require keep alive.
So the absence of defining a minimum means it's simply optional? I
suppose I can see it that way as the intended interpretation, but seems
counter productive to do on PCIe when you can MMIO the controller status
register to verify liveness. If the controller responds successfully to
the feature, then I have to agree we need the host to do its part.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] nvme: reserve a keep-alive admin tag for all transports
2026-05-08 9:31 ` Keith Busch
@ 2026-05-10 20:53 ` Sagi Grimberg
2026-05-15 4:55 ` Chao S
2026-05-15 5:00 ` Chao S
1 sibling, 1 reply; 11+ messages in thread
From: Sagi Grimberg @ 2026-05-10 20:53 UTC (permalink / raw)
To: Keith Busch, Christoph Hellwig
Cc: Maurizio Lombardi, Chao Shi, linux-nvme, Jens Axboe,
Tatsuya Sasaki, linux-kernel, Sungwoo Kim, Dave Tian, Weidong Zhu
On 08/05/2026 12:31, Keith Busch wrote:
> On Fri, May 08, 2026 at 11:04:27AM +0200, Christoph Hellwig wrote:
>> On Tue, Apr 28, 2026 at 08:24:35AM +0100, Keith Busch wrote:
>>>> This field specifies the timeout value for the Keep Alive feature in
>>>> milliseconds. [...]
>>>> The default value for this field is 0h for NVMe transports that do not require use of the Keep Alive
>>>> feature (e.g., NVMe over PCIe). For NVMe transports that require use of the Keep Alive feature
>>>> (e.g., RDMA and TCP), the default value for this field is 1D4C0h "
>>>>
>>>> To me, it sounds like for nvme-pci, keep alive isn't required, but could
>>>> be activated.
>>> The spec says the support is subject to the Transport binding
>>> specification, which does not exist in the PCIe transport spec.
>> My memories from the fabrics working group back in the day is that we
>> explicitly intended to support it in PCIe. The wording in the spec
>> referring to transport specs I can find is:
>>
>> The NVMe Transport binding specification for the associated NVMe Transport
>> defines:
>>
>> o the minimum Keep Alive Timeout value, if any;
>> o the maximum Keep Alive Timeout value, if any; and
>> o if the Keep Alive Timer feature is required to be supported and enabled.
>>
>> which does not read to me like there is any required language in the
>> transport spec to require keep alive.
> So the absence of defining a minimum means it's simply optional? I
> suppose I can see it that way as the intended interpretation, but seems
> counter productive to do on PCIe when you can MMIO the controller status
> register to verify liveness. If the controller responds successfully to
> the feature, then I have to agree we need the host to do its part.
Perhaps under a quirk? Would be interesting to understand which
pci devices support this...
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] nvme: reserve a keep-alive admin tag for all transports
2026-05-10 20:53 ` Sagi Grimberg
@ 2026-05-15 4:55 ` Chao S
0 siblings, 0 replies; 11+ messages in thread
From: Chao S @ 2026-05-15 4:55 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Keith Busch, Christoph Hellwig, Maurizio Lombardi, linux-nvme,
Jens Axboe, Tatsuya Sasaki, linux-kernel, Sungwoo Kim, Dave Tian,
Weidong Zhu
On Fri, May 8, 2026, Sagi Grimberg wrote:
> Perhaps under a quirk? Would be interesting to understand which
> pci devices support this...
Thanks for the comments. I thought
always reserving an admin tag for keep-alive is simpler.
I queried the Identify Controller KAS field on the two PCIe NVMe
drives I have locally (Micron 7450 MTFDKBA960TFR and
MTFDLAL3T8THG-1BP1DFCYY), both report KAS=0. I could not find
publicly documented PCIe NVMe controller that declares KAS != 0
either.
The spec (NVMe 2.0a 5.27.1.12 plus the transport binding wording
quoted upthread by Christoph) allows PCIe to support KATO; the host
should be defensive against that legal case.
The cost is a single admin tag - NVME_AQ_MQ_TAG_DEPTH stays at 30,
the regular pool goes from 30 to 29 on non-fabrics. Fabrics has been
running with 28 regular tags without issue.
If a real PCIe controller is later found to advertise KAS != 0 and
misbehave with keep-alive, narrowing the activation with a quirk on
top of this change is straightforward. I would prefer to land the
defensive default first.
I will send v2 shortly with this rationale in
the commit message.
Chao
On Sun, May 10, 2026 at 4:53 PM Sagi Grimberg <sagi@grimberg.me> wrote:
>
>
>
> On 08/05/2026 12:31, Keith Busch wrote:
> > On Fri, May 08, 2026 at 11:04:27AM +0200, Christoph Hellwig wrote:
> >> On Tue, Apr 28, 2026 at 08:24:35AM +0100, Keith Busch wrote:
> >>>> This field specifies the timeout value for the Keep Alive feature in
> >>>> milliseconds. [...]
> >>>> The default value for this field is 0h for NVMe transports that do not require use of the Keep Alive
> >>>> feature (e.g., NVMe over PCIe). For NVMe transports that require use of the Keep Alive feature
> >>>> (e.g., RDMA and TCP), the default value for this field is 1D4C0h "
> >>>>
> >>>> To me, it sounds like for nvme-pci, keep alive isn't required, but could
> >>>> be activated.
> >>> The spec says the support is subject to the Transport binding
> >>> specification, which does not exist in the PCIe transport spec.
> >> My memories from the fabrics working group back in the day is that we
> >> explicitly intended to support it in PCIe. The wording in the spec
> >> referring to transport specs I can find is:
> >>
> >> The NVMe Transport binding specification for the associated NVMe Transport
> >> defines:
> >>
> >> o the minimum Keep Alive Timeout value, if any;
> >> o the maximum Keep Alive Timeout value, if any; and
> >> o if the Keep Alive Timer feature is required to be supported and enabled.
> >>
> >> which does not read to me like there is any required language in the
> >> transport spec to require keep alive.
> > So the absence of defining a minimum means it's simply optional? I
> > suppose I can see it that way as the intended interpretation, but seems
> > counter productive to do on PCIe when you can MMIO the controller status
> > register to verify liveness. If the controller responds successfully to
> > the feature, then I have to agree we need the host to do its part.
>
> Perhaps under a quirk? Would be interesting to understand which
> pci devices support this...
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] nvme: reserve a keep-alive admin tag for all transports
2026-05-08 9:31 ` Keith Busch
2026-05-10 20:53 ` Sagi Grimberg
@ 2026-05-15 5:00 ` Chao S
1 sibling, 0 replies; 11+ messages in thread
From: Chao S @ 2026-05-15 5:00 UTC (permalink / raw)
To: Keith Busch
Cc: Christoph Hellwig, Maurizio Lombardi, linux-nvme, Sagi Grimberg,
Jens Axboe, Tatsuya Sasaki, linux-kernel, Sungwoo Kim, Dave Tian,
Weidong Zhu
On Fri, May 8, 2026, Keith Busch wrote:
> So the absence of defining a minimum means it's simply optional? I
> suppose I can see it that way as the intended interpretation, but
> seems counter productive to do on PCIe when you can MMIO the
> controller status register to verify liveness. If the controller
> responds successfully to the feature, then I have to agree we need
> the host to do its part.
Thanks Keith - I will take that last sentence as the basis for v2.
The v2 commit message will cite the spec reading (NVMe 2.0a 5.27.1.12
plus the transport binding wording) explicitly and conclude that PCIe
MAY support KATO, in which case the host needs the reserved tag.
For data points: I also tested two Micron PCIe SSDs locally and both
report KAS=0, so the bug is likely field-rare on mainstream hardware
today. The patch remains a cheap defensive change for spec-compliant
controllers that declare KAS != 0 or for emulated environments.
Chao
On Fri, May 8, 2026 at 5:31 AM Keith Busch <kbusch@kernel.org> wrote:
>
> On Fri, May 08, 2026 at 11:04:27AM +0200, Christoph Hellwig wrote:
> > On Tue, Apr 28, 2026 at 08:24:35AM +0100, Keith Busch wrote:
> > > > This field specifies the timeout value for the Keep Alive feature in
> > > > milliseconds. [...]
> > > > The default value for this field is 0h for NVMe transports that do not require use of the Keep Alive
> > > > feature (e.g., NVMe over PCIe). For NVMe transports that require use of the Keep Alive feature
> > > > (e.g., RDMA and TCP), the default value for this field is 1D4C0h "
> > > >
> > > > To me, it sounds like for nvme-pci, keep alive isn't required, but could
> > > > be activated.
> > >
> > > The spec says the support is subject to the Transport binding
> > > specification, which does not exist in the PCIe transport spec.
> >
> > My memories from the fabrics working group back in the day is that we
> > explicitly intended to support it in PCIe. The wording in the spec
> > referring to transport specs I can find is:
> >
> > The NVMe Transport binding specification for the associated NVMe Transport
> > defines:
> >
> > o the minimum Keep Alive Timeout value, if any;
> > o the maximum Keep Alive Timeout value, if any; and
> > o if the Keep Alive Timer feature is required to be supported and enabled.
> >
> > which does not read to me like there is any required language in the
> > transport spec to require keep alive.
>
> So the absence of defining a minimum means it's simply optional? I
> suppose I can see it that way as the intended interpretation, but seems
> counter productive to do on PCIe when you can MMIO the controller status
> register to verify liveness. If the controller responds successfully to
> the feature, then I have to agree we need the host to do its part.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] nvme: reserve a keep-alive admin tag for all transports
2026-05-08 9:04 ` Christoph Hellwig
2026-05-08 9:31 ` Keith Busch
@ 2026-05-15 5:04 ` Chao S
1 sibling, 0 replies; 11+ messages in thread
From: Chao S @ 2026-05-15 5:04 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, Maurizio Lombardi, linux-nvme, Sagi Grimberg,
Jens Axboe, Tatsuya Sasaki, linux-kernel, Sungwoo Kim, Dave Tian,
Weidong Zhu
On Fri, May 8, 2026, Christoph Hellwig wrote:
> My memories from the fabrics working group back in the day is that
> we explicitly intended to support it in PCIe. The wording in the
> spec referring to transport specs I can find is:
>
> The NVMe Transport binding specification for the associated NVMe
> Transport defines:
>
> o the minimum Keep Alive Timeout value, if any;
> o the maximum Keep Alive Timeout value, if any; and
> o if the Keep Alive Timer feature is required to be supported and
> enabled.
>
> which does not read to me like there is any required language in
> the transport spec to require keep alive.
Thanks for recalling the working-group history and the spec citation - both
will appear in the v2 commit message. "PCIe MAY support KATO,
not required" will be the foundation for v2's frame.
Chao
On Fri, May 8, 2026 at 5:04 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Tue, Apr 28, 2026 at 08:24:35AM +0100, Keith Busch wrote:
> > > This field specifies the timeout value for the Keep Alive feature in
> > > milliseconds. [...]
> > > The default value for this field is 0h for NVMe transports that do not require use of the Keep Alive
> > > feature (e.g., NVMe over PCIe). For NVMe transports that require use of the Keep Alive feature
> > > (e.g., RDMA and TCP), the default value for this field is 1D4C0h "
> > >
> > > To me, it sounds like for nvme-pci, keep alive isn't required, but could
> > > be activated.
> >
> > The spec says the support is subject to the Transport binding
> > specification, which does not exist in the PCIe transport spec.
>
> My memories from the fabrics working group back in the day is that we
> explicitly intended to support it in PCIe. The wording in the spec
> referring to transport specs I can find is:
>
> The NVMe Transport binding specification for the associated NVMe Transport
> defines:
>
> o the minimum Keep Alive Timeout value, if any;
> o the maximum Keep Alive Timeout value, if any; and
> o if the Keep Alive Timer feature is required to be supported and enabled.
>
> which does not read to me like there is any required language in the
> transport spec to require keep alive.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] nvme: reserve a keep-alive admin tag for all transports
2026-04-28 7:15 ` Maurizio Lombardi
2026-04-28 7:24 ` Keith Busch
@ 2026-05-15 5:05 ` Chao S
1 sibling, 0 replies; 11+ messages in thread
From: Chao S @ 2026-05-15 5:05 UTC (permalink / raw)
To: Maurizio Lombardi
Cc: Keith Busch, linux-nvme, Christoph Hellwig, Sagi Grimberg,
Jens Axboe, Tatsuya Sasaki, linux-kernel, Sungwoo Kim, Dave Tian,
Weidong Zhu
On Tue, Apr 28, 2026, Maurizio Lombardi wrote:
> The spec 2.0a, at section 5.27.1.12 Keep Alive Timer (Feature
> Identifier 0Fh) says:
> "Keep Alive Timeout (KATO):
> ...
> The default value for this field is 0h for NVMe transports that do
> not require use of the Keep Alive feature (e.g., NVMe over PCIe).
> For NVMe transports that require use of the Keep Alive feature
> (e.g., RDMA and TCP), the default value for this field is 1D4C0h"
>
> To me, it sounds like for nvme-pci, keep alive isn't required, but
> could be activated.
Thanks for the spec reference - that exact wording will be quoted in
the v2 commit message. I am adding you to the v2 Cc list so you see
the next revision directly. Thanks again!
Chao
On Tue, Apr 28, 2026 at 3:15 AM Maurizio Lombardi <mlombard@arkamax.eu> wrote:
>
> On Tue Apr 28, 2026 at 8:47 AM CEST, Keith Busch wrote:
> > On Mon, Apr 27, 2026 at 10:29:11PM -0400, Chao Shi wrote:
> >> nvme_keep_alive_work() always allocates with BLK_MQ_REQ_RESERVED, but
> >> nvme_alloc_admin_tag_set() only sets reserved_tags for fabrics. Since
> >> commit b58da2d270db ("nvme: update keep alive interval when kato is
> >> modified"), userspace can start keep-alive on any transport via Set
> >> Features (KATO), after which the allocation trips WARN_ON_ONCE() in
> >> blk_mq_get_tag() and fails with -EWOULDBLOCK:
> >>
> >> nvme nvme0: keep-alive failed: -11
> >>
> >> Reserve one admin tag for keep-alive on all transports. Fabrics keeps
> >> two, the second being for the connect command.
> >
> >> Fixes: b58da2d270db ("nvme: update keep alive interval when kato is modified")
> >>
> >> Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).
> >>
> >> Acked-by: Sungwoo Kim <iam@sung-woo.kim>
> >> Acked-by: Dave Tian <daveti@purdue.edu>
> >> Acked-by: Weidong Zhu <weizhu@fiu.edu>
> >> Signed-off-by: Chao Shi <coshi036@gmail.com>
> >> ---
> >>
> >> Reproducer (run as root on an unpatched kernel with a PCIe NVMe device):
> >
> > You have a PCI controller that doesn't return Invalid Field In Command
> > status to the KATO feature? That's weird, it's fabrics specific feature.
>
> Are you sure that it's fabrics-only?
>
> The spec 2.0a, at section 5.27.1.12 Keep Alive Timer (Feature Identifier
> 0Fh)
>
> says:
> "Keep Alive Timeout (KATO):
>
> This field specifies the timeout value for the Keep Alive feature in
> milliseconds. [...]
> The default value for this field is 0h for NVMe transports that do not require use of the Keep Alive
> feature (e.g., NVMe over PCIe). For NVMe transports that require use of the Keep Alive feature
> (e.g., RDMA and TCP), the default value for this field is 1D4C0h "
>
> To me, it sounds like for nvme-pci, keep alive isn't required, but could
> be activated.
>
>
> Maurizio
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-05-15 5:05 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28 2:29 [PATCH] nvme: reserve a keep-alive admin tag for all transports Chao Shi
2026-04-28 6:47 ` Keith Busch
2026-04-28 7:15 ` Maurizio Lombardi
2026-04-28 7:24 ` Keith Busch
2026-05-08 9:04 ` Christoph Hellwig
2026-05-08 9:31 ` Keith Busch
2026-05-10 20:53 ` Sagi Grimberg
2026-05-15 4:55 ` Chao S
2026-05-15 5:00 ` Chao S
2026-05-15 5:04 ` Chao S
2026-05-15 5:05 ` Chao S
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.