From: Christoph Hellwig <hch@lst.de>
To: Chao Shi <coshi036@gmail.com>
Cc: linux-nvme@lists.infradead.org, Keith Busch <kbusch@kernel.org>,
Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>,
Jens Axboe <axboe@kernel.dk>,
Tatsuya Sasaki <tatsuya6.sasaki@kioxia.com>,
Maurizio Lombardi <mlombard@arkamax.eu>,
linux-kernel@vger.kernel.org, Sungwoo Kim <iam@sung-woo.kim>,
Dave Tian <daveti@purdue.edu>, Weidong Zhu <weizhu@fiu.edu>
Subject: Re: [PATCH v5] nvme: reject passthrough of driver-managed Set Features
Date: Wed, 27 May 2026 16:16:48 +0200 [thread overview]
Message-ID: <20260527141648.GA13404@lst.de> (raw)
In-Reply-To: <20260523225629.3964037-1-coshi036@gmail.com>
On Sat, May 23, 2026 at 06:56:29PM -0400, Chao Shi wrote:
> Since commit b58da2d270db ("nvme: update keep alive interval when kato
> is modified"), userspace can start keep-alive on any transport via a
> Set Features (KATO) passthrough command. nvme_keep_alive_work() then
> allocates with BLK_MQ_REQ_RESERVED, but nvme_alloc_admin_tag_set()
> only reserves admin tags for fabrics, so the allocation trips
> WARN_ON_ONCE() in blk_mq_get_tag() and fails:
>
> nvme nvme0: keep-alive failed: -11
>
> More generally, several Set Features change controller state that the
> driver manages itself and cannot react to correctly when set behind
> its back from userspace. Reject these in nvme_cmd_allowed():
>
> - KATO on non-fabrics (keep-alive is only armed for fabrics; on PCIe
> it has no reserved tag and an active keep-alive harms idle power
> states)
> - Host Behavior Support, Host Memory Buffer, Number of Queues, and
> Autonomous Power State Transition (all driver-managed)
>
> Keep Alive on fabrics is unchanged. I/O commands are unaffected as the
> check is confined to the admin path (ns == NULL).
>
> Link: https://lore.kernel.org/linux-nvme/20260522162639.395802-1-coshi036@gmail.com/
>
> Fixes: b58da2d270db ("nvme: update keep alive interval when kato is modified")
>
> Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).
>
> Acked-by: Sungwoo Kim <iam@sung-woo.kim>
> Acked-by: Dave Tian <daveti@purdue.edu>
> Acked-by: Weidong Zhu <weizhu@fiu.edu>
> Signed-off-by: Chao Shi <coshi036@gmail.com>
> ---
>
> Reproducer for the keep-alive case (run as root on a PCIe NVMe device):
>
> #include <fcntl.h>
> #include <stdio.h>
> #include <string.h>
> #include <sys/ioctl.h>
> #include <linux/nvme_ioctl.h>
>
> int main(void)
> {
> struct nvme_admin_cmd cmd = {0};
> int fd = open("/dev/nvme0", O_RDWR);
> if (fd < 0) { perror("open"); return 1; }
> cmd.opcode = 0x09; /* SET_FEATURES */
> cmd.cdw10 = 0x0f; /* Feature ID: KATO */
> cmd.cdw11 = 5; /* KATO = 5 seconds */
> if (ioctl(fd, NVME_IOCTL_ADMIN_CMD, &cmd) < 0) {
> perror("ioctl");
> return 1;
> }
> return 0;
> }
>
> On an unpatched kernel, within ~kato/2 seconds after the program exits,
> dmesg shows:
>
> nvme nvme0: keep alive interval updated from 0 ms to 5000 ms
> WARNING: CPU: 0 PID: ... at block/blk-mq-tag.c:148 blk_mq_get_tag+...
> nvme nvme0: keep-alive failed: -11
>
> With this patch the ioctl fails with EACCES on non-fabrics.
>
> Changes since v4:
> - Fold the check into the existing nvme_cmd_allowed() instead of a
> separate helper, and reject additional driver-managed Set Features
> (Host Behavior, Host Memory Buffer, Number of Queues, Autonomous
> Power State Transition) in the same switch (Keith Busch). The admin
> vs I/O distinction is now structural: the switch lives in the
> ns == NULL branch, so I/O commands (e.g. Dataset Management, which
> shares opcode 0x09 with Set Features) are never inspected.
>
> Changes since v3:
> - Only inspect admin commands so a DSM I/O command is not wrongly
> rejected (Keith Busch).
>
> Changes since v2:
> - Reject the KATO passthrough on non-fabrics instead of reserving an
> admin tag for all transports (Keith Busch, Christoph Hellwig).
>
> Changes since v1:
> - v2 added a spec citation and quirk discussion, superseded by the
> reject approach.
>
> drivers/nvme/host/ioctl.c | 33 +++++++++++++++++++++++++++------
> 1 file changed, 27 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
> index a9c097dacad6..31784506e845 100644
> --- a/drivers/nvme/host/ioctl.c
> +++ b/drivers/nvme/host/ioctl.c
> @@ -14,8 +14,9 @@ enum {
> NVME_IOCTL_PARTITION = (1 << 1),
> };
>
> -static bool nvme_cmd_allowed(struct nvme_ns *ns, struct nvme_command *c,
> - unsigned int flags, bool open_for_write)
> +static bool nvme_cmd_allowed(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
> + struct nvme_command *c, unsigned int flags,
> + bool open_for_write)
> {
> u32 effects;
>
> @@ -50,6 +51,26 @@ static bool nvme_cmd_allowed(struct nvme_ns *ns, struct nvme_command *c,
> case NVME_ID_CNS_CTRL:
> return true;
> }
> + } else if (c->common.opcode == nvme_admin_set_features) {
> + /*
> + * Reject Set Features that change controller state the
> + * driver manages itself; setting them behind the driver's
> + * back from userspace leaves it unable to react correctly.
Overly long lines. I suspect we're best off splitting out the
admin and ns-command set specific parts of nvme_cmd_allowed into
separate helpers. And maybe use a switch statement on the command
as nested ifs become cumersome in the long run.
> - if (!nvme_cmd_allowed(ns, &c, 0, ioucmd->file->f_mode & FMODE_WRITE))
> + if (!nvme_cmd_allowed(ctrl, ns, &c, 0, ioucmd->file->f_mode & FMODE_WRITE))
Another overly long line here.
Otherwise this looks good.
next prev parent reply other threads:[~2026-05-27 14:17 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-23 22:56 [PATCH v5] nvme: reject passthrough of driver-managed Set Features Chao Shi
2026-05-25 15:34 ` Tokunori Ikegami
2026-05-25 15:49 ` Keith Busch
2026-05-25 16:49 ` Tokunori Ikegami
2026-05-27 14:16 ` Christoph Hellwig [this message]
2026-05-27 14:32 ` Keith Busch
2026-05-28 8:43 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260527141648.GA13404@lst.de \
--to=hch@lst.de \
--cc=axboe@kernel.dk \
--cc=coshi036@gmail.com \
--cc=daveti@purdue.edu \
--cc=iam@sung-woo.kim \
--cc=kbusch@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=mlombard@arkamax.eu \
--cc=sagi@grimberg.me \
--cc=tatsuya6.sasaki@kioxia.com \
--cc=weizhu@fiu.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.