Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Chao Shi <coshi036@gmail.com>
Cc: linux-nvme@lists.infradead.org, Keith Busch <kbusch@kernel.org>,
	Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>,
	Jens Axboe <axboe@kernel.dk>,
	Tatsuya Sasaki <tatsuya6.sasaki@kioxia.com>,
	Maurizio Lombardi <mlombard@arkamax.eu>,
	linux-kernel@vger.kernel.org, Sungwoo Kim <iam@sung-woo.kim>,
	Dave Tian <daveti@purdue.edu>, Weidong Zhu <weizhu@fiu.edu>
Subject: Re: [PATCH v5] nvme: reject passthrough of driver-managed Set Features
Date: Wed, 27 May 2026 16:16:48 +0200	[thread overview]
Message-ID: <20260527141648.GA13404@lst.de> (raw)
In-Reply-To: <20260523225629.3964037-1-coshi036@gmail.com>

On Sat, May 23, 2026 at 06:56:29PM -0400, Chao Shi wrote:
> Since commit b58da2d270db ("nvme: update keep alive interval when kato
> is modified"), userspace can start keep-alive on any transport via a
> Set Features (KATO) passthrough command. nvme_keep_alive_work() then
> allocates with BLK_MQ_REQ_RESERVED, but nvme_alloc_admin_tag_set()
> only reserves admin tags for fabrics, so the allocation trips
> WARN_ON_ONCE() in blk_mq_get_tag() and fails:
> 
>   nvme nvme0: keep-alive failed: -11
> 
> More generally, several Set Features change controller state that the
> driver manages itself and cannot react to correctly when set behind
> its back from userspace. Reject these in nvme_cmd_allowed():
> 
>   - KATO on non-fabrics (keep-alive is only armed for fabrics; on PCIe
>     it has no reserved tag and an active keep-alive harms idle power
>     states)
>   - Host Behavior Support, Host Memory Buffer, Number of Queues, and
>     Autonomous Power State Transition (all driver-managed)
> 
> Keep Alive on fabrics is unchanged. I/O commands are unaffected as the
> check is confined to the admin path (ns == NULL).
> 
> Link: https://lore.kernel.org/linux-nvme/20260522162639.395802-1-coshi036@gmail.com/
> 
> Fixes: b58da2d270db ("nvme: update keep alive interval when kato is modified")
> 
> Found by FuzzNvme(Syzkaller with FEMU fuzzing framework).
> 
> Acked-by: Sungwoo Kim <iam@sung-woo.kim>
> Acked-by: Dave Tian <daveti@purdue.edu>
> Acked-by: Weidong Zhu <weizhu@fiu.edu>
> Signed-off-by: Chao Shi <coshi036@gmail.com>
> ---
> 
> Reproducer for the keep-alive case (run as root on a PCIe NVMe device):
> 
>     #include <fcntl.h>
>     #include <stdio.h>
>     #include <string.h>
>     #include <sys/ioctl.h>
>     #include <linux/nvme_ioctl.h>
> 
>     int main(void)
>     {
>             struct nvme_admin_cmd cmd = {0};
>             int fd = open("/dev/nvme0", O_RDWR);
>             if (fd < 0) { perror("open"); return 1; }
>             cmd.opcode = 0x09;       /* SET_FEATURES */
>             cmd.cdw10  = 0x0f;       /* Feature ID: KATO */
>             cmd.cdw11  = 5;          /* KATO = 5 seconds */
>             if (ioctl(fd, NVME_IOCTL_ADMIN_CMD, &cmd) < 0) {
>                     perror("ioctl");
>                     return 1;
>             }
>             return 0;
>     }
> 
> On an unpatched kernel, within ~kato/2 seconds after the program exits,
> dmesg shows:
> 
>     nvme nvme0: keep alive interval updated from 0 ms to 5000 ms
>     WARNING: CPU: 0 PID: ... at block/blk-mq-tag.c:148 blk_mq_get_tag+...
>     nvme nvme0: keep-alive failed: -11
> 
> With this patch the ioctl fails with EACCES on non-fabrics.
> 
> Changes since v4:
> - Fold the check into the existing nvme_cmd_allowed() instead of a
>   separate helper, and reject additional driver-managed Set Features
>   (Host Behavior, Host Memory Buffer, Number of Queues, Autonomous
>   Power State Transition) in the same switch (Keith Busch). The admin
>   vs I/O distinction is now structural: the switch lives in the
>   ns == NULL branch, so I/O commands (e.g. Dataset Management, which
>   shares opcode 0x09 with Set Features) are never inspected.
> 
> Changes since v3:
> - Only inspect admin commands so a DSM I/O command is not wrongly
>   rejected (Keith Busch).
> 
> Changes since v2:
> - Reject the KATO passthrough on non-fabrics instead of reserving an
>   admin tag for all transports (Keith Busch, Christoph Hellwig).
> 
> Changes since v1:
> - v2 added a spec citation and quirk discussion, superseded by the
>   reject approach.
> 
>  drivers/nvme/host/ioctl.c | 33 +++++++++++++++++++++++++++------
>  1 file changed, 27 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
> index a9c097dacad6..31784506e845 100644
> --- a/drivers/nvme/host/ioctl.c
> +++ b/drivers/nvme/host/ioctl.c
> @@ -14,8 +14,9 @@ enum {
>  	NVME_IOCTL_PARTITION	= (1 << 1),
>  };
>  
> -static bool nvme_cmd_allowed(struct nvme_ns *ns, struct nvme_command *c,
> -		unsigned int flags, bool open_for_write)
> +static bool nvme_cmd_allowed(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
> +			     struct nvme_command *c, unsigned int flags,
> +			     bool open_for_write)
>  {
>  	u32 effects;
>  
> @@ -50,6 +51,26 @@ static bool nvme_cmd_allowed(struct nvme_ns *ns, struct nvme_command *c,
>  			case NVME_ID_CNS_CTRL:
>  				return true;
>  			}
> +		} else if (c->common.opcode == nvme_admin_set_features) {
> +			/*
> +			 * Reject Set Features that change controller state the
> +			 * driver manages itself; setting them behind the driver's
> +			 * back from userspace leaves it unable to react correctly.

Overly long lines.  I suspect we're best off splitting out the
admin and ns-command set specific parts of nvme_cmd_allowed into
separate helpers.  And maybe use a switch statement on the command
as nested ifs become cumersome in the long run.

> -	if (!nvme_cmd_allowed(ns, &c, 0, ioucmd->file->f_mode & FMODE_WRITE))
> +	if (!nvme_cmd_allowed(ctrl, ns, &c, 0, ioucmd->file->f_mode & FMODE_WRITE))

Another overly long line here.

Otherwise this looks good.


  parent reply	other threads:[~2026-05-27 14:17 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-23 22:56 [PATCH v5] nvme: reject passthrough of driver-managed Set Features Chao Shi
2026-05-25 15:34 ` Tokunori Ikegami
2026-05-25 15:49   ` Keith Busch
2026-05-25 16:49     ` Tokunori Ikegami
2026-05-27 14:16 ` Christoph Hellwig [this message]
2026-05-27 14:32 ` Keith Busch
2026-05-28  8:43   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260527141648.GA13404@lst.de \
    --to=hch@lst.de \
    --cc=axboe@kernel.dk \
    --cc=coshi036@gmail.com \
    --cc=daveti@purdue.edu \
    --cc=iam@sung-woo.kim \
    --cc=kbusch@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=mlombard@arkamax.eu \
    --cc=sagi@grimberg.me \
    --cc=tatsuya6.sasaki@kioxia.com \
    --cc=weizhu@fiu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox