linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Uday Shankar <ushankar@purestorage.com>
To: Jens Axboe <axboe@kernel.dk>, Kanchan Joshi <joshi.k@samsung.com>,
	Anuj Gupta <anuj20.g@samsung.com>, Christoph Hellwig <hch@lst.de>
Cc: linux-block@vger.kernel.org, Xinyu Zhang <xizhang@purestorage.com>
Subject: Re: [PATCH] block: fix sanity checks in blk_rq_map_user_bvec
Date: Wed, 23 Oct 2024 16:50:41 -0600	[thread overview]
Message-ID: <Zxl9wS2j5mUkye9o@dev-ushankar.dev.purestorage.com> (raw)
In-Reply-To: <20241023211519.4177873-1-ushankar@purestorage.com>

On Wed, Oct 23, 2024 at 03:15:19PM -0600, Uday Shankar wrote:
> From: Xinyu Zhang <xizhang@purestorage.com>
> 
> blk_rq_map_user_bvec contains a check bytes + bv->bv_len > nr_iter which
> causes unnecessary failures in NVMe passthrough I/O, reproducible as
> follows:
> 
> - register a 2 page, page-aligned buffer against a ring
> - use that buffer to do a 1 page io_uring NVMe passthrough read
> 
> The second (i = 1) iteration of the loop in blk_rq_map_user_bvec will
> then have nr_iter == 1 page, bytes == 1 page, bv->bv_len == 1 page, so
> the check bytes + bv->bv_len > nr_iter will succeed, causing the I/O to
> fail. This failure is unnecessary, as when the check succeeds, it means
> we've checked the entire buffer that will be used by the request - i.e.
> blk_rq_map_user_bvec should complete successfully. Therefore, terminate
> the loop early and return successfully when the check bytes + bv->bv_len
> > nr_iter succeeds.

For anyone interested, here are the details on how to reproduce the
issue described above:

# cat test.c
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <sys/ioctl.h>
#include <liburing.h>
#include <stdlib.h>
#include <assert.h>
#include <linux/nvme_ioctl.h>

int main(int argc, char *argv[]) {
    struct io_uring ring;

    assert(io_uring_queue_init(1, &ring, IORING_SETUP_SQE128 | IORING_SETUP_CQE32) == 0);

    void *buf = memalign(4096, 2 * 4096);
    printf("buf %p\n", buf);
    struct iovec iov = {
        .iov_base = buf,
        .iov_len = 2 * 4096,
    };

    assert(io_uring_register_buffers(&ring, &iov, 1) == 0);

    struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
    assert(sqe != NULL);

    int fd = open("/dev/ng0n1", O_RDONLY);
    assert(fd > 0);
    sqe->fd = fd;
    sqe->opcode = IORING_OP_URING_CMD;
    sqe->cmd_op = NVME_URING_CMD_IO;
    sqe->buf_index = 0;
    sqe->flags = 0;
    sqe->uring_cmd_flags = IORING_URING_CMD_FIXED;

    struct nvme_passthru_cmd *cmd = &sqe->cmd;
    cmd->opcode = 2; // read
    cmd->nsid = 1;
    cmd->data_len = 1 * 4096;
    cmd->addr = buf;

    struct io_uring_cqe *cqe;
    assert(io_uring_submit(&ring) == 1);
    assert(io_uring_wait_cqe(&ring, &cqe) == 0);

    printf("res %d\n", cqe->res);

    return 0;
}
# gcc -o test -luring test.c
test.c: In function ‘main’:
test.c:15:17: warning: implicit declaration of function ‘memalign’ [-Wimplicit-function-declaration]
   15 |     void *buf = memalign(4096, 2 * 4096);
      |                 ^~~~~~~~
test.c:15:17: warning: initialization of ‘void *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
test.c:36:37: warning: initialization of ‘struct nvme_passthru_cmd *’ from incompatible pointer type ‘__u8 (*)[0]’ {aka ‘unsigned char (*)[]’} [-Wincompatible-pointer-types]
   36 |     struct nvme_passthru_cmd *cmd = &sqe->cmd;
      |                                     ^
test.c:40:15: warning: assignment to ‘__u64’ {aka ‘long long unsigned int’} from ‘void *’ makes integer from pointer without a cast [-Wint-conversion]
   40 |     cmd->addr = buf;
      |
# ./test
buf 0x406000
res -22

  parent reply	other threads:[~2024-10-23 22:50 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-23 21:15 [PATCH] block: fix sanity checks in blk_rq_map_user_bvec Uday Shankar
2024-10-23 22:31 ` Jens Axboe
2024-10-23 22:46   ` Uday Shankar
2024-10-23 22:50 ` Uday Shankar [this message]
2024-10-23 22:54   ` Bart Van Assche
2024-10-24  0:42     ` Chaitanya Kulkarni
2024-10-23 23:03 ` Jens Axboe
2024-10-24  4:56 ` Christoph Hellwig
2024-10-24  6:05   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zxl9wS2j5mUkye9o@dev-ushankar.dev.purestorage.com \
    --to=ushankar@purestorage.com \
    --cc=anuj20.g@samsung.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=joshi.k@samsung.com \
    --cc=linux-block@vger.kernel.org \
    --cc=xizhang@purestorage.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).