All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Christoph Hellwig <hch@lst.de>,
	marcan@marcan.st, sven@svenpeter.dev, kbusch@kernel.org,
	axboe@kernel.dk, james.smart@broadcom.com
Cc: alyssa@rosenzweig.io, asahi@lists.linux.dev,
	linux-nvme@lists.infradead.org, kch@nvidia.com
Subject: Re: [PATCH] nvme: don't set a virt_boundary unless needed
Date: Thu, 21 Dec 2023 11:30:38 +0200	[thread overview]
Message-ID: <155ec506-ede8-42c7-95f7-e8be32800a8d@grimberg.me> (raw)
In-Reply-To: <20231221084853.1175482-1-hch@lst.de>


> NVMe PRPs are a pain and force the expensive virt_boundary checking on
> block layer, prevent secure passthrough and require scatter/gather I/O
> to be split into multiple commands which is problematic for the upcoming
> atomic write support.

But is the threshold still correct? meaning for I/Os small enough the
device will have lower performance? I'm not advocating that we keep it,
but we should at least mention the tradeoff in the change log.

> Fix the NVMe core to require an opt-in from the drivers for it.
> 
> For nvme-apple it is always required as the driver only supports PRPs.
> 
> For nvme-pci when SGLs are supported we'll always use them for data I/O
> that would require a virt_boundary.
> 
> For nvme-rdma the virt boundary is always required, as RMDA MRs are just
> as dumb as NVMe PRPs.

That is actually device dependent. The driver can ask for a pool of
mrs with type IB_MR_TYPE_SG_GAPS if the device supports IBK_SG_GAPS_REG.

See from ib_srp.c:
--
        if (device->attrs.kernel_cap_flags & IBK_SG_GAPS_REG)
                 mr_type = IB_MR_TYPE_SG_GAPS;
         else
                 mr_type = IB_MR_TYPE_MEM_REG;
--

> 
> For nvme-tcp and nvme-fc I set the flags for now because I don't
> understand the drivers fully, but I suspect the flags could be lifted.

tcp can absolutely omit virt boundaries.

> For nvme-loop the flag is never set as it doesn't have any requirements
> on the I/O format.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/apple.c |  6 +++++
>   drivers/nvme/host/core.c  | 11 ++++++++-
>   drivers/nvme/host/fc.c    |  3 +++
>   drivers/nvme/host/nvme.h  |  4 +++
>   drivers/nvme/host/pci.c   | 52 ++++++++++++++++++++++-----------------
>   drivers/nvme/host/rdma.c  |  6 +++++
>   drivers/nvme/host/tcp.c   |  3 +++
>   7 files changed, 61 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
> index 596bb11eeba5a9..a1afb54e3b4da8 100644
> --- a/drivers/nvme/host/apple.c
> +++ b/drivers/nvme/host/apple.c
> @@ -1116,6 +1116,12 @@ static void apple_nvme_reset_work(struct work_struct *work)
>   		goto out;
>   	}
>   
> +	/*
> +	 * nvme-apple always uses PRPs and thus needs to set a virt boundary.
> +	 */
> +	set_bit(NVME_CTRL_VIRT_BOUNDARY_IO, &anv->ctrl.flags);
> +	set_bit(NVME_CTRL_VIRT_BOUNDARY_ADMIN, &anv->ctrl.flags);
> +

Why two flags? Why can't the core just always set the blk virt boundary
on the admin request queue?

>   	ret = nvme_init_ctrl_finish(&anv->ctrl, false);
>   	if (ret)
>   		goto out

  reply	other threads:[~2023-12-21  9:30 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-21  8:48 [PATCH] nvme: don't set a virt_boundary unless needed Christoph Hellwig
2023-12-21  9:30 ` Sagi Grimberg [this message]
2023-12-21 12:17   ` Christoph Hellwig
2023-12-21 12:32     ` Sagi Grimberg
2023-12-21 12:40       ` Christoph Hellwig
2023-12-25  9:13         ` Sagi Grimberg
2023-12-21 17:03     ` Keith Busch
2023-12-25  9:20       ` Sagi Grimberg
2023-12-22  1:16   ` Max Gurtovoy
2023-12-25 10:08     ` Sagi Grimberg
2023-12-25 10:36       ` Max Gurtovoy
2023-12-25 10:44         ` Sagi Grimberg
2023-12-25 12:31           ` Max Gurtovoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=155ec506-ede8-42c7-95f7-e8be32800a8d@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=alyssa@rosenzweig.io \
    --cc=asahi@lists.linux.dev \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=james.smart@broadcom.com \
    --cc=kbusch@kernel.org \
    --cc=kch@nvidia.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=marcan@marcan.st \
    --cc=sven@svenpeter.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.