From: Christoph Hellwig <hch@lst.de>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Christoph Hellwig <hch@lst.de>,
marcan@marcan.st, sven@svenpeter.dev, kbusch@kernel.org,
axboe@kernel.dk, james.smart@broadcom.com, alyssa@rosenzweig.io,
asahi@lists.linux.dev, linux-nvme@lists.infradead.org,
kch@nvidia.com
Subject: Re: [PATCH] nvme: don't set a virt_boundary unless needed
Date: Thu, 21 Dec 2023 13:17:46 +0100 [thread overview]
Message-ID: <20231221121746.GA17956@lst.de> (raw)
In-Reply-To: <155ec506-ede8-42c7-95f7-e8be32800a8d@grimberg.me>
On Thu, Dec 21, 2023 at 11:30:38AM +0200, Sagi Grimberg wrote:
>
>> NVMe PRPs are a pain and force the expensive virt_boundary checking on
>> block layer, prevent secure passthrough and require scatter/gather I/O
>> to be split into multiple commands which is problematic for the upcoming
>> atomic write support.
>
> But is the threshold still correct? meaning for I/Os small enough the
> device will have lower performance? I'm not advocating that we keep it,
> but we should at least mention the tradeoff in the change log.
Chaitanya benchmarked it on the first generation of devices that
supported SGLs. On the only SGL-enabled device I have there is no
performance penality for using SGLs on small transfer, but I'd love
to see numbers from other setups.
>> For nvme-rdma the virt boundary is always required, as RMDA MRs are just
>> as dumb as NVMe PRPs.
>
> That is actually device dependent. The driver can ask for a pool of
> mrs with type IB_MR_TYPE_SG_GAPS if the device supports IBK_SG_GAPS_REG.
>
> See from ib_srp.c:
> --
> if (device->attrs.kernel_cap_flags & IBK_SG_GAPS_REG)
> mr_type = IB_MR_TYPE_SG_GAPS;
> else
> mr_type = IB_MR_TYPE_MEM_REG;
> --
For that we'd need to support IB_MR_TYPE_SG_GAPS gaps first, which can
be done as an incremental improvement.
>> + /*
>> + * nvme-apple always uses PRPs and thus needs to set a virt boundary.
>> + */
>> + set_bit(NVME_CTRL_VIRT_BOUNDARY_IO, &anv->ctrl.flags);
>> + set_bit(NVME_CTRL_VIRT_BOUNDARY_ADMIN, &anv->ctrl.flags);
>> +
>
> Why two flags? Why can't the core just always set the blk virt boundary
> on the admin request queue?
It could, and given that the admin queue isn't performance critical it
probably won't hurt in reality. But why enforce a really weird limit
on the queue if there is no reason for it? The only transport that
treats the admin queue different is PCIe, and that's just a spec
oddity.
next prev parent reply other threads:[~2023-12-21 12:17 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-21 8:48 [PATCH] nvme: don't set a virt_boundary unless needed Christoph Hellwig
2023-12-21 9:30 ` Sagi Grimberg
2023-12-21 12:17 ` Christoph Hellwig [this message]
2023-12-21 12:32 ` Sagi Grimberg
2023-12-21 12:40 ` Christoph Hellwig
2023-12-25 9:13 ` Sagi Grimberg
2023-12-21 17:03 ` Keith Busch
2023-12-25 9:20 ` Sagi Grimberg
2023-12-22 1:16 ` Max Gurtovoy
2023-12-25 10:08 ` Sagi Grimberg
2023-12-25 10:36 ` Max Gurtovoy
2023-12-25 10:44 ` Sagi Grimberg
2023-12-25 12:31 ` Max Gurtovoy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231221121746.GA17956@lst.de \
--to=hch@lst.de \
--cc=alyssa@rosenzweig.io \
--cc=asahi@lists.linux.dev \
--cc=axboe@kernel.dk \
--cc=james.smart@broadcom.com \
--cc=kbusch@kernel.org \
--cc=kch@nvidia.com \
--cc=linux-nvme@lists.infradead.org \
--cc=marcan@marcan.st \
--cc=sagi@grimberg.me \
--cc=sven@svenpeter.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.