From: mhkelley58@gmail.com
To: kbusch@kernel.org, axboe@kernel.dk, sagi@grimberg.me,
James.Bottomley@HansenPartnership.com,
martin.petersen@oracle.com, kys@microsoft.com,
haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com,
robin.murphy@arm.com, hch@lst.de, m.szyprowski@samsung.com,
petr@tesarici.cz, iommu@lists.linux.dev,
linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
linux-scsi@vger.kernel.org, linux-hyperv@vger.kernel.org,
linux-coco@lists.linux.dev
Subject: [RFC 7/7] nvme: Enable swiotlb throttling for NVMe PCI devices
Date: Thu, 22 Aug 2024 11:37:18 -0700 [thread overview]
Message-ID: <20240822183718.1234-8-mhklinux@outlook.com> (raw)
In-Reply-To: <20240822183718.1234-1-mhklinux@outlook.com>
From: Michael Kelley <mhklinux@outlook.com>
In a CoCo VM, all DMA-based I/O must use swiotlb bounce buffers
because DMA cannot be done to private (encrypted) portions of VM
memory. The bounce buffer memory is marked shared (decrypted) at
boot time, so I/O is done to/from the bounce buffer memory and then
copied by the CPU to/from the final target memory (i.e, "bounced").
Storage devices can be large consumers of bounce buffer memory because
it is possible to have large numbers of I/Os in flight across multiple
devices. Bounce buffer memory must be pre-allocated at boot time, and
it is difficult to know how much memory to allocate to handle peak
storage I/O loads. Consequently, bounce buffer memory is typically
over-provisioned, which wastes memory, and may still not avoid a peak
that exhausts bounce buffer memory and cause storage I/O errors.
For Coco VMs running with NVMe PCI devices, update the driver to
permit bounce buffer throttling. Gate the throttling behavior
on a DMA layer check indicating that throttling is useful, so that
no change occurs in a non-CoCo VM. If throttling is useful, enable
the BLK_MQ_F_BLOCKING flag, and pass the DMA_ATTR_MAY_BLOCK attribute
into dma_map_bvec() and dma_map_sgtable() calls. With these options in
place, DMA map requests are pended when necessary to reduce the
likelihood of usage peaks caused by the NVMe driver that could exhaust
bounce buffer memory and generate errors.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
drivers/nvme/host/pci.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 6cd9395ba9ec..2c39943a87f8 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -156,6 +156,7 @@ struct nvme_dev {
dma_addr_t host_mem_descs_dma;
struct nvme_host_mem_buf_desc *host_mem_descs;
void **host_mem_desc_bufs;
+ unsigned long dma_attrs;
unsigned int nr_allocated_queues;
unsigned int nr_write_queues;
unsigned int nr_poll_queues;
@@ -735,7 +736,8 @@ static blk_status_t nvme_setup_prp_simple(struct nvme_dev *dev,
unsigned int offset = bv->bv_offset & (NVME_CTRL_PAGE_SIZE - 1);
unsigned int first_prp_len = NVME_CTRL_PAGE_SIZE - offset;
- iod->first_dma = dma_map_bvec(dev->dev, bv, rq_dma_dir(req), 0);
+ iod->first_dma = dma_map_bvec(dev->dev, bv, rq_dma_dir(req),
+ dev->dma_attrs);
if (dma_mapping_error(dev->dev, iod->first_dma))
return BLK_STS_RESOURCE;
iod->dma_len = bv->bv_len;
@@ -754,7 +756,8 @@ static blk_status_t nvme_setup_sgl_simple(struct nvme_dev *dev,
{
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
- iod->first_dma = dma_map_bvec(dev->dev, bv, rq_dma_dir(req), 0);
+ iod->first_dma = dma_map_bvec(dev->dev, bv, rq_dma_dir(req),
+ dev->dma_attrs);
if (dma_mapping_error(dev->dev, iod->first_dma))
return BLK_STS_RESOURCE;
iod->dma_len = bv->bv_len;
@@ -800,7 +803,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
goto out_free_sg;
rc = dma_map_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req),
- DMA_ATTR_NO_WARN);
+ dev->dma_attrs | DMA_ATTR_NO_WARN);
if (rc) {
if (rc == -EREMOTEIO)
ret = BLK_STS_TARGET;
@@ -828,7 +831,8 @@ static blk_status_t nvme_map_metadata(struct nvme_dev *dev, struct request *req,
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
struct bio_vec bv = rq_integrity_vec(req);
- iod->meta_dma = dma_map_bvec(dev->dev, &bv, rq_dma_dir(req), 0);
+ iod->meta_dma = dma_map_bvec(dev->dev, &bv, rq_dma_dir(req),
+ dev->dma_attrs);
if (dma_mapping_error(dev->dev, iod->meta_dma))
return BLK_STS_IOERR;
cmnd->rw.metadata = cpu_to_le64(iod->meta_dma);
@@ -3040,6 +3044,12 @@ static struct nvme_dev *nvme_pci_alloc_dev(struct pci_dev *pdev,
* a single integrity segment for the separate metadata pointer.
*/
dev->ctrl.max_integrity_segments = 1;
+
+ if (dma_recommend_may_block(dev->dev)) {
+ dev->ctrl.blocking = true;
+ dev->dma_attrs = DMA_ATTR_MAY_BLOCK;
+ }
+
return dev;
out_put_device:
--
2.25.1
next prev parent reply other threads:[~2024-08-22 18:37 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-22 18:37 [RFC 0/7] Introduce swiotlb throttling mhkelley58
2024-08-22 18:37 ` [RFC 1/7] swiotlb: " mhkelley58
2024-08-23 7:41 ` Petr Tesařík
2024-08-23 20:41 ` Michael Kelley
2024-08-27 15:55 ` Petr Tesařík
2024-08-27 17:30 ` Michael Kelley
2024-08-28 5:15 ` Petr Tesařík
2024-08-28 6:14 ` Michael Kelley
2024-08-22 18:37 ` [RFC 2/7] dma: Handle swiotlb throttling for SGLs mhkelley58
2024-08-23 8:02 ` Petr Tesařík
2024-08-23 20:42 ` Michael Kelley
2024-08-24 19:56 ` Petr Tesařík
2024-08-22 18:37 ` [RFC 3/7] dma: Add function for drivers to know if allowing blocking is useful mhkelley58
2024-08-23 8:07 ` Petr Tesařík
2024-08-22 18:37 ` [RFC 4/7] scsi_lib_dma: Add _attrs variant of scsi_dma_map() mhkelley58
2024-08-23 8:08 ` Petr Tesařík
2024-08-22 18:37 ` [RFC 5/7] scsi: storvsc: Enable swiotlb throttling mhkelley58
2024-08-23 8:19 ` Petr Tesařík
2024-08-23 20:42 ` Michael Kelley
2024-08-22 18:37 ` [RFC 6/7] nvme: Move BLK_MQ_F_BLOCKING indicator to struct nvme_ctrl mhkelley58
2024-08-23 8:22 ` Petr Tesařík
2024-08-22 18:37 ` mhkelley58 [this message]
2024-08-23 8:26 ` [RFC 7/7] nvme: Enable swiotlb throttling for NVMe PCI devices Petr Tesařík
2024-08-22 19:29 ` [RFC 0/7] Introduce swiotlb throttling Bart Van Assche
2024-08-23 2:20 ` Michael Kelley
2024-08-23 5:46 ` Petr Tesařík
2024-08-24 8:05 ` hch
2024-08-23 6:44 ` Petr Tesařík
2024-08-23 20:40 ` Michael Kelley
2024-08-24 20:05 ` Petr Tesařík
2024-08-26 16:24 ` Michael Kelley
2024-08-26 19:28 ` Petr Tesařík
2024-08-27 0:26 ` Michael Kelley
2024-08-27 8:00 ` Petr Tesařík
2024-08-24 8:16 ` Christoph Hellwig
2024-08-26 15:27 ` Michael Kelley
2024-08-27 7:14 ` Christoph Hellwig
2024-08-28 12:02 ` Robin Murphy
2024-08-28 13:03 ` Petr Tesařík
2024-08-28 16:30 ` Michael Kelley
2024-08-28 16:41 ` Petr Tesařík
2024-08-28 19:50 ` Robin Murphy
2024-08-30 3:58 ` Michael Kelley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240822183718.1234-8-mhklinux@outlook.com \
--to=mhkelley58@gmail.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=axboe@kernel.dk \
--cc=decui@microsoft.com \
--cc=haiyangz@microsoft.com \
--cc=hch@lst.de \
--cc=iommu@lists.linux.dev \
--cc=kbusch@kernel.org \
--cc=kys@microsoft.com \
--cc=linux-coco@lists.linux.dev \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-scsi@vger.kernel.org \
--cc=m.szyprowski@samsung.com \
--cc=martin.petersen@oracle.com \
--cc=mhklinux@outlook.com \
--cc=petr@tesarici.cz \
--cc=robin.murphy@arm.com \
--cc=sagi@grimberg.me \
--cc=wei.liu@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox