From: "Petr Tesařík" <petr@tesarici.cz>
To: mhkelley58@gmail.com
Cc: mhklinux@outlook.com, kbusch@kernel.org, axboe@kernel.dk,
sagi@grimberg.me, James.Bottomley@HansenPartnership.com,
martin.petersen@oracle.com, kys@microsoft.com,
haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com,
robin.murphy@arm.com, hch@lst.de, m.szyprowski@samsung.com,
iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-coco@lists.linux.dev
Subject: Re: [RFC 7/7] nvme: Enable swiotlb throttling for NVMe PCI devices
Date: Fri, 23 Aug 2024 10:26:42 +0200 [thread overview]
Message-ID: <20240823102642.3f7f893a@meshulam.tesarici.cz> (raw)
In-Reply-To: <20240822183718.1234-8-mhklinux@outlook.com>
On Thu, 22 Aug 2024 11:37:18 -0700
mhkelley58@gmail.com wrote:
> From: Michael Kelley <mhklinux@outlook.com>
>
> In a CoCo VM, all DMA-based I/O must use swiotlb bounce buffers
> because DMA cannot be done to private (encrypted) portions of VM
> memory. The bounce buffer memory is marked shared (decrypted) at
> boot time, so I/O is done to/from the bounce buffer memory and then
> copied by the CPU to/from the final target memory (i.e, "bounced").
> Storage devices can be large consumers of bounce buffer memory because
> it is possible to have large numbers of I/Os in flight across multiple
> devices. Bounce buffer memory must be pre-allocated at boot time, and
> it is difficult to know how much memory to allocate to handle peak
> storage I/O loads. Consequently, bounce buffer memory is typically
> over-provisioned, which wastes memory, and may still not avoid a peak
> that exhausts bounce buffer memory and cause storage I/O errors.
>
> For Coco VMs running with NVMe PCI devices, update the driver to
> permit bounce buffer throttling. Gate the throttling behavior
> on a DMA layer check indicating that throttling is useful, so that
> no change occurs in a non-CoCo VM. If throttling is useful, enable
> the BLK_MQ_F_BLOCKING flag, and pass the DMA_ATTR_MAY_BLOCK attribute
> into dma_map_bvec() and dma_map_sgtable() calls. With these options in
> place, DMA map requests are pended when necessary to reduce the
> likelihood of usage peaks caused by the NVMe driver that could exhaust
> bounce buffer memory and generate errors.
>
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
LGTM.
Reviewed-by: Petr Tesarik <ptesarik@suse.com>
Petr T
> ---
> drivers/nvme/host/pci.c | 18 ++++++++++++++----
> 1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 6cd9395ba9ec..2c39943a87f8 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -156,6 +156,7 @@ struct nvme_dev {
> dma_addr_t host_mem_descs_dma;
> struct nvme_host_mem_buf_desc *host_mem_descs;
> void **host_mem_desc_bufs;
> + unsigned long dma_attrs;
> unsigned int nr_allocated_queues;
> unsigned int nr_write_queues;
> unsigned int nr_poll_queues;
> @@ -735,7 +736,8 @@ static blk_status_t nvme_setup_prp_simple(struct nvme_dev *dev,
> unsigned int offset = bv->bv_offset & (NVME_CTRL_PAGE_SIZE - 1);
> unsigned int first_prp_len = NVME_CTRL_PAGE_SIZE - offset;
>
> - iod->first_dma = dma_map_bvec(dev->dev, bv, rq_dma_dir(req), 0);
> + iod->first_dma = dma_map_bvec(dev->dev, bv, rq_dma_dir(req),
> + dev->dma_attrs);
> if (dma_mapping_error(dev->dev, iod->first_dma))
> return BLK_STS_RESOURCE;
> iod->dma_len = bv->bv_len;
> @@ -754,7 +756,8 @@ static blk_status_t nvme_setup_sgl_simple(struct nvme_dev *dev,
> {
> struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
>
> - iod->first_dma = dma_map_bvec(dev->dev, bv, rq_dma_dir(req), 0);
> + iod->first_dma = dma_map_bvec(dev->dev, bv, rq_dma_dir(req),
> + dev->dma_attrs);
> if (dma_mapping_error(dev->dev, iod->first_dma))
> return BLK_STS_RESOURCE;
> iod->dma_len = bv->bv_len;
> @@ -800,7 +803,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
> goto out_free_sg;
>
> rc = dma_map_sgtable(dev->dev, &iod->sgt, rq_dma_dir(req),
> - DMA_ATTR_NO_WARN);
> + dev->dma_attrs | DMA_ATTR_NO_WARN);
> if (rc) {
> if (rc == -EREMOTEIO)
> ret = BLK_STS_TARGET;
> @@ -828,7 +831,8 @@ static blk_status_t nvme_map_metadata(struct nvme_dev *dev, struct request *req,
> struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> struct bio_vec bv = rq_integrity_vec(req);
>
> - iod->meta_dma = dma_map_bvec(dev->dev, &bv, rq_dma_dir(req), 0);
> + iod->meta_dma = dma_map_bvec(dev->dev, &bv, rq_dma_dir(req),
> + dev->dma_attrs);
> if (dma_mapping_error(dev->dev, iod->meta_dma))
> return BLK_STS_IOERR;
> cmnd->rw.metadata = cpu_to_le64(iod->meta_dma);
> @@ -3040,6 +3044,12 @@ static struct nvme_dev *nvme_pci_alloc_dev(struct pci_dev *pdev,
> * a single integrity segment for the separate metadata pointer.
> */
> dev->ctrl.max_integrity_segments = 1;
> +
> + if (dma_recommend_may_block(dev->dev)) {
> + dev->ctrl.blocking = true;
> + dev->dma_attrs = DMA_ATTR_MAY_BLOCK;
> + }
> +
> return dev;
>
> out_put_device:
next prev parent reply other threads:[~2024-08-23 8:26 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-22 18:37 [RFC 0/7] Introduce swiotlb throttling mhkelley58
2024-08-22 18:37 ` [RFC 1/7] swiotlb: " mhkelley58
2024-08-23 7:41 ` Petr Tesařík
2024-08-23 20:41 ` Michael Kelley
2024-08-27 15:55 ` Petr Tesařík
2024-08-27 17:30 ` Michael Kelley
2024-08-28 5:15 ` Petr Tesařík
2024-08-28 6:14 ` Michael Kelley
2024-08-22 18:37 ` [RFC 2/7] dma: Handle swiotlb throttling for SGLs mhkelley58
2024-08-23 8:02 ` Petr Tesařík
2024-08-23 20:42 ` Michael Kelley
2024-08-24 19:56 ` Petr Tesařík
2024-08-22 18:37 ` [RFC 3/7] dma: Add function for drivers to know if allowing blocking is useful mhkelley58
2024-08-23 8:07 ` Petr Tesařík
2024-08-22 18:37 ` [RFC 4/7] scsi_lib_dma: Add _attrs variant of scsi_dma_map() mhkelley58
2024-08-23 8:08 ` Petr Tesařík
2024-08-22 18:37 ` [RFC 5/7] scsi: storvsc: Enable swiotlb throttling mhkelley58
2024-08-23 8:19 ` Petr Tesařík
2024-08-23 20:42 ` Michael Kelley
2024-08-22 18:37 ` [RFC 6/7] nvme: Move BLK_MQ_F_BLOCKING indicator to struct nvme_ctrl mhkelley58
2024-08-23 8:22 ` Petr Tesařík
2024-08-22 18:37 ` [RFC 7/7] nvme: Enable swiotlb throttling for NVMe PCI devices mhkelley58
2024-08-23 8:26 ` Petr Tesařík [this message]
2024-08-22 19:29 ` [RFC 0/7] Introduce swiotlb throttling Bart Van Assche
2024-08-23 2:20 ` Michael Kelley
2024-08-23 5:46 ` Petr Tesařík
2024-08-24 8:05 ` hch
2024-08-23 6:44 ` Petr Tesařík
2024-08-23 20:40 ` Michael Kelley
2024-08-24 20:05 ` Petr Tesařík
2024-08-26 16:24 ` Michael Kelley
2024-08-26 19:28 ` Petr Tesařík
2024-08-27 0:26 ` Michael Kelley
2024-08-27 8:00 ` Petr Tesařík
2024-08-24 8:16 ` Christoph Hellwig
2024-08-26 15:27 ` Michael Kelley
2024-08-27 7:14 ` Christoph Hellwig
2024-08-28 12:02 ` Robin Murphy
2024-08-28 13:03 ` Petr Tesařík
2024-08-28 16:30 ` Michael Kelley
2024-08-28 16:41 ` Petr Tesařík
2024-08-28 19:50 ` Robin Murphy
2024-08-30 3:58 ` Michael Kelley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240823102642.3f7f893a@meshulam.tesarici.cz \
--to=petr@tesarici.cz \
--cc=James.Bottomley@HansenPartnership.com \
--cc=axboe@kernel.dk \
--cc=decui@microsoft.com \
--cc=haiyangz@microsoft.com \
--cc=hch@lst.de \
--cc=iommu@lists.linux.dev \
--cc=kbusch@kernel.org \
--cc=kys@microsoft.com \
--cc=linux-coco@lists.linux.dev \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-scsi@vger.kernel.org \
--cc=m.szyprowski@samsung.com \
--cc=martin.petersen@oracle.com \
--cc=mhkelley58@gmail.com \
--cc=mhklinux@outlook.com \
--cc=robin.murphy@arm.com \
--cc=sagi@grimberg.me \
--cc=wei.liu@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).