* [PATCH v2 0/2] block: Enable proper MMIO memory handling for P2P DMA
@ 2025-10-20 17:00 Leon Romanovsky
2025-10-20 17:00 ` [PATCH v2 1/2] nvme-pci: migrate to dma_map_phys instead of map_page Leon Romanovsky
2025-10-20 17:00 ` [PATCH v2 2/2] block-dma: properly take MMIO path Leon Romanovsky
0 siblings, 2 replies; 10+ messages in thread
From: Leon Romanovsky @ 2025-10-20 17:00 UTC (permalink / raw)
To: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg
Cc: linux-block, linux-kernel, linux-nvme
Changelog:
v2:
* Added Chirstoph's Reviewed-by tag for first patch.
* Squashed patches
* Stored DMA MMIO attribute in NVMe IOD flags variable instead of block layer.
v1: https://patch.msgid.link/20251017-block-with-mmio-v1-0-3f486904db5e@nvidia.com
* Reordered patches.
* Dropped patch which tried to unify unmap flow.
* Set MMIO flag separately for data and integrity payloads.
v0: https://lore.kernel.org/all/cover.1760369219.git.leon@kernel.org/
----------------------------------------------------------------------
This patch series improves block layer and NVMe driver support for MMIO
memory regions, particularly for peer-to-peer (P2P) DMA transfers that
go through the host bridge.
The series addresses a critical gap where P2P transfers through the host
bridge (PCI_P2PDMA_MAP_THRU_HOST_BRIDGE) were not properly marked as
MMIO memory, leading to potential issues with:
- Inappropriate CPU cache synchronization operations on MMIO regions
- Incorrect DMA mapping/unmapping that doesn't respect MMIO semantics
- Missing IOMMU configuration for MMIO memory handling
This work is extracted from the larger DMA physical API improvement
series [1] and focuses specifically on block layer and NVMe requirements
for MMIO memory support.
Thanks
[1] https://lore.kernel.org/all/cover.1757423202.git.leonro@nvidia.com/
---
Leon Romanovsky (2):
nvme-pci: migrate to dma_map_phys instead of map_page
block-dma: properly take MMIO path
block/blk-mq-dma.c | 8 ++++---
drivers/nvme/host/pci.c | 50 +++++++++++++++++++++++++++++++------------
include/linux/blk-integrity.h | 7 +++---
include/linux/blk-mq-dma.h | 12 +++++++----
4 files changed, 53 insertions(+), 24 deletions(-)
---
base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787
change-id: 20251016-block-with-mmio-02acf4285427
Best regards,
--
Leon Romanovsky <leonro@nvidia.com>
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH v2 1/2] nvme-pci: migrate to dma_map_phys instead of map_page 2025-10-20 17:00 [PATCH v2 0/2] block: Enable proper MMIO memory handling for P2P DMA Leon Romanovsky @ 2025-10-20 17:00 ` Leon Romanovsky 2025-10-20 23:36 ` Chaitanya Kulkarni 2025-10-22 6:14 ` Christoph Hellwig 2025-10-20 17:00 ` [PATCH v2 2/2] block-dma: properly take MMIO path Leon Romanovsky 1 sibling, 2 replies; 10+ messages in thread From: Leon Romanovsky @ 2025-10-20 17:00 UTC (permalink / raw) To: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg Cc: linux-block, linux-kernel, linux-nvme From: Leon Romanovsky <leonro@nvidia.com> After introduction of dma_map_phys(), there is no need to convert from physical address to struct page in order to map page. So let's use it directly. Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> --- block/blk-mq-dma.c | 4 ++-- drivers/nvme/host/pci.c | 27 +++++++++++++++------------ include/linux/blk-mq-dma.h | 1 + 3 files changed, 18 insertions(+), 14 deletions(-) diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c index 449950029872..4ba7b0323da4 100644 --- a/block/blk-mq-dma.c +++ b/block/blk-mq-dma.c @@ -93,8 +93,8 @@ static bool blk_dma_map_bus(struct blk_dma_iter *iter, struct phys_vec *vec) static bool blk_dma_map_direct(struct request *req, struct device *dma_dev, struct blk_dma_iter *iter, struct phys_vec *vec) { - iter->addr = dma_map_page(dma_dev, phys_to_page(vec->paddr), - offset_in_page(vec->paddr), vec->len, rq_dma_dir(req)); + iter->addr = dma_map_phys(dma_dev, vec->paddr, vec->len, + rq_dma_dir(req), 0); if (dma_mapping_error(dma_dev, iter->addr)) { iter->status = BLK_STS_RESOURCE; return false; diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index c916176bd9f0..91a8965754f0 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -685,20 +685,20 @@ static void nvme_free_descriptors(struct request *req) } } -static void nvme_free_prps(struct request *req) +static void nvme_free_prps(struct request *req, unsigned int attrs) { struct nvme_iod *iod = blk_mq_rq_to_pdu(req); struct nvme_queue *nvmeq = req->mq_hctx->driver_data; unsigned int i; for (i = 0; i < iod->nr_dma_vecs; i++) - dma_unmap_page(nvmeq->dev->dev, iod->dma_vecs[i].addr, - iod->dma_vecs[i].len, rq_dma_dir(req)); + dma_unmap_phys(nvmeq->dev->dev, iod->dma_vecs[i].addr, + iod->dma_vecs[i].len, rq_dma_dir(req), attrs); mempool_free(iod->dma_vecs, nvmeq->dev->dmavec_mempool); } static void nvme_free_sgls(struct request *req, struct nvme_sgl_desc *sge, - struct nvme_sgl_desc *sg_list) + struct nvme_sgl_desc *sg_list, unsigned int attrs) { struct nvme_queue *nvmeq = req->mq_hctx->driver_data; enum dma_data_direction dir = rq_dma_dir(req); @@ -707,13 +707,14 @@ static void nvme_free_sgls(struct request *req, struct nvme_sgl_desc *sge, unsigned int i; if (sge->type == (NVME_SGL_FMT_DATA_DESC << 4)) { - dma_unmap_page(dma_dev, le64_to_cpu(sge->addr), len, dir); + dma_unmap_phys(dma_dev, le64_to_cpu(sge->addr), len, dir, + attrs); return; } for (i = 0; i < len / sizeof(*sg_list); i++) - dma_unmap_page(dma_dev, le64_to_cpu(sg_list[i].addr), - le32_to_cpu(sg_list[i].length), dir); + dma_unmap_phys(dma_dev, le64_to_cpu(sg_list[i].addr), + le32_to_cpu(sg_list[i].length), dir, attrs); } static void nvme_unmap_metadata(struct request *req) @@ -723,6 +724,7 @@ static void nvme_unmap_metadata(struct request *req) struct nvme_iod *iod = blk_mq_rq_to_pdu(req); struct device *dma_dev = nvmeq->dev->dev; struct nvme_sgl_desc *sge = iod->meta_descriptor; + unsigned int attrs = 0; if (iod->flags & IOD_SINGLE_META_SEGMENT) { dma_unmap_page(dma_dev, iod->meta_dma, @@ -734,10 +736,10 @@ static void nvme_unmap_metadata(struct request *req) if (!blk_rq_integrity_dma_unmap(req, dma_dev, &iod->meta_dma_state, iod->meta_total_len)) { if (nvme_pci_cmd_use_meta_sgl(&iod->cmd)) - nvme_free_sgls(req, sge, &sge[1]); + nvme_free_sgls(req, sge, &sge[1], attrs); else - dma_unmap_page(dma_dev, iod->meta_dma, - iod->meta_total_len, dir); + dma_unmap_phys(dma_dev, iod->meta_dma, + iod->meta_total_len, dir, attrs); } if (iod->meta_descriptor) @@ -750,6 +752,7 @@ static void nvme_unmap_data(struct request *req) struct nvme_iod *iod = blk_mq_rq_to_pdu(req); struct nvme_queue *nvmeq = req->mq_hctx->driver_data; struct device *dma_dev = nvmeq->dev->dev; + unsigned int attrs = 0; if (iod->flags & IOD_SINGLE_SEGMENT) { static_assert(offsetof(union nvme_data_ptr, prp1) == @@ -762,9 +765,9 @@ static void nvme_unmap_data(struct request *req) if (!blk_rq_dma_unmap(req, dma_dev, &iod->dma_state, iod->total_len)) { if (nvme_pci_cmd_use_sgl(&iod->cmd)) nvme_free_sgls(req, iod->descriptors[0], - &iod->cmd.common.dptr.sgl); + &iod->cmd.common.dptr.sgl, attrs); else - nvme_free_prps(req); + nvme_free_prps(req, attrs); } if (iod->nr_descriptors) diff --git a/include/linux/blk-mq-dma.h b/include/linux/blk-mq-dma.h index 51829958d872..faf4dd574c62 100644 --- a/include/linux/blk-mq-dma.h +++ b/include/linux/blk-mq-dma.h @@ -16,6 +16,7 @@ struct blk_dma_iter { /* Output address range for this iteration */ dma_addr_t addr; u32 len; + unsigned int attrs; /* Status code. Only valid when blk_rq_dma_map_iter_* returned false */ blk_status_t status; -- 2.51.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] nvme-pci: migrate to dma_map_phys instead of map_page 2025-10-20 17:00 ` [PATCH v2 1/2] nvme-pci: migrate to dma_map_phys instead of map_page Leon Romanovsky @ 2025-10-20 23:36 ` Chaitanya Kulkarni 2025-10-22 6:14 ` Christoph Hellwig 1 sibling, 0 replies; 10+ messages in thread From: Chaitanya Kulkarni @ 2025-10-20 23:36 UTC (permalink / raw) To: Leon Romanovsky, Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org On 10/20/25 10:00, Leon Romanovsky wrote: > From: Leon Romanovsky<leonro@nvidia.com> > > After introduction of dma_map_phys(), there is no need to convert > from physical address to struct page in order to map page. So let's > use it directly. > > Reviewed-by: Keith Busch<kbusch@kernel.org> > Reviewed-by: Christoph Hellwig<hch@lst.de> > Signed-off-by: Leon Romanovsky<leonro@nvidia.com> Looks good. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> -ck ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] nvme-pci: migrate to dma_map_phys instead of map_page 2025-10-20 17:00 ` [PATCH v2 1/2] nvme-pci: migrate to dma_map_phys instead of map_page Leon Romanovsky 2025-10-20 23:36 ` Chaitanya Kulkarni @ 2025-10-22 6:14 ` Christoph Hellwig 2025-10-26 12:38 ` Leon Romanovsky 1 sibling, 1 reply; 10+ messages in thread From: Christoph Hellwig @ 2025-10-22 6:14 UTC (permalink / raw) To: Leon Romanovsky Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg, linux-block, linux-kernel, linux-nvme This actually has block and nvme bits, so the subject line should say that. > + unsigned int attrs = 0; attrs is always zero here, no need to start passing it for the map_phys conversion alone. > + unsigned int attrs = 0; Same here. > + unsigned int attrs; And this is also entirely unused as far as I can tell. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] nvme-pci: migrate to dma_map_phys instead of map_page 2025-10-22 6:14 ` Christoph Hellwig @ 2025-10-26 12:38 ` Leon Romanovsky 2025-10-27 6:49 ` Christoph Hellwig 0 siblings, 1 reply; 10+ messages in thread From: Leon Romanovsky @ 2025-10-26 12:38 UTC (permalink / raw) To: Christoph Hellwig Cc: Jens Axboe, Keith Busch, Sagi Grimberg, linux-block, linux-kernel, linux-nvme On Wed, Oct 22, 2025 at 08:14:18AM +0200, Christoph Hellwig wrote: > This actually has block and nvme bits, so the subject line should > say that. > > > + unsigned int attrs = 0; > > attrs is always zero here, no need to start passing it for the > map_phys conversion alone. > > > + unsigned int attrs = 0; > > Same here. It gave me more clean second patch where I only added new attribute, but if it doesn't look right to you, let's change. > > > + unsigned int attrs; > > And this is also entirely unused as far as I can tell. Right, it is used in second patch, will fix. Thanks ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/2] nvme-pci: migrate to dma_map_phys instead of map_page 2025-10-26 12:38 ` Leon Romanovsky @ 2025-10-27 6:49 ` Christoph Hellwig 0 siblings, 0 replies; 10+ messages in thread From: Christoph Hellwig @ 2025-10-27 6:49 UTC (permalink / raw) To: Leon Romanovsky Cc: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, linux-block, linux-kernel, linux-nvme On Sun, Oct 26, 2025 at 02:38:04PM +0200, Leon Romanovsky wrote: > On Wed, Oct 22, 2025 at 08:14:18AM +0200, Christoph Hellwig wrote: > > This actually has block and nvme bits, so the subject line should > > say that. > > > > > + unsigned int attrs = 0; > > > > attrs is always zero here, no need to start passing it for the > > map_phys conversion alone. > > > > > + unsigned int attrs = 0; > > > > Same here. > > It gave me more clean second patch where I only added new attribute, but > if it doesn't look right to you, let's change. The usual rule is do one thing at a time. There might be an occasinal slight bend of the rule to make life easier, but I don't think that really fits here. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 2/2] block-dma: properly take MMIO path 2025-10-20 17:00 [PATCH v2 0/2] block: Enable proper MMIO memory handling for P2P DMA Leon Romanovsky 2025-10-20 17:00 ` [PATCH v2 1/2] nvme-pci: migrate to dma_map_phys instead of map_page Leon Romanovsky @ 2025-10-20 17:00 ` Leon Romanovsky 2025-10-20 23:37 ` Chaitanya Kulkarni 2025-10-22 6:21 ` Christoph Hellwig 1 sibling, 2 replies; 10+ messages in thread From: Leon Romanovsky @ 2025-10-20 17:00 UTC (permalink / raw) To: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg Cc: linux-block, linux-kernel, linux-nvme From: Leon Romanovsky <leonro@nvidia.com> In commit eadaa8b255f3 ("dma-mapping: introduce new DMA attribute to indicate MMIO memory"), DMA_ATTR_MMIO attribute was added to describe MMIO addresses, which require to avoid any memory cache flushing, as an outcome of the discussion pointed in Link tag below. In case of PCI_P2PDMA_MAP_THRU_HOST_BRIDGE transfer, blk-mq-dm logic treated this as regular page and relied on "struct page" DMA flow. That flow performs CPU cache flushing, which shouldn't be done here, and doesn't set IOMMU_MMIO flag in DMA-IOMMU case. Link: https://lore.kernel.org/all/f912c446-1ae9-4390-9c11-00dce7bf0fd3@arm.com/ Signed-off-by: Leon Romanovsky <leonro@nvidia.com> --- block/blk-mq-dma.c | 6 ++++-- drivers/nvme/host/pci.c | 23 +++++++++++++++++++++-- include/linux/blk-integrity.h | 7 ++++--- include/linux/blk-mq-dma.h | 11 +++++++---- 4 files changed, 36 insertions(+), 11 deletions(-) diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c index 4ba7b0323da4..3ede8022b41c 100644 --- a/block/blk-mq-dma.c +++ b/block/blk-mq-dma.c @@ -94,7 +94,7 @@ static bool blk_dma_map_direct(struct request *req, struct device *dma_dev, struct blk_dma_iter *iter, struct phys_vec *vec) { iter->addr = dma_map_phys(dma_dev, vec->paddr, vec->len, - rq_dma_dir(req), 0); + rq_dma_dir(req), iter->attrs); if (dma_mapping_error(dma_dev, iter->addr)) { iter->status = BLK_STS_RESOURCE; return false; @@ -116,7 +116,7 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev, do { error = dma_iova_link(dma_dev, state, vec->paddr, mapped, - vec->len, dir, 0); + vec->len, dir, iter->attrs); if (error) break; mapped += vec->len; @@ -184,6 +184,8 @@ static bool blk_dma_map_iter_start(struct request *req, struct device *dma_dev, * P2P transfers through the host bridge are treated the * same as non-P2P transfers below and during unmap. */ + iter->attrs |= DMA_ATTR_MMIO; + fallthrough; case PCI_P2PDMA_MAP_NONE: break; default: diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 91a8965754f0..f45d1968611d 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -260,6 +260,12 @@ enum nvme_iod_flags { /* single segment dma mapping */ IOD_SINGLE_SEGMENT = 1U << 2, + /* Data payload contains MMIO memory */ + IOD_DATA_MMIO = 1U << 3, + + /* Metadata contains MMIO memory */ + IOD_META_MMIO = 1U << 4, + /* Metadata using non-coalesced MPTR */ IOD_SINGLE_META_SEGMENT = 1U << 5, }; @@ -733,8 +739,11 @@ static void nvme_unmap_metadata(struct request *req) return; } + if (iod->flags & IOD_META_MMIO) + attrs |= DMA_ATTR_MMIO; + if (!blk_rq_integrity_dma_unmap(req, dma_dev, &iod->meta_dma_state, - iod->meta_total_len)) { + iod->meta_total_len, attrs)) { if (nvme_pci_cmd_use_meta_sgl(&iod->cmd)) nvme_free_sgls(req, sge, &sge[1], attrs); else @@ -762,7 +771,11 @@ static void nvme_unmap_data(struct request *req) return; } - if (!blk_rq_dma_unmap(req, dma_dev, &iod->dma_state, iod->total_len)) { + if (iod->flags & IOD_DATA_MMIO) + attrs |= DMA_ATTR_MMIO; + + if (!blk_rq_dma_unmap(req, dma_dev, &iod->dma_state, iod->total_len, + attrs)) { if (nvme_pci_cmd_use_sgl(&iod->cmd)) nvme_free_sgls(req, iod->descriptors[0], &iod->cmd.common.dptr.sgl, attrs); @@ -1038,6 +1051,9 @@ static blk_status_t nvme_map_data(struct request *req) if (!blk_rq_dma_map_iter_start(req, dev->dev, &iod->dma_state, &iter)) return iter.status; + if (iter.attrs & DMA_ATTR_MMIO) + iod->flags |= IOD_DATA_MMIO; + if (use_sgl == SGL_FORCED || (use_sgl == SGL_SUPPORTED && (sgl_threshold && nvme_pci_avg_seg_size(req) >= sgl_threshold))) @@ -1060,6 +1076,9 @@ static blk_status_t nvme_pci_setup_meta_sgls(struct request *req) &iod->meta_dma_state, &iter)) return iter.status; + if (iter.attrs & DMA_ATTR_MMIO) + iod->flags |= IOD_META_MMIO; + if (blk_rq_dma_map_coalesce(&iod->meta_dma_state)) entries = 1; diff --git a/include/linux/blk-integrity.h b/include/linux/blk-integrity.h index b659373788f6..aa42172f5cc9 100644 --- a/include/linux/blk-integrity.h +++ b/include/linux/blk-integrity.h @@ -30,10 +30,11 @@ int blk_rq_map_integrity_sg(struct request *, struct scatterlist *); static inline bool blk_rq_integrity_dma_unmap(struct request *req, struct device *dma_dev, struct dma_iova_state *state, - size_t mapped_len) + size_t mapped_len, unsigned int attrs) { return blk_dma_unmap(req, dma_dev, state, mapped_len, - bio_integrity(req->bio)->bip_flags & BIP_P2P_DMA); + bio_integrity(req->bio)->bip_flags & BIP_P2P_DMA, + attrs); } int blk_rq_count_integrity_sg(struct request_queue *, struct bio *); @@ -126,7 +127,7 @@ static inline int blk_rq_map_integrity_sg(struct request *q, } static inline bool blk_rq_integrity_dma_unmap(struct request *req, struct device *dma_dev, struct dma_iova_state *state, - size_t mapped_len) + size_t mapped_len, unsigned int attrs) { return false; } diff --git a/include/linux/blk-mq-dma.h b/include/linux/blk-mq-dma.h index faf4dd574c62..aab4d04e6c69 100644 --- a/include/linux/blk-mq-dma.h +++ b/include/linux/blk-mq-dma.h @@ -50,19 +50,21 @@ static inline bool blk_rq_dma_map_coalesce(struct dma_iova_state *state) * @state: DMA IOVA state * @mapped_len: number of bytes to unmap * @is_p2p: true if mapped with PCI_P2PDMA_MAP_BUS_ADDR + * @attrs: DMA attributes * * Returns %false if the callers need to manually unmap every DMA segment * mapped using @iter or %true if no work is left to be done. */ static inline bool blk_dma_unmap(struct request *req, struct device *dma_dev, - struct dma_iova_state *state, size_t mapped_len, bool is_p2p) + struct dma_iova_state *state, size_t mapped_len, bool is_p2p, + unsigned int attrs) { if (is_p2p) return true; if (dma_use_iova(state)) { dma_iova_destroy(dma_dev, state, mapped_len, rq_dma_dir(req), - 0); + attrs); return true; } @@ -70,10 +72,11 @@ static inline bool blk_dma_unmap(struct request *req, struct device *dma_dev, } static inline bool blk_rq_dma_unmap(struct request *req, struct device *dma_dev, - struct dma_iova_state *state, size_t mapped_len) + struct dma_iova_state *state, size_t mapped_len, + unsigned int attrs) { return blk_dma_unmap(req, dma_dev, state, mapped_len, - req->cmd_flags & REQ_P2PDMA); + req->cmd_flags & REQ_P2PDMA, attrs); } #endif /* BLK_MQ_DMA_H */ -- 2.51.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] block-dma: properly take MMIO path 2025-10-20 17:00 ` [PATCH v2 2/2] block-dma: properly take MMIO path Leon Romanovsky @ 2025-10-20 23:37 ` Chaitanya Kulkarni 2025-10-22 6:21 ` Christoph Hellwig 1 sibling, 0 replies; 10+ messages in thread From: Chaitanya Kulkarni @ 2025-10-20 23:37 UTC (permalink / raw) To: Leon Romanovsky, Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org On 10/20/25 10:00, Leon Romanovsky wrote: > From: Leon Romanovsky<leonro@nvidia.com> > > In commit eadaa8b255f3 ("dma-mapping: introduce new DMA attribute to > indicate MMIO memory"), DMA_ATTR_MMIO attribute was added to describe > MMIO addresses, which require to avoid any memory cache flushing, as > an outcome of the discussion pointed in Link tag below. > > In case of PCI_P2PDMA_MAP_THRU_HOST_BRIDGE transfer, blk-mq-dm logic > treated this as regular page and relied on "struct page" DMA flow. > That flow performs CPU cache flushing, which shouldn't be done here, > and doesn't set IOMMU_MMIO flag in DMA-IOMMU case. > > Link:https://lore.kernel.org/all/f912c446-1ae9-4390-9c11-00dce7bf0fd3@arm.com/ > Signed-off-by: Leon Romanovsky<leonro@nvidia.com> Looks good. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> -ck ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] block-dma: properly take MMIO path 2025-10-20 17:00 ` [PATCH v2 2/2] block-dma: properly take MMIO path Leon Romanovsky 2025-10-20 23:37 ` Chaitanya Kulkarni @ 2025-10-22 6:21 ` Christoph Hellwig 2025-10-26 13:30 ` Leon Romanovsky 1 sibling, 1 reply; 10+ messages in thread From: Christoph Hellwig @ 2025-10-22 6:21 UTC (permalink / raw) To: Leon Romanovsky Cc: Jens Axboe, Keith Busch, Christoph Hellwig, Sagi Grimberg, linux-block, linux-kernel, linux-nvme On Mon, Oct 20, 2025 at 08:00:21PM +0300, Leon Romanovsky wrote: > From: Leon Romanovsky <leonro@nvidia.com> > > In commit eadaa8b255f3 ("dma-mapping: introduce new DMA attribute to > indicate MMIO memory"), DMA_ATTR_MMIO attribute was added to describe > MMIO addresses, which require to avoid any memory cache flushing, as > an outcome of the discussion pointed in Link tag below. > > In case of PCI_P2PDMA_MAP_THRU_HOST_BRIDGE transfer, blk-mq-dm logic > treated this as regular page and relied on "struct page" DMA flow. > That flow performs CPU cache flushing, which shouldn't be done here, > and doesn't set IOMMU_MMIO flag in DMA-IOMMU case. > > Link: https://lore.kernel.org/all/f912c446-1ae9-4390-9c11-00dce7bf0fd3@arm.com/ > Signed-off-by: Leon Romanovsky <leonro@nvidia.com> > --- > block/blk-mq-dma.c | 6 ++++-- > drivers/nvme/host/pci.c | 23 +++++++++++++++++++++-- > include/linux/blk-integrity.h | 7 ++++--- > include/linux/blk-mq-dma.h | 11 +++++++---- > 4 files changed, 36 insertions(+), 11 deletions(-) > > diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c > index 4ba7b0323da4..3ede8022b41c 100644 > --- a/block/blk-mq-dma.c > +++ b/block/blk-mq-dma.c > @@ -94,7 +94,7 @@ static bool blk_dma_map_direct(struct request *req, struct device *dma_dev, > struct blk_dma_iter *iter, struct phys_vec *vec) > { > iter->addr = dma_map_phys(dma_dev, vec->paddr, vec->len, > - rq_dma_dir(req), 0); > + rq_dma_dir(req), iter->attrs); > if (dma_mapping_error(dma_dev, iter->addr)) { > iter->status = BLK_STS_RESOURCE; > return false; > @@ -116,7 +116,7 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev, > > do { > error = dma_iova_link(dma_dev, state, vec->paddr, mapped, > - vec->len, dir, 0); > + vec->len, dir, iter->attrs); > if (error) > break; > mapped += vec->len; > @@ -184,6 +184,8 @@ static bool blk_dma_map_iter_start(struct request *req, struct device *dma_dev, > * P2P transfers through the host bridge are treated the > * same as non-P2P transfers below and during unmap. > */ > + iter->attrs |= DMA_ATTR_MMIO; DMA_ATTR_MMIO is the only flags in iter->attrs, and I can't see any other DMA mapping flag that would fit here. So I'd rather store the enum pci_p2pdma_map_type here, which also removes the need for REQ_P2PDMA and BIP_P2P_DMA when propagating that to nvme. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] block-dma: properly take MMIO path 2025-10-22 6:21 ` Christoph Hellwig @ 2025-10-26 13:30 ` Leon Romanovsky 0 siblings, 0 replies; 10+ messages in thread From: Leon Romanovsky @ 2025-10-26 13:30 UTC (permalink / raw) To: Christoph Hellwig Cc: Jens Axboe, Keith Busch, Sagi Grimberg, linux-block, linux-kernel, linux-nvme On Wed, Oct 22, 2025 at 08:21:35AM +0200, Christoph Hellwig wrote: > On Mon, Oct 20, 2025 at 08:00:21PM +0300, Leon Romanovsky wrote: > > From: Leon Romanovsky <leonro@nvidia.com> > > > > In commit eadaa8b255f3 ("dma-mapping: introduce new DMA attribute to > > indicate MMIO memory"), DMA_ATTR_MMIO attribute was added to describe > > MMIO addresses, which require to avoid any memory cache flushing, as > > an outcome of the discussion pointed in Link tag below. > > > > In case of PCI_P2PDMA_MAP_THRU_HOST_BRIDGE transfer, blk-mq-dm logic > > treated this as regular page and relied on "struct page" DMA flow. > > That flow performs CPU cache flushing, which shouldn't be done here, > > and doesn't set IOMMU_MMIO flag in DMA-IOMMU case. > > > > Link: https://lore.kernel.org/all/f912c446-1ae9-4390-9c11-00dce7bf0fd3@arm.com/ > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com> > > --- > > block/blk-mq-dma.c | 6 ++++-- > > drivers/nvme/host/pci.c | 23 +++++++++++++++++++++-- > > include/linux/blk-integrity.h | 7 ++++--- > > include/linux/blk-mq-dma.h | 11 +++++++---- > > 4 files changed, 36 insertions(+), 11 deletions(-) > > > > diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c > > index 4ba7b0323da4..3ede8022b41c 100644 > > --- a/block/blk-mq-dma.c > > +++ b/block/blk-mq-dma.c > > @@ -94,7 +94,7 @@ static bool blk_dma_map_direct(struct request *req, struct device *dma_dev, > > struct blk_dma_iter *iter, struct phys_vec *vec) > > { > > iter->addr = dma_map_phys(dma_dev, vec->paddr, vec->len, > > - rq_dma_dir(req), 0); > > + rq_dma_dir(req), iter->attrs); > > if (dma_mapping_error(dma_dev, iter->addr)) { > > iter->status = BLK_STS_RESOURCE; > > return false; > > @@ -116,7 +116,7 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev, > > > > do { > > error = dma_iova_link(dma_dev, state, vec->paddr, mapped, > > - vec->len, dir, 0); > > + vec->len, dir, iter->attrs); > > if (error) > > break; > > mapped += vec->len; > > @@ -184,6 +184,8 @@ static bool blk_dma_map_iter_start(struct request *req, struct device *dma_dev, > > * P2P transfers through the host bridge are treated the > > * same as non-P2P transfers below and during unmap. > > */ > > + iter->attrs |= DMA_ATTR_MMIO; > > DMA_ATTR_MMIO is the only flags in iter->attrs, and I can't see any other > DMA mapping flag that would fit here. So I'd rather store the > enum pci_p2pdma_map_type here, which also removes the need for REQ_P2PDMA > and BIP_P2P_DMA when propagating that to nvme. It is already stored in iter->p2pdma.map, will reuse it. Thanks > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-10-27 6:49 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-20 17:00 [PATCH v2 0/2] block: Enable proper MMIO memory handling for P2P DMA Leon Romanovsky 2025-10-20 17:00 ` [PATCH v2 1/2] nvme-pci: migrate to dma_map_phys instead of map_page Leon Romanovsky 2025-10-20 23:36 ` Chaitanya Kulkarni 2025-10-22 6:14 ` Christoph Hellwig 2025-10-26 12:38 ` Leon Romanovsky 2025-10-27 6:49 ` Christoph Hellwig 2025-10-20 17:00 ` [PATCH v2 2/2] block-dma: properly take MMIO path Leon Romanovsky 2025-10-20 23:37 ` Chaitanya Kulkarni 2025-10-22 6:21 ` Christoph Hellwig 2025-10-26 13:30 ` Leon Romanovsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).