* [PATCH v2 0/2] block: fix pgmap handling for zone device pages in bio merge paths
@ 2026-04-10 15:34 Naman Jain
2026-04-10 15:34 ` [PATCH v2 1/2] block: add pgmap check to biovec_phys_mergeable Naman Jain
2026-04-10 15:34 ` [PATCH v2 2/2] block: relax pgmap check in bio_add_page for compatible zone device pages Naman Jain
0 siblings, 2 replies; 3+ messages in thread
From: Naman Jain @ 2026-04-10 15:34 UTC (permalink / raw)
To: Jens Axboe
Cc: Christoph Hellwig, Chaitanya Kulkarni, John Hubbard,
Logan Gunthorpe, linux-kernel, linux-block, Saurabh Sengar,
Long Li, Michael Kelley, namjain
When zone device memory is registered in multiple chunks, each chunk
gets its own dev_pagemap. A single bio can contain bvecs from different
pgmaps -- iov_iter_extract_bvecs() breaks at pgmap boundaries but the
outer loop in bio_iov_iter_get_pages() continues filling the same bio.
There are two problems with the current code:
1. biovec_phys_mergeable() has no pgmap check, so the request merge,
DMA mapping, and integrity merge paths can coalesce physically
contiguous bvec segments from different pgmaps. This makes it
impossible to recover the correct pgmap for the merged segment
via page_pgmap().
2. bio_add_page() and bio_integrity_add_page() reject pages from a
different pgmap entirely (returning 0), rather than just skipping
the merge and adding them as new bvec entries. This forces callers
to start a new bio unnecessarily.
Patch 1 fixes the merge-path gap by adding a pgmap check to
biovec_phys_mergeable().
Patch 2 introduces zone_device_pages_compatible() which replaces the
blanket zone_device_pages_have_same_pgmap() rejection in bio_add_page()
and bio_integrity_add_page(). Pages that are safe to coexist as separate
bvec entries (e.g. MEMORY_DEVICE_GENERIC from different pgmaps) are now
accepted, while P2PDMA pages from different pgmaps or mixed P2PDMA and
non-P2PDMA pages are still rejected, since the DMA iterator caches the
P2PDMA mapping state from the first segment.
zone_device_pages_have_same_pgmap() is kept as a merge guard so pages
from different pgmaps are not coalesced into the same bvec segment.
Changes since v1:
https://lore.kernel.org/all/20260401082329.1602328-1-namjain@linux.microsoft.com/
- Reworked patch 2 to introduce zone_device_pages_compatible() which
rejects P2PDMA pages from different pgmaps at the bio-building level,
not just at merge time. The previous version only moved the pgmap check
into the merge conditional without preventing incompatible pages from
being added as separate bvec entries. (Christoph Hellwig)
Naman Jain (2):
block: add pgmap check to biovec_phys_mergeable
block: relax pgmap check in bio_add_page for compatible zone device
pages
block/bio-integrity.c | 6 +++---
block/bio.c | 6 +++---
block/blk.h | 21 +++++++++++++++++++++
3 files changed, 27 insertions(+), 6 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH v2 1/2] block: add pgmap check to biovec_phys_mergeable
2026-04-10 15:34 [PATCH v2 0/2] block: fix pgmap handling for zone device pages in bio merge paths Naman Jain
@ 2026-04-10 15:34 ` Naman Jain
2026-04-10 15:34 ` [PATCH v2 2/2] block: relax pgmap check in bio_add_page for compatible zone device pages Naman Jain
1 sibling, 0 replies; 3+ messages in thread
From: Naman Jain @ 2026-04-10 15:34 UTC (permalink / raw)
To: Jens Axboe
Cc: Christoph Hellwig, Chaitanya Kulkarni, John Hubbard,
Logan Gunthorpe, linux-kernel, linux-block, Saurabh Sengar,
Long Li, Michael Kelley, namjain
biovec_phys_mergeable() is used by the request merge, DMA mapping,
and integrity merge paths to decide if two physically contiguous
bvec segments can be coalesced into one. It currently has no check
for whether the segments belong to different dev_pagemaps.
When zone device memory is registered in multiple chunks, each chunk
gets its own dev_pagemap. A single bio can legitimately contain
bvecs from different pgmaps -- iov_iter_extract_bvecs() breaks at
pgmap boundaries but the outer loop in bio_iov_iter_get_pages()
continues filling the same bio. If such bvecs are physically
contiguous, biovec_phys_mergeable() will coalesce them, making it
impossible to recover the correct pgmap for the merged segment
via page_pgmap().
Add a zone_device_pages_have_same_pgmap() check to prevent merging
bvec segments that span different pgmaps.
Fixes: 49580e690755 ("block: add check when merging zone device pages")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
---
block/blk.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/block/blk.h b/block/blk.h
index ec4674cdf2ead..50a41db039133 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -127,6 +127,8 @@ static inline bool biovec_phys_mergeable(struct request_queue *q,
if (addr1 + vec1->bv_len != addr2)
return false;
+ if (!zone_device_pages_have_same_pgmap(vec1->bv_page, vec2->bv_page))
+ return false;
if (xen_domain() && !xen_biovec_phys_mergeable(vec1, vec2->bv_page))
return false;
if ((addr1 | mask) != ((addr2 + vec2->bv_len - 1) | mask))
--
2.43.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH v2 2/2] block: relax pgmap check in bio_add_page for compatible zone device pages
2026-04-10 15:34 [PATCH v2 0/2] block: fix pgmap handling for zone device pages in bio merge paths Naman Jain
2026-04-10 15:34 ` [PATCH v2 1/2] block: add pgmap check to biovec_phys_mergeable Naman Jain
@ 2026-04-10 15:34 ` Naman Jain
1 sibling, 0 replies; 3+ messages in thread
From: Naman Jain @ 2026-04-10 15:34 UTC (permalink / raw)
To: Jens Axboe
Cc: Christoph Hellwig, Chaitanya Kulkarni, John Hubbard,
Logan Gunthorpe, linux-kernel, linux-block, Saurabh Sengar,
Long Li, Michael Kelley, namjain
bio_add_page() and bio_integrity_add_page() reject pages from different
dev_pagemaps entirely, returning 0 even when those pages have compatible
DMA mapping requirements. This forces callers to start a new bio when
buffers span pgmap boundaries, even though the pages could safely coexist
as separate bvec entries.
This matters for guests where memory is registered through
devm_memremap_pages() with MEMORY_DEVICE_GENERIC in multiple calls,
creating separate dev_pagemaps for each chunk. When a direct I/O buffer
spans two such chunks, bio_add_page() rejects the second page, forcing an
unnecessary bio split or I/O failure.
Introduce zone_device_pages_compatible() in blk.h to check whether two
pages can coexist in the same bio as separate bvec entries. The block DMA
iterator (blk_dma_map_iter_start) caches the P2PDMA mapping state from the
first segment and applies it to all others, so P2PDMA pages from different
pgmaps must not be mixed, and neither must P2PDMA and non-P2PDMA pages.
All other combinations (MEMORY_DEVICE_GENERIC pages from different pgmaps,
or MEMORY_DEVICE_GENERIC with normal RAM) use the same dma_map_phys path
and are safe.
Replace the blanket zone_device_pages_have_same_pgmap() rejection with
zone_device_pages_compatible(), while keeping
zone_device_pages_have_same_pgmap() as a merge guard.
Pages from different pgmaps can be added as separate bvec entries but
must not be coalesced into the same segment, as that would make
it impossible to recover the correct pgmap via page_pgmap().
Fixes: 49580e690755 ("block: add check when merging zone device pages")
Cc: stable@vger.kernel.org
Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
---
block/bio-integrity.c | 6 +++---
block/bio.c | 6 +++---
block/blk.h | 19 +++++++++++++++++++
3 files changed, 25 insertions(+), 6 deletions(-)
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index e79eaf0477943..e54c6e06e1cbb 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -231,10 +231,10 @@ int bio_integrity_add_page(struct bio *bio, struct page *page,
if (bip->bip_vcnt > 0) {
struct bio_vec *bv = &bip->bip_vec[bip->bip_vcnt - 1];
- if (!zone_device_pages_have_same_pgmap(bv->bv_page, page))
+ if (!zone_device_pages_compatible(bv->bv_page, page))
return 0;
-
- if (bvec_try_merge_hw_page(q, bv, page, len, offset)) {
+ if (zone_device_pages_have_same_pgmap(bv->bv_page, page) &&
+ bvec_try_merge_hw_page(q, bv, page, len, offset)) {
bip->bip_iter.bi_size += len;
return len;
}
diff --git a/block/bio.c b/block/bio.c
index 641ef0928d735..c52a0bd1e8993 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1048,10 +1048,10 @@ int bio_add_page(struct bio *bio, struct page *page,
if (bio->bi_vcnt > 0) {
struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
- if (!zone_device_pages_have_same_pgmap(bv->bv_page, page))
+ if (!zone_device_pages_compatible(bv->bv_page, page))
return 0;
-
- if (bvec_try_merge_page(bv, page, len, offset)) {
+ if (zone_device_pages_have_same_pgmap(bv->bv_page, page) &&
+ bvec_try_merge_page(bv, page, len, offset)) {
bio->bi_iter.bi_size += len;
return len;
}
diff --git a/block/blk.h b/block/blk.h
index 50a41db039133..b998a7761faf3 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -136,6 +136,25 @@ static inline bool biovec_phys_mergeable(struct request_queue *q,
return true;
}
+/*
+ * Check if two pages from potentially different zone device pgmaps can
+ * coexist as separate bvec entries in the same bio.
+ *
+ * The block DMA iterator (blk_dma_map_iter_start) caches the P2PDMA mapping
+ * state from the first segment and applies it to all subsequent segments, so
+ * P2PDMA pages from different pgmaps must not be mixed in the same bio.
+ *
+ * Other zone device types (FS_DAX, GENERIC) use the same dma_map_phys() path
+ * as normal RAM. PRIVATE and COHERENT pages never appear in bios.
+ */
+static inline bool zone_device_pages_compatible(const struct page *a,
+ const struct page *b)
+{
+ if (is_pci_p2pdma_page(a) || is_pci_p2pdma_page(b))
+ return zone_device_pages_have_same_pgmap(a, b);
+ return true;
+}
+
static inline bool __bvec_gap_to_prev(const struct queue_limits *lim,
struct bio_vec *bprv, unsigned int offset)
{
--
2.43.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-04-10 15:34 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-10 15:34 [PATCH v2 0/2] block: fix pgmap handling for zone device pages in bio merge paths Naman Jain
2026-04-10 15:34 ` [PATCH v2 1/2] block: add pgmap check to biovec_phys_mergeable Naman Jain
2026-04-10 15:34 ` [PATCH v2 2/2] block: relax pgmap check in bio_add_page for compatible zone device pages Naman Jain
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox