From: Naman Jain <namjain@linux.microsoft.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>, Chaitanya Kulkarni <kch@nvidia.com>,
John Hubbard <jhubbard@nvidia.com>,
Logan Gunthorpe <logang@deltatee.com>,
linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
Saurabh Sengar <ssengar@linux.microsoft.com>,
Long Li <longli@microsoft.com>,
Michael Kelley <mhklinux@outlook.com>
Subject: Re: [PATCH 2/2] block: allow different-pgmap pages as separate bvecs in bio_add_page
Date: Thu, 2 Apr 2026 14:25:14 +0530 [thread overview]
Message-ID: <d48427c2-9aa3-42a2-a8d7-43a7a3b3b685@linux.microsoft.com> (raw)
In-Reply-To: <20260402053057.GA11783@lst.de>
On 4/2/2026 11:00 AM, Christoph Hellwig wrote:
> On Thu, Apr 02, 2026 at 10:51:05AM +0530, Naman Jain wrote:
>> When a direct I/O request spans pages from different chunks (different
>> pgmaps), the current code rejected the second page entirely:
>>
>> if (!zone_device_pages_have_same_pgmap(bv->bv_page, page))
>> return 0; // Rejection - forces bio split or I/O error
>>
>> Both chunks are regular RAM from the DMA perspective
>> (MEMORY_DEVICE_GENERIC, not P2PDMA). The only requirement is that they not
>> be merged into the same bvec segment, which patch 1/2 enforces by adding
>> the pgmap check to biovec_phys_mergeable().
>>
>> This patch allows pages from different pgmaps to be added as separate bvec
>> entries in the same bio, eliminating bio splits and I/O failures
>> when buffers span pgmap boundaries.
>
> Which as I said we can't do in general, as different pgmaps cna have
> different DMA mapping requirements. We might be able to relax this
> if we know multiple pgmaps can be mapped in the same way. I.e.
> replace zone_device_pages_have_same_pgmap with
> zone_device_pages_compatible and add additional conditions to it.
>
>> --- a/block/bio-integrity.c
>> +++ b/block/bio-integrity.c
>> @@ -231,6 +231,9 @@ int bio_integrity_add_page(struct bio *bio, struct page
>> *page,
>> if (bip->bip_vcnt > 0) {
>> struct bio_vec *bv = &bip->bip_vec[bip->bip_vcnt - 1];
>>
>> + if (is_pci_p2pdma_page(bv->bv_page) !=
>> + is_pci_p2pdma_page(page))
>> + return 0;
>> if (zone_device_pages_have_same_pgmap(bv->bv_page, page) &&
>
> The above is implied by not having the same pgmap.
Thanks. If I understand correctly, here is how this would look like.
Please let me know if this is what you suggested.
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index e79eaf0477943..e54c6e06e1cbb 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -231,10 +231,10 @@ int bio_integrity_add_page(struct bio *bio, struct
page *page,
if (bip->bip_vcnt > 0) {
struct bio_vec *bv = &bip->bip_vec[bip->bip_vcnt - 1];
- if (!zone_device_pages_have_same_pgmap(bv->bv_page, page))
+ if (!zone_device_pages_compatible(bv->bv_page, page))
return 0;
-
- if (bvec_try_merge_hw_page(q, bv, page, len, offset)) {
+ if (zone_device_pages_have_same_pgmap(bv->bv_page, page) &&
+ bvec_try_merge_hw_page(q, bv, page, len, offset)) {
bip->bip_iter.bi_size += len;
return len;
}
diff --git a/block/bio.c b/block/bio.c
index 77067fa346d35..0e70bb912338c 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1034,10 +1034,10 @@ int bio_add_page(struct bio *bio, struct page *page,
if (bio->bi_vcnt > 0) {
struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
- if (!zone_device_pages_have_same_pgmap(bv->bv_page, page))
+ if (!zone_device_pages_compatible(bv->bv_page, page))
return 0;
-
- if (bvec_try_merge_page(bv, page, len, offset)) {
+ if (zone_device_pages_have_same_pgmap(bv->bv_page, page) &&
+ bvec_try_merge_page(bv, page, len, offset)) {
bio->bi_iter.bi_size += len;
return len;
}
diff --git a/block/blk.h b/block/blk.h
index 0cb3441638284..c5710ba4c81b9 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -136,6 +136,23 @@ static inline bool biovec_phys_mergeable(struct
request_queue *q,
return true;
}
+/*
+ * Check if two pages from potentially different zone device pgmaps can
+ * coexist as separate bvec entries in the same bio.
+ *
+ * The block DMA iterator (blk_dma_map_iter_start) caches the P2PDMA
mapping
+ * state from the first segment and applies it to all subsequent
segments, so
+ * P2PDMA and non-P2PDMA pages must never be mixed in the same bio.
+ *
+ * Other zone device types (FS_DAX, GENERIC) use the same
dma_map_phys() path
+ * as normal RAM. PRIVATE and COHERENT pages never appear in bios.
+ */
+static inline bool zone_device_pages_compatible(const struct page *a,
+ const struct page *b)
+{
+ return is_pci_p2pdma_page(a) == is_pci_p2pdma_page(b);
+}
+
static inline bool __bvec_gap_to_prev(const struct queue_limits *lim,
struct bio_vec *bprv, unsigned int offset)
{
next prev parent reply other threads:[~2026-04-02 8:55 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-01 8:23 [PATCH 0/2] block: fix pgmap handling for zone device pages in bio merge paths Naman Jain
2026-04-01 8:23 ` [PATCH 1/2] block: add pgmap check to biovec_phys_mergeable Naman Jain
2026-04-01 14:07 ` Christoph Hellwig
2026-04-01 8:23 ` [PATCH 2/2] block: allow different-pgmap pages as separate bvecs in bio_add_page Naman Jain
2026-04-01 14:08 ` Christoph Hellwig
2026-04-02 5:21 ` Naman Jain
2026-04-02 5:30 ` Christoph Hellwig
2026-04-02 8:55 ` Naman Jain [this message]
2026-04-07 5:52 ` Christoph Hellwig
2026-04-07 7:08 ` Naman Jain
2026-04-08 6:08 ` Christoph Hellwig
2026-04-10 3:38 ` Naman Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d48427c2-9aa3-42a2-a8d7-43a7a3b3b685@linux.microsoft.com \
--to=namjain@linux.microsoft.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=jhubbard@nvidia.com \
--cc=kch@nvidia.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=longli@microsoft.com \
--cc=mhklinux@outlook.com \
--cc=ssengar@linux.microsoft.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.