From: Naman Jain <namjain@linux.microsoft.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>, Chaitanya Kulkarni <kch@nvidia.com>,
John Hubbard <jhubbard@nvidia.com>,
Logan Gunthorpe <logang@deltatee.com>,
linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
Saurabh Sengar <ssengar@linux.microsoft.com>,
Long Li <longli@microsoft.com>,
Michael Kelley <mhklinux@outlook.com>
Subject: Re: [PATCH 2/2] block: allow different-pgmap pages as separate bvecs in bio_add_page
Date: Thu, 2 Apr 2026 10:51:05 +0530 [thread overview]
Message-ID: <70c82c3a-d135-4877-ab46-c15d329815f5@linux.microsoft.com> (raw)
In-Reply-To: <20260401140850.GC21703@lst.de>
On 4/1/2026 7:38 PM, Christoph Hellwig wrote:
> On Wed, Apr 01, 2026 at 08:23:29AM +0000, Naman Jain wrote:
>> bio_add_page() and bio_integrity_add_page() reject pages from a
>> different dev_pagemap entirely, returning 0 even when the page could
>> be added as a new bvec entry. The pgmap check was intended only to
>> prevent merging into the same bvec segment, not to block the page
>> from being added at all.
>>
>> This causes callers to unnecessarily start a new bio when a buffer
>> spans pages from two different pgmaps, even though the bio has room
>> for another bvec.
>
> This is not unnecessary. A single dma mapping operation can only
> map a single target pgmap. The old SG API works around this by
> doing multiple mapping operation underneath, but compared to that
> just having multiple bios is much easier and more efficient.
>
> What is your use case here?
Hello Christoph,
Thanks for reviewing these patches.
The use case driving this patch is the MSHV VTL driver
(drivers/hv/mshv_vtl_main.c) for VMs with paravisor architecture
(OpenHCL/OpenVMM: https://openvmm.dev/guide/index.html).
In this setup, the guest runs at two Virtual Trust Levels:
- VTL2 (higher privilege): runs a Linux kernel acting as "paravisor"
that handles device I/O on behalf of the guest
- VTL0 (lower privilege): runs the actual guest OS (Windows/Linux)
VTL2 Linux performs block I/O (NVMe, SCSI, etc.) using VTL0's memory as
DMA buffers. To enable this, VTL0 memory is registered into the VTL2
kernel via the MSHV_ADD_VTL0_MEMORY ioctl, which calls
devm_memremap_pages() to create MEMORY_DEVICE_GENERIC zone device pages.
The ioctl is called multiple times, by the Virtual Machine Manager
(VMM), registering VTL0's physical address space in chunks. Each call
creates a separate dev_pagemap. This chunking is necessary because:
1. Firmware/UEFI fragments the guest physical address space (MMIO holes,
reserved regions)
2. Alignment constraints: vmemmap_shift is computed from the range
alignment, and highly aligned large ranges can exceed MAX_FOLIO_ORDER,
causing devm_memremap_pages() to fail
When a direct I/O request spans pages from different chunks (different
pgmaps), the current code rejected the second page entirely:
if (!zone_device_pages_have_same_pgmap(bv->bv_page, page))
return 0; // Rejection - forces bio split or I/O error
Both chunks are regular RAM from the DMA perspective
(MEMORY_DEVICE_GENERIC, not P2PDMA). The only requirement is that they
not be merged into the same bvec segment, which patch 1/2 enforces by
adding the pgmap check to biovec_phys_mergeable().
This patch allows pages from different pgmaps to be added as separate
bvec entries in the same bio, eliminating bio splits and I/O failures
when buffers span pgmap boundaries.
I noticed this while doing kernel upgrade from 6.12 to 6.18 for OpenHCL
kernel.
There's this another concern flagged from Sashiko code review:
https://sashiko.dev/#/patchset/20260401082329.1602328-1-namjain%40linux.microsoft.com
From my code analysis, this issue would not happening as of now, so
this is future proofing the APIs after change 2/2. I would need to add a
check like this to fix this:
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 3462697331890..6f2f30a814560 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -231,6 +231,9 @@ int bio_integrity_add_page(struct bio *bio, struct
page *page,
if (bip->bip_vcnt > 0) {
struct bio_vec *bv = &bip->bip_vec[bip->bip_vcnt - 1];
+ if (is_pci_p2pdma_page(bv->bv_page) !=
+ is_pci_p2pdma_page(page))
+ return 0;
if (zone_device_pages_have_same_pgmap(bv->bv_page, page) &&
bvec_try_merge_hw_page(q, bv, page, len, offset)) {
bip->bip_iter.bi_size += len;
diff --git a/block/bio.c b/block/bio.c
index 7715e59e68613..6216a554de68b 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1034,6 +1034,9 @@ int bio_add_page(struct bio *bio, struct page *page,
if (bio->bi_vcnt > 0) {
struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
+ if (is_pci_p2pdma_page(bv->bv_page) !=
+ is_pci_p2pdma_page(page))
+ return 0;
if (zone_device_pages_have_same_pgmap(bv->bv_page, page) &&
bvec_try_merge_page(bv, page, len, offset)) {
bio->bi_iter.bi_size += len;
Please let me know what you think about this.
Thanks,
Naman
next prev parent reply other threads:[~2026-04-02 5:21 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-01 8:23 [PATCH 0/2] block: fix pgmap handling for zone device pages in bio merge paths Naman Jain
2026-04-01 8:23 ` [PATCH 1/2] block: add pgmap check to biovec_phys_mergeable Naman Jain
2026-04-01 14:07 ` Christoph Hellwig
2026-04-01 8:23 ` [PATCH 2/2] block: allow different-pgmap pages as separate bvecs in bio_add_page Naman Jain
2026-04-01 14:08 ` Christoph Hellwig
2026-04-02 5:21 ` Naman Jain [this message]
2026-04-02 5:30 ` Christoph Hellwig
2026-04-02 8:55 ` Naman Jain
2026-04-07 5:52 ` Christoph Hellwig
2026-04-07 7:08 ` Naman Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=70c82c3a-d135-4877-ab46-c15d329815f5@linux.microsoft.com \
--to=namjain@linux.microsoft.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=jhubbard@nvidia.com \
--cc=kch@nvidia.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=longli@microsoft.com \
--cc=mhklinux@outlook.com \
--cc=ssengar@linux.microsoft.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox