From: Naman Jain <namjain@linux.microsoft.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>, Chaitanya Kulkarni <kch@nvidia.com>,
John Hubbard <jhubbard@nvidia.com>,
Logan Gunthorpe <logang@deltatee.com>,
linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
Saurabh Sengar <ssengar@linux.microsoft.com>,
Long Li <longli@microsoft.com>,
Michael Kelley <mhklinux@outlook.com>
Subject: Re: [PATCH 2/2] block: allow different-pgmap pages as separate bvecs in bio_add_page
Date: Thu, 2 Apr 2026 10:51:05 +0530 [thread overview]
Message-ID: <70c82c3a-d135-4877-ab46-c15d329815f5@linux.microsoft.com> (raw)
In-Reply-To: <20260401140850.GC21703@lst.de>
On 4/1/2026 7:38 PM, Christoph Hellwig wrote:
> On Wed, Apr 01, 2026 at 08:23:29AM +0000, Naman Jain wrote:
>> bio_add_page() and bio_integrity_add_page() reject pages from a
>> different dev_pagemap entirely, returning 0 even when the page could
>> be added as a new bvec entry. The pgmap check was intended only to
>> prevent merging into the same bvec segment, not to block the page
>> from being added at all.
>>
>> This causes callers to unnecessarily start a new bio when a buffer
>> spans pages from two different pgmaps, even though the bio has room
>> for another bvec.
>
> This is not unnecessary. A single dma mapping operation can only
> map a single target pgmap. The old SG API works around this by
> doing multiple mapping operation underneath, but compared to that
> just having multiple bios is much easier and more efficient.
>
> What is your use case here?
Hello Christoph,
Thanks for reviewing these patches.
The use case driving this patch is the MSHV VTL driver
(drivers/hv/mshv_vtl_main.c) for VMs with paravisor architecture
(OpenHCL/OpenVMM: https://openvmm.dev/guide/index.html).
In this setup, the guest runs at two Virtual Trust Levels:
- VTL2 (higher privilege): runs a Linux kernel acting as "paravisor"
that handles device I/O on behalf of the guest
- VTL0 (lower privilege): runs the actual guest OS (Windows/Linux)
VTL2 Linux performs block I/O (NVMe, SCSI, etc.) using VTL0's memory as
DMA buffers. To enable this, VTL0 memory is registered into the VTL2
kernel via the MSHV_ADD_VTL0_MEMORY ioctl, which calls
devm_memremap_pages() to create MEMORY_DEVICE_GENERIC zone device pages.
The ioctl is called multiple times, by the Virtual Machine Manager
(VMM), registering VTL0's physical address space in chunks. Each call
creates a separate dev_pagemap. This chunking is necessary because:
1. Firmware/UEFI fragments the guest physical address space (MMIO holes,
reserved regions)
2. Alignment constraints: vmemmap_shift is computed from the range
alignment, and highly aligned large ranges can exceed MAX_FOLIO_ORDER,
causing devm_memremap_pages() to fail
When a direct I/O request spans pages from different chunks (different
pgmaps), the current code rejected the second page entirely:
if (!zone_device_pages_have_same_pgmap(bv->bv_page, page))
return 0; // Rejection - forces bio split or I/O error
Both chunks are regular RAM from the DMA perspective
(MEMORY_DEVICE_GENERIC, not P2PDMA). The only requirement is that they
not be merged into the same bvec segment, which patch 1/2 enforces by
adding the pgmap check to biovec_phys_mergeable().
This patch allows pages from different pgmaps to be added as separate
bvec entries in the same bio, eliminating bio splits and I/O failures
when buffers span pgmap boundaries.
I noticed this while doing kernel upgrade from 6.12 to 6.18 for OpenHCL
kernel.
There's this another concern flagged from Sashiko code review:
https://sashiko.dev/#/patchset/20260401082329.1602328-1-namjain%40linux.microsoft.com
From my code analysis, this issue would not happening as of now, so
this is future proofing the APIs after change 2/2. I would need to add a
check like this to fix this:
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 3462697331890..6f2f30a814560 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -231,6 +231,9 @@ int bio_integrity_add_page(struct bio *bio, struct
page *page,
if (bip->bip_vcnt > 0) {
struct bio_vec *bv = &bip->bip_vec[bip->bip_vcnt - 1];
+ if (is_pci_p2pdma_page(bv->bv_page) !=
+ is_pci_p2pdma_page(page))
+ return 0;
if (zone_device_pages_have_same_pgmap(bv->bv_page, page) &&
bvec_try_merge_hw_page(q, bv, page, len, offset)) {
bip->bip_iter.bi_size += len;
diff --git a/block/bio.c b/block/bio.c
index 7715e59e68613..6216a554de68b 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1034,6 +1034,9 @@ int bio_add_page(struct bio *bio, struct page *page,
if (bio->bi_vcnt > 0) {
struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
+ if (is_pci_p2pdma_page(bv->bv_page) !=
+ is_pci_p2pdma_page(page))
+ return 0;
if (zone_device_pages_have_same_pgmap(bv->bv_page, page) &&
bvec_try_merge_page(bv, page, len, offset)) {
bio->bi_iter.bi_size += len;
Please let me know what you think about this.
Thanks,
Naman
next prev parent reply other threads:[~2026-04-02 5:21 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-01 8:23 [PATCH 0/2] block: fix pgmap handling for zone device pages in bio merge paths Naman Jain
2026-04-01 8:23 ` [PATCH 1/2] block: add pgmap check to biovec_phys_mergeable Naman Jain
2026-04-01 14:07 ` Christoph Hellwig
2026-04-01 8:23 ` [PATCH 2/2] block: allow different-pgmap pages as separate bvecs in bio_add_page Naman Jain
2026-04-01 14:08 ` Christoph Hellwig
2026-04-02 5:21 ` Naman Jain [this message]
2026-04-02 5:30 ` Christoph Hellwig
2026-04-02 8:55 ` Naman Jain
2026-04-07 5:52 ` Christoph Hellwig
2026-04-07 7:08 ` Naman Jain
2026-04-08 6:08 ` Christoph Hellwig
2026-04-10 3:38 ` Naman Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=70c82c3a-d135-4877-ab46-c15d329815f5@linux.microsoft.com \
--to=namjain@linux.microsoft.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=jhubbard@nvidia.com \
--cc=kch@nvidia.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=longli@microsoft.com \
--cc=mhklinux@outlook.com \
--cc=ssengar@linux.microsoft.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.