public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Naman Jain <namjain@linux.microsoft.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>, Chaitanya Kulkarni <kch@nvidia.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Logan Gunthorpe <logang@deltatee.com>,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	Saurabh Sengar <ssengar@linux.microsoft.com>,
	Long Li <longli@microsoft.com>,
	Michael Kelley <mhklinux@outlook.com>
Subject: Re: [PATCH 2/2] block: allow different-pgmap pages as separate bvecs in bio_add_page
Date: Thu, 2 Apr 2026 10:51:05 +0530	[thread overview]
Message-ID: <70c82c3a-d135-4877-ab46-c15d329815f5@linux.microsoft.com> (raw)
In-Reply-To: <20260401140850.GC21703@lst.de>



On 4/1/2026 7:38 PM, Christoph Hellwig wrote:
> On Wed, Apr 01, 2026 at 08:23:29AM +0000, Naman Jain wrote:
>> bio_add_page() and bio_integrity_add_page() reject pages from a
>> different dev_pagemap entirely, returning 0 even when the page could
>> be added as a new bvec entry. The pgmap check was intended only to
>> prevent merging into the same bvec segment, not to block the page
>> from being added at all.
>>
>> This causes callers to unnecessarily start a new bio when a buffer
>> spans pages from two different pgmaps, even though the bio has room
>> for another bvec.
> 
> This is not unnecessary.  A single dma mapping operation can only
> map a single target pgmap.  The old SG API works around this by
> doing multiple mapping operation underneath, but compared to that
> just having multiple bios is much easier and more efficient.
> 
> What is your use case here?

Hello Christoph,

Thanks for reviewing these patches.

The use case driving this patch is the MSHV VTL driver 
(drivers/hv/mshv_vtl_main.c) for VMs with paravisor architecture 
(OpenHCL/OpenVMM: https://openvmm.dev/guide/index.html).

In this setup, the guest runs at two Virtual Trust Levels:
- VTL2 (higher privilege): runs a Linux kernel acting as "paravisor" 
that handles device I/O on behalf of the guest
- VTL0 (lower privilege): runs the actual guest OS (Windows/Linux)

VTL2 Linux performs block I/O (NVMe, SCSI, etc.) using VTL0's memory as 
DMA buffers. To enable this, VTL0 memory is registered into the VTL2 
kernel via the MSHV_ADD_VTL0_MEMORY ioctl, which calls 
devm_memremap_pages() to create MEMORY_DEVICE_GENERIC zone device pages.

The ioctl is called multiple times, by the Virtual Machine Manager 
(VMM), registering VTL0's physical address space in chunks. Each call 
creates a separate dev_pagemap. This chunking is necessary because:

1. Firmware/UEFI fragments the guest physical address space (MMIO holes,
    reserved regions)
2. Alignment constraints: vmemmap_shift is computed from the range 
alignment, and highly aligned large ranges can exceed MAX_FOLIO_ORDER, 
causing devm_memremap_pages() to fail

When a direct I/O request spans pages from different chunks (different 
pgmaps), the current code rejected the second page entirely:

     if (!zone_device_pages_have_same_pgmap(bv->bv_page, page))
         return 0;  // Rejection - forces bio split or I/O error

Both chunks are regular RAM from the DMA perspective 
(MEMORY_DEVICE_GENERIC, not P2PDMA). The only requirement is that they 
not be merged into the same bvec segment, which patch 1/2 enforces by 
adding the pgmap check to biovec_phys_mergeable().

This patch allows pages from different pgmaps to be added as separate 
bvec entries in the same bio, eliminating bio splits and I/O failures
when buffers span pgmap boundaries.

I noticed this while doing kernel upgrade from 6.12 to 6.18 for OpenHCL 
kernel.



There's this another concern flagged from Sashiko code review:
https://sashiko.dev/#/patchset/20260401082329.1602328-1-namjain%40linux.microsoft.com

 From my code analysis, this issue would not happening as of now, so 
this is future proofing the APIs after change 2/2. I would need to add a 
check like this to fix this:

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 3462697331890..6f2f30a814560 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -231,6 +231,9 @@ int bio_integrity_add_page(struct bio *bio, struct 
page *page,
         if (bip->bip_vcnt > 0) {
                 struct bio_vec *bv = &bip->bip_vec[bip->bip_vcnt - 1];

+               if (is_pci_p2pdma_page(bv->bv_page) !=
+                   is_pci_p2pdma_page(page))
+                       return 0;
                 if (zone_device_pages_have_same_pgmap(bv->bv_page, page) &&
                     bvec_try_merge_hw_page(q, bv, page, len, offset)) {
                         bip->bip_iter.bi_size += len;
diff --git a/block/bio.c b/block/bio.c
index 7715e59e68613..6216a554de68b 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1034,6 +1034,9 @@ int bio_add_page(struct bio *bio, struct page *page,
         if (bio->bi_vcnt > 0) {
                 struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];

+               if (is_pci_p2pdma_page(bv->bv_page) !=
+                   is_pci_p2pdma_page(page))
+                       return 0;
                 if (zone_device_pages_have_same_pgmap(bv->bv_page, page) &&
                     bvec_try_merge_page(bv, page, len, offset)) {
                         bio->bi_iter.bi_size += len;


Please let me know what you think about this.

Thanks,
Naman

  reply	other threads:[~2026-04-02  5:21 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-01  8:23 [PATCH 0/2] block: fix pgmap handling for zone device pages in bio merge paths Naman Jain
2026-04-01  8:23 ` [PATCH 1/2] block: add pgmap check to biovec_phys_mergeable Naman Jain
2026-04-01 14:07   ` Christoph Hellwig
2026-04-01  8:23 ` [PATCH 2/2] block: allow different-pgmap pages as separate bvecs in bio_add_page Naman Jain
2026-04-01 14:08   ` Christoph Hellwig
2026-04-02  5:21     ` Naman Jain [this message]
2026-04-02  5:30       ` Christoph Hellwig
2026-04-02  8:55         ` Naman Jain
2026-04-07  5:52           ` Christoph Hellwig
2026-04-07  7:08             ` Naman Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=70c82c3a-d135-4877-ab46-c15d329815f5@linux.microsoft.com \
    --to=namjain@linux.microsoft.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=jhubbard@nvidia.com \
    --cc=kch@nvidia.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=longli@microsoft.com \
    --cc=mhklinux@outlook.com \
    --cc=ssengar@linux.microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox